GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().
As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available. While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().
[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the addition of __sg_alloc_table_from_pages we can control
the maximum coalescing size and eliminate a separate path for
allocating backing store here.
Similar to 871dfbd67d ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.
v2:
* Rename helper to i915_sg_segment_size and fix swiotlb override.
* Commit message update.
v3:
* Actually include the swiotlb override fix.
v4:
* Regroup parameters a bit. (Chris Wilson)
v5:
* Rebase for swiotlb_max_segment.
* Add DMA map failure handling as in abb0deacb5
("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").
v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)
v7: Rebase.
v8: Commit spelling fix.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803091417.23677-1-tvrtko.ursulin@linux.intel.com
Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.
See commit e27ab73d17 ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.
v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.
Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.
v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.
The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).
v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.
v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.
Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.
This shouldn't introduce any functional change.
Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
build robot
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
If we allow the user to convert a GTT mmap address into a userptr, we
may end up in recursion hell, where currently we hit a mutex deadlock
but other possibilities include use-after-free during the
unbind/cancel_userptr.
[ 143.203989] gem_userptr_bli D 0 902 898 0x00000000
[ 143.204054] Call Trace:
[ 143.204137] __schedule+0x511/0x1180
[ 143.204195] ? pci_mmcfg_check_reserved+0xc0/0xc0
[ 143.204274] schedule+0x57/0xe0
[ 143.204327] schedule_timeout+0x383/0x670
[ 143.204374] ? trace_hardirqs_on_caller+0x187/0x280
[ 143.204457] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 143.204507] ? usleep_range+0x110/0x110
[ 143.204657] ? irq_exit+0x89/0x100
[ 143.204710] ? retint_kernel+0x2d/0x2d
[ 143.204794] ? trace_hardirqs_on_caller+0x187/0x280
[ 143.204857] ? _raw_spin_unlock_irq+0x33/0x60
[ 143.204944] wait_for_common+0x1f0/0x2f0
[ 143.205006] ? out_of_line_wait_on_atomic_t+0x170/0x170
[ 143.205103] ? wake_up_q+0xa0/0xa0
[ 143.205159] ? flush_workqueue_prep_pwqs+0x15a/0x2c0
[ 143.205237] wait_for_completion+0x1d/0x20
[ 143.205292] flush_workqueue+0x2e9/0xbb0
[ 143.205339] ? flush_workqueue+0x163/0xbb0
[ 143.205418] ? __schedule+0x533/0x1180
[ 143.205498] ? check_flush_dependency+0x1a0/0x1a0
[ 143.205681] i915_gem_userptr_mn_invalidate_range_start+0x1c7/0x270 [i915]
[ 143.205865] ? i915_gem_userptr_dmabuf_export+0x40/0x40 [i915]
[ 143.205955] __mmu_notifier_invalidate_range_start+0xc6/0x120
[ 143.206044] ? __mmu_notifier_invalidate_range_start+0x51/0x120
[ 143.206123] zap_page_range_single+0x1c7/0x1f0
[ 143.206171] ? unmap_single_vma+0x160/0x160
[ 143.206260] ? unmap_mapping_range+0xa9/0x1b0
[ 143.206308] ? vma_interval_tree_subtree_search+0x75/0xd0
[ 143.206397] unmap_mapping_range+0x18f/0x1b0
[ 143.206444] ? zap_vma_ptes+0x70/0x70
[ 143.206524] ? __pm_runtime_resume+0x67/0xa0
[ 143.206723] i915_gem_release_mmap+0x1ba/0x1c0 [i915]
[ 143.206846] i915_vma_unbind+0x5c2/0x690 [i915]
[ 143.206925] ? __lock_is_held+0x52/0x100
[ 143.207076] i915_gem_object_set_tiling+0x1db/0x650 [i915]
[ 143.207236] i915_gem_set_tiling_ioctl+0x1d3/0x3b0 [i915]
[ 143.207377] ? i915_gem_set_tiling_ioctl+0x5/0x3b0 [i915]
[ 143.207457] drm_ioctl+0x36c/0x670
[ 143.207535] ? debug_lockdep_rcu_enabled.part.0+0x1a/0x30
[ 143.207730] ? i915_gem_object_set_tiling+0x650/0x650 [i915]
[ 143.207793] ? drm_getunique+0x120/0x120
[ 143.207875] ? __handle_mm_fault+0x996/0x14a0
[ 143.207939] ? vm_insert_page+0x340/0x340
[ 143.208028] ? up_write+0x28/0x50
[ 143.208086] ? vm_mmap_pgoff+0x160/0x190
[ 143.208163] do_vfs_ioctl+0x12c/0xa60
[ 143.208218] ? debug_lockdep_rcu_enabled+0x35/0x40
[ 143.208267] ? ioctl_preallocate+0x150/0x150
[ 143.208353] ? __do_page_fault+0x36a/0x6e0
[ 143.208400] ? mark_held_locks+0x23/0xc0
[ 143.208479] ? up_read+0x1f/0x40
[ 143.208526] ? entry_SYSCALL_64_fastpath+0x5/0xc6
[ 143.208669] ? __fget_light+0xa7/0xc0
[ 143.208747] SyS_ioctl+0x41/0x70
To prevent the possibility of a deadlock, we defer scheduling the worker
until after we have proven that given the current mm, the userptr range
does not overlap a GGTT mmaping. If another thread tries to remap the
GGTT over the userptr before the worker is scheduled, it will be stopped
by its invalidate-range flushing the current work, before the deadlock
can occur.
v2: Improve discussion of how we end up in the deadlock.
v3: Don't forget to mark the userptr as active after a successful
gup_fast. Rename overlaps_ggtt to noncontiguous_or_overlaps_ggtt.
v4: Fix test ordering between invalid GTT mmaping and range completion
(Tvrtko)
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_userptr_blits/map-fixed-invalidate-gup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170308215903.24171-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
To avoid waiting for work from other invalidate-range threads where
not required, only wait on the userptr cancel workqueue if we have added
some work to it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If the worker fails, it no longer has pages to release and can be
immediately removed from the invalidate-tree.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We already have the helper, we can convert the rest of the kernel
mechanically using:
git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.
Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.
(Michal Hocko provided most of the kerneldoc comment.)
Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYaYNlAAoJEHm+PkMAQRiGtCUH/18PMUJpHqRKjxL3Yscw+QZC
RmGlD/hwBRLUSgiTCfURNGKP4QZv2kQW7BGsGC72oL01lmxozsU72ixUIO+wXzDY
K2b0OOKGZZWzFtaVm7Qs+5JhHAEKZcT046mLD8sjJuqkrFAhmNLKdwHjihKBEkm9
J3s2tpdXdN0x/Uyga/GY9khEYIrvLPeBoKSz+JXcQKdC0iq3/+PMpWnN47QCNScr
7azojkJkj/rs2cqVdOi7Wbh6PSqIvPsl8E3qJefpaVJF/IQaU1pFdy5g8kYm4V7T
fr6HgIbuN4EQWdN/5cgKrUdpQyV7D8iYx02klk4R8WgfS0QMYoUcsg+XsTd02TI=
=OhGe
-----END PGP SIGNATURE-----
Merge tag 'v4.10-rc2' into drm-intel-next-queued
Backmerge Linux 4.10-rc2 to resync with our -fixes cherry-picks. I've
done the backmerge directly because Dave is on vacation.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Patch series "mm: unexport __get_user_pages_unlocked()".
This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.
It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality. This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.
Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)
This patch (of 2):
Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().
[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Where it is more appropriate and also to be consistent with
the direction of the driver.
v2: Leave out object alloc/free inlining. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Commit 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.
Restore that so idle userptr objects can be discarded in order
to free up memory.
One change further to what was introduced in 1bec9b0bda is
to start considering userptr objects in oom but that should
also be a correct thing to do.
v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com
The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.
v2: Rewrite building the radixtree for clarity, hopefully.
v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.
v2: Rewrite i915_wait_request() kerneldocs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keep any error reported by the gup_worker until we are notified that the
arena has changed (via the mmu-notifier). This has the importance of
making two consecutive calls to i915_gem_object_get_pages() reporting
the same error, and curtailing a loop of detecting a fault and requeueing
a gup_worker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-19-chris@chris-wilson.co.uk
We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-3-git-send-email-chris@chris-wilson.co.uk
We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).
v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.
Before: 6298417 3619610 696320 10614347 a1f64b vmlinux
After: 6298417 3619610 696320 10614347 a1f64b vmlinux
(with i915 builtin)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.
v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk
The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.
Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)
One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.
Various uses of for_each_sg_page() are then converted to the new macros.
v2: Force inlining of the sg_iter constructor and make the union
anonymous.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk
userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.
PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
Since we only ever use the drm_i915_private from the stored
i915_mm_struct->dev, save some electrons by storing the right
backpointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-3-git-send-email-chris@chris-wilson.co.uk
Holding a reference to the containing task_struct is not sufficient to
prevent the mm_struct from being reaped under memory pressure. If this
happens whilst we are calling get_user_pages(), explosions erupt -
sometimes an immediate GPF, sometimes page flag corruption. To prevent
the target mm from being reaped as we are reading from it, acquire a
reference before we begin.
Testcase: igt/gem_shrink/*userptr
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459864801-28606-2-git-send-email-chris@chris-wilson.co.uk