linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 12:42:02 +00:00

History

Andrea Arcangeli 7066f0f933 mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() change_huge_pmd() after arming the numa/protnone pmd doesn't flush the TLB right away. do_huge_pmd_numa_page() flushes the TLB before calling migrate_misplaced_transhuge_page(). By the time do_huge_pmd_numa_page() runs some CPU could still access the page through the TLB. change_huge_pmd() before arming the numa/protnone transhuge pmd calls mmu_notifier_invalidate_range_start(). So there's no need of mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end() sequence in migrate_misplaced_transhuge_page() too, because by the time migrate_misplaced_transhuge_page() runs, the pmd mapping has already been invalidated in the secondary MMUs. It has to or if a secondary MMU can still write to the page, the migrate_page_copy() would lose data. However an explicit mmu_notifier_invalidate_range() is needed before migrate_misplaced_transhuge_page() starts copying the data of the transhuge page or the below can happen for MMU notifier users sharing the primary MMU pagetables and only implementing ->invalidate_range: CPU0 CPU1 GPU sharing linux pagetables using only ->invalidate_range ----------- ------------ --------- GPU secondary MMU writes to the page mapped by the transhuge pmd change_pmd_range() mmu..._range_start() ->invalidate_range_start() noop change_huge_pmd() set_pmd_at(numa/protnone) pmd_unlock() do_huge_pmd_numa_page() CPU TLB flush globally (1) CPU cannot write to page migrate_misplaced_transhuge_page() GPU writes to the page... migrate_page_copy() ...GPU stops writing to the page CPU TLB flush (2) mmu..._range_end() (3) ->invalidate_range_stop() noop ->invalidate_range() GPU secondary MMU is invalidated and cannot write to the page anymore (too late) Just like we need a CPU TLB flush (1) because the TLB flush (2) arrives too late, we also need a mmu_notifier_invalidate_range() before calling migrate_misplaced_transhuge_page(), because the ->invalidate_range() in (3) also arrives too late. This requirement is the result of the lazy optimization in change_huge_pmd() that releases the pmd_lock without first flushing the TLB and without first calling mmu_notifier_invalidate_range(). Even converting the removed mmu_notifier_invalidate_range_only_end() into a mmu_notifier_invalidate_range_end() would not have been enough to fix this, because it run after migrate_page_copy(). After the hugepage data copy is done migrate_misplaced_transhuge_page() can proceed and call set_pmd_at without having to flush the TLB nor any secondary MMUs because the secondary MMU invalidate, just like the CPU TLB flush, has to happen before the migrate_page_copy() is called or it would be a bug in the first place (and it was for drivers using ->invalidate_range()). KVM is unaffected because it doesn't implement ->invalidate_range(). The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and uses the generic migrate_pages which transitions the pte from numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs and all mmu notifiers there before copying the page. Link: http://lkml.kernel.org/r/20181013002430.698-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2018-10-26 16:38:15 -07:00
..
kasan	mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t	2018-10-26 16:38:15 -07:00
backing-dev.c	blkcg: delay blkg destruction until after writeback has finished	2018-08-31 14:48:56 -06:00
balloon_compaction.c	virtio_balloon: fix deadlock on OOM	2017-11-14 23:57:38 +02:00
bootmem.c	docs/mm: bootmem: add overview documentation	2018-08-02 12:17:27 -06:00
cleancache.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma_debug.c	mm/cma: remove unsupported gfp_mask parameter from cma_alloc()	2018-08-17 16:20:32 -07:00
cma.c	mm/cma: remove unsupported gfp_mask parameter from cma_alloc()	2018-08-17 16:20:32 -07:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	mm: provide kernel parameter to allow disabling page init poisoning	2018-10-26 16:26:34 -07:00
dmapool.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
early_ioremap.c	mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep	2017-12-11 14:54:44 +01:00
fadvise.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
failslab.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
filemap.c	mm/filemap.c: use vmf_error()	2018-10-26 16:26:35 -07:00
frame_vector.c	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'	2017-12-14 16:00:48 -08:00
frontswap.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
gup_benchmark.c	mm/gup_benchmark.c: add additional pinning methods	2018-10-26 16:38:15 -07:00
gup.c	mm/gup: cache dev_pagemap while pinning pages	2018-10-26 16:38:15 -07:00
highmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
hmm.c	mm: defer ZONE_DEVICE page initialization to the point where we init pgmap	2018-10-26 16:26:34 -07:00
huge_memory.c	mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()	2018-10-26 16:38:15 -07:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hugetlb.c	hugetlb: take PMD sharing into account when flushing tlb/caches	2018-10-05 16:32:04 -07:00
hwpoison-inject.c	mm/memory_failure: Remove unused trapno from memory_failure	2018-01-23 12:17:42 -06:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	mm: Change return type int to vm_fault_t for fault handlers	2018-08-23 18:48:44 -07:00
interval_tree.c	mm/interval_tree.c: use vma_pages() helper	2018-01-31 17:18:37 -08:00
Kconfig	mm: disable deferred struct page for 32-bit arches	2018-09-20 22:01:11 +02:00
Kconfig.debug	mm: clarify CONFIG_PAGE_POISONING and usage	2018-08-22 10:52:44 -07:00
khugepaged.c	mm: Change return type int to vm_fault_t for fault handlers	2018-08-23 18:48:44 -07:00
kmemleak-test.c
kmemleak.c	kmemleak: add module param to print warnings to dmesg	2018-10-26 16:25:19 -07:00
ksm.c	include/linux/compiler.h: make compiler-.h mutually exclusive	2018-08-22 17:31:34 -07:00
list_lru.c	mm/list_lru: introduce list_lru_shrink_walk_irq()	2018-08-17 16:20:32 -07:00
maccess.c	x86/fault: BUG() when uaccess helpers fault on kernel addresses	2018-09-03 15:12:09 +02:00
madvise.c	mm: madvise(MADV_DODUMP): allow hugetlbfs pages	2018-10-05 16:32:05 -07:00
Makefile	arm64 updates for 4.20:	2018-10-22 17:30:06 +01:00
memblock.c	mm: provide kernel parameter to allow disabling page init poisoning	2018-10-26 16:26:34 -07:00
memcontrol.c	mm: don't raise MEMCG_OOM event due to failed high-order allocation	2018-10-26 16:38:14 -07:00
memfd.c	alloc_file(): switch to passing O_... flags instead of FMODE_... mode	2018-07-12 10:02:57 -04:00
memory_hotplug.c	mm/memory_hotplug.c: clean up node_states_check_changes_offline()	2018-10-26 16:26:33 -07:00
memory-failure.c	libnvdimm-for-4.19_dax-memory-failure	2018-08-25 18:43:59 -07:00
memory.c	mm/memory.c: recheck page table entry with page table lock held	2018-10-26 16:26:35 -07:00
mempolicy.c	mm/mempolicy.c: use match_string() helper to simplify the code	2018-10-26 16:26:33 -07:00
mempool.c	mm/mempool.c: add missing parameter description	2018-08-22 10:52:44 -07:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()	2018-10-26 16:38:15 -07:00
mincore.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mlock.c	dax: remove VM_MIXEDMAP for fsdax and device dax	2018-08-17 16:20:27 -07:00
mm_init.c	mm: access zone->node via zone_to_nid() and zone_set_nid()	2018-08-22 10:52:45 -07:00
mmap.c	mm: brk: downgrade mmap_sem to read when shrinking	2018-10-26 16:26:35 -07:00
mmu_context.c
mmu_gather.c	mm/memory: Move mmu_gather and TLB invalidation code into its own file	2018-09-07 15:19:25 +01:00
mmu_notifier.c	Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"	2018-10-26 16:25:19 -07:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings	2018-06-20 19:10:01 +02:00
mremap.c	mm: mremap: downgrade mmap_sem to read when shrinking	2018-10-26 16:26:35 -07:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nobootmem.c	mm/memblock: add a name for memblock flags enumeration	2018-08-02 12:17:27 -06:00
nommu.c	mm/gup: cache dev_pagemap while pinning pages	2018-10-26 16:38:15 -07:00
oom_kill.c	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2018-10-24 11:22:39 +01:00
page_alloc.c	mm: return zero_resv_unavail optimization	2018-10-26 16:38:15 -07:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	mm/page_ext.c: constify lookup_page_ext() argument	2018-08-17 16:20:28 -07:00
page_idle.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
page_io.c	blkcg: associate a blkg for pages being evicted by swap	2018-09-21 20:29:09 -06:00
page_isolation.c	mm, migrate: remove reason argument from new_page_t	2018-04-11 10:28:32 -07:00
page_owner.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
page_poison.c	mm/page_poison.c: make early_page_poison_param() __init	2018-04-05 21:36:26 -07:00
page_vma_mapped.c	mm, page_vma_mapped: Introduce pfn_in_hpage()	2018-01-22 12:15:57 -08:00
page-writeback.c	mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock	2018-10-26 16:38:14 -07:00
pagewalk.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
percpu-internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
percpu-km.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu-stats.c	treewide: Use array_size() in vmalloc()	2018-06-12 16:19:22 -07:00
percpu-vm.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu.c	percpu: stop leaking bitmap metadata blocks	2018-10-07 14:50:12 -07:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors	2018-02-06 18:32:48 -08:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
rmap.c	mm: migration: fix migration of huge PMD shared pages	2018-10-05 16:32:04 -07:00
rodata_test.c	mm: fix RODATA_TEST failure "rodata_test: test data was not read only"	2017-10-03 17:54:24 -07:00
shmem.c	mm: shmem.c: Correctly annotate new inodes for lockdep	2018-09-20 22:01:11 +02:00
slab_common.c	mm, slab: shorten kmalloc cache names for large sizes	2018-10-26 16:26:32 -07:00
slab.c	mm, slab: combine kmalloc_caches and kmalloc_dma_caches	2018-10-26 16:26:31 -07:00
slab.h	mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB	2018-08-17 16:20:30 -07:00
slob.c	slab: __GFP_ZERO is incompatible with a constructor	2018-06-07 17:34:34 -07:00
slub.c	mm, slab: combine kmalloc_caches and kmalloc_dma_caches	2018-10-26 16:26:31 -07:00
sparse-vmemmap.c	mm/sparse: delete old sparse_init and enable new one	2018-08-17 16:20:32 -07:00
sparse.c	mm: provide kernel parameter to allow disabling page init poisoning	2018-10-26 16:26:34 -07:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	mm: workingset: tell cache transitions from workingset thrashing	2018-10-26 16:26:32 -07:00
swap.c	mm/swap.c: remove duplicated include	2018-10-26 16:26:33 -07:00
swapfile.c	mm/swapfile.c: clear si->swap_map[] in swap_free_cluster()	2018-10-26 16:25:19 -07:00
truncate.c	page cache: use xa_lock	2018-04-11 10:28:39 -07:00
usercopy.c	usercopy: Allow boot cmdline disabling of hardening	2018-07-04 08:04:52 -07:00
userfaultfd.c	userfaultfd: prevent non-cooperative events vs mcopy_atomic races	2018-06-07 17:34:38 -07:00
util.c	kvfree(): fix misleading comment	2018-10-26 16:26:33 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	vfree: add debug might_sleep()	2018-10-26 16:26:33 -07:00
vmpressure.c	mm/vmpressure.c: convert to use match_string() helper	2018-06-07 17:34:36 -07:00
vmscan.c	mm: zero-seek shrinkers	2018-10-26 16:26:33 -07:00
vmstat.c	mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size	2018-10-26 16:26:35 -07:00
workingset.c	mm: zero-seek shrinkers	2018-10-26 16:26:33 -07:00
z3fold.c	z3fold: fix reclaim lock-ups	2018-05-11 17:28:45 -07:00
zbud.c	mm: docs: fix parameter names mismatch	2018-02-06 18:32:48 -08:00
zpool.c	mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc	2018-02-21 15:35:43 -08:00
zsmalloc.c	mm/zsmalloc.c: fix fall-through annotation	2018-10-26 16:26:35 -07:00
zswap.c	zswap: re-check zswap_is_full() after do zswap_shrink()	2018-07-26 19:38:03 -07:00