linux/mm
Baolin Wang fac35ba763 mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte().  However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.

For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.

Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().

To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.

Mike said:

Support for CONT_PMD/_PTE was added with bb9dd3df8e ("arm64: hugetlb:
refactor find_num_contig()").  Patch series "Support for contiguous pte
hugepages", v4.  However, I do not believe these code paths were
executed until migration support was added with 5480280d3f ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f for the Fixes: targe.

Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11 19:05:44 -07:00
..
damon damon/sysfs: fix possible memleak on damon_sysfs_add_target 2022-09-30 18:46:31 -07:00
kasan - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
kfence - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
backing-dev.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-28 14:02:45 -07:00
cma_debug.c mm/cma_debug.c: align the name buffer length as struct cma 2022-07-29 18:07:16 -07:00
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
folio-compat.c mm/folio-compat: Remove migration compatibility functions 2022-08-02 12:34:04 -04:00
frontswap.c frontswap: don't call ->init if no ops are registered 2022-09-26 12:14:34 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-10-11 19:05:44 -07:00
highmem.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
hmm.c mm/hmm: fault non-owner device private entries 2022-07-29 11:33:37 -07:00
huge_memory.c mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() 2022-09-26 12:14:33 -07:00
hugetlb_cgroup.c hugetlb_cgroup: fix wrong hugetlb cgroup numa stat 2022-07-29 18:07:17 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE 2022-08-08 18:06:43 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability 2022-08-08 18:06:43 -07:00
hugetlb.c mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-10-11 19:05:44 -07:00
hwpoison-inject.c mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig cxl for 6.0 2022-08-10 11:07:26 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm: gup: fix the fast GUP race against THP collapse 2022-09-26 12:14:33 -07:00
kmemleak.c mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan() 2022-06-16 19:48:32 -07:00
ksm.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-09-26 12:14:33 -07:00
Makefile mm: shrinkers: introduce debugfs interface for memory shrinkers 2022-07-03 18:08:40 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock updates for v5.20 2022-08-09 09:48:30 -07:00
memcontrol.c mm: memcontrol: fix potential oom_lock recursion deadlock 2022-07-29 18:07:18 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: use is_zone_movable_page() helper 2022-07-29 18:07:20 -07:00
memory-failure.c mm,hwpoison: check mm when killing accessing process 2022-09-26 12:14:34 -07:00
memory.c mm: bring back update_mmu_cache() to finish_fault() 2022-09-26 12:14:34 -07:00
mempolicy.c mm/mempolicy: remove unneeded out label 2022-07-29 18:07:16 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
memtest.c
migrate_device.c mm/migrate_device.c: copy pte dirty bit to page 2022-09-11 16:22:30 -07:00
migrate.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mm_init.c
mmap_lock.c
mmap.c mm/hugetlb: fix hugetlb not supporting softdirty tracking 2022-08-20 15:17:45 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c mm/mprotect: only reference swap pfn page if type match 2022-08-28 14:02:46 -07:00
mremap.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
msync.c
nommu.c mm: nommu: pass a pointer to virt_to_page() 2022-07-17 17:14:37 -07:00
oom_kill.c mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery 2022-06-01 15:57:16 -07:00
page_alloc.c mm: prevent page_frag_alloc() from corrupting the memory 2022-09-26 12:14:34 -07:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_isolation.c mm/page_isolation: fix isolate_single_pageblock() isolation behavior 2022-09-26 12:14:34 -07:00
page_owner.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c Six hotfixes. One from Miaohe Lin is considered a minor thing so it isn't 2022-05-27 11:29:35 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: use helper function huge_pte_lock 2022-07-17 17:14:47 -07:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-28 14:02:43 -07:00
pagewalk.c
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Fix serialization adding transparent huge pages to page cache 2022-06-23 12:22:00 -04:00
rmap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
rodata_test.c
secretmem.c mm: fix dereferencing possible ERR_PTR 2022-09-11 16:22:31 -07:00
shmem.c shmem: update folio if shmem_replace_page() updates the page 2022-08-28 14:02:43 -07:00
shrinker_debug.c mm: shrinkers: fix double kfree on shrinker name 2022-07-29 18:07:13 -07:00
shuffle.c
shuffle.h
slab_common.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slab.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
slab.h mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slob.c mm/slab_common: move generic bulk alloc/free functions to SLOB 2022-07-20 13:30:12 +02:00
slub.c mm/sl[au]b: use own bulk free function when bulk alloc failed 2022-07-20 13:30:11 +02:00
sparse-vmemmap.c mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c 2022-08-08 18:06:42 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c arm64: enable THP_SWAP for arm64 2022-07-20 10:52:40 +01:00
swap_state.c mm: fix VM_BUG_ON in __delete_from_swap_cache() 2022-09-11 16:22:31 -07:00
swap.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
swap.h mm/khugepaged: try to free transhuge swapcache when possible 2022-07-03 18:08:52 -07:00
swapfile.c mm/swap: convert delete_from_swap_cache() to take a folio 2022-07-03 18:08:48 -07:00
truncate.c mm: Remove __delete_from_page_cache() 2022-06-29 08:51:05 -04:00
usercopy.c usercopy: use unsigned long instead of uintptr_t 2022-07-01 17:03:38 -07:00
userfaultfd.c mm/uffd: reset write protection when unregister with wp-mode 2022-08-20 15:17:45 -07:00
util.c mm: fix BUG splat with kvmalloc + GFP_ATOMIC 2022-09-30 18:46:31 -07:00
vmacache.c
vmalloc.c mm/vmalloc: extend __find_vmap_area() with one more argument 2022-07-03 18:08:41 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c vmscan: check folio_test_private(), not folio_get_private() 2022-09-11 16:22:31 -07:00
vmstat.c mm: add DEVICE_ZONE to FOR_ALL_ZONES 2022-08-20 15:17:45 -07:00
workingset.c mm: shrinkers: provide shrinkers with names 2022-07-03 18:08:40 -07:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c mm/zsmalloc: do not attempt to free IS_ERR handle 2022-08-28 14:02:44 -07:00
zswap.c zswap: memcg accounting 2022-05-19 14:08:53 -07:00