linux/mm
Baolin Wang 5d4af6195c mm: rmap: fix CONT-PTE/PMD size hugetlb issue when migration
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and
1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified.

When migrating a hugetlb page, we will get the relevant page table entry
by huge_pte_offset() only once to nuke it and remap it with a migration
pte entry.  This is correct for PMD or PUD size hugetlb, since they always
contain only one pmd entry or pud entry in the page table.

However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since
they can contain several continuous pte or pmd entry with same page table
attributes.  So we will nuke or remap only one pte or pmd entry for this
CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb
migration.  The problem is we can still continue to modify the subpages'
data of a hugetlb page during migrating a hugetlb page, which can cause a
serious data consistent issue, since we did not nuke the page table entry
and set a migration pte for the subpages of a hugetlb page.

To fix this issue, we should change to use huge_ptep_clear_flush() to nuke
a hugetlb page table, and remap it with set_huge_pte_at() and
set_huge_swap_pte_at() when migrating a hugetlb page, which already
considered the CONT-PTE or CONT-PMD size hugetlb.

[akpm@linux-foundation.org: fix nommu build]
[baolin.wang@linux.alibaba.com: fix build errors for !CONFIG_MMU]
  Link: https://lkml.kernel.org/r/a4baca670aca637e7198d9ae4543b8873cb224dc.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:55 -07:00
..
damon mm/damon/reclaim: use resource_size function on resource object 2022-05-13 07:20:18 -07:00
kasan kasan: give better names to shadow values 2022-05-13 07:20:19 -07:00
kfence kfence: enable check kfence canary on panic via boot param 2022-05-13 07:20:06 -07:00
backing-dev.c remove congestion tracking framework 2022-03-22 15:57:01 -07:00
balloon_compaction.c mm/balloon_compaction: make balloon page compaction callbacks static 2022-03-28 16:52:57 -04:00
bootmem_info.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
cma_debug.c
cma_sysfs.c
cma.c mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c tracing: incorrect gfp_t conversion 2022-05-13 07:20:18 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE 2022-05-09 18:20:45 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c remove inode_congested() 2022-03-22 15:57:01 -07:00
failslab.c
filemap.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
folio-compat.c mm/rmap: Convert rmap_walk() to take a folio 2022-03-21 13:01:35 -04:00
frontswap.c frontswap: remove support for multiple ops 2022-01-22 08:33:38 +02:00
gup_test.c
gup_test.h
gup.c mm/gup: fix comments to pin_user_pages_*() 2022-05-09 18:20:47 -07:00
highmem.c highmem: fix checks in __kmap_local_sched_{in,out} 2022-04-08 14:20:36 -10:00
hmm.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
huge_memory.c mm/huge_memory: convert do_huge_pmd_anonymous_page() to use vma_alloc_folio() 2022-05-13 07:20:14 -07:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: move comment block to Documentation/vm 2022-04-28 23:16:15 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
hugetlb.c mm/hugetlb: handle uffd-wp during fork() 2022-05-13 07:20:11 -07:00
hwpoison-inject.c mm/hwpoison: disable hwpoison filter during removing 2022-05-13 07:20:19 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h mm/memory-failure.c: move clear_hwpoisoned_pages 2022-05-13 07:20:19 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
Kconfig mm/uffd: move USERFAULTFD configs into mm/ 2022-05-13 07:20:12 -07:00
Kconfig.debug mm: page table check 2022-01-15 16:30:28 +02:00
khugepaged.c mm/khugepaged: don't recycle vma pgtable if uffd-wp registered 2022-05-13 07:20:11 -07:00
kmemleak.c mm: kmemleak: take a full lowmem check in kmemleak_*_phys() 2022-04-15 14:49:56 -07:00
ksm.c mm: remember exclusively mapped anonymous pages with PG_anon_exclusive 2022-05-09 18:20:44 -07:00
list_lru.c mm/list_lru.c: revert "mm/list_lru: optimize memcg_reparent_list_lru_node()" 2022-04-08 14:20:36 -10:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: submit multipage reads for SWP_FS_OPS swap-space 2022-05-09 18:20:49 -07:00
Makefile mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c memblock: test suite and a small cleanup 2022-03-27 13:36:06 -07:00
memcontrol.c swap: turn get_swap_page() into folio_alloc_swap() 2022-05-13 07:20:15 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: make alloc_contig_range work at pageblock granularity 2022-05-13 07:20:13 -07:00
memory-failure.c mm/memory-failure.c: simplify num_poisoned_pages_inc/dec 2022-05-13 07:20:19 -07:00
memory.c mm/shmem: remove duplicate include in memory.c 2022-05-13 07:20:14 -07:00
mempolicy.c mm: remove alloc_pages_vma() 2022-05-13 07:20:15 -07:00
mempool.c mm: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
memremap.c mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages 2022-05-09 18:20:44 -07:00
memtest.c
migrate_device.c mm: remember exclusively mapped anonymous pages with PG_anon_exclusive 2022-05-09 18:20:44 -07:00
migrate.c mm/migrate: convert move_to_new_page() into move_to_new_folio() 2022-05-13 07:20:17 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm/munlock: protect the per-CPU pagevec by a local_lock_t 2022-04-01 11:46:09 -07:00
mm_init.c
mmap_lock.c
mmap.c mmap locking API: fix missed mmap_sem references in comments 2022-05-13 07:20:07 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c mm/hugetlb: handle UFFDIO_WRITEPROTECT 2022-05-13 07:20:11 -07:00
mremap.c mm: hugetlb: considering PMD sharing when flushing cache/TLBs 2022-05-13 07:20:07 -07:00
msync.c
nommu.c no-MMU: expose vmalloc_huge() for alloc_large_system_hash() 2022-04-25 10:11:49 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-21 20:01:10 -07:00
page_alloc.c mm/memory-failure.c: simplify num_poisoned_pages_dec 2022-05-13 07:20:19 -07:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
page_idle.c mm/rmap: Constify the rmap_walk_control argument 2022-03-21 13:01:35 -04:00
page_io.c MM: handle THP in swap_*page_fs() - count_vm_events() 2022-05-09 18:20:49 -07:00
page_isolation.c mm: page_isolation: enable arbitrary range page isolation. 2022-05-13 07:20:13 -07:00
page_owner.c mm/page_owner: use strscpy() instead of strlcpy() 2022-05-13 07:20:19 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: page_table_check: move pxx_user_accessible_page into x86 2022-05-13 07:20:17 -07:00
page_vma_mapped.c mm: pvmw: add support for walking devmap pages 2022-04-28 23:16:10 -07:00
page-writeback.c mm: rework calculation of bdi_min_ratio in bdi_set_min_ratio 2022-04-28 23:15:57 -07:00
pagewalk.c
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c readahead: Update comments 2022-04-01 14:40:42 -04:00
rmap.c mm: rmap: fix CONT-PTE/PMD size hugetlb issue when migration 2022-05-13 16:48:55 -07:00
rodata_test.c
secretmem.c mm/secretmem: fix panic when growing a memfd_secret 2022-04-15 14:49:54 -07:00
shmem.c mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio() 2022-05-13 07:20:17 -07:00
shuffle.c
shuffle.h
slab_common.c mm: make minimum slab alignment a runtime property 2022-05-13 07:20:07 -07:00
slab.c mm: make minimum slab alignment a runtime property 2022-05-13 07:20:07 -07:00
slab.h mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-15 14:49:55 -07:00
slob.c mm: make minimum slab alignment a runtime property 2022-05-13 07:20:07 -07:00
slub.c mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-15 14:49:55 -07:00
sparse-vmemmap.c mm/sparse-vmemmap: improve memory savings for compound devmaps 2022-04-28 23:16:16 -07:00
sparse.c mm/memory-failure.c: move clear_hwpoisoned_pages 2022-05-13 07:20:19 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c swap: turn get_swap_page() into folio_alloc_swap() 2022-05-13 07:20:15 -07:00
swap_state.c swap: convert add_to_swap() to take a folio 2022-05-13 07:20:15 -07:00
swap.c mm/munlock: protect the per-CPU pagevec by a local_lock_t 2022-04-01 11:46:09 -07:00
swap.h swap: convert add_to_swap() to take a folio 2022-05-13 07:20:15 -07:00
swapfile.c swap: turn get_swap_page() into folio_alloc_swap() 2022-05-13 07:20:15 -07:00
truncate.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
usercopy.c Merge branch 'akpm' (patches from Andrew) 2022-03-22 16:11:53 -07:00
userfaultfd.c mm/uffd: enable write protection for shmem & hugetlbfs 2022-05-13 07:20:11 -07:00
util.c mm: create new mm/swap.h header file 2022-05-09 18:20:47 -07:00
vmacache.c
vmalloc.c mm/vmalloc: use raw_cpu_ptr() for vmap_block_queue access 2022-05-13 07:20:18 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c vmscan: remove remaining uses of page in shrink_page_list 2022-05-13 07:20:16 -07:00
vmstat.c mm/vmstat: add events for ksm cow 2022-04-28 23:16:16 -07:00
workingset.c memcg: sync flush only if periodic flush is delayed 2022-04-21 20:01:09 -07:00
z3fold.c mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle() 2022-04-28 23:16:06 -07:00
zbud.c
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c zsmalloc: replace get_cpu_var with local_lock 2022-01-22 08:33:37 +02:00
zswap.c mm: create new mm/swap.h header file 2022-05-09 18:20:47 -07:00