linux/mm
Mike Kravetz e95a985178 hugetlb: skip to end of PT page mapping when pte not present
Patch series "hugetlb: speed up linear address scanning", v2.

At unmap, fork and remap time hugetlb address ranges are linearly scanned.
We can optimize these scans if the ranges are sparsely populated.

Also, enable page table "Lazy copy" for hugetlb at fork.

NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB need to
add an arch specific version hugetlb_mask_last_page() to take advantage of
sparse address scanning improvements.  Baolin Wang added the routine for
arm64.  Other architectures which could be optimized are: ia64, mips,
parisc, powerpc, s390, sh and sparc.


This patch (of 4):

HugeTLB address ranges are linearly scanned during fork, unmap and remap
operations.  If a non-present entry is encountered, the code currently
continues to the next huge page aligned address.  However, a non-present
entry implies that the page table page for that entry is not present. 
Therefore, the linear scan can skip to the end of range mapped by the page
table page.  This can speed operations on large sparsely populated hugetlb
mappings.

Create a new routine hugetlb_mask_last_page() that will return an address
mask.  When the mask is ORed with an address, the result will be the
address of the last huge page mapped by the associated page table page. 
Use this mask to update addresses in routines which linearly scan hugetlb
address ranges when a non-present pte is encountered.

hugetlb_mask_last_page is related to the implementation of huge_pte_offset
as hugetlb_mask_last_page is called when huge_pte_offset returns NULL. 
This patch only provides a complete hugetlb_mask_last_page implementation
when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.  Architectures which
provide their own versions of huge_pte_offset can also provide their own
version of hugetlb_mask_last_page.

Link: https://lkml.kernel.org/r/20220621235620.291305-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220621235620.291305-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: James Houghton <jthoughton@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:14:34 -07:00
..
damon mm/damon/lru_sort: fix potential memory leak in damon_lru_sort_init() 2022-07-17 17:14:27 -07:00
kasan kasan: separate double free case from invalid free 2022-07-17 17:14:33 -07:00
kfence mm/kfence: select random number before taking raw lock 2022-06-16 19:11:31 -07:00
backing-dev.c init: Initialize noop_backing_dev_info early 2022-06-16 10:55:57 +02:00
balloon_compaction.c mm/balloon_compaction: make balloon page compaction callbacks static 2022-03-28 16:52:57 -04:00
bootmem_info.c bootmem: Use page->index instead of page->freelist 2022-01-06 12:27:03 +01:00
cma_debug.c
cma_sysfs.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-05-13 15:11:26 -07:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c mm, docs: fix comments that mention mem_hotplug_end() 2022-07-03 18:08:50 -07:00
debug_page_ref.c
debug_vm_pgtable.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
debug.c mm: unexport page_init_poison 2022-03-24 19:06:45 -07:00
dmapool.c mm/dmapool.c: revert "make dma pool to use kmalloc_node" 2022-01-15 16:30:28 +02:00
early_ioremap.c mm/early_ioremap: declare early_memremap_pgprot_adjust() 2022-03-22 15:57:11 -07:00
fadvise.c riscv: compat: syscall: Add compat_sys_call_table implementation 2022-04-26 13:36:25 -07:00
failslab.c mm: fix missing handler for __GFP_NOWARN 2022-05-19 14:08:55 -07:00
filemap.c filemap: Handle sibling entries in filemap_get_read_batch() 2022-06-20 16:37:45 -04:00
folio-compat.c fs: Remove aop flags parameter from grab_cache_page_write_begin() 2022-05-08 14:28:19 -04:00
frontswap.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
gup_test.c mm: rename is_pinnable_page() to is_longterm_pinnable_page() 2022-07-17 17:14:27 -07:00
gup_test.h
gup.c mm/gup: migrate device coherent pages when pinning instead of failing 2022-07-17 17:14:28 -07:00
highmem.c highmem: fix checks in __kmap_local_sched_{in,out} 2022-04-08 14:20:36 -10:00
hmm.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
huge_memory.c mm: khugepaged: reorg some khugepaged helpers 2022-07-17 17:14:33 -07:00
hugetlb_cgroup.c hugetlb: add hugetlb.*.numa_stat file 2022-01-15 16:30:29 +02:00
hugetlb_vmemmap.c mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory 2022-07-03 18:08:49 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* 2022-04-28 23:16:15 -07:00
hugetlb.c hugetlb: skip to end of PT page mapping when pte not present 2022-07-17 17:14:34 -07:00
hwpoison-inject.c mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
init-mm.c kernel/fork: Initialize mm's PASID 2022-02-14 19:51:47 +01:00
internal.h mm/gup: migrate device coherent pages when pinning instead of failing 2022-07-17 17:14:28 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
Kconfig.debug Two followon fixes for the post-5.19 series "Use pageblock_order for cma 2022-05-27 11:40:49 -07:00
khugepaged.c mm: khugepaged: reorg some khugepaged helpers 2022-07-17 17:14:33 -07:00
kmemleak.c mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan() 2022-06-16 19:48:32 -07:00
ksm.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
madvise.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
Makefile mm: shrinkers: introduce debugfs interface for memory shrinkers 2022-07-03 18:08:40 -07:00
mapping_dirty_helpers.c mm: move tlb_flush_pending inline helpers to mm_inline.h 2022-01-15 16:30:27 +02:00
memblock.c mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys() 2022-06-16 19:48:30 -07:00
memcontrol.c mm: add zone device coherent type memory support 2022-07-17 17:14:27 -07:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-05 11:08:32 -08:00
memory_hotplug.c mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory 2022-07-03 18:08:49 -07:00
memory-failure.c mm: introduce mf_dax_kill_procs() for fsdax case 2022-07-17 17:14:30 -07:00
memory.c mm: thp: kill __transhuge_page_enabled() 2022-07-17 17:14:33 -07:00
mempolicy.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mempool.c mm/mempool: use might_alloc() 2022-06-16 19:48:30 -07:00
memremap.c mm: add zone device coherent type memory support 2022-07-17 17:14:27 -07:00
memtest.c
migrate_device.c mm/gup: migrate device coherent pages when pinning instead of failing 2022-07-17 17:14:28 -07:00
migrate.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mincore.c mm: teach core mm about pte markers 2022-05-13 07:20:09 -07:00
mlock.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mm_init.c
mmap_lock.c
mmap.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
mmu_gather.c mm/mmu_gather: limit free batch count and add schedule point in tlb_batch_pages_flush 2022-04-28 23:16:12 -07:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c Folio changes for 5.18 2022-03-22 17:03:12 -07:00
mprotect.c mm: handling Non-LRU pages returned by vm_normal_pages 2022-07-17 17:14:28 -07:00
mremap.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
msync.c
nommu.c no-MMU: expose vmalloc_huge() for alloc_large_system_hash() 2022-04-25 10:11:49 -07:00
oom_kill.c mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery 2022-06-01 15:57:16 -07:00
page_alloc.c mm/page_alloc: make the annotations of available memory more accurate 2022-07-03 18:08:50 -07:00
page_counter.c mm/page_counter: remove an incorrect call to propagate_protected_usage() 2022-01-15 16:30:27 +02:00
page_ext.c mm: use for_each_online_node and node_online instead of open coding 2022-04-29 14:36:58 -07:00
page_idle.c mm: don't be stuck to rmap lock on reclaim path 2022-05-19 14:08:54 -07:00
page_io.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_isolation.c mm/page_isolation.c: fix one kernel-doc comment 2022-06-16 19:11:30 -07:00
page_owner.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c Six hotfixes. One from Miaohe Lin is considered a minor thing so it isn't 2022-05-27 11:29:35 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable() 2022-07-03 18:08:37 -07:00
page-writeback.c sysctl changes for v5.19-rc1 2022-05-26 16:57:20 -07:00
pagewalk.c
percpu-internal.h percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c percpu: improve percpu_alloc_percpu event trace 2022-05-13 07:20:18 -07:00
pgalloc-track.h
pgtable-generic.c mm: avoid unnecessary flush on change_huge_pmd() 2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c mm: sparsemem: use page table lock to protect kernel pmd operations 2022-03-22 15:57:08 -07:00
readahead.c filemap: Fix serialization adding transparent huge pages to page cache 2022-06-23 12:22:00 -04:00
rmap.c mm: add zone device coherent type memory support 2022-07-17 17:14:27 -07:00
rodata_test.c
secretmem.c secretmem: Convert to free_folio 2022-05-09 23:12:53 -04:00
shmem.c mm/swap: convert delete_from_swap_cache() to take a folio 2022-07-03 18:08:48 -07:00
shrinker_debug.c mm: shrinkers: add scan interface for shrinker debugfs 2022-07-03 18:08:41 -07:00
shuffle.c
shuffle.h
slab_common.c Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
slab.c mm/slab: delete cache_alloc_debugcheck_before() 2022-06-16 19:48:29 -07:00
slab.h slab changes for 5.19 2022-05-25 10:24:04 -07:00
slob.c mm: make minimum slab alignment a runtime property 2022-05-13 07:20:07 -07:00
slub.c mm/slub: add missing TID updates on slab deactivation 2022-06-13 17:41:36 +02:00
sparse-vmemmap.c mm: sparsemem: drop unexpected word 'a' in comments 2022-07-03 18:08:50 -07:00
sparse.c mm: memory_hotplug: enumerate all supported section flags 2022-07-03 18:08:49 -07:00
swap_cgroup.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
swap_slots.c mm/swap: remove buggy cache->nr check in refill_swap_slots_cache 2022-05-19 14:08:51 -07:00
swap_state.c mm/swap: convert __delete_from_swap_cache() to a folio 2022-07-03 18:08:48 -07:00
swap.c mm: convert destroy_compound_page() to destroy_large_folio() 2022-07-03 18:08:48 -07:00
swap.h mm/khugepaged: try to free transhuge swapcache when possible 2022-07-03 18:08:52 -07:00
swapfile.c mm/swap: convert delete_from_swap_cache() to take a folio 2022-07-03 18:08:48 -07:00
truncate.c Filesystem folio changes for 5.18 2022-03-22 18:26:56 -07:00
usercopy.c usercopy: Make usercopy resilient against ridiculously large copies 2022-06-13 09:54:52 -07:00
userfaultfd.c mm/uffd: enable write protection for shmem & hugetlbfs 2022-05-13 07:20:11 -07:00
util.c docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
vmacache.c
vmalloc.c mm/vmalloc: extend __find_vmap_area() with one more argument 2022-07-03 18:08:41 -07:00
vmpressure.c mm/vmpressure: fix data-race with memcg->socket_pressure 2021-11-06 13:30:40 -07:00
vmscan.c mm, docs: fix comments that mention mem_hotplug_end() 2022-07-03 18:08:50 -07:00
vmstat.c Bitmap patches for 5.19-rc1 2022-06-04 14:04:27 -07:00
workingset.c mm: shrinkers: provide shrinkers with names 2022-07-03 18:08:40 -07:00
z3fold.c mm/z3fold: fix z3fold_page_migrate races with z3fold_map 2022-05-27 09:33:44 -07:00
zbud.c
zpool.c zpool: remove the list of pools_head 2022-01-15 16:30:31 +02:00
zsmalloc.c mm: shrinkers: provide shrinkers with names 2022-07-03 18:08:40 -07:00
zswap.c zswap: memcg accounting 2022-05-19 14:08:53 -07:00