linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 06:31:52 +00:00

History

Gang Li fc37bbb328 hugetlb: code clean for hugetlb_hstate_alloc_pages Patch series "hugetlb: parallelize hugetlb page init on boot", v6. Introduction ------------ Hugetlb initialization during boot takes up a considerable amount of time. For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel host takes more than 1 minute[1]. This is a noteworthy figure. Inspired by [2] and [3], hugetlb initialization can also be accelerated through parallelization. Kernel already has infrastructure like padata_do_multithreaded, this patch uses it to achieve effective results by minimal modifications. [1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/ [2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/ [3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/ [4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/ max_threads ----------- This patch use `padata_do_multithreaded` like this: ``` job.max_threads = num_node_state(N_MEMORY) * multiplier; padata_do_multithreaded(&job); ``` To fully utilize the CPU, the number of parallel threads needs to be carefully considered. `max_threads = num_node_state(N_MEMORY)` does not fully utilize the CPU, so we need to multiply it by a multiplier. Tests below indicate that a multiplier of 2 significantly improves performance, and although larger values also provide improvements, the gains are marginal. multiplier 1 2 3 4 5 ------------ ------- ------- ------- ------- ------- 256G 2node 358ms 215ms 157ms 134ms 126ms 2T 4node 979ms 679ms 543ms 489ms 481ms 50G 2node 71ms 44ms 37ms 30ms 31ms Therefore, choosing 2 as the multiplier strikes a good balance between enhancing parallel processing capabilities and maintaining efficient resource management. Test result ----------- test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% This patch (of 8): The readability of `hugetlb_hstate_alloc_pages` is poor. By cleaning the code, its readability can be improved, facilitating future modifications. This patch extracts two functions to reduce the complexity of `hugetlb_hstate_alloc_pages` and has no functional changes. - hugetlb_hstate_alloc_pages_node_specific() to handle iterates through each online node and performs allocation if necessary. - hugetlb_hstate_alloc_pages_report() report error during allocation. And the value of h->max_huge_pages is updated accordingly. Link: https://lkml.kernel.org/r/20240222140422.393911-1-gang.li@linux.dev Link: https://lkml.kernel.org/r/20240222140422.393911-2-gang.li@linux.dev Signed-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2024-03-06 13:04:17 -08:00
..
damon	mm: madvise: pageout: ignore references rather than clearing young	2024-03-04 17:01:18 -08:00
kasan	kasan: fix a2 allocation and remove explicit cast in atomic tests	2024-03-04 17:01:17 -08:00
kfence	KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal	2023-12-05 11:17:58 +01:00
kmsan	mm: kmsan: remove runtime checks from kmsan_unpoison_memory()	2024-02-22 10:24:41 -08:00
backing-dev.c	blk-wbt: Fix detection of dirty-throttled tasks	2024-02-06 09:44:03 -07:00
balloon_compaction.c
bootmem_info.c	bootmem: use kmemleak_free_part_phys in put_page_bootmem	2023-10-25 16:47:13 -07:00
cma_debug.c
cma_sysfs.c	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
cma.c	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
cma.h	mm/cma: add sysfs file 'release_pages_success'	2024-02-22 10:24:57 -08:00
compaction.c	mm/compaction: optimize >0 order folio compaction with free page split.	2024-02-23 17:48:33 -08:00
debug_page_alloc.c	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
debug_page_ref.c
debug_vm_pgtable.c	mm/debug_vm_pgtable: fix BUG_ON with pud advanced test	2024-02-23 17:27:13 -08:00
debug.c
dmapool_test.c
dmapool.c	mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs	2023-12-05 11:17:58 +01:00
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c	mm: support order-1 folios in the page cache	2024-03-04 17:01:19 -08:00
folio-compat.c	mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable	2023-12-29 11:58:27 -08:00
gup_test.c
gup_test.h
gup.c	mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte\|pmd]()	2023-12-29 11:58:56 -08:00
highmem.c	x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs	2023-12-29 12:22:28 -08:00
hmm.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
huge_memory.c	mm: use folio more widely in __split_huge_page	2024-03-04 17:01:27 -08:00
hugetlb_cgroup.c	mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER	2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c	mm: hugetlb_vmemmap: move mmap lock to vmemmap_remap_range()	2023-12-12 10:57:08 -08:00
hugetlb_vmemmap.h	mm: hugetlb_vmemmap: fix reference to nonexistent file	2023-10-25 16:47:14 -07:00
hugetlb.c	hugetlb: code clean for hugetlb_hstate_alloc_pages	2024-03-06 13:04:17 -08:00
hwpoison-inject.c
init-mm.c	mm: Deprecate pasid field	2023-12-12 10:11:32 +01:00
internal.h	mm: add alloc_contig_migrate_range allocation statistics	2024-03-04 17:01:27 -08:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed	2023-08-18 10:12:36 -07:00
Kconfig	Introduce cpu_dcache_is_aliasing() across all architectures	2024-02-22 15:27:19 -08:00
Kconfig.debug	mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile	2023-12-05 11:14:40 +01:00
khugepaged.c	mm: convert free_swap_cache() to take a folio	2024-03-04 17:01:26 -08:00
kmemleak.c	kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers	2023-12-12 10:57:07 -08:00
ksm.c	mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte\|pmd]()	2023-12-29 11:58:56 -08:00
list_lru.c	mm/zswap: stop lru list shrinking when encounter warm region	2024-02-22 10:24:54 -08:00
maccess.c
madvise.c	mm: madvise: pageout: ignore references rather than clearing young	2024-03-04 17:01:18 -08:00
Makefile	mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile	2023-12-05 11:14:40 +01:00
mapping_dirty_helpers.c	mm: fix clean_record_shared_mapping_range kernel-doc	2023-08-24 16:20:30 -07:00
memblock.c	mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array	2024-02-20 14:20:49 -08:00
memcontrol.c	memcg: remove mem_cgroup_uncharge_list()	2024-03-04 17:01:25 -08:00
memfd.c	mm/memfd: refactor memfd_tag_pins() and memfd_wait_for_pins()	2024-03-04 17:01:21 -08:00
memory_hotplug.c	mm/memory_hotplug: export mhp_supports_memmap_on_memory()	2024-02-22 10:24:40 -08:00
memory-failure.c	mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page	2024-02-07 21:20:34 -08:00
memory-tiers.c	mm/demotion: print demotion targets	2024-02-22 10:24:55 -08:00
memory.c	mm/memory.c: do_numa_page(): remove a redundant page table read	2024-03-04 17:01:27 -08:00
mempolicy.c	mm/mempolicy: protect task interleave functions with tsk->mems_allowed_seq	2024-02-22 10:24:47 -08:00
mempool.c	Many singleton patches against the MM code. The patch series which	2024-01-09 11:18:47 -08:00
memremap.c	mm: remove stale example from comment	2023-12-29 11:58:26 -08:00
memtest.c	mm: memtest: convert to memtest_report_meminfo()	2023-08-21 13:37:47 -07:00
migrate_device.c	mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte\|pmd]()	2023-12-29 11:58:56 -08:00
migrate.c	merge mm-hotfixes-stable into mm-nonmm-stable to pick up stackdepot changes	2024-02-23 17:28:43 -08:00
mincore.c	mm: enable page walking API to lock vmas during the walk	2023-08-21 13:07:20 -07:00
mlock.c	mm: make folios_put() the basis of release_pages()	2024-03-04 17:01:22 -08:00
mm_init.c	efi: disable mirror feature during crashkernel	2024-01-12 15:20:47 -08:00
mm_slot.h
mmap_lock.c
mmap.c	mm/mmap: pass vma to vma_merge()	2024-02-22 10:24:52 -08:00
mmu_gather.c	mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing	2024-02-22 15:27:17 -08:00
mmu_notifier.c	mmu_notifiers: rename invalidate_range notifier	2023-08-18 10:12:41 -07:00
mmzone.c	zswap: shrink zswap pool based on memory pressure	2023-12-12 10:57:02 -08:00
mprotect.c	mprotect: use pfn_swap_entry_folio	2024-02-21 16:00:03 -08:00
mremap.c	mm: abstract VMA merge and extend into vma_merge_extend() helper	2023-10-18 14:34:18 -07:00
msync.c
nommu.c	mm/vmalloc: remove vmap_area_list	2024-02-23 17:48:19 -08:00
oom_kill.c	mm: update mark_victim tracepoints fields	2024-03-04 17:01:16 -08:00
page_alloc.c	mm: add alloc_contig_migrate_range allocation statistics	2024-03-04 17:01:27 -08:00
page_counter.c
page_ext.c
page_idle.c
page_io.c	zswap: memcontrol: implement zswap writeback disabling	2023-12-29 20:22:11 -08:00
page_isolation.c	mm: add alloc_contig_migrate_range allocation statistics	2024-03-04 17:01:27 -08:00
page_owner.c	mm: page_owner: add support for splitting to any order in split page_owner	2024-03-04 17:01:20 -08:00
page_poison.c	mm/page_poison: replace kmap_atomic() with kmap_local_page()	2023-12-10 16:51:50 -08:00
page_reporting.c	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c	mm: convert page_table_check_pte_set() to page_table_check_ptes_set()	2023-08-24 16:20:18 -07:00
page_vma_mapped.c	mm: thp: introduce multi-size THP sysfs interface	2023-12-20 14:48:12 -08:00
page-writeback.c	writeback: remove a use of write_cache_pages() from do_writepages()	2024-02-23 17:48:38 -08:00
pagewalk.c	mm: pagewalk: assert write mmap lock only for walking the user page tables	2023-12-10 16:51:53 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c	mm: Introduce flush_cache_vmap_early()	2023-12-14 00:23:17 -08:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c	mm: fix process_vm_rw page counts	2023-12-10 16:51:39 -08:00
ptdump.c	mm: ptdump: add check_wx_pages debugfs attribute	2024-02-22 10:24:47 -08:00
readahead.c	mm: support order-1 folios in the page cache	2024-03-04 17:01:19 -08:00
rmap.c	rmap: replace two calls to compound_order with folio_order	2024-02-22 15:27:20 -08:00
rodata_test.c
secretmem.c	mm/secretmem: use a folio in secretmem_fault()	2023-08-21 13:38:02 -07:00
shmem_quota.c
shmem.c	shmem: properly report quota mount options	2024-02-23 17:48:34 -08:00
show_mem.c	mm, treewide: introduce NR_PAGE_ORDERS	2024-01-08 15:27:15 -08:00
shrinker_debug.c	mm: shrinker: convert shrinker_rwsem to mutex	2023-10-04 10:32:26 -07:00
shrinker.c	mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()	2024-01-05 09:58:32 -08:00
shuffle.c
shuffle.h	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
slab_common.c	slub: use a folio in __kmalloc_large_node	2024-01-05 10:17:46 -08:00
slab.h	mm/slab: move kmalloc() functions from slab_common.c to slub.c	2023-12-06 11:57:21 +01:00
slub.c	Many singleton patches against the MM code. The patch series which	2024-01-09 11:18:47 -08:00
sparse-vmemmap.c	mm/vmemmap: allow architectures to override how vmemmap optimization works	2023-08-18 10:12:53 -07:00
sparse.c	mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers	2024-02-21 16:00:01 -08:00
swap_cgroup.c
swap_slots.c	mm/zswap: invalidate zswap entry when swap entry free	2024-02-22 10:24:54 -08:00
swap_state.c	mm: convert free_swap_cache() to take a folio	2024-03-04 17:01:26 -08:00
swap.c	mm: allow non-hugetlb large folios to be batch processed	2024-03-04 17:01:24 -08:00
swap.h	mm/swap: fix race when skipping swapcache	2024-02-20 14:20:48 -08:00
swapfile.c	mm/swapfile:__swap_duplicate: drop redundant WRITE_ONCE on swap_map for err cases	2024-02-23 17:48:34 -08:00
truncate.c	fs: convert error_remove_page to error_remove_folio	2023-12-10 16:51:42 -08:00
usercopy.c
userfaultfd.c	userfaultfd: use per-vma locks in userfaultfd operations	2024-02-22 15:27:20 -08:00
util.c	mm/util.c: add byte count to __vm_enough_memory failure warning	2024-03-04 17:01:14 -08:00
vmalloc.c	mm: vmalloc: refactor vmalloc_dump_obj() function	2024-02-23 17:48:21 -08:00
vmpressure.c	eventfd: simplify eventfd_signal()	2023-11-28 14:08:38 +01:00
vmscan.c	mm: free folios directly in move_folios_to_lru()	2024-03-04 17:01:25 -08:00
vmstat.c	mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER	2024-01-08 15:27:15 -08:00
workingset.c	mm: ratelimit stat flush from workingset shrinker	2024-01-05 10:17:45 -08:00
z3fold.c	mm/z3fold: fix the comment for __encode_handle()	2024-02-23 17:48:31 -08:00
zbud.c
zpool.c
zsmalloc.c	mm/zsmalloc: don't need to reserve LSB in handle	2024-03-04 17:01:28 -08:00
zswap.c	mm/zswap: change zswap_pool kref to percpu_ref	2024-03-04 17:01:13 -08:00