linux/include
Baolin Wang 42d0c3fbb5 mm: hugetlb: make the hugetlb migration strategy consistent
As discussed in previous thread [1], there is an inconsistency when
handing hugetlb migration.  When handling the migration of freed hugetlb,
it prevents fallback to other NUMA nodes in
alloc_and_dissolve_hugetlb_folio().  However, when dealing with in-use
hugetlb, it allows fallback to other NUMA nodes in
alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
and might result in unexpected failures when node bound workloads doesn't
get what is asssumed available.

To make hugetlb migration strategy more clear, we should list all the scenarios
of hugetlb migration and analyze whether allocation fallback is permitted:

1) Memory offline: will call dissolve_free_huge_pages() to free the
   freed hugetlb, and call do_migrate_range() to migrate the in-use
   hugetlb.  Both can break the per-node hugetlb pool, but as this is an
   explicit offlining operation, no better choice.  So should allow the
   hugetlb allocation fallback.

2) Memory failure: same as memory offline.  Should allow fallback to a
   different node might be the only option to handle it, otherwise the
   impact of poisoned memory can be amplified.

3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to
   migrate in-use and not-longterm-pinnable hugetlb, which can break the
   per-node pool.  But we should fail to longterm pinning if can not
   allocate on current node to avoid breaking the per-node pool.

4) Syscalls (mbind, migrate_pages, move_pages): these are explicit
   users operation to move pages to other nodes, so fallback to other
   nodes should not be prohibited.

5) alloc_contig_range: used by CMA allocation and virtio-mem
   fake-offline to allocate given range of pages.  Now the freed hugetlb
   migration is not allowed to fallback, to keep consistency, the in-use
   hugetlb migration should be also not allowed to fallback.

6) alloc_contig_pages: used by kfence, pgtable_debug etc.  The strategy
   should be consistent with that of alloc_contig_range().

Based on the analysis of the various scenarios above, introducing a new
helper to determine whether fallback is permitted according to the
migration reason..

[1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
Link: https://lkml.kernel.org/r/3519fcd41522817307a05b40fb551e2e17e68101.1709719720.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:06 -07:00
..
acpi mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
asm-generic mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
clocksource
crypto mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
drm drm fixes for 6.9-rc1 2024-03-21 19:04:31 -07:00
dt-bindings Char/Misc and other driver subsystem updates for 6.9-rc1 2024-03-21 13:21:31 -07:00
keys
kunit
kvm KVM: arm64: Fix host-programmed guest events in nVHE 2024-03-26 01:51:44 -07:00
linux mm: hugetlb: make the hugetlb migration strategy consistent 2024-04-25 20:56:06 -07:00
math-emu
media media updates for v6.9-rc1 2024-03-15 11:36:54 -07:00
memory
misc
net mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
pcmcia
ras PCI/AER: Generalize TLP Header Log reading 2024-03-08 15:26:46 -06:00
rdma fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
rv
scsi scsi: sd: Fix TCG OPAL unlock on system resume 2024-03-25 15:46:12 -04:00
soc Char/Misc and other driver subsystem updates for 6.9-rc1 2024-03-21 13:21:31 -07:00
sound ASoC: Fixes for v6.9 2024-04-05 08:48:12 +02:00
target
trace mm: free up PG_slab 2024-04-25 20:56:00 -07:00
uapi vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE 2024-04-08 04:11:04 -04:00
ufs scsi: ufs: core: Add config_scsi_dev vops comment 2024-03-10 18:10:24 -04:00
vdso vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.h 2024-04-03 21:50:04 +02:00
video
xen