A mirror of the official Linux kernel repository just in case
Go to file
Baolin Wang 42d0c3fbb5 mm: hugetlb: make the hugetlb migration strategy consistent
As discussed in previous thread [1], there is an inconsistency when
handing hugetlb migration.  When handling the migration of freed hugetlb,
it prevents fallback to other NUMA nodes in
alloc_and_dissolve_hugetlb_folio().  However, when dealing with in-use
hugetlb, it allows fallback to other NUMA nodes in
alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
and might result in unexpected failures when node bound workloads doesn't
get what is asssumed available.

To make hugetlb migration strategy more clear, we should list all the scenarios
of hugetlb migration and analyze whether allocation fallback is permitted:

1) Memory offline: will call dissolve_free_huge_pages() to free the
   freed hugetlb, and call do_migrate_range() to migrate the in-use
   hugetlb.  Both can break the per-node hugetlb pool, but as this is an
   explicit offlining operation, no better choice.  So should allow the
   hugetlb allocation fallback.

2) Memory failure: same as memory offline.  Should allow fallback to a
   different node might be the only option to handle it, otherwise the
   impact of poisoned memory can be amplified.

3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to
   migrate in-use and not-longterm-pinnable hugetlb, which can break the
   per-node pool.  But we should fail to longterm pinning if can not
   allocate on current node to avoid breaking the per-node pool.

4) Syscalls (mbind, migrate_pages, move_pages): these are explicit
   users operation to move pages to other nodes, so fallback to other
   nodes should not be prohibited.

5) alloc_contig_range: used by CMA allocation and virtio-mem
   fake-offline to allocate given range of pages.  Now the freed hugetlb
   migration is not allowed to fallback, to keep consistency, the in-use
   hugetlb migration should be also not allowed to fallback.

6) alloc_contig_pages: used by kfence, pgtable_debug etc.  The strategy
   should be consistent with that of alloc_contig_range().

Based on the analysis of the various scenarios above, introducing a new
helper to determine whether fallback is permitted according to the
migration reason..

[1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
Link: https://lkml.kernel.org/r/3519fcd41522817307a05b40fb551e2e17e68101.1709719720.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:06 -07:00
arch mm: vmalloc: enable memory allocation profiling 2024-04-25 20:55:57 -07:00
block block-6.9-20240412 2024-04-12 10:22:33 -07:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto This push fixes a regression that broke iwd as well as a divide by 2024-03-25 10:48:23 -07:00
Documentation memprofiling: documentation 2024-04-25 20:55:58 -07:00
drivers mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
fs mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
include mm: hugetlb: make the hugetlb migration strategy consistent 2024-04-25 20:56:06 -07:00
init mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
io_uring io_uring/net: restore msg_control on sendzc retry 2024-04-08 21:48:41 -06:00
ipc sysctl changes for v6.9-rc1 2024-03-18 14:59:13 -07:00
kernel mm: free up PG_slab 2024-04-25 20:56:00 -07:00
lib alloc_tag: Tighten file permissions on /proc/allocinfo 2024-04-25 20:55:59 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: hugetlb: make the hugetlb migration strategy consistent 2024-04-25 20:56:06 -07:00
net mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
rust rust: add a rust helper for krealloc() 2024-04-25 20:55:55 -07:00
samples Tracing updates for 6.9: 2024-03-18 15:11:44 -07:00
scripts lib: add allocation tagging support for memory allocation profiling 2024-04-25 20:55:52 -07:00
security security: Place security_path_post_mknod() where the original IMA call was 2024-04-03 10:21:32 -07:00
sound fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
tools selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation 2024-04-25 20:56:02 -07:00
usr Kbuild updates for v6.8 2024-01-18 17:57:07 -08:00
virt KVM Xen and pfncache changes for 6.9: 2024-03-11 10:42:55 -04:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: create a list of all built DTB files 2024-02-19 18:20:39 +09:00
.mailmap MAINTAINERS: update Naoya Horiguchi's email address 2024-04-16 15:39:51 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer 2024-03-27 13:41:02 -05:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add entries for code tagging and memory allocation profiling 2024-04-25 20:55:58 -07:00
Makefile Linux 6.9-rc4 2024-04-14 13:38:39 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.