When creating hugetlb pages, the hugetlb code must first allocate
contiguous pages from a low level allocator such as buddy, cma or
memblock. The pages returned from these low level allocators are ref
counted. This creates potential issues with other code taking speculative
references on these pages before they can be transformed to a hugetlb
page. This issue has been addressed with methods and code such as that
provided in [1].
Recent discussions about vmemmap freeing [2] have indicated that it would
be beneficial to freeze all sub pages, including the head page of pages
returned from low level allocators before converting to a hugetlb page.
This helps avoid races if we want to replace the page containing vmemmap
for the head page.
There have been proposals to change at least the buddy allocator to return
frozen pages as described at [3]. If such a change is made, it can be
employed by the hugetlb code. However, as mentioned above hugetlb uses
several low level allocators so each would need to be modified to return
frozen pages. For now, we can manually freeze the returned pages. This
is done in two places:
1) alloc_buddy_huge_page, only the returned head page is ref counted.
We freeze the head page, retrying once in the VERY rare case where
there may be an inflated ref count.
2) prep_compound_gigantic_page, for gigantic pages the current code
freezes all pages except the head page. New code will simply freeze
the head page as well.
In a few other places, code checks for inflated ref counts on newly
allocated hugetlb pages. With the modifications to freeze after
allocating, this code can be removed.
After hugetlb pages are freshly allocated, they are often added to the
hugetlb free lists. Since these pages were previously ref counted, this
was done via put_page() which would end up calling the hugetlb destructor:
free_huge_page. With changes to freeze pages, we simply call
free_huge_page directly to add the pages to the free list.
In a few other places, freshly allocated hugetlb pages were immediately
put into use, and the expectation was they were already ref counted. In
these cases, we must manually ref count the page.
[1] https://lore.kernel.org/linux-mm/20210622021423.154662-3-mike.kravetz@oracle.com/
[2] https://lore.kernel.org/linux-mm/20220802180309.19340-1-joao.m.martins@oracle.com/
[3] https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@infradead.org/
[mike.kravetz@oracle.com: fix NULL pointer dereference]
Link: https://lkml.kernel.org/r/20220921202702.106069-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220916214638.155744-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>