linux/include
Baolin Wang fac35ba763 mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte().  However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.

For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.

Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().

To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.

Mike said:

Support for CONT_PMD/_PTE was added with bb9dd3df8e ("arm64: hugetlb:
refactor find_num_contig()").  Patch series "Support for contiguous pte
hugepages", v4.  However, I do not believe these code paths were
executed until migration support was added with 5480280d3f ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f for the Fixes: targe.

Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-11 19:05:44 -07:00
..
acpi Merge branch 'acpi-properties' 2022-08-11 19:21:03 +02:00
asm-generic Seventeen hotfixes. Mostly memory management things. Ten patches are 2022-08-28 14:49:59 -07:00
clocksource - Add the missing DT bindings for the MTU nomadik timer (Linus 2022-07-28 12:33:34 +02:00
crypto for-5.20/block-2022-08-04 2022-08-04 20:00:14 -07:00
drm
dt-bindings power supply and reset changes for the v6.0 series 2022-08-12 09:37:33 -07:00
keys
kunit
kvm
linux mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page 2022-10-11 19:05:44 -07:00
math-emu
media SPDX changes for 6.0-rc1 2022-08-04 12:12:54 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf 2022-08-24 19:18:10 -07:00
pcmcia
ras mm, hwpoison: enable memory error handling on 1GB hugepage 2022-08-08 18:06:44 -07:00
rdma dma-mapping updates 2022-08-06 10:56:45 -07:00
rv Documentation/rv: Add deterministic automata monitor synthesis documentation 2022-07-30 14:01:29 -04:00
scsi SCSI misc on 20220813 2022-08-13 13:41:48 -07:00
soc net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset 2022-08-17 21:58:32 -07:00
sound ASoC: More updates for v5.20 2022-08-01 15:26:40 +02:00
target scsi: target: core: De-RCU of se_lun and se_lun acl 2022-08-01 19:36:02 -04:00
trace * Xen timer fixes 2022-08-11 12:10:08 -07:00
uapi io_uring-6.0-2022-08-26 2022-08-26 11:01:52 -07:00
ufs scsi: ufs: core: Enable link lost interrupt 2022-08-11 22:04:32 -04:00
vdso
video
xen x86/xen: Add support for HVMOP_set_evtchn_upcall_vector 2022-08-12 11:28:21 +02:00