linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 01:51:34 +00:00

History

Hugh Dickins 033b5d7755 mm/khugepaged: fix filemap page_to_pgoff(page) != offset There have been elusive reports of filemap_fault() hitting its VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built with CONFIG_READ_ONLY_THP_FOR_FS=y. Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged without NUMA reuses the same huge page after collapse_file() failed (whereas NUMA targets its allocation to the respective node each time). And most of us were usually testing with CONFIG_NUMA=y kernels. collapse_file(old start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=old offset new_page->mapping = mapping xas_store(&xas, new_page) filemap_fault page = find_get_page(mapping, offset) // if offset falls inside hpage then // compound_head(page) == hpage lock_page_maybe_drop_mmap() __lock_page(page) // collapse fails xas_store(&xas, old page) new_page->mapping = NULL unlock_page(new_page) collapse_file(new start) new_page = khugepaged_alloc_page(hpage) __SetPageLocked(new_page) new_page->index = start // hpage->index=new offset new_page->mapping = mapping // mapping becomes valid again // since compound_head(page) == hpage // page_to_pgoff(page) got changed VM_BUG_ON_PAGE(page_to_pgoff(page) != offset) An initial patch replaced __SetPageLocked() by lock_page(), which did fix the race which Suren illustrates above. But testing showed that it's not good enough: if the racing task's __lock_page() gets delayed long after its find_get_page(), then it may follow collapse_file(new start)'s successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE. It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a check and retry (as is done for mapping), with similar relaxations in find_lock_entry() and pagecache_get_page(): but it's not obvious what else might get caught out; and khugepaged non-NUMA appears to be unique in exposing a page to page cache, then revoking, without going through a full cycle of freeing before reuse. Instead, non-NUMA khugepaged_prealloc_page() release the old page if anyone else has a reference to it (1% of cases when I tested). Although never reported on huge tmpfs, I believe its find_lock_entry() has been at similar risk; but huge tmpfs does not rely on khugepaged for its normal working nearly so much as READ_ONLY_THP_FOR_FS does. Reported-by: Denis Lisov <dennis.lissov@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569 Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org Reported-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com> Fixes: `87c460a0bd` ("mm/khugepaged: collapse_shmem() without freezing new_page") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org # v4.9+ Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2020-10-10 15:52:54 -07:00
..
kasan	Kbuild updates for v5.9	2020-08-09 14:10:26 -07:00
backing-dev.c	writeback: remove struct bdi_writeback_congested	2020-07-08 17:05:53 -06:00
balloon_compaction.c	mm/balloon_compaction: suppress allocation warnings	2019-09-04 07:42:01 -04:00
cleancache.c	Driver Core and debugfs changes for 5.3-rc1	2019-07-12 12:24:03 -07:00
cma_debug.c	debugfs: make sure we can remove u32_array files cleanly	2020-07-10 13:54:00 -07:00
cma.c	cma: don't quit at first error when activating reserved areas	2020-08-12 10:57:57 -07:00
cma.h	mm: cma: fix the name of CMA areas	2020-08-12 10:57:57 -07:00
compaction.c	mm: replace hpage_nr_pages with thp_nr_pages	2020-08-14 19:56:56 -07:00
debug_page_ref.c
debug_vm_pgtable.c	Documentation/mm: add descriptions for arch page table helpers	2020-08-07 11:33:23 -07:00
debug.c	mm, dump_page: do not crash with bad compound_mapcount()	2020-08-07 11:33:23 -07:00
dmapool.c	mm/dmapool.c: micro-optimisation remove unnecessary branch	2020-04-07 10:43:42 -07:00
early_ioremap.c	mm/early_ioremap.c: use %pa to print resource_size_t variables	2020-01-31 10:30:38 -08:00
fadvise.c	mm: return void from various readahead functions	2020-06-02 10:59:06 -07:00
failslab.c	mm/failslab.c: by default, do not fail allocations with direct reclaim only	2019-07-12 11:05:43 -07:00
filemap.c	io_uring-5.9-2020-10-02	2020-10-02 14:38:10 -07:00
frame_vector.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
frontswap.c	mm/frontswap: mark various intentional data races	2020-08-14 19:56:56 -07:00
gup_benchmark.c	mm/gup_benchmark: support pin_user_pages() and related calls	2020-04-02 09:35:27 -07:00
gup.c	mm: do not rely on mm == current->mm in __get_user_pages_locked	2020-09-28 09:21:50 -07:00
highmem.c	mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions	2019-12-10 10:12:55 +01:00
hmm.c	mm: do page fault accounting in handle_mm_fault	2020-08-12 10:58:02 -07:00
huge_memory.c	mm/thp: Split huge pmds/puds if they're pinned when fork()	2020-09-27 11:21:35 -07:00
hugetlb_cgroup.c	hugetlb_cgroup: convert comma to semicolon	2020-08-21 09:52:52 -07:00
hugetlb.c	mm/hugetlb: fix a race between hugetlb sysctl handlers	2020-09-05 12:14:30 -07:00
hwpoison-inject.c	mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops	2019-12-01 12:59:09 -08:00
init-mm.c	mmap locking API: add MMAP_LOCK_INITIALIZER	2020-06-09 09:39:14 -07:00
internal.h	mm: replace hpage_nr_pages with thp_nr_pages	2020-08-14 19:56:56 -07:00
interval_tree.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248	2019-06-19 17:09:08 +02:00
ioremap.c	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
Kconfig	mm/sparse: cleanup the code surrounding memory_present()	2020-08-07 11:33:27 -07:00
Kconfig.debug	treewide: replace '---help---' in Kconfig files with 'help'	2020-06-14 01:57:21 +09:00
khugepaged.c	mm/khugepaged: fix filemap page_to_pgoff(page) != offset	2020-10-10 15:52:54 -07:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	mm/kmemleak: silence KCSAN splats in checksum	2020-08-14 19:56:56 -07:00
ksm.c	ksm: reinstate memcg charge on copied pages	2020-09-19 13:13:38 -07:00
list_lru.c	mm/list_lru: fix a data race in list_lru_count_one	2020-08-14 19:56:57 -07:00
maccess.c	uaccess: add force_uaccess_{begin,end} helpers	2020-08-12 10:57:59 -07:00
madvise.c	mm: validate pmd after splitting	2020-09-26 10:48:08 -07:00
Makefile	mm: move lib/ioremap.c to mm/	2020-08-07 11:33:26 -07:00
mapping_dirty_helpers.c	mm/mapping_dirty_helpers: update huge page-table entry callbacks	2020-04-02 09:35:29 -07:00
memblock.c	mm/memblock: expose only miminal interface to add/walk physmem	2020-07-10 15:08:09 +02:00
memcontrol.c	mm: memcontrol: fix missing suffix of workingset_restore	2020-09-26 10:33:57 -07:00
memfd.c	mm: page cache: store only head pages in i_pages	2019-09-24 15:54:08 -07:00
memory_hotplug.c	mm: don't rely on system state to detect hot-plug operations	2020-09-26 10:33:57 -07:00
memory-failure.c	mm/migrate: introduce a standard migration target allocation function	2020-08-12 10:58:02 -07:00
memory.c	mm: avoid early COW write protect games during fork()	2020-10-08 10:11:32 -07:00
mempolicy.c	mm: replace hpage_nr_pages with thp_nr_pages	2020-08-14 19:56:56 -07:00
mempool.c	mm/mempool: fix a data race in mempool_free()	2020-08-14 19:56:57 -07:00
memremap.c	memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC	2020-09-04 09:59:59 +02:00
memtest.c
migrate.c	mm/migrate: correct thp migration stats	2020-09-26 10:33:57 -07:00
mincore.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
mlock.c	mlock: fix unevictable_pgs event counts on THP	2020-09-19 13:13:38 -07:00
mm_init.c	mm: adjust vm_committed_as_batch according to vm overcommit policy	2020-08-07 11:33:26 -07:00
mmap.c	mm: remove unnecessary wrapper function do_mmap_pgoff()	2020-08-07 11:33:27 -07:00
mmu_gather.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
mmu_notifier.c	mm: mmu_notifier: fix and extend kerneldoc	2020-08-12 10:57:57 -07:00
mmzone.c
mprotect.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
mremap.c	mm/mremap: start addresses are properly aligned	2020-08-07 11:33:27 -07:00
msync.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
nommu.c	mm/nommu.c: delete duplicated words	2020-08-12 10:57:58 -07:00
oom_kill.c	mm, oom: show process exiting information in __oom_kill_process()	2020-08-12 10:57:56 -07:00
page_alloc.c	mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs	2020-10-03 11:28:12 -07:00
page_counter.c	mm/page_counter: fix various data races at memsw	2020-08-14 19:56:57 -07:00
page_ext.c	mm/page_ext.c: drop pfn_present() check when onlining	2020-04-07 10:43:40 -07:00
page_idle.c	mm/page_idle.c: skip offline pages	2020-06-08 11:05:55 -07:00
page_io.c	mm/page_io: mark various intentional data races	2020-08-14 19:56:56 -07:00
page_isolation.c	mm/memory_hotplug: drain per-cpu pages again during memory offline	2020-09-19 13:13:39 -07:00
page_owner.c	mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention	2020-06-03 20:09:45 -07:00
page_poison.c	mm/page_poison.c: fix a typo in a comment	2019-09-24 15:54:08 -07:00
page_reporting.c	mm/page_reporting: add budget limit on how many pages can be reported per pass	2020-04-07 10:43:39 -07:00
page_reporting.h	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
page_vma_mapped.c	mm: replace hpage_nr_pages with thp_nr_pages	2020-08-14 19:56:56 -07:00
page-writeback.c	mm: remove vm_total_pages	2020-08-07 11:33:28 -07:00
pagewalk.c	mmap locking API: convert mmap_sem comments	2020-06-09 09:39:14 -07:00
percpu-internal.h	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-km.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-stats.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu-vm.c	mm: memcg/percpu: account percpu memory to memory cgroups	2020-08-12 10:57:55 -07:00
percpu.c	percpu: fix first chunk size calculation for populated bitmap	2020-09-17 17:34:39 +00:00
pgalloc-track.h	mm: move p?d_alloc_track to separate header file	2020-08-07 11:33:26 -07:00
pgtable-generic.c	mm: introduce include/linux/pgtable.h	2020-06-09 09:39:13 -07:00
process_vm_access.c	mm/gup: remove task_struct pointer for all gup code	2020-08-12 10:58:04 -07:00
ptdump.c	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
readahead.c	mm: use memalloc_nofs_save in readahead path	2020-06-02 10:59:07 -07:00
rmap.c	mm/rmap: fixup copying of soft dirty and uffd ptes	2020-09-05 12:14:30 -07:00
rodata_test.c	mm/rodata_test.c: fix missing function declaration	2020-08-21 09:52:53 -07:00
shmem.c	tmpfs: restore functionality of nr_inodes=0	2020-09-19 13:13:38 -07:00
shuffle.c	mm/shuffle: remove dynamic reconfiguration	2020-08-07 11:33:29 -07:00
shuffle.h	mm/shuffle: remove dynamic reconfiguration	2020-08-07 11:33:29 -07:00
slab_common.c	mm/slab_common.c: delete duplicated word	2020-08-12 10:57:58 -07:00
slab.c	mm: slab: fix potential double free in ___cache_free	2020-09-26 10:15:01 -07:00
slab.h	mm: slab: rename (un)charge_slab_page() to (un)account_slab_page()	2020-08-07 11:33:25 -07:00
slob.c	mm: memcg: convert vmstat slab counters to bytes	2020-08-07 11:33:24 -07:00
slub.c	mm, slub: restore initial kmem_cache flags	2020-10-03 11:28:12 -07:00
sparse-vmemmap.c	mm/sparse: only sub-section aligned range would be populated	2020-08-07 11:33:27 -07:00
sparse.c	mm/sparse: cleanup the code surrounding memory_present()	2020-08-07 11:33:27 -07:00
swap_cgroup.c	mm: memcontrol: make swap tracking an integral part of memory control	2020-06-03 20:09:48 -07:00
swap_slots.c	mm/swap_slots.c: remove redundant check for swap_slot_cache_initialized	2020-08-07 11:33:24 -07:00
swap_state.c	mm/swap_state: mark various intentional data races	2020-08-14 19:56:57 -07:00
swap.c	mlock: fix unevictable_pgs event counts on THP	2020-09-19 13:13:38 -07:00
swapfile.c	mm, THP, swap: fix allocating cluster for swapfile by mistake	2020-09-26 10:33:57 -07:00
truncate.c	mm/thp: allow dropping THP from page cache	2019-10-19 06:32:33 -04:00
usercopy.c	mm/usercopy.c: delete duplicated word	2020-08-12 10:57:58 -07:00
userfaultfd.c	mm/vmscan: protect the workingset on anonymous LRU	2020-08-12 10:57:55 -07:00
util.c	mm: remove unnecessary wrapper function do_mmap_pgoff()	2020-08-07 11:33:27 -07:00
vmacache.c	kernel: better document the use_mm/unuse_mm API contract	2020-06-10 19:14:18 -07:00
vmalloc.c	mm/vunmap: add cond_resched() in vunmap_pmd_range	2020-08-21 09:52:53 -07:00
vmpressure.c	mm: vmpressure: use mem_cgroup_is_root API	2020-04-02 09:35:31 -07:00
vmscan.c	mm: fix check_move_unevictable_pages() on THP	2020-09-19 13:13:38 -07:00
vmstat.c	Merge branch 'simplify-do_wp_page'	2020-09-04 09:31:54 -07:00
workingset.c	mm: replace hpage_nr_pages with thp_nr_pages	2020-08-14 19:56:56 -07:00
z3fold.c	mm/z3fold: silence kmemleak false positives of slots	2020-05-28 11:35:40 -07:00
zbud.c	mm: use false for bool variable	2020-06-04 19:06:24 -07:00
zpool.c	mm/zpool.c: delete duplicated word and fix grammar	2020-08-12 10:57:58 -07:00
zsmalloc.c	mm/zsmalloc.c: fix duplicated words	2020-08-12 10:57:58 -07:00
zswap.c	mm/zswap: allow setting default status, compressor and allocator in Kconfig	2020-04-07 10:43:41 -07:00