linux/mm
Jeongjun Park 818f916e3a mm: swap: prevent possible data-race in __try_to_reclaim_swap
A report [1] was uploaded from syzbot.

In the previous commit 862590ac37 ("mm: swap: allow cache reclaim to
skip slot cache"), the __try_to_reclaim_swap() function reads offset and
folio->entry from folio without folio_lock protection.

In the currently reported KCSAN log, it is assumed that the actual
data-race will not occur because the calltrace that does WRITE already
obtains the folio_lock and then writes.

However, the existing __try_to_reclaim_swap() function was already
implemented to perform reads under folio_lock protection [1], and there is
a risk of a data-race occurring through a function other than the one
shown in the KCSAN log.

Therefore, I think it is appropriate to change
read operations for folio to be performed under folio_lock.

[1]

==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap

write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
 __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
 delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
 folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
 free_swap_cache mm/swap_state.c:293 [inline]
 free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
 tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
 tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
 zap_pte_range mm/memory.c:1700 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 do_group_exit+0x102/0x150 kernel/exit.c:1088
 get_signal+0xf2a/0x1070 kernel/signal.c:2917
 arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
 do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
 __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
 free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
 zap_pte_range mm/memory.c:1656 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 __do_sys_exit kernel/exit.c:1055 [inline]
 __se_sys_exit kernel/exit.c:1053 [inline]
 __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
 x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x0000000000000242 -> 0x0000000000000000

Link: https://lkml.kernel.org/r/20241007070623.23340-1-aha310510@gmail.com
Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Fixes: 862590ac37 ("mm: swap: allow cache reclaim to skip slot cache")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-17 00:28:11 -07:00
..
damon mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets() 2024-10-17 00:28:08 -07:00
kasan Rust changes for v6.12 2024-09-25 10:25:40 -07:00
kfence mm: kfence: fix elapsed time for allocated/freed track 2024-09-26 14:01:44 -07:00
kmsan kmsan: do not pass NULL pointers as 0 2024-07-03 19:30:26 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma_debug.c
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.c mm/cma: add cma_{alloc,free}_folio() 2024-09-03 21:15:36 -07:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() 2024-09-03 21:15:47 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
debug.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
dmapool.c
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c mm/filemap: fix filemap_get_folios_contig THP panic 2024-09-26 14:01:43 -07:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup_test.c
gup_test.h
gup.c mm/hugetlb: simplify refs in memfd_alloc_folio 2024-09-26 14:01:44 -07:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() 2024-10-17 00:28:10 -07:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak 2024-09-26 14:01:43 -07:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig mm: make SPLIT_PTE_PTLOCKS depend on SMP 2024-09-26 14:01:43 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c mm: khugepaged: fix the incorrect statistics when collapsing large file folios 2024-10-17 00:28:10 -07:00
kmemleak.c mm/kmemleak: use IS_ERR_PCPU() for pointer in the percpu address space 2024-09-03 21:15:38 -07:00
ksm.c mm: remove PageSwapCache 2024-09-03 21:15:44 -07:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c
memblock.c memblock: updates for 6.12-rc1 2024-09-25 11:35:19 -07:00
memcontrol-v1.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
memcontrol-v1.h memcg: cleanup with !CONFIG_MEMCG_V1 2024-09-17 01:07:00 -07:00
memcontrol.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
memfd.c mm/hugetlb: simplify refs in memfd_alloc_folio 2024-09-26 14:01:44 -07:00
memory_hotplug.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
memory-failure.c mm: migrate: add isolate_folio_to_list() 2024-09-03 21:15:59 -07:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: don't install PMD mappings when THPs are disabled by the hw/process/vma 2024-10-17 00:28:10 -07:00
mempolicy.c mm,memcg: provide per-cgroup counters for NUMA balancing operations 2024-09-03 21:15:36 -07:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
migrate.c mm: migrate: annotate data-race in migrate_folio_unmap() 2024-09-26 14:01:44 -07:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c Random number generator updates for Linux 6.11-rc1. 2024-07-24 10:29:50 -07:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmap.c mm/mmap: correct error handling in mmap_region() 2024-10-17 00:28:05 -07:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm/mprotect: replace can_modify_mm with can_modify_vma 2024-09-03 21:15:41 -07:00
mremap.c mm/mremap: fix move_normal_pmd/retract_page_tables race 2024-10-17 00:28:07 -07:00
mseal.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
msync.c
nommu.c mm: remove follow_page() 2024-09-01 20:26:01 -07:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c memory: remove the now superfluous sentinel element from ctl_table array 2024-04-25 20:56:32 -07:00
page_alloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
page_isolation.c mm: remove migration for HugePage in isolate_single_pageblock() 2024-09-03 21:15:40 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma conditional on CONFIG_MEMORY_FAILURE 2024-05-05 17:53:45 -07:00
page-writeback.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
pagewalk.c mm/pagewalk: check pfnmap for folio_walk_start() 2024-09-17 01:06:58 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c percpu: remove pcpu_alloc_size() 2024-09-01 20:26:04 -07:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
rmap.c mm: introduce a pageflag for partially mapped folios 2024-09-09 16:39:04 -07:00
rodata_test.c
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-09 12:47:19 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
shmem.c mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() 2024-10-17 00:28:10 -07:00
show_mem.c mm/show_mem.c: report alloc tags in human readable units 2024-09-17 01:07:00 -07:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shuffle.c
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
slab.h mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
slub.c mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
sparse-vmemmap.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
sparse.c A set of X86 fixes: 2024-09-01 14:43:08 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios 2024-09-17 01:07:01 -07:00
swap.c mm: remove isolate_lru_page() 2024-09-09 16:38:59 -07:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swapfile.c mm: swap: prevent possible data-race in __try_to_reclaim_swap 2024-10-17 00:28:11 -07:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma_internal.h mm: remove duplicated include in vma_internal.h 2024-09-01 20:26:02 -07:00
vma.c mm/vma: return the exact errno in vms_gather_munmap_vmas() 2024-09-17 01:07:00 -07:00
vma.h mm: make vma_prepare() and friends static and internal to vma.c 2024-09-03 21:15:54 -07:00
vmalloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
vmpressure.c
vmscan.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
vmstat.c mm: split underused THPs 2024-09-09 16:39:04 -07:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
zswap.c mm: zswap: delete comments for "value" member of 'struct zswap_entry'. 2024-10-09 12:47:19 -07:00