linux/mm
Huang Ying eb085574a7 mm, swap: fix race between swapoff and some swap operations
When swapin is performed, after getting the swap entry information from
the page table, system will swap in the swap entry, without any lock held
to prevent the swap device from being swapoff.  This may cause the race
like below,

CPU 1				CPU 2
-----				-----
				do_swap_page
				  swapin_readahead
				    __read_swap_cache_async
swapoff				      swapcache_prepare
  p->swap_map = NULL		        __swap_duplicate
					  p->swap_map[?] /* !!! NULL pointer access */

Because swapoff is usually done when system shutdown only, the race may
not hit many people in practice.  But it is still a race need to be fixed.

To fix the race, get_swap_device() is added to check whether the specified
swap entry is valid in its swap device.  If so, it will keep the swap
entry valid via preventing the swap device from being swapoff, until
put_swap_device() is called.

Because swapoff() is very rare code path, to make the normal path runs as
fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
reference count is used to implement get/put_swap_device().  >From
get_swap_device() to put_swap_device(), RCU reader side is locked, so
synchronize_rcu() in swapoff() will wait until put_swap_device() is
called.

In addition to swap_map, cluster_info, etc.  data structure in the struct
swap_info_struct, the swap cache radix tree will be freed after swapoff,
so this patch fixes the race between swap cache looking up and swapoff
too.

Races between some other swap cache usages and swapoff are fixed too via
calling synchronize_rcu() between clearing PageSwapCache() and freeing
swap cache data structure.

Another possible method to fix this is to use preempt_off() +
stop_machine() to prevent the swap device from being swapoff when its data
structure is being accessed.  The overhead in hot-path of both methods is
similar.  The advantages of RCU based method are,

1. stop_machine() may disturb the normal execution code path on other
   CPUs.

2. File cache uses RCU to protect its radix tree.  If the similar
   mechanism is used for swap cache too, it is easier to share code
   between them.

3. RCU is used to protect swap cache in total_swapcache_pages() and
   exit_swap_address_space() already.  The two mechanisms can be
   merged to simplify the logic.

Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
Fixes: 235b621767 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Not-nacked-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 11:05:43 -07:00
..
kasan mm/kasan: change kasan_check_{read,write} to return boolean 2019-07-12 11:05:42 -07:00
backing-dev.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
balloon_compaction.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cleancache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98 2019-05-24 17:37:54 +02:00
cma.h
compaction.c mm, compaction: make sure we isolate a valid PFN 2019-06-01 15:51:32 -07:00
debug_page_ref.c
debug.c mm: update references to page _refcount 2019-05-14 19:52:47 -07:00
dmapool.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 403 2019-06-05 17:37:13 +02:00
early_ioremap.c
fadvise.c
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm/filemap.c: correct the comment about VM_FAULT_RETRY 2019-07-12 11:05:43 -07:00
frame_vector.c
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm/gup.c: make follow_page_mask() static 2019-07-12 11:05:42 -07:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/devm_memremap_pages: fix final page put race 2019-06-13 17:34:56 -10:00
huge_memory.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
hugetlb_cgroup.c
hugetlb.c mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge 2019-06-29 16:43:45 +08:00
hwpoison-inject.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
init-mm.c
internal.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig Linux 5.2-rc4 2019-06-14 14:18:53 -06:00
Kconfig.debug mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
khugepaged.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c mm/kmemleak.c: change error at _write when kmemleak is disabled 2019-07-12 11:05:42 -07:00
ksm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-13 17:34:56 -10:00
maccess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
madvise.c mm/mmu_notifier: use correct mmu_notifier events for each invalidation 2019-05-14 09:47:49 -07:00
Makefile mm: shuffle initial free memory to improve memory-side-cache utilization 2019-05-14 19:52:48 -07:00
memblock.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
memcontrol.c mm/memcontrol: fix wrong statistics in memory.stat 2019-07-12 11:05:40 -07:00
memfd.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
memory_hotplug.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
memory-failure.c Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2019-07-08 21:48:15 -07:00
memory.c mm, swap: fix race between swapoff and some swap operations 2019-07-12 11:05:43 -07:00
mempolicy.c mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask 2019-06-29 16:43:44 +08:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memtest.c
migrate.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-14 19:52:48 -07:00
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-06-13 17:34:56 -10:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmu_context.c
mmu_gather.c mm: mmu_gather: remove __tlb_reset_range() for force flush 2019-06-13 17:34:56 -10:00
mmu_notifier.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
mmzone.c
mprotect.c mm/mprotect.c: fix compilation warning because of unused 'mm' variable 2019-05-14 09:47:51 -07:00
mremap.c mm/mmu_notifier: contextual information for event triggering invalidation 2019-05-14 09:47:49 -07:00
msync.c
nommu.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
oom_kill.c mm/oom_kill.c: fix uninitialized oc->constraint 2019-06-29 16:43:45 +08:00
page_alloc.c mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
page_counter.c
page_ext.c mm, debug_pagealloc: use a page type instead of page_ext flag 2019-07-12 11:05:43 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c swap_readpage(): avoid blk_wake_io_task() if !synchronous 2019-07-05 11:12:07 +09:00
page_isolation.c mm/page_isolation.c: change the prototype of undo_isolate_page_range() 2019-07-12 11:05:43 -07:00
page_owner.c mm/page_owner: Simplify stack trace handling 2019-04-29 12:37:50 +02:00
page_poison.c page_poison: play nicely with KASAN 2019-03-05 21:07:13 -08:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-10-31 08:54:11 -07:00
page-writeback.c mm: remove the account_page_dirtied export 2019-07-12 11:05:42 -07:00
pagewalk.c
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
pgtable-generic.c
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
quicklist.c
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/rmap.c: use the pra.mapcount to do the check 2019-05-14 09:47:49 -07:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c Revert "mm: page cache: store only head pages in i_pages" 2019-07-05 19:55:18 -07:00
shuffle.c mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm/kasan: add object validation in ksize() 2019-07-12 11:05:42 -07:00
slab.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
slab.h mm/slab: sanity-check page type when looking up cache 2019-07-12 11:05:41 -07:00
slob.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
slub.c mm/slab: refactor common ksize KASAN logic into slab_common.c 2019-07-12 11:05:42 -07:00
sparse-vmemmap.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
sparse.c mm/sparse.c: clean up obsolete code comment 2019-05-14 09:47:48 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm, swap: fix race between swapoff and some swap operations 2019-07-12 11:05:43 -07:00
swap.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
swapfile.c mm, swap: fix race between swapoff and some swap operations 2019-07-12 11:05:43 -07:00
truncate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
usercopy.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c prctl_set_mm: downgrade mmap_sem to read lock 2019-06-01 15:51:31 -07:00
vmacache.c
vmalloc.c arm64 updates for 5.3: 2019-07-08 09:54:55 -07:00
vmpressure.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
vmscan.c mm: vmscan: scan anonymous pages on file refaults 2019-07-12 11:05:39 -07:00
vmstat.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
workingset.c mm: memcontrol: make cgroup stats and events query API explicitly local 2019-05-14 19:52:53 -07:00
z3fold.c mm/z3fold.c: lock z3fold page before __SetPageMovable() 2019-07-12 11:05:40 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zsmalloc.c mm/zsmalloc.c: fix fall-through annotation 2018-10-26 16:26:35 -07:00
zswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00