linux/mm
Linus Torvalds c2407cf7d2 mm: make wait_on_page_writeback() wait for multiple pending writebacks
Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-05 11:33:00 -08:00
..
kasan kasan: fix null pointer dereference in kasan_record_aux_stack 2020-12-29 15:36:49 -08:00
backing-dev.c mm:backing-dev: use sysfs_emit in macro defining functions 2020-12-15 12:13:47 -08:00
balloon_compaction.c
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c mm: cma: improve pr_debug log in cma_release() 2020-12-15 12:13:46 -08:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: avoid doing memory allocation with pgtable_t mapped. 2020-10-16 11:11:14 -07:00
debug.c mm: memcontrol: Use helpers to read page's memcg data 2020-12-02 18:28:05 -08:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fix infinite loop in generic_file_buffered_read() 2020-12-18 13:37:04 -08:00
frame_vector.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
gup.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
highmem.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
hugetlb_cgroup.c hugetlb_cgroup: fix offline of hugetlb cgroup with reservations 2020-12-06 10:19:07 -08:00
hugetlb.c mm/hugetlb: fix deadlock in hugetlb_cow error path 2020-12-29 15:36:49 -08:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/Kconfig: fix spelling mistake "whats" -> "what's" 2020-12-19 11:25:41 -08:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c mm: cleanup kstrto*() usage 2020-12-15 12:13:47 -08:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm,memory_failure: always pin the page in madvise_inject_error 2020-12-15 12:13:44 -08:00
Makefile mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: enhance the kernel-doc markups 2020-12-15 12:13:41 -08:00
memblock.c memblock: debug enhancements 2020-12-16 14:44:53 -08:00
memcontrol.c mm/memcontrol:rewrite mem_cgroup_page_lruvec() 2020-12-19 11:18:37 -08:00
memfd.c
memory_hotplug.c mm: memmap defer init doesn't work as expected 2020-12-29 15:36:49 -08:00
memory-failure.c mm,hwpoison: return -EBUSY when migration fails 2020-12-15 12:13:44 -08:00
memory.c mm: generalise COW SMC TLB flushing race comment 2020-12-29 15:36:49 -08:00
mempolicy.c mm: migrate: clean up migrate_prep{_local} 2020-12-15 12:13:45 -08:00
mempool.c kasan, mm: rename kasan_poison_kfree 2020-12-22 12:55:09 -08:00
memremap.c mm/mremap_pages: fix static key devmap_managed_key updates 2020-11-02 12:14:18 -08:00
memtest.c
migrate.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mm/lru: introduce relock_page_lruvec() 2020-12-15 14:48:04 -08:00
mm_init.c mm: fix fall-through warnings for Clang 2020-12-15 12:13:47 -08:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmap.c UAPI Changes: 2020-12-18 12:38:28 -08:00
mmu_gather.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
mmu_notifier.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: Add 'mprotect' hook to struct vm_operations_struct 2020-11-17 14:36:14 +01:00
mremap.c mm/mremap.c: fix extent calculation 2020-12-29 15:36:49 -08:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm: cleanup: remove unused tsk arg from __access_remote_vm 2020-12-15 12:13:40 -08:00
oom_kill.c mm/oom_kill: change comment and rename is_dump_unreclaim_slabs() 2020-12-15 12:13:45 -08:00
page_alloc.c mm: memmap defer init doesn't work as expected 2020-12-29 15:36:49 -08:00
page_counter.c mm/page_counter: use page_counter_read in page_counter_set_max 2020-12-15 12:13:40 -08:00
page_ext.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c mm: memcontrol: Use helpers to read page's memcg data 2020-12-02 18:28:05 -08:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm/page_owner: record timestamp and pid 2020-12-15 12:13:38 -08:00
page_poison.c kasan, mm: reset tags when accessing metadata 2020-12-22 12:55:08 -08:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-05 11:33:00 -08:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: convert flexible array initializers to use struct_size() 2020-10-30 23:02:28 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c
process_vm_access.c mm/process_vm_access: remove redundant initialization of iov_r 2020-12-15 12:13:46 -08:00
ptdump.c kasan, arm64: expand CONFIG_KASAN checks 2020-12-22 12:55:08 -08:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/lru: revise the comments of lru_lock 2020-12-15 14:48:04 -08:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm: shmem: convert shmem_enabled_show to use sysfs_emit_at 2020-12-15 12:13:47 -08:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c kasan, mm: allow cache merging with no metadata 2020-12-22 12:55:09 -08:00
slab.c mm: introduce debug_pagealloc_{map,unmap}_pages() helpers 2020-12-15 12:13:43 -08:00
slab.h Networking updates for 5.11 2020-12-15 13:22:29 -08:00
slob.c mm: extract might_alloc() debug check 2020-12-15 12:13:41 -08:00
slub.c mm: slub: call account_slab_page() after slab page initialization 2020-12-29 15:36:49 -08:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c mm: use sysfs_emit for struct kobject * uses 2020-12-15 12:13:47 -08:00
swap.c mm/lru: introduce relock_page_lruvec() 2020-12-15 14:48:04 -08:00
swapfile.c mm: fix a race on nr_swap_pages 2020-12-15 22:46:15 -08:00
truncate.c mm: fix kernel-doc markups 2020-12-15 12:13:47 -08:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm: introduce vma_set_file function v5 2020-11-19 10:36:36 +01:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc.c: fix kasan shadow poisoning size 2020-12-15 12:13:42 -08:00
vmpressure.c
vmscan.c mm/lru: revise the comments of lru_lock 2020-12-15 14:48:04 -08:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2020-12-15 12:13:42 -08:00
workingset.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 14:55:10 -08:00
z3fold.c z3fold: remove preempt disabled sections for RT 2020-12-15 12:13:45 -08:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: rework the list_add code in insert_zspage() 2020-12-15 12:13:46 -08:00
zswap.c mm/zswap: move to use crypto_acomp API for hardware acceleration 2020-12-15 12:13:46 -08:00