linux/mm
Jann Horn 90717566f8 mm: don't drop VMA locks in mm_drop_all_locks()
Despite its name, mm_drop_all_locks() does not drop _all_ locks; the mmap
lock is held write-locked by the caller, and the caller is responsible for
dropping the mmap lock at a later point (which will also release the VMA
locks).

Calling vma_end_write_all() here is dangerous because the caller might
have write-locked a VMA with the expectation that it will stay
write-locked until the mmap_lock is released, as usual.

This _almost_ becomes a problem in the following scenario:

An anonymous VMA A and an SGX VMA B are mapped adjacent to each other. 
Userspace calls munmap() on a range starting at the start address of A and
ending in the middle of B.

Hypothetical call graph with additional notes in brackets:

do_vmi_align_munmap
  [begin first for_each_vma_range loop]
  vma_start_write [on VMA A]
  vma_mark_detached [on VMA A]
  __split_vma [on VMA B]
    sgx_vma_open [== new->vm_ops->open]
      sgx_encl_mm_add
        __mmu_notifier_register [luckily THIS CAN'T ACTUALLY HAPPEN]
          mm_take_all_locks
          mm_drop_all_locks
            vma_end_write_all [drops VMA lock taken on VMA A before]
  vma_start_write [on VMA B]
  vma_mark_detached [on VMA B]
  [end first for_each_vma_range loop]
  vma_iter_clear_gfp [removes VMAs from maple tree]
  mmap_write_downgrade
  unmap_region
  mmap_read_unlock

In this hypothetical scenario, while do_vmi_align_munmap() thinks it still
holds a VMA write lock on VMA A, the VMA write lock has actually been
invalidated inside __split_vma().

The call from sgx_encl_mm_add() to __mmu_notifier_register() can't
actually happen here, as far as I understand, because we are duplicating
an existing SGX VMA, but sgx_encl_mm_add() only calls
__mmu_notifier_register() for the first SGX VMA created in a given
process.  So this could only happen in fork(), not on munmap().  But in my
view it is just pure luck that this can't happen.

Also, we wouldn't actually have any bad consequences from this in
do_vmi_align_munmap(), because by the time the bug drops the lock on VMA
A, we've already marked VMA A as detached, which makes it completely
ineligible for any VMA-locked page faults.  But again, that's just pure
luck.

So remove the vma_end_write_all(), so that VMA write locks are only ever
released on mmap_write_unlock() or mmap_write_downgrade().

Also add comments to document the locking rules established by this patch.

Link: https://lkml.kernel.org/r/20230720193436.454247-1-jannh@google.com
Fixes: eeff9a5d47 ("mm/mmap: prevent pagefault handler from racing with mmu_notifier registration")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:46 -07:00
..
damon mm/damon/core-test: initialise context before test in damon_test_set_attrs() 2023-07-27 13:07:03 -07:00
kasan kasan, slub: fix HW_TAGS zeroing with slub_debug 2023-07-08 09:29:32 -07:00
kfence mm: kfence: allocate kfence_metadata at runtime 2023-08-18 10:12:39 -07:00
kmsan kasan,kmsan: remove __GFP_KSWAPD_RECLAIM usage from kasan/kmsan 2023-06-23 16:59:26 -07:00
backing-dev.c mm: backing-dev: make bdi_class a static const structure 2023-06-23 16:59:27 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm: cma: print cma name as well in cma_alloc debug 2023-08-18 10:12:12 -07:00
cma.h
compaction.c mm/compaction: avoid unneeded pageblock_end_pfn when no_set_skip_hint is set 2023-08-18 10:12:44 -07:00
debug_page_alloc.c mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable,page_table_check: warn pte map fails 2023-06-19 16:19:15 -07:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c dmapool: create/destroy cleanup 2023-06-09 16:25:17 -07:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2023-08-18 10:12:12 -07:00
folio-compat.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
frontswap.c mm: zswap: support exclusive loads 2023-06-19 16:19:05 -07:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c mm/gup: retire follow_hugetlb_page() 2023-08-18 10:12:04 -07:00
highmem.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
hmm.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
huge_memory.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
hugetlb_cgroup.c mm/hugetlb: increase use of folios in alloc_huge_page() 2023-02-13 15:54:27 -08:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-08-18 10:12:14 -07:00
hugetlb_vmemmap.h
hugetlb.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c IOMMU Updates for Linux 6.4 2023-04-30 13:00:38 -07:00
internal.h mm, netfs, fscache: stop read optimisation when folio removed from pagecache 2023-08-18 10:12:13 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig mm: make MEMFD_CREATE into a selectable config option 2023-08-18 10:12:01 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() 2023-08-18 10:12:25 -07:00
kmemleak.c lib/stackdepot, mm: rename stack_depot_want_early_init 2023-02-16 20:43:49 -08:00
ksm.c ksm: consider KSM-placed zeropages when calculating KSM profit 2023-08-18 10:12:10 -07:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c mm: make PTE_MARKER_SWAPIN_ERROR more general 2023-08-18 10:12:16 -07:00
Makefile mm: page_alloc: split out DEBUG_PAGEALLOC 2023-06-09 16:25:23 -07:00
mapping_dirty_helpers.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
memblock.c Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF" 2023-07-28 09:47:06 -07:00
memcontrol.c mm/memcg: minor cleanup for mc_handle_present_pte() 2023-08-18 10:12:37 -07:00
memfd.c mm/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2023-08-18 10:12:11 -07:00
memory_hotplug.c mm/memory_hotplug: document the signal_pending() check in offline_pages() 2023-08-18 10:12:19 -07:00
memory-failure.c mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON 2023-08-18 10:12:26 -07:00
memory-tiers.c memory tier: use helper macro __ATTR_RW() 2023-08-18 10:12:38 -07:00
memory.c mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() 2023-08-18 10:12:41 -07:00
mempolicy.c mm/mempolicy: Take VMA lock before replacing policy 2023-07-28 09:44:06 -07:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c mm/memremap.c: fix outdated comment in devm_memremap_pages 2023-02-09 16:51:46 -08:00
memtest.c mm/memtest: add results of early memtest to /proc/meminfo 2023-04-05 19:42:55 -07:00
migrate_device.c mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() 2023-08-18 10:12:41 -07:00
migrate.c migrate: use folio_set_bh() instead of set_bh_page() 2023-08-18 10:12:30 -07:00
mincore.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
mlock.c mm/mlock: fix vma iterator conversion of apply_vma_lock_flags() 2023-07-17 12:53:21 -07:00
mm_init.c mm: kfence: allocate kfence_metadata at runtime 2023-08-18 10:12:39 -07:00
mm_slot.h
mmap_lock.c
mmap.c mm: don't drop VMA locks in mm_drop_all_locks() 2023-08-18 10:12:46 -07:00
mmu_gather.c mm: prefer xxx_page() alloc/free functions for order-0 pages 2023-03-28 16:20:16 -07:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c
mprotect.c mm/mprotect: fix obsolete function name in change_pte_range() 2023-08-18 10:12:43 -07:00
mremap.c mm: Update do_vmi_align_munmap() return semantics 2023-07-01 08:10:56 -07:00
msync.c
nommu.c mm: make show_free_areas() static 2023-08-18 10:12:02 -07:00
oom_kill.c mm, oom: do not check 0 mask in out_of_memory() 2023-06-09 16:25:20 -07:00
page_alloc.c mm: page_alloc: avoid false page outside zone error info 2023-08-18 10:12:10 -07:00
page_counter.c
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c mm/page_io: convert bio_associate_blkg_from_page() to take in a folio 2023-08-18 10:12:46 -07:00
page_isolation.c mm/hugetlb: get rid of page_hstate() 2023-08-18 10:12:39 -07:00
page_owner.c mm/page_owner/cma: show pfn in cma/page_owner with hex format 2023-06-19 16:19:32 -07:00
page_poison.c
page_reporting.c mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
page_reporting.h
page_table_check.c mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set 2023-08-18 10:12:29 -07:00
page_vma_mapped.c mm: correct stale comment of function check_pte 2023-08-18 10:12:13 -07:00
page-writeback.c writeback: account the number of pages written back 2023-07-08 09:29:30 -07:00
pagewalk.c mm/pagewalk: fix EFI_PGT_DUMP of espfix area 2023-07-27 13:07:04 -07:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: memcontrol: rename memcg_kmem_enabled() 2023-02-16 20:43:56 -08:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm/gup: remove unused vmas parameter from pin_user_pages_remote() 2023-06-09 16:25:25 -07:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
rmap.c mmu_notifiers: don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() 2023-08-18 10:12:41 -07:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/mlock: rename mlock_future_check() to mlock_future_ok() 2023-06-09 16:25:38 -07:00
shmem.c mm: make PTE_MARKER_SWAPIN_ERROR more general 2023-08-18 10:12:16 -07:00
show_mem.c mm: make show_free_areas() static 2023-08-18 10:12:02 -07:00
shrinker_debug.c Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless" 2023-06-19 13:19:34 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: redefine MAX_ORDER sanely 2023-04-05 19:42:46 -07:00
slab_common.c slab updates for 6.5 2023-06-29 16:34:12 -07:00
slab.c slab updates for 6.5 2023-06-29 16:34:12 -07:00
slab.h kasan, slub: fix HW_TAGS zeroing with slub_debug 2023-07-08 09:29:32 -07:00
slub.c slab updates for 6.5 2023-06-29 16:34:12 -07:00
sparse-vmemmap.c mm: ptep_get() conversion 2023-06-19 16:19:25 -07:00
sparse.c mm/sparse: remove redundant judgments from macro for_each_present_section_nr 2023-08-18 10:12:14 -07:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c
swap_state.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h mm: remove the __swap_writepage return value 2023-02-02 22:33:33 -08:00
swapfile.c mm: make PTE_MARKER_SWAPIN_ERROR more general 2023-08-18 10:12:16 -07:00
truncate.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2023-08-18 10:12:12 -07:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c mm: userfaultfd: support UFFDIO_POISON for hugetlbfs 2023-08-18 10:12:17 -07:00
util.c mm: remove page_rmapping() 2023-08-18 10:12:01 -07:00
vmalloc.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
vmpressure.c
vmscan.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2023-08-18 10:12:12 -07:00
vmstat.c - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
workingset.c Multi-gen LRU: fix workingset accounting 2023-06-09 16:25:46 -07:00
z3fold.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c zsmalloc: remove obj_tagged() 2023-08-18 10:12:18 -07:00
zswap.c mm: zswap: fix double invalidate with exclusive loads 2023-06-23 16:59:31 -07:00