linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 04:31:50 +00:00

History

Linus Torvalds 5df397dec7 mm: delay page_remove_rmap() until after the TLB has been flushed When we remove a page table entry, we are very careful to only free the page after we have flushed the TLB, because other CPUs could still be using the page through stale TLB entries until after the flush. However, we have removed the rmap entry for that page early, which means that functions like folio_mkclean() would end up not serializing with the page table lock because the page had already been made invisible to rmap. And that is a problem, because while the TLB entry exists, we could end up with the following situation: (a) one CPU could come in and clean it, never seeing our mapping of the page (b) another CPU could continue to use the stale and dirty TLB entry and continue to write to said page resulting in a page that has been dirtied, but then marked clean again, all while another CPU might have dirtied it some more. End result: possibly lost dirty data. This extends our current TLB gather infrastructure to optionally track a "should I do a delayed page_remove_rmap() for this page after flushing the TLB". It uses the newly introduced 'encoded page pointer' to do that without having to keep separate data around. Note, this is complicated by a couple of issues: - we want to delay the rmap removal, but not past the page table lock, because that simplifies the memcg accounting - only SMP configurations want to delay TLB flushing, since on UP there are obviously no remote TLBs to worry about, and the page table lock means there are no preemption issues either - s390 has its own mmu_gather model that doesn't delay TLB flushing, and as a result also does not want the delayed rmap. As such, we can treat S390 like the UP case and use a common fallback for the "no delays" case. - we can track an enormous number of pages in our mmu_gather structure, with MAX_GATHER_BATCH_COUNT batches of MAX_TABLE_BATCH pages each, all set up to be approximately 10k pending pages. We do not want to have a huge number of batched pages that we then need to check for delayed rmap handling inside the page table lock. Particularly that last point results in a noteworthy detail, where the normal page batch gathering is limited once we have delayed rmaps pending, in such a way that only the last batch (the so-called "active batch") in the mmu_gather structure can have any delayed entries. NOTE! While the "possibly lost dirty data" sounds catastrophic, for this all to happen you need to have a user thread doing either madvise() with MADV_DONTNEED or a full re-mmap() of the area concurrently with another thread continuing to use said mapping. So arguably this is about user space doing crazy things, but from a VM consistency standpoint it's better if we track the dirty bit properly even when user space goes off the rails. [akpm@linux-foundation.org: fix UP build, per Linus] Link: https://lore.kernel.org/all/B88D3073-440A-41C7-95F4-895D3F657EF2@gmail.com/ Link: https://lkml.kernel.org/r/20221109203051.1835763-4-torvalds@linux-foundation.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Hugh Dickins <hughd@google.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Tested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2022-11-30 15:58:50 -08:00
..
damon	mm/damon: use kstrtobool() instead of strtobool()	2022-11-30 15:58:45 -08:00
kasan	memory: move hotplug memory notifier priority to same file for easy sorting	2022-11-08 17:37:17 -08:00
kfence	kfence: fix stack trace pruning	2022-11-22 18:50:44 -08:00
kmsan	kmsan: core: kmsan_in_runtime() should return true in NMI context	2022-11-08 15:57:24 -08:00
backing-dev.c	mm: backing-dev: Remove the unneeded result variable	2022-09-11 20:26:02 -07:00
balloon_compaction.c	mm: Convert all PageMovable users to movable_operations	2022-08-02 12:34:03 -04:00
bootmem_info.c	bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem	2022-08-28 14:02:45 -07:00
cma_debug.c	mm/cma_debug: show complete cma name in debugfs directories	2022-09-11 20:25:50 -07:00
cma_sysfs.c
cma.c	Revert "mm/cma.c: remove redundant cma_mutex lock"	2022-05-13 15:11:26 -07:00
cma.h	mm/cma: provide option to opt out from exposing pages on activation failure	2022-03-22 15:57:09 -07:00
compaction.c	mm: migrate: fix THP's mapcount on isolation	2022-11-30 14:49:41 -08:00
debug_page_ref.c
debug_vm_pgtable.c	mm: remove unused savedwrite infrastructure	2022-11-30 15:58:49 -08:00
debug.c	mm,thp,rmap: simplify compound page mapcount handling	2022-11-30 15:58:46 -08:00
dmapool.c
early_ioremap.c	mm/early_ioremap: declare early_memremap_pgprot_adjust()	2022-03-22 15:57:11 -07:00
fadvise.c	riscv: compat: syscall: Add compat_sys_call_table implementation	2022-04-26 13:36:25 -07:00
failslab.c	mm: fix unexpected changes to {failslab\|fail_page_alloc}.attr	2022-11-22 18:50:44 -08:00
filemap.c	filemap: find_get_entries() now updates start offset	2022-11-08 17:37:12 -08:00
folio-compat.c	mm,thp,rmap: simplify compound page mapcount handling	2022-11-30 15:58:46 -08:00
frontswap.c	frontswap: don't call ->init if no ops are registered	2022-09-26 12:14:34 -07:00
gup_test.c	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup_test.h	mm/gup_test: start/stop/read functionality for PIN LONGTERM test	2022-11-08 17:37:15 -08:00
gup.c	hugetlb: simplify hugetlb handling in follow_page_mask	2022-11-08 17:37:10 -08:00
highmem.c	highmem: fix kmap_to_page() for kmap_local_page() addresses	2022-10-12 18:51:51 -07:00
hmm.c	mm/swap: add swp_offset_pfn() to fetch PFN from swap entry	2022-09-26 19:46:05 -07:00
huge_memory.c	mm/autonuma: use can_change_(pte\|pmd)_writable() to replace savedwrite	2022-11-30 15:58:49 -08:00
hugetlb_cgroup.c	mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() to folios	2022-11-30 15:58:43 -08:00
hugetlb_vmemmap.c	mm/hugetlb_vmemmap: remap head page to newly allocated page	2022-11-30 15:58:47 -08:00
hugetlb_vmemmap.h	mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability	2022-08-08 18:06:43 -07:00
hugetlb.c	mm,thp,rmap: simplify compound page mapcount handling	2022-11-30 15:58:46 -08:00
hwpoison-inject.c	mm/hwpoison: add __init/__exit annotations to module init/exit funcs	2022-10-03 14:03:05 -07:00
init-mm.c	mm: remove rb tree.	2022-09-26 19:46:16 -07:00
internal.h	mm/hwpoison: introduce per-memory_block hwpoison counter	2022-11-08 17:37:22 -08:00
interval_tree.c
io-mapping.c
ioremap.c	mm: ioremap: Add ioremap/iounmap_allowed()	2022-06-27 12:22:31 +01:00
Kconfig	mm,hugetlb: use folio fields in second tail page	2022-11-30 15:58:46 -08:00
Kconfig.debug	Two followon fixes for the post-5.19 series "Use pageblock_order for cma	2022-05-27 11:40:49 -07:00
khugepaged.c	mm,thp,rmap: simplify compound page mapcount handling	2022-11-30 15:58:46 -08:00
kmemleak.c	mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops	2022-10-28 13:37:22 -07:00
ksm.c	mm/autonuma: use can_change_(pte\|pmd)_writable() to replace savedwrite	2022-11-30 15:58:49 -08:00
list_lru.c	mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe	2022-06-16 19:48:31 -07:00
maccess.c	asm-generic updates for 5.18	2022-03-23 18:03:08 -07:00
madvise.c	madvise: use zap_page_range_single for madvise dontneed	2022-11-30 14:49:40 -08:00
Makefile	mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol	2022-10-03 14:03:36 -07:00
mapping_dirty_helpers.c
memblock.c	mm: add pageblock_align() macro	2022-10-03 14:03:04 -07:00
memcontrol.c	mm: vmscan: split khugepaged stats from direct reclaim stats	2022-11-30 15:58:41 -08:00
memfd.c	memfd: fix F_SEAL_WRITE after shmem huge page allocated	2022-03-05 11:08:32 -08:00
memory_hotplug.c	mm: add pageblock_aligned() macro	2022-10-03 14:03:04 -07:00
memory-failure.c	mm,hugetlb: use folio fields in second tail page	2022-11-30 15:58:46 -08:00
memory-tiers.c	memory: move hotplug memory notifier priority to same file for easy sorting	2022-11-08 17:37:17 -08:00
memory.c	mm: delay page_remove_rmap() until after the TLB has been flushed	2022-11-30 15:58:50 -08:00
mempolicy.c	mm/mempolicy: fix mbind_range() arguments to vma_merge()	2022-10-20 21:27:21 -07:00
mempool.c	mempool: do not use ksize() for poisoning	2022-11-30 15:58:41 -08:00
memremap.c	mm/memremap.c: map FS_DAX device memory as decrypted	2022-11-08 15:57:23 -08:00
memtest.c
migrate_device.c	mm/migrate_device: return number of migrating pages in args->cpages	2022-11-22 18:50:43 -08:00
migrate.c	mm/hugetlb: convert move_hugetlb_state() to folios	2022-11-30 15:58:43 -08:00
mincore.c	mm: convert find_get_incore_page() to filemap_get_incore_folio()	2022-11-08 17:37:18 -08:00
mlock.c	mm/mlock: drop dead code in count_mm_mlocked_page_nr()	2022-09-26 19:46:27 -07:00
mm_init.c	memory: move hotplug memory notifier priority to same file for easy sorting	2022-11-08 17:37:17 -08:00
mm_slot.h	mm: introduce common struct mm_slot	2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c	Merge branch 'mm-hotfixes-stable' into mm-stable	2022-11-30 14:58:42 -08:00
mmu_gather.c	mm: delay page_remove_rmap() until after the TLB has been flushed	2022-11-30 15:58:50 -08:00
mmu_notifier.c	mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()	2022-04-21 20:01:10 -07:00
mmzone.c	mm: multi-gen LRU: groundwork	2022-09-26 19:46:09 -07:00
mprotect.c	mm/autonuma: use can_change_(pte\|pmd)_writable() to replace savedwrite	2022-11-30 15:58:49 -08:00
mremap.c	mm: add merging after mremap resize	2022-09-26 19:46:28 -07:00
msync.c	mm/msync: use vma_find() instead of vma linked list	2022-09-26 19:46:25 -07:00
nommu.c	mm: remove the vma linked list	2022-09-26 19:46:26 -07:00
oom_kill.c	mm: reduce noise in show_mem for lowmem allocations	2022-09-26 19:46:29 -07:00
page_alloc.c	mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped	2022-11-30 15:58:48 -08:00
page_counter.c	mm: page_counter: remove unneeded atomic ops for low/min	2022-09-11 20:26:01 -07:00
page_ext.c	Merge branch 'mm-hotfixes-stable' into mm-stable	2022-11-30 14:58:42 -08:00
page_idle.c	mm: don't be stuck to rmap lock on reclaim path	2022-05-19 14:08:54 -07:00
page_io.c	swap: convert swap_writepage() to use a folio	2022-10-03 14:02:52 -07:00
page_isolation.c	mm/page_isolation: fix clang deadcode warning	2022-10-28 13:37:22 -07:00
page_owner.c	mm: reuse pageblock_start/end_pfn() macro	2022-10-03 14:03:03 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c	mm: use kstrtobool() instead of strtobool()	2022-11-30 15:58:45 -08:00
page_vma_mapped.c	mm/swap: add swp_offset_pfn() to fetch PFN from swap entry	2022-09-26 19:46:05 -07:00
page-writeback.c	mm: export balance_dirty_pages_ratelimited_flags()	2022-09-26 12:28:07 +02:00
pagewalk.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
percpu-internal.h	percpu: improve percpu_alloc_percpu event trace	2022-05-13 07:20:18 -07:00
percpu-km.c
percpu-stats.c	mm: use vmalloc_array and vcalloc for array allocations	2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c	mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free()	2022-07-17 17:14:47 -07:00
pgalloc-track.h
pgtable-generic.c	mm: avoid unnecessary flush on change_huge_pmd()	2022-05-13 07:20:05 -07:00
process_vm_access.c
ptdump.c	mm: pagewalk: Fix race between unmap and page walker	2022-09-03 10:13:13 -07:00
readahead.c	mm: add PSI accounting around ->read_folio and ->readahead calls	2022-09-20 08:24:38 -06:00
rmap.c	mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped	2022-11-30 15:58:48 -08:00
rodata_test.c	mm/rodata_test: use PAGE_ALIGNED() helper	2022-10-03 14:03:05 -07:00
secretmem.c	mm/secretmem: remove reduntant return value	2022-10-03 14:03:36 -07:00
shmem.c	mm: use pte markers for swap errors	2022-11-30 15:58:46 -08:00
shrinker_debug.c	mm: shrinkers: fix double kfree on shrinker name	2022-07-29 18:07:13 -07:00
shuffle.c	mm/shuffle: convert module_param_call to module_param_cb	2022-10-03 14:03:07 -07:00
shuffle.h
slab_common.c	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
slab.c	Random number generator fixes for Linux 6.1-rc1.	2022-10-16 15:27:07 -07:00
slab.h	- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in	2022-10-10 17:53:04 -07:00
slob.c	Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next	2022-09-29 11:30:55 +02:00
slub.c	mm/slub.c: use hotplug_memory_notifier() directly	2022-11-08 17:37:16 -08:00
sparse-vmemmap.c	mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c	2022-08-08 18:06:42 -07:00
sparse.c	mm/hwpoison: introduce per-memory_block hwpoison counter	2022-11-08 17:37:22 -08:00
swap_cgroup.c	mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled	2022-10-03 14:03:36 -07:00
swap_slots.c	mm/swap: convert put_swap_page() to put_swap_folio()	2022-10-03 14:02:46 -07:00
swap_state.c	mm: mmu_gather: prepare to gather encoded page pointers with flags	2022-11-30 15:58:50 -08:00
swap.c	mm: teach release_pages() to take an array of encoded page pointers too	2022-11-30 15:58:50 -08:00
swap.h	mm: convert find_get_incore_page() to filemap_get_incore_folio()	2022-11-08 17:37:18 -08:00
swapfile.c	mm: use pte markers for swap errors	2022-11-30 15:58:46 -08:00
truncate.c	filemap: find_get_entries() now updates start offset	2022-11-08 17:37:12 -08:00
usercopy.c	mm: use kstrtobool() instead of strtobool()	2022-11-30 15:58:45 -08:00
userfaultfd.c	mm/shmem: use page_mapping() to detect page cache for uffd continue	2022-11-08 15:57:23 -08:00
util.c	mm,thp,rmap: simplify compound page mapcount handling	2022-11-30 15:58:46 -08:00
vmalloc.c	mm: vmalloc: use trace_free_vmap_area_noflush event	2022-11-08 17:37:17 -08:00
vmpressure.c
vmscan.c	mm: vmscan: split khugepaged stats from direct reclaim stats	2022-11-30 15:58:41 -08:00
vmstat.c	mm: vmscan: split khugepaged stats from direct reclaim stats	2022-11-30 15:58:41 -08:00
workingset.c	mm: vmscan: make rotations a secondary factor in balancing anon vs file	2022-11-08 17:37:11 -08:00
z3fold.c	mm: Convert all PageMovable users to movable_operations	2022-08-02 12:34:03 -04:00
zbud.c
zpool.c
zsmalloc.c	zsmalloc: replace IS_ERR() with IS_ERR_VALUE()	2022-11-30 15:58:46 -08:00
zswap.c	mm/swap: remove the end_write_func argument to __swap_writepage	2022-09-11 20:25:50 -07:00