linux/mm
Johannes Weiner b91ac37434 mm: vmscan: enforce inactive:active ratio at the reclaim root
We split the LRU lists into inactive and an active parts to maximize
workingset protection while allowing just enough inactive cache space to
faciltate readahead and writeback for one-off file accesses (e.g.  a
linear scan through a file, or logging); or just enough inactive anon to
maintain recent reference information when reclaim needs to swap.

With cgroups and their nested LRU lists, we currently don't do this
correctly.  While recursive cgroup reclaim establishes a relative LRU
order among the pages of all involved cgroups, inactive:active size
decisions are done on a per-cgroup level.  As a result, we'll reclaim a
cgroup's workingset when it doesn't have cold pages, even when one of its
siblings has plenty of it that should be reclaimed first.

For example: workload A has 50M worth of hot cache but doesn't do any
one-off file accesses; meanwhile, parallel workload B scans files and
rarely accesses the same page twice.

If these workloads were to run in an uncgrouped system, A would be
protected from the high rate of cache faults from B.  But if they were put
in parallel cgroups for memory accounting purposes, B's fast cache fault
rate would push out the hot cache pages of A.  This is unexpected and
undesirable - the "scan resistance" of the page cache is broken.

This patch moves inactive:active size balancing decisions to the root of
reclaim - the same level where the LRU order is established.

It does this by looking at the recursive size of the inactive and the
active file sets of the cgroup subtree at the beginning of the reclaim
cycle, and then making a decision - scan or skip active pages - that
applies throughout the entire run and to every cgroup involved.

With that in place, in the test above, the VM will recognize that there
are plenty of inactive pages in the combined cache set of workloads A and
B and prefer the one-off cache in B over the hot pages in A.  The scan
resistance of the cache is restored.

[cai@lca.pw: fix some -Wenum-conversion warnings]
  Link: http://lkml.kernel.org/r/1573848697-29262-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20191107205334.158354-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:07 -08:00
..
kasan kasan: support backing vmalloc space with real shadow memory 2019-12-01 12:59:05 -08:00
backing-dev.c bdi: Do not use freezable workqueue 2019-10-06 09:11:35 -06:00
balloon_compaction.c mm/balloon_compaction: suppress allocation warnings 2019-09-04 07:42:01 -04:00
cleancache.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
cma_debug.c mm/cma_debug.c: fix the break condition in cma_maxchunk_get() 2019-05-14 09:47:45 -07:00
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-07-16 19:23:21 -07:00
cma.h
compaction.c mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() 2019-10-14 15:04:01 -07:00
debug_page_ref.c
debug.c mm/debug.c: PageAnon() is true for PageKsm() pages 2019-11-15 18:34:00 -08:00
dmapool.c mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-07-12 11:05:46 -07:00
early_ioremap.c
fadvise.c fs: Export generic_fadvise() 2019-08-30 22:43:58 -07:00
failslab.c mm/failslab.c: by default, do not fail allocations with direct reclaim only 2019-07-12 11:05:43 -07:00
filemap.c mm: drop mmap_sem before calling balance_dirty_pages() in write fault 2019-12-01 06:29:18 -08:00
frame_vector.c mm: untag user pointers in get_vaddr_frames 2019-09-25 17:51:41 -07:00
frontswap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
gup_benchmark.c mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
gup.c mm/gup.c: fix comments of __get_user_pages() and get_user_pages_remote() 2019-12-01 06:29:18 -08:00
highmem.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
hmm.c mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap 2019-11-23 19:56:45 -04:00
huge_memory.c mm/thp: fix node page state in split_huge_page_to_list() 2019-10-19 06:32:32 -04:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-15 18:34:00 -08:00
hugetlb.c mm/page_alloc: add alloc_contig_pages() 2019-12-01 12:59:06 -08:00
hwpoison-inject.c hwpoison-inject: no need to check return value of debugfs_create functions 2019-06-03 15:39:40 +02:00
init-mm.c mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch 2019-10-19 06:32:32 -04:00
internal.h mm, pcpu: make zone pcp updates and reset internal to the mm 2019-12-01 12:59:06 -08:00
interval_tree.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248 2019-06-19 17:09:08 +02:00
Kconfig hmm related patches for 5.5 2019-11-30 10:33:14 -08:00
Kconfig.debug mm, page_owner, debug_pagealloc: save and dump freeing stack trace 2019-09-24 15:54:08 -07:00
khugepaged.c mm,thp: recheck each page before collapsing file THP 2019-11-15 18:34:00 -08:00
kmemleak-test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
kmemleak.c kmemleak: Do not corrupt the object_list during clean-up 2019-10-14 08:56:16 -07:00
ksm.c mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() 2019-11-22 09:11:18 -08:00
list_lru.c mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages 2019-07-12 11:05:44 -07:00
maccess.c uaccess: Add strict non-pagefault kernel-space read function 2019-11-02 12:39:12 -07:00
madvise.c mm, soft-offline: convert parameter to pfn 2019-12-01 12:59:04 -08:00
Makefile mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
mapping_dirty_helpers.c mm: Add write-protect and clean utilities for address space ranges 2019-11-06 13:03:36 +01:00
memblock.c mm: memblock: do not enforce current limit for memblock_phys* family 2019-10-19 06:32:32 -04:00
memcontrol.c mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
memfd.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
memory_hotplug.c mm/memory_hotplug.c: don't allow to online/offline memory blocks with holes 2019-12-01 12:59:05 -08:00
memory-failure.c mm/memory-failure.c: use page_shift() in add_to_kill() 2019-12-01 12:59:04 -08:00
memory.c mm/memory.c: fix a huge pud insertion race during faulting 2019-12-01 06:29:19 -08:00
mempolicy.c mm: mempolicy: fix the wrong return value and potential pages leak of mbind 2019-11-15 18:33:59 -08:00
mempool.c docs/core-api/mm: fix return value descriptions in mm/ 2019-03-05 21:07:20 -08:00
memremap.c mm/memunmap: don't access uninitialized memmap in memunmap_pages() 2019-10-19 06:32:32 -04:00
memtest.c
migrate.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mincore.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mlock.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mm_init.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
mmap.c mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
mmu_context.c
mmu_gather.c mm: remove quicklist page table caches 2019-09-24 15:54:09 -07:00
mmu_notifier.c hmm related patches for 5.5 2019-11-30 10:33:14 -08:00
mmzone.c
mprotect.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
mremap.c mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
msync.c mm: untag user pointers passed to memory syscalls 2019-09-25 17:51:41 -07:00
nommu.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
oom_kill.c mm: introduce MADV_COLD 2019-09-25 17:51:41 -07:00
page_alloc.c mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm, page_owner: fix off-by-one error in __set_page_owner_handle() 2019-10-14 15:04:00 -07:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-06-29 16:43:45 +08:00
page_io.c mm/page_io.c: do not free shared swap slots 2019-11-15 18:34:00 -08:00
page_isolation.c mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-12-01 12:59:04 -08:00
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-19 06:32:31 -04:00
page_poison.c mm/page_poison.c: fix a typo in a comment 2019-09-24 15:54:08 -07:00
page_vma_mapped.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
page-writeback.c writeback, memcg: Implement foreign dirty flushing 2019-08-27 09:22:38 -06:00
pagewalk.c mm: Add a walk_page_mapping() function to the pagewalk code 2019-11-06 13:02:43 +01:00
percpu-internal.h percpu: convert chunk hints to be based on pcpu_block_md 2019-03-13 12:25:31 -07:00
percpu-km.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-stats.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu-vm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 2019-06-05 17:37:16 +02:00
percpu.c percpu: Use struct_size() helper 2019-09-04 13:40:49 -07:00
pgtable-generic.c asm-generic/mm: stub out p{4,u}d_clear_bad() if __PAGETABLE_P{4,U}D_FOLDED 2019-12-01 06:29:19 -08:00
process_vm_access.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
readahead.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
rmap.c mm/rmap.c: use VM_BUG_ON_PAGE() in __page_check_anon_rmap() 2019-12-01 06:29:19 -08:00
rodata_test.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
shmem.c mm, memfd: fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings 2019-12-01 12:59:03 -08:00
shuffle.c mm: fix -Wmissing-prototypes warnings 2019-10-07 15:47:19 -07:00
shuffle.h mm: maintain randomization of page free lists 2019-05-14 19:52:48 -07:00
slab_common.c mm, slab_common: use enum kmalloc_cache_type to iterate over kmalloc caches 2019-12-01 06:29:17 -08:00
slab.c mm, slab: remove unused kmalloc_size() 2019-12-01 06:29:17 -08:00
slab.h mm: clean up and clarify lruvec lookup procedure 2019-12-01 12:59:06 -08:00
slob.c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
slub.c mm/slub.c: clean up validate_slab() 2019-12-01 06:29:18 -08:00
sparse-vmemmap.c mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() 2019-07-18 17:08:07 -07:00
sparse.c mm/sparse.c: do not waste pre allocated memmap space 2019-12-01 12:59:05 -08:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c mm: page cache: store only head pages in i_pages 2019-09-24 15:54:08 -07:00
swap.c mm/swap.c: piggyback lru_add_drain_all() calls 2019-12-01 06:29:19 -08:00
swapfile.c mm, swap: disallow swapon() on zoned block devices 2019-12-01 06:29:18 -08:00
truncate.c mm/thp: allow dropping THP from page cache 2019-10-19 06:32:33 -04:00
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-09-17 15:20:17 -07:00
userfaultfd.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
util.c mm/mmap.c: rb_parent is not necessary in __vma_link_list() 2019-12-01 06:29:19 -08:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c kasan: support backing vmalloc space with real shadow memory 2019-12-01 12:59:05 -08:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-07 15:47:19 -07:00
vmscan.c mm: vmscan: enforce inactive:active ratio at the reclaim root 2019-12-01 12:59:07 -08:00
vmstat.c mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo 2019-11-06 08:47:50 -08:00
workingset.c mm: vmscan: detect file thrashing at the reclaim root 2019-12-01 12:59:07 -08:00
z3fold.c mm/z3fold.c: claim page in the beginning of free 2019-10-07 15:47:19 -07:00
zbud.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
zpool.c zpool: add malloc_support_movable to zpool_driver 2019-09-24 15:54:12 -07:00
zsmalloc.c mm/zsmalloc.c: fix a -Wunused-function warning 2019-09-24 15:54:12 -07:00
zswap.c zswap: do not map same object twice 2019-09-24 15:54:12 -07:00