linux/mm
Muchun Song cae3af62b3 mm: memcontrol: fix swap undercounting in cgroup2
When pages are swapped in, the VM may retain the swap copy to avoid
repeated writes in the future.  It's also retained if shared pages are
faulted back in some processes, but not in others.  During that time we
have an in-memory copy of the page, as well as an on-swap copy.  Cgroup1
and cgroup2 handle these overlapping lifetimes slightly differently due to
the nature of how they account memory and swap:

Cgroup1 has a unified memory+swap counter that tracks a data page
regardless whether it's in-core or swapped out.  On swapin, we transfer
the charge from the swap entry to the newly allocated swapcache page, even
though the swap entry might stick around for a while.  That's why we have
a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge().

Cgroup2 tracks memory and swap as separate, independent resources and thus
has split memory and swap counters.  On swapin, we charge the newly
allocated swapcache page as memory, while the swap slot in turn must
remain charged to the swap counter as long as its allocated too.

The cgroup2 logic was broken by commit 2d1c498072 ("mm: memcontrol: make
swap tracking an integral part of memory control"), because it
accidentally removed the do_memsw_account() check in the branch inside
mem_cgroup_uncharge() that was supposed to tell the difference between the
charge transfer in cgroup1 and the separate counters in cgroup2.

As a result, cgroup2 currently undercounts retained swap to varying
degrees: swap slots are cached up to 50% of the configured limit or total
available swap space; partially faulted back shared pages are only limited
by physical capacity.  This in turn allows cgroups to significantly
overconsume their alloted swap space.

Add the do_memsw_account() check back to fix this problem.

Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com
Fixes: 2d1c498072 ("mm: memcontrol: make swap tracking an integral part of memory control")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:30 -08:00
..
kasan kasan: fix stack traces dependency for HW_TAGS 2021-02-09 17:26:44 -08:00
backing-dev.c mm: backing-dev: Remove duplicated macro definition 2021-02-24 13:38:28 -08:00
balloon_compaction.c
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c mm: cma: improve pr_debug log in cma_release() 2020-12-15 12:13:46 -08:00
cma.h mm: cma: use CMA_MAX_NAME to define the length of cma name array 2020-09-01 09:19:43 +02:00
compaction.c mm, compaction: move high_pfn to the for loop scope 2021-02-05 11:03:47 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable/basic: iterate over entire protection_map[] 2021-02-24 13:38:27 -08:00
debug.c mm/debug: improve memcg debugging 2021-02-24 13:38:27 -08:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm: memcontrol: convert NR_SHMEM_THPS account to pages 2021-02-24 13:38:29 -08:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_test.c mm/gup_test.c: mark gup_test_init as __init function 2020-12-15 12:13:38 -08:00
gup_test.h selftests/vm: gup_test: introduce the dump_pages() sub-test 2020-12-15 12:13:38 -08:00
gup.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
highmem.c mm/highmem: prepare for overriding set_pte_at() 2021-01-24 10:34:52 -08:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm: memcontrol: convert NR_SHMEM_THPS account to pages 2021-02-24 13:38:29 -08:00
hugetlb_cgroup.c hugetlb_cgroup: fix offline of hugetlb cgroup with reservations 2020-12-06 10:19:07 -08:00
hugetlb.c These changes fix MM (soft-)dirty bit management in the procfs code & clean up the API. 2021-02-21 12:19:56 -08:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-15 12:13:39 -08:00
internal.h mm, page_alloc: disable pcplists during memory offline 2020-12-15 12:13:43 -08:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig media: videobuf2: Move frame_vector into media subsystem 2021-01-12 14:15:31 +01:00
Kconfig.debug mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO 2020-12-15 12:13:46 -08:00
khugepaged.c mm: memcontrol: convert NR_SHMEM_THPS account to pages 2021-02-24 13:38:29 -08:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c mm: cleanup kstrto*() usage 2020-12-15 12:13:47 -08:00
list_lru.c mm/list_lru.c: remove kvfree_rcu_local() 2021-02-24 13:38:30 -08:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
Makefile media: videobuf2: Move frame_vector into media subsystem 2021-01-12 14:15:31 +01:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: enhance the kernel-doc markups 2020-12-15 12:13:41 -08:00
memblock.c memblock: remove return value of memblock_free_all() 2021-02-22 13:01:23 -08:00
memcontrol.c mm: memcontrol: fix swap undercounting in cgroup2 2021-02-24 13:38:30 -08:00
memfd.c
memory_hotplug.c mm: memmap defer init doesn't work as expected 2020-12-29 15:36:49 -08:00
memory-failure.c mm: fix page reference leak in soft_offline_page() 2021-01-24 10:34:52 -08:00
memory.c Fixes around VM_FPNMAP and follow_pfn 2021-02-22 17:45:02 -08:00
mempolicy.c mm: migrate: initialize err in do_migrate_pages 2021-01-12 18:12:54 -08:00
mempool.c kasan, mm: rename kasan_poison_kfree 2020-12-22 12:55:09 -08:00
memremap.c mm/mremap_pages: fix static key devmap_managed_key updates 2020-11-02 12:14:18 -08:00
memtest.c
migrate.c mm: memcg: add swapcache stat for memcg v2 2021-02-24 13:38:29 -08:00
mincore.c inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
mlock.c mm/lru: introduce relock_page_lruvec() 2020-12-15 14:48:04 -08:00
mm_init.c mm: fix fall-through warnings for Clang 2020-12-15 12:13:47 -08:00
mmap_lock.c mm: mmap_lock: add tracepoints around lock acquisition 2020-12-15 12:13:41 -08:00
mmap.c tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() 2021-01-29 20:02:29 +01:00
mmu_gather.c tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() 2021-01-29 20:02:29 +01:00
mmu_notifier.c mm: track mmu notifiers in fs_reclaim_acquire/release 2020-12-15 12:13:41 -08:00
mmzone.c mm/lru: replace pgdat lru_lock with lruvec lock 2020-12-15 14:48:04 -08:00
mprotect.c mm: Add 'mprotect' hook to struct vm_operations_struct 2020-11-17 14:36:14 +01:00
mremap.c This pull request contains the following changes for UML: 2021-02-21 13:53:00 -08:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c mm/nommu: Fix return type of filemap_map_pages() 2021-01-28 14:10:31 +00:00
oom_kill.c tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() 2021-01-29 20:02:29 +01:00
page_alloc.c mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages 2021-02-24 13:38:29 -08:00
page_counter.c mm/page_counter: use page_counter_read in page_counter_set_max 2020-12-15 12:13:40 -08:00
page_ext.c mm: fix some spelling mistakes in comments 2020-12-15 22:46:19 -08:00
page_idle.c mm: page_idle_get_page() does not need lru_lock 2020-12-15 14:48:03 -08:00
page_io.c mm/page_io: use pr_alert_ratelimited for swap read/write errors 2021-02-24 13:38:29 -08:00
page_isolation.c mm/page_isolation: do not isolate the max order page 2020-12-15 12:13:45 -08:00
page_owner.c mm/page_owner: use helper function zone_end_pfn() to get end_pfn 2021-02-24 13:38:27 -08:00
page_poison.c kasan, mm: reset tags when accessing metadata 2020-12-22 12:55:08 -08:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm/page_vma_mapped.c: add colon to fix kernel-doc markups error for check_pte 2020-12-15 12:13:41 -08:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-05 11:33:00 -08:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: fix clang modpost section mismatch 2021-02-14 18:15:15 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-12 18:12:54 -08:00
ptdump.c kasan, arm64: expand CONFIG_KASAN checks 2020-12-22 12:55:08 -08:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages 2021-02-24 13:38:29 -08:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm: memcontrol: convert NR_SHMEM_THPS account to pages 2021-02-24 13:38:29 -08:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm, slab, slub: stop taking cpu hotplug lock 2021-02-24 13:38:27 -08:00
slab.c mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT 2021-02-24 13:38:29 -08:00
slab.h mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT 2021-02-24 13:38:29 -08:00
slob.c mm, tracing: record slab name for kmem_cache_free() 2021-02-24 13:38:26 -08:00
slub.c mm: memcg/slab: pre-allocate obj_cgroups for slab caches with SLAB_ACCOUNT 2021-02-24 13:38:29 -08:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG 2020-10-16 11:11:18 -07:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove redundant NULL check 2021-02-24 13:38:28 -08:00
swap_state.c mm: memcg: add swapcache stat for memcg v2 2021-02-24 13:38:29 -08:00
swap.c mm/lru: introduce relock_page_lruvec() 2020-12-15 14:48:04 -08:00
swapfile.c mm/swapfile.c: fix debugging information problem 2021-02-24 13:38:28 -08:00
truncate.c mm: fix kernel-doc markups 2020-12-15 12:13:47 -08:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c mm: Make mem_dump_obj() handle vmalloc() memory 2021-01-22 15:24:04 -08:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2021-02-12 12:56:55 +01:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm: don't put pinned pages into the swap cache 2021-01-17 12:08:04 -08:00
vmstat.c mm: memcg: add swapcache stat for memcg v2 2021-02-24 13:38:29 -08:00
workingset.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 14:55:10 -08:00
z3fold.c z3fold: remove preempt disabled sections for RT 2020-12-15 12:13:45 -08:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: rework the list_add code in insert_zspage() 2020-12-15 12:13:46 -08:00
zswap.c mm/zswap: move to use crypto_acomp API for hardware acceleration 2020-12-15 12:13:46 -08:00