linux/mm
Suren Baghdasaryan 67197a4f28 mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm.  This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.

Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important.  Such operation can happen frequently.  We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression.  Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us.  Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.

Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK).  Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes.  To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex.  Its scope is changed to global.

The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork().  To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified.  Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare.  Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.

With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.

[surenb@google.com: v3]
  Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com

Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00
..
kasan mm: kasan: do not panic if both panic_on_warn and kasan_multishot set 2020-10-13 18:38:32 -07:00
backing-dev.c bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag 2020-09-24 13:43:39 -06:00
balloon_compaction.c
cleancache.c
cma_debug.c debugfs: make sure we can remove u32_array files cleanly 2020-07-10 13:54:00 -07:00
cma.c cma: don't quit at first error when activating reserved areas 2020-08-12 10:57:57 -07:00
cma.h mm: cma: fix the name of CMA areas 2020-08-12 10:57:57 -07:00
compaction.c mm/compaction.c: micro-optimization remove unnecessary branch 2020-10-13 18:38:34 -07:00
debug_page_ref.c
debug_vm_pgtable.c Documentation/mm: add descriptions for arch page table helpers 2020-08-07 11:33:23 -07:00
debug.c mm, dump_page: rename head_mapcount() --> head_compound_mapcount() 2020-10-13 18:38:29 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c mm/early_ioremap.c: use %pa to print resource_size_t variables 2020-01-31 10:30:38 -08:00
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c mm/filemap: fix filemap_map_pages for THP 2020-10-13 18:38:29 -07:00
frame_vector.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
frontswap.c mm/frontswap: mark various intentional data races 2020-08-14 19:56:56 -07:00
gup_benchmark.c mm/gup_benchmark: use pin_user_pages for FOLL_LONGTERM flag 2020-10-13 18:38:29 -07:00
gup.c mm/gup: protect unpin_user_pages() against npages==-ERRNO 2020-10-13 18:38:29 -07:00
highmem.c mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions 2019-12-10 10:12:55 +01:00
hmm.c mm: do page fault accounting in handle_mm_fault 2020-08-12 10:58:02 -07:00
huge_memory.c mm/mmap: leave adjust_next as virtual address instead of page frame number 2020-10-13 18:38:31 -07:00
hugetlb_cgroup.c hugetlb_cgroup: convert comma to semicolon 2020-08-21 09:52:52 -07:00
hugetlb.c hugetlb: add lockdep check for i_mmap_rwsem held in huge_pmd_share 2020-10-13 18:38:34 -07:00
hwpoison-inject.c mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops 2019-12-01 12:59:09 -08:00
init-mm.c mmap locking API: add MMAP_LOCK_INITIALIZER 2020-06-09 09:39:14 -07:00
internal.h i915: use find_lock_page instead of find_lock_entry 2020-10-13 18:38:29 -07:00
interval_tree.c
ioremap.c mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
Kconfig mm/gup_benchmark: update the documentation in Kconfig 2020-10-13 18:38:29 -07:00
Kconfig.debug treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
khugepaged.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-11 10:31:11 -07:00
kmemleak.c mm/kmemleak: rely on rcu for task stack scanning 2020-10-13 18:38:27 -07:00
ksm.c ksm: reinstate memcg charge on copied pages 2020-09-19 13:13:38 -07:00
list_lru.c mm/list_lru: fix a data race in list_lru_count_one 2020-08-14 19:56:57 -07:00
maccess.c uaccess: add force_uaccess_{begin,end} helpers 2020-08-12 10:57:59 -07:00
madvise.c mm: optimise madvise WILLNEED 2020-10-13 18:38:29 -07:00
Makefile mm,kmemleak-test.c: move kmemleak-test.c to samples dir 2020-10-13 18:38:27 -07:00
mapping_dirty_helpers.c mm/mapping_dirty_helpers: update huge page-table entry callbacks 2020-04-02 09:35:29 -07:00
memblock.c memblock: use separate iterators for memory and reserved regions 2020-10-13 18:38:35 -07:00
memcontrol.c mm/memcg: fix device private memcg accounting 2020-10-13 18:38:31 -07:00
memfd.c
memory_hotplug.c mm/memory_hotplug: introduce default phys_to_target_node() implementation 2020-10-13 18:38:27 -07:00
memory-failure.c mm/memory-failure.c: remove unused macro `writeback' 2020-10-13 18:38:32 -07:00
memory.c mm: remove src/dst mm parameter in copy_page_range() 2020-10-13 18:38:32 -07:00
mempolicy.c mm: remove unused alloc_page_vma_node() 2020-10-13 18:38:34 -07:00
mempool.c mm/mempool: add 'else' to split mutually exclusive case 2020-10-13 18:38:34 -07:00
memremap.c mm/memremap.c: convert devmap static branch to {inc,dec} 2020-10-13 18:38:30 -07:00
memtest.c
migrate.c block-5.10-2020-10-12 2020-10-13 12:12:44 -07:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c mlock: fix unevictable_pgs event counts on THP 2020-09-19 13:13:38 -07:00
mm_init.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
mmap.c mm/mmap.c: replace do_brk with do_brk_flags in comment of insert_vm_struct() 2020-10-13 18:38:32 -07:00
mmu_gather.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
mmu_notifier.c mm: mmu_notifier: fix and extend kerneldoc 2020-08-12 10:57:57 -07:00
mmzone.c
mprotect.c mm: Introduce arch_validate_flags() 2020-09-04 12:46:07 +01:00
mremap.c mm/mremap: start addresses are properly aligned 2020-08-07 11:33:27 -07:00
msync.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
nommu.c Fix references to nommu-mmap.rst 2020-09-24 11:03:40 -06:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-13 18:38:35 -07:00
page_alloc.c memblock: use separate iterators for memory and reserved regions 2020-10-13 18:38:35 -07:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c mm/page_ext.c: drop pfn_present() check when onlining 2020-04-07 10:43:40 -07:00
page_idle.c mm/page_idle.c: skip offline pages 2020-06-08 11:05:55 -07:00
page_io.c mm/page_io.c: remove useless out label in __swap_writepage() 2020-10-13 18:38:30 -07:00
page_isolation.c mm/page_isolation: cleanup set_migratetype_isolate() 2020-10-13 18:38:33 -07:00
page_owner.c mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention 2020-06-03 20:09:45 -07:00
page_poison.c
page_reporting.c mm/page_reporting: add budget limit on how many pages can be reported per pass 2020-04-07 10:43:39 -07:00
page_reporting.h mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
page_vma_mapped.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
page-writeback.c bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag 2020-09-24 13:43:39 -06:00
pagewalk.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
percpu-internal.h mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-km.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-stats.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu-vm.c mm: memcg/percpu: account percpu memory to memory cgroups 2020-08-12 10:57:55 -07:00
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-17 17:34:39 +00:00
pgalloc-track.h mm: move p?d_alloc_track to separate header file 2020-08-07 11:33:26 -07:00
pgtable-generic.c mm: introduce include/linux/pgtable.h 2020-06-09 09:39:13 -07:00
process_vm_access.c mm: remove compat_process_vm_{readv,writev} 2020-10-03 00:02:15 -04:00
ptdump.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
readahead.c mm: use memalloc_nofs_save in readahead path 2020-06-02 10:59:07 -07:00
rmap.c mm/rmap: fixup copying of soft dirty and uffd ptes 2020-09-05 12:14:30 -07:00
rodata_test.c mm/rodata_test.c: fix missing function declaration 2020-08-21 09:52:53 -07:00
shmem.c mm/shmem: return head page from find_lock_entry 2020-10-13 18:38:29 -07:00
shuffle.c mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
shuffle.h mm/shuffle: remove dynamic reconfiguration 2020-08-07 11:33:29 -07:00
slab_common.c mm/slab_common.c: delete duplicated word 2020-08-12 10:57:58 -07:00
slab.c mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
slab.h mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
slob.c mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
slub.c mm: memcg/slab: uncharge during kmem_cache_free_bulk() 2020-10-13 18:38:31 -07:00
sparse-vmemmap.c mm/sparse: only sub-section aligned range would be populated 2020-08-07 11:33:27 -07:00
sparse.c arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() 2020-10-13 18:38:35 -07:00
swap_cgroup.c mm: memcontrol: make swap tracking an integral part of memory control 2020-06-03 20:09:48 -07:00
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity 2020-10-13 18:38:29 -07:00
swap.c mm: move call to compound_head() in release_pages() 2020-10-13 18:38:33 -07:00
swapfile.c mm/swapfile.c: fix potential memory leak in sys_swapon 2020-10-13 18:38:30 -07:00
truncate.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
usercopy.c mm/usercopy.c: delete duplicated word 2020-08-12 10:57:58 -07:00
userfaultfd.c mm/vmscan: protect the workingset on anonymous LRU 2020-08-12 10:57:55 -07:00
util.c arm64: mte: Tags-aware aware memcmp_pages() implementation 2020-09-04 12:46:07 +01:00
vmacache.c kernel: better document the use_mm/unuse_mm API contract 2020-06-10 19:14:18 -07:00
vmalloc.c mm/vmalloc.c: fix the comment of find_vm_area 2020-10-13 18:38:32 -07:00
vmpressure.c mm: vmpressure: use mem_cgroup_is_root API 2020-04-02 09:35:31 -07:00
vmscan.c mm/vmscan: fix comments for isolate_lru_page() 2020-10-13 18:38:34 -07:00
vmstat.c Merge branch 'simplify-do_wp_page' 2020-09-04 09:31:54 -07:00
workingset.c mm: replace hpage_nr_pages with thp_nr_pages 2020-08-14 19:56:56 -07:00
z3fold.c mm/z3fold.c: use xx_zalloc instead xx_alloc and memset 2020-10-13 18:38:34 -07:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c mm/zpool.c: delete duplicated word and fix grammar 2020-08-12 10:57:58 -07:00
zsmalloc.c mm/zsmalloc.c: fix duplicated words 2020-08-12 10:57:58 -07:00
zswap.c mm/zswap: allow setting default status, compressor and allocator in Kconfig 2020-04-07 10:43:41 -07:00