linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 14:12:06 +00:00

History

Michal Hocko cd04ae1e2d mm, oom: do not rely on TIF_MEMDIE for memory reserves access For ages we have been relying on TIF_MEMDIE thread flag to mark OOM victims and then, among other things, to give these threads full access to memory reserves. There are few shortcomings of this implementation, though. First of all and the most serious one is that the full access to memory reserves is quite dangerous because we leave no safety room for the system to operate and potentially do last emergency steps to move on. Secondly this flag is per task_struct while the OOM killer operates on mm_struct granularity so all processes sharing the given mm are killed. Giving the full access to all these task_structs could lead to a quick memory reserves depletion. We have tried to reduce this risk by giving TIF_MEMDIE only to the main thread and the currently allocating task but that doesn't really solve this problem while it surely opens up a room for corner cases - e.g. GFP_NO{FS,IO} requests might loop inside the allocator without access to memory reserves because a particular thread was not the group leader. Now that we have the oom reaper and that all oom victims are reapable after `1b51e65eab` ("oom, oom_reaper: allow to reap mm shared by the kthreads") we can be more conservative and grant only partial access to memory reserves because there are reasonable chances of the parallel memory freeing. We still want some access to reserves because we do not want other consumers to eat up the victim's freed memory. oom victims will still contend with __GFP_HIGH users but those shouldn't be so aggressive to starve oom victims completely. Introduce ALLOC_OOM flag and give all tsk_is_oom_victim tasks access to the half of the reserves. This makes the access to reserves independent on which task has passed through mark_oom_victim. Also drop any usage of TIF_MEMDIE from the page allocator proper and replace it by tsk_is_oom_victim as well which will make page_alloc.c completely TIF_MEMDIE free finally. CONFIG_MMU=n doesn't have oom reaper so let's stick to the original ALLOC_NO_WATERMARKS approach. There is a demand to make the oom killer memcg aware which will imply many tasks killed at once. This change will allow such a usecase without worrying about complete memory reserves depletion. Link: http://lkml.kernel.org/r/20170810075019.28998-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-09-06 17:27:30 -07:00
..
kasan	Merge branch 'linus' into locking/core, to pick up fixes	2017-08-10 12:20:53 +02:00
backing-dev.c	bdi: Drop 'parent' argument from bdi_register[_va]()	2017-04-20 12:09:55 -06:00
balloon_compaction.c	mm/balloon_compaction.c: don't zero ballooned pages	2017-08-10 15:54:07 -07:00
bootmem.c	mm/bootmem.c: cosmetic improvement of code readability	2017-02-22 16:41:29 -08:00
cleancache.c	fs: switch ->s_uuid to uuid_t	2017-06-05 16:59:12 +02:00
cma_debug.c	mm/cma_debug.c: fix stack corruption due to sprintf usage	2017-08-18 15:32:02 -07:00
cma.c	cma: fix calculation of aligned offset	2017-07-10 16:32:32 -07:00
cma.h	cma: Store a name in the cma structure	2017-04-18 20:41:12 +02:00
compaction.c	mm, compaction: skip over holes in __reset_isolation_suitable	2017-07-06 16:24:32 -07:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
debug.c	mm: make tlb_flush_pending global	2017-08-10 15:54:07 -07:00
dmapool.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
early_ioremap.c	x86/mm: Add support to access boot related data in the clear	2017-07-18 11:38:02 +02:00
fadvise.c	mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED	2016-12-20 09:48:46 -08:00
failslab.c
filemap.c	mm: use find_get_pages_range() in filemap_range_has_page()	2017-09-06 17:27:27 -07:00
frame_vector.c	treewide: use kv[mz]alloc* rather than opencoded variants	2017-05-08 17:15:13 -07:00
frontswap.c	mm, frontswap: convert frontswap_enabled to static key	2016-07-26 16:19:19 -07:00
gup.c	mm/gup: make __gup_device_* require THP	2017-09-06 17:27:26 -07:00
highmem.c	mm/highmem: make nr_free_highpages() handles all highmem zones by itself	2016-05-19 19:12:14 -07:00
huge_memory.c	mm, THP, swap: support splitting THP for THP swap out	2017-09-06 17:27:28 -07:00
hugetlb_cgroup.c	mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size	2016-05-20 17:58:30 -07:00
hugetlb.c	mm, hugetlb: do not allocate non-migrateable gigantic pages from movable zones	2017-09-06 17:27:29 -07:00
hwpoison-inject.c	mm: hwpoison: call shake_page() unconditionally	2017-05-03 15:52:12 -07:00
init-mm.c	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks	2016-11-22 11:49:48 -06:00
internal.h	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
interval_tree.c
Kconfig	mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups	2017-09-06 17:27:29 -07:00
Kconfig.debug	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
khugepaged.c	mm: make PR_SET_THP_DISABLE immediately active	2017-07-10 16:32:31 -07:00
kmemcheck.c	mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU	2017-04-18 11:42:36 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects	2017-07-06 16:24:34 -07:00
ksm.c	mm/ksm.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
list_lru.c	mm/list_lru.c: fix list_lru_count_node() to be race free	2017-07-10 16:32:33 -07:00
maccess.c	x86: remove more uaccess_32.h complexity	2016-05-22 17:21:27 -07:00
madvise.c	mm, madvise: ensure poisoned pages are removed from per-cpu lists	2017-08-31 16:33:15 -07:00
Makefile	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
memblock.c	mm/memblock.c: reversed logic in memblock_discard()	2017-08-25 16:12:46 -07:00
memcontrol.c	memcg, THP, swap: make mem_cgroup_swapout() support THP	2017-09-06 17:27:28 -07:00
memory_hotplug.c	mm, memory_hotplug: get rid of zonelists_mutex	2017-09-06 17:27:26 -07:00
memory-failure.c	x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages	2017-08-17 10:30:49 +02:00
memory.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
mempolicy.c	mm/mempolicy: fix use after free when calling get_mempolicy	2017-08-18 15:32:02 -07:00
mempool.c	sched/wait: Rename wait_queue_t => wait_queue_entry_t	2017-06-20 12:18:27 +02:00
memtest.c
migrate.c	Sanitize 'move_pages()' permission checks	2017-08-20 13:26:27 -07:00
mincore.c	mm: remove shmem_mapping() shmem_zero_setup() duplicates	2017-02-24 17:46:56 -08:00
mlock.c	mlock: fix mlock count can not decrease in race condition	2017-06-02 15:07:38 -07:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds	2017-09-06 17:27:29 -07:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_notifier.c	mm/mmu_notifier: kill invalidate_page	2017-08-31 16:13:00 -07:00
mmzone.c	mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()	2017-02-22 16:41:29 -08:00
mprotect.c	mm: migrate: prevent racy access to tlb_flush_pending	2017-08-10 15:54:07 -07:00
mremap.c	mm/mremap: fail map duplication attempts for private mappings	2017-09-06 17:27:26 -07:00
msync.c
nobootmem.c	mm: discard memblock data later	2017-08-18 15:32:01 -07:00
nommu.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
oom_kill.c	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
page_alloc.c	mm, oom: do not rely on TIF_MEMDIE for memory reserves access	2017-09-06 17:27:30 -07:00
page_counter.c
page_ext.c	mm, page_ext: periodically reschedule during page_ext_init()	2017-09-06 17:27:26 -07:00
page_idle.c	mm/page_idle.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
page_io.c	mm: test code to write THP to swap device as a whole	2017-09-06 17:27:28 -07:00
page_isolation.c	mm: unify new_node_page and alloc_migrate_target	2017-07-10 16:32:31 -07:00
page_owner.c	mm, page_owner: don't grab zone->lock for init_pages_in_zone()	2017-09-06 17:27:26 -07:00
page_poison.c	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
page_vma_mapped.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
page-writeback.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
pagewalk.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
percpu-internal.h	percpu: fix early calls for spinlock in pcpu_stats	2017-06-21 13:53:52 -04:00
percpu-km.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu-stats.c	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
percpu-vm.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu.c	percpu: resolve err may not be initialized in pcpu_alloc	2017-06-21 12:00:45 -04:00
pgtable-generic.c	mm: convert generic code to 5-level paging	2017-03-09 11:48:47 -08:00
process_vm_access.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>	2017-03-02 08:42:28 +01:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	mm/rmap: update to new mmu_notifier semantic v2	2017-08-31 16:12:59 -07:00
rodata_test.c	mm: remove rodata_test_data export, add pr_fmt	2017-05-03 15:52:09 -07:00
shmem.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
slab_common.c	mm: allow slab_nomerge to be set at build time	2017-07-06 16:24:31 -07:00
slab.c	mm: memcontrol: account slab stats per lruvec	2017-07-06 16:24:35 -07:00
slab.h	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slob.c	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slub.c	mm/slub.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
sparse-vmemmap.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
sparse.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
swap_cgroup.c	mm, THP, swap: delay splitting THP during swap out	2017-07-06 16:24:31 -07:00
swap_slots.c	mm/swap_slots.c: don't disable preemption while taking the per-CPU cache	2017-07-10 16:32:32 -07:00
swap_state.c	mm, swap: add sysfs interface for VMA based swap readahead	2017-09-06 17:27:29 -07:00
swap.c	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
swapfile.c	mm, swap: don't use VMA based swap readahead if HDD is used as swap	2017-09-06 17:27:30 -07:00
truncate.c	mm/truncate.c: fix THP handling in invalidate_mapping_pages()	2017-07-10 16:32:32 -07:00
usercopy.c	mm/usercopy: Drop extra is_vmalloc_or_module() check	2017-04-05 12:30:18 -07:00
userfaultfd.c	userfaultfd: shmem: wire up shmem_mfill_zeropage_pte	2017-09-06 17:27:28 -07:00
util.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
vmacache.c	sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
vmalloc.c	mm/vmalloc.c: don't reinvent the wheel but use existing llist API	2017-09-06 17:27:29 -07:00
vmpressure.c	mm, vmpressure: pass-through notification support	2017-07-10 16:32:31 -07:00
vmscan.c	mm, THP, swap: add THP swapping out fallback counting	2017-09-06 17:27:28 -07:00
vmstat.c	mm, swap: add swap readahead hit statistics	2017-09-06 17:27:29 -07:00
workingset.c	mm: memcontrol: per-lruvec stats infrastructure	2017-07-06 16:24:35 -07:00
z3fold.c	z3fold: use per-cpu unbuddied lists	2017-09-06 17:27:30 -07:00
zbud.c
zpool.c
zsmalloc.c	zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse	2017-09-06 17:27:26 -07:00
zswap.c	mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()	2017-07-06 16:24:35 -07:00