linux

History

Michal Hocko 7854ea6c28 mm: consider compaction feedback also for costly allocation PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside should_reclaim_retry currently where we decide to not retry after at least order worth of pages were reclaimed or the watermark check for at least one zone would succeed after reclaiming all pages if the reclaim hasn't made any progress. Compaction feedback is mostly ignored and we just try to make sure that the compaction did at least something before giving up. The first condition was added by `a41f24ea9f` ("page allocator: smarter retry of costly-order allocations) and it assumed that lumpy reclaim could have created a page of the sufficient order. Lumpy reclaim, has been removed quite some time ago so the assumption doesn't hold anymore. Remove the check for the number of reclaimed pages and rely on the compaction feedback solely. should_reclaim_retry now only makes sure that we keep retrying reclaim for high order pages only if they are hidden by watermaks so order-0 reclaim makes really sense. should_compact_retry now keeps retrying even for the costly allocations. The number of retries is reduced wrt. !costly requests because they are less important and harder to grant and so their pressure shouldn't cause contention for other requests or cause an over reclaim. We also do not reset no_progress_loops for costly request to make sure we do not keep reclaiming too agressively. This has been tested by running a process which fragments memory: - compact memory - mmap large portion of the memory (1920M on 2GRAM machine with 2G of swapspace) - MADV_DONTNEED single page in PAGE_SIZE((1UL<<MAX_ORDER)-1) steps until certain amount of memory is freed (250M in my test) and reduce the step to (step / 2) + 1 after reaching the end of the mapping - then run a script which populates the page cache 2G (MemTotal) from /dev/zero to a new file And then tries to allocate nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(21024)}' /proc/meminfo) huge pages. root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M Node 0, zone DMA 31 28 31 10 2 0 2 1 2 3 1 Node 0, zone DMA32 437 319 171 50 28 25 20 16 16 14 437 * This is the /proc/buddyinfo after the compaction Done fragmenting. size=2013265920 freed=262144000 Node 0, zone DMA 165 48 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 35109 14575 185 51 41 12 6 0 0 0 0 * /proc/buddyinfo after memory got fragmented Executing "/root/alloc_hugepages.sh" Eating some pagecache 508623+0 records in 508623+0 records out 2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s Node 0, zone DMA 3 5 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 111 344 153 20 24 10 3 0 0 0 0 * /proc/buddyinfo after page cache got eaten Trying to allocate 129 129 * 129 hugepages requested and all of them granted. Node 0, zone DMA 3 5 3 1 2 0 2 2 2 2 0 Node 0, zone DMA32 127 97 30 99 11 6 2 1 4 0 0 * /proc/buddyinfo after hugetlb allocation. 10 runs will behave as follows: Trying to allocate 130 130 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 -- Trying to allocate 132 132 -- Trying to allocate 129 129 -- Trying to allocate 128 128 -- Trying to allocate 129 129 So basically 100% success for all 10 attempts. Without the patch numbers looked much worse: Trying to allocate 128 12 -- Trying to allocate 129 14 -- Trying to allocate 129 7 -- Trying to allocate 129 16 -- Trying to allocate 129 30 -- Trying to allocate 129 38 -- Trying to allocate 129 19 -- Trying to allocate 129 37 -- Trying to allocate 129 28 -- Trying to allocate 129 37 Just for completness the base kernel without oom detection rework looks as follows: Trying to allocate 127 30 -- Trying to allocate 129 12 -- Trying to allocate 129 52 -- Trying to allocate 128 32 -- Trying to allocate 129 12 -- Trying to allocate 129 10 -- Trying to allocate 129 32 -- Trying to allocate 128 14 -- Trying to allocate 128 16 -- Trying to allocate 129 8 As we can see the success rate is much more volatile and smaller without this patch. So the patch not only makes the retry logic for costly requests more sensible the success rate is even higher. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2016-05-20 17:58:30 -07:00
..
kasan	mm, kasan: fix compilation for CONFIG_SLAB	2016-04-01 17:03:37 -05:00
backing-dev.c	mm: throttle on IO only when there are too many dirty and writeback pages	2016-05-20 17:58:30 -07:00
balloon_compaction.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2016-03-17 21:38:27 -07:00
bootmem.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
cleancache.c	cleancache: constify cleancache_ops structure	2016-01-27 09:09:57 -05:00
cma_debug.c	mm/cma_debug: correct size input to bitmap function	2015-07-17 16:39:54 -07:00
cma.c	mm/cma.c: suppress warning	2015-11-05 19:34:48 -08:00
cma.h	mm: cma: mark cma_bitmap_maxno() inline in header	2015-08-14 15:56:32 -07:00
compaction.c	mm, compaction: distinguish between full and partial COMPACT_COMPLETE	2016-05-20 17:58:30 -07:00
debug_page_ref.c	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
debug.c	mm: introduce page reference manipulation functions	2016-03-17 15:09:34 -07:00
dmapool.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
early_ioremap.c	mm/early_ioremap: use offset_in_page macro	2015-11-05 19:34:48 -08:00
fadvise.c	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:41:08 -07:00
failslab.c	mm: fault-inject take over bootstrap kmem_cache check	2016-03-15 16:55:16 -07:00
filemap.c	mm: filemap: only do access activations on reads	2016-05-20 17:58:30 -07:00
frame_vector.c	mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm	2016-02-16 10:11:12 +01:00
frontswap.c	frontswap: allow multiple backends	2015-06-24 17:49:45 -07:00
gup.c	Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-04-14 19:31:34 -07:00
highmem.c	mm/highmem: make nr_free_highpages() handles all highmem zones by itself	2016-05-19 19:12:14 -07:00
huge_memory.c	huge mm: move_huge_pmd does not need new_vma	2016-05-19 19:12:14 -07:00
hugetlb_cgroup.c	mm: make compound_head() robust	2015-11-06 17:50:42 -08:00
hugetlb.c	mm/hugetlb: add same zone check in pfn_range_valid_gigantic()	2016-05-19 19:12:14 -07:00
hwpoison-inject.c	hwpoison: use page_cgroup_ino for filtering by memcg	2015-09-10 13:29:01 -07:00
init-mm.c
internal.h	mm, compaction: distinguish between full and partial COMPACT_COMPLETE	2016-05-20 17:58:30 -07:00
interval_tree.c	mm: replace vma->sharead.linear with vma->shared	2015-02-10 14:30:31 -08:00
Kconfig	memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE	2016-05-19 19:12:14 -07:00
Kconfig.debug	mm/page_ref: add tracepoint to track down page reference manipulation	2016-03-17 15:09:34 -07:00
kmemcheck.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak-test.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
kmemleak.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
ksm.c	ksm: fix conflict between mmput and scan_get_next_rmap_item	2016-05-12 15:52:50 -07:00
list_lru.c	mm: memcontrol: move kmem accounting code to CONFIG_MEMCG	2016-01-20 17:09:18 -08:00
maccess.c	mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe	2015-11-05 19:34:48 -08:00
madvise.c	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:41:08 -07:00
Makefile	mm, kasan: SLAB support	2016-03-25 16:37:42 -07:00
memblock.c	mm: coalesce split strings	2016-03-17 15:09:34 -07:00
memcontrol.c	oom, oom_reaper: try to reap tasks which skip regular OOM killer path	2016-05-19 19:12:14 -07:00
memory_hotplug.c	memory_hotplug: introduce memhp_default_state= command line parameter	2016-05-19 19:12:14 -07:00
memory-failure.c	mm/memory-failure: fix race with compound page split/merge	2016-04-28 19:34:04 -07:00
memory.c	mm: thp: calculate the mapcount correctly for THP pages during WP faults	2016-05-12 15:52:50 -07:00
mempolicy.c	mm, page_alloc: avoid looking up the first zone in a zonelist twice	2016-05-19 19:12:14 -07:00
mempool.c	mm, kasan: add GFP flags to KASAN API	2016-03-25 16:37:42 -07:00
memtest.c	memtest: remove unused header files	2015-09-08 15:35:28 -07:00
migrate.c	mm: use __SetPageSwapBacked and dont ClearPageSwapBacked	2016-05-19 19:12:14 -07:00
mincore.c	mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage	2016-04-04 10:41:08 -07:00
mlock.c	mm: fix mlock accouting	2016-01-21 17:20:51 -08:00
mm_init.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
mmap.c	mm/mmap: kill hook arch_rebalance_pgtables()	2016-05-19 19:12:14 -07:00
mmu_context.c	mm/mmu_context, sched/core: Fix mmu_context.h assumption	2016-04-28 11:44:19 +02:00
mmu_notifier.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
mmzone.c	mm, page_alloc: inline the fast path of the zonelist iterator	2016-05-19 19:12:14 -07:00
mprotect.c	mm/mprotect.c: don't imply PROT_EXEC on non-exec fs	2016-03-22 15:36:02 -07:00
mremap.c	huge pagecache: extend mremap pmd rmap lockout to files	2016-05-19 19:12:14 -07:00
msync.c	mm/msync: use offset_in_page macro	2015-11-05 19:34:48 -08:00
nobootmem.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
nommu.c	Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2016-04-14 19:31:34 -07:00
oom_kill.c	mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper	2016-05-19 19:12:14 -07:00
page_alloc.c	mm: consider compaction feedback also for costly allocation	2016-05-20 17:58:30 -07:00
page_counter.c	mm: page_counter: let page_counter_try_charge() return bool	2015-11-05 19:34:48 -08:00
page_ext.c	mm/page_poisoning.c: allow for zero poisoning	2016-03-15 16:55:16 -07:00
page_idle.c	mm: add page_check_address_transhuge() helper	2016-01-15 17:56:32 -08:00
page_io.c	Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-05-17 15:05:23 -07:00
page_isolation.c	mm/memory_hotplug: add comment to some functions related to memory hotplug	2016-05-19 19:12:14 -07:00
page_owner.c	mm, page_alloc: inline pageblock lookup in page free fast paths	2016-05-19 19:12:14 -07:00
page_poison.c	mm/page_poisoning.c: allow for zero poisoning	2016-03-15 16:55:16 -07:00
page-writeback.c	mm/writeback: correct dirty page calculation for highmem	2016-05-19 19:12:14 -07:00
pagewalk.c	thp: rename split_huge_page_pmd() to split_huge_pmd()	2016-01-15 17:56:32 -08:00
percpu-km.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
percpu-vm.c	percpu: move region iterations out of pcpu_[de]populate_chunk()	2014-09-02 14:46:02 -04:00
percpu.c	mm: percpu: use pr_fmt to prefix output	2016-03-17 15:09:34 -07:00
pgtable-generic.c	mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range	2016-03-17 15:09:34 -07:00
process_vm_access.c	mm/gup: Introduce get_user_pages_remote()	2016-02-16 10:04:09 +01:00
quicklist.c	fix Christoph's email addresses	2016-03-17 15:09:34 -07:00
readahead.c	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:41:08 -07:00
rmap.c	mm: use __SetPageSwapBacked and dont ClearPageSwapBacked	2016-05-19 19:12:14 -07:00
shmem.c	tmpfs: mem_cgroup charge fault to vm_mm not current mm	2016-05-19 19:12:14 -07:00
slab_common.c	mm, kasan: add GFP flags to KASAN API	2016-03-25 16:37:42 -07:00
slab.c	include/linux/nodemask.h: create next_node_in() helper	2016-05-19 19:12:14 -07:00
slab.h	mm, kasan: add GFP flags to KASAN API	2016-03-25 16:37:42 -07:00
slob.c	mm: slab: free kmem_cache_node after destroy sysfs file	2016-02-18 16:23:24 -08:00
slub.c	mm: rename _count, field of the struct page, to _refcount	2016-05-19 19:12:14 -07:00
sparse-vmemmap.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
sparse.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_cgroup.c	mm: convert printk(KERN_<LEVEL> to pr_<level>	2016-03-17 15:09:34 -07:00
swap_state.c	mm: use __SetPageSwapBacked and dont ClearPageSwapBacked	2016-05-19 19:12:14 -07:00
swap.c	thp: keep huge zero page pinned until tlb flush	2016-04-28 19:34:04 -07:00
swapfile.c	mm: thp: calculate the mapcount correctly for THP pages during WP faults	2016-05-12 15:52:50 -07:00
truncate.c	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:41:08 -07:00
userfaultfd.c	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	2016-04-04 10:41:08 -07:00
util.c	mm: uninline page_mapped()	2016-05-19 19:12:14 -07:00
vmacache.c	mm/vmacache: inline vmacache_valid_mm()	2015-11-05 19:34:48 -08:00
vmalloc.c	mm/vmalloc: use PAGE_ALIGNED() to check PAGE_SIZE alignment	2016-03-17 15:09:34 -07:00
vmpressure.c	mm/vmpressure.c: fix subtree pressure detection	2016-02-03 08:28:43 -08:00
vmscan.c	mm, oom: rework oom detection	2016-05-20 17:58:30 -07:00
vmstat.c	mm, page_alloc: inline pageblock lookup in page free fast paths	2016-05-19 19:12:14 -07:00
workingset.c	mm: workingset: make shadow node shrinker memcg aware	2016-03-17 15:09:34 -07:00
zbud.c	mm/zbud.c: use list_last_entry() instead of list_tail_entry()	2016-01-15 11:40:52 -08:00
zpool.c	mm: zsmalloc: constify struct zs_pool name	2015-11-06 17:50:42 -08:00
zsmalloc.c	zsmalloc: fix zs_can_compact() integer overflow	2016-05-09 17:40:59 -07:00
zswap.c	mm/zswap: provide unique zpool name	2016-05-05 17:38:53 -07:00