linux

History

Mel Gorman 2876592f23 mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT should_continue_reclaim() for reclaim/compaction allows scanning to continue even if pages are not being reclaimed until the full list is scanned. In terms of allocation success, this makes sense but potentially it introduces unwanted latency for high-order allocations such as transparent hugepages and network jumbo frames that would prefer to fail the allocation attempt and fallback to order-0 pages. Worse, there is a potential that the full LRU scan will clear all the young bits, distort page aging information and potentially push pages into swap that would have otherwise remained resident. This patch will stop reclaim/compaction if no pages were reclaimed in the last SWAP_CLUSTER_MAX pages that were considered. For allocations such as hugetlbfs that use __GFP_REPEAT and have fewer fallback options, the full LRU list may still be scanned. Order-0 allocation should not be affected because RECLAIM_MODE_COMPACTION is not set so the following avoids the gfp_mask being examined: if (!(sc->reclaim_mode & RECLAIM_MODE_COMPACTION)) return false; A tool was developed based on ftrace that tracked the latency of high-order allocations while transparent hugepage support was enabled and three benchmarks were run. The "fix-infinite" figures are 2.6.38-rc4 with Johannes's patch "vmscan: fix zone shrinking exit when scan work is done" applied. STREAM Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 10298 10229 1 :: Min 0.4560 0.4640 1 :: Mean 1.0589 1.0183 1 :: Max 14.5990 11.7510 1 :: Stddev 0.5208 0.4719 2 :: Count 2 1 2 :: Min 1.8610 3.7240 2 :: Mean 3.4325 3.7240 2 :: Max 5.0040 3.7240 2 :: Stddev 1.5715 0.0000 9 :: Count 111696 111694 9 :: Min 0.5230 0.4110 9 :: Mean 10.5831 10.5718 9 :: Max 38.4480 43.2900 9 :: Stddev 1.1147 1.1325 Mean time for order-1 allocations is reduced. order-2 looks increased but with so few allocations, it's not particularly significant. THP mean allocation latency is also reduced. That said, allocation time varies so significantly that the reductions are within noise. Max allocation time is reduced by a significant amount for low-order allocations but reduced for THP allocations which presumably are now breaking before reclaim has done enough work. SysBench Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 15745 15677 1 :: Min 0.4250 0.4550 1 :: Mean 1.1023 1.0810 1 :: Max 14.4590 10.8220 1 :: Stddev 0.5117 0.5100 2 :: Count 1 1 2 :: Min 3.0040 2.1530 2 :: Mean 3.0040 2.1530 2 :: Max 3.0040 2.1530 2 :: Stddev 0.0000 0.0000 9 :: Count 2017 1931 9 :: Min 0.4980 0.7480 9 :: Mean 10.4717 10.3840 9 :: Max 24.9460 26.2500 9 :: Stddev 1.1726 1.1966 Again, mean time for order-1 allocations is reduced while order-2 allocations are too few to draw conclusions from. The mean time for THP allocations is also slightly reduced albeit the reductions are within varianes. Once again, our maximum allocation time is significantly reduced for low-order allocations and slightly increased for THP allocations. Anon stream mmap reference Highorder Allocation Latency Statistics 1 :: Count 1376 1790 1 :: Min 0.4940 0.5010 1 :: Mean 1.0289 0.9732 1 :: Max 6.2670 4.2540 1 :: Stddev 0.4142 0.2785 2 :: Count 1 - 2 :: Min 1.9060 - 2 :: Mean 1.9060 - 2 :: Max 1.9060 - 2 :: Stddev 0.0000 - 9 :: Count 11266 11257 9 :: Min 0.4990 0.4940 9 :: Mean 27250.4669 24256.1919 9 :: Max 11439211.0000 6008885.0000 9 :: Stddev 226427.4624 186298.1430 This benchmark creates one thread per CPU which references an amount of anonymous memory 1.5 times the size of physical RAM. This pounds swap quite heavily and is intended to exercise THP a bit. Mean allocation time for order-1 is reduced as before. It's also reduced for THP allocations but the variations here are pretty massive due to swap. As before, maximum allocation times are significantly reduced. Overall, the patch reduces the mean and maximum allocation latencies for the smaller high-order allocations. This was with Slab configured so it would be expected to be more significant with Slub which uses these size allocations more aggressively. The mean allocation times for THP allocations are also slightly reduced. The maximum latency was slightly increased as predicted by the comments due to reclaim/compaction breaking early. However, workloads care more about the latency of lower-order allocations than THP so it's an acceptable trade-off. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2011-02-25 15:07:36 -08:00
..
backing-dev.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-10-26 17:58:44 -07:00
bootmem.c	x86, memblock: Replace e820_/_early string with memblock_	2010-08-27 11:13:47 -07:00
bounce.c	bounce: call flush_dcache_page() after bounce_copy_vec()	2010-09-09 18:57:25 -07:00
compaction.c	mm: compaction: prevent division-by-zero during user-requested compaction	2011-01-20 17:02:05 -08:00
debug-pagealloc.c	generic debug pagealloc	2009-04-01 08:59:13 -07:00
dmapool.c	mm/dmapool.c: use TASK_UNINTERRUPTIBLE in dma_pool_alloc()	2011-01-13 17:32:48 -08:00
fadvise.c	readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM	2010-03-06 11:26:25 -08:00
failslab.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
filemap_xip.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
filemap.c	mm: remove likely() from grab_cache_page_write_begin()	2011-01-13 17:32:36 -08:00
fremap.c	Avoid pgoff overflow in remap_file_pages	2010-09-25 09:34:58 -07:00
highmem.c	mm,x86: fix kmap_atomic_push vs ioremap_32.c	2010-10-27 18:03:05 -07:00
huge_memory.c	thp: prevent hugepages during args/env copying into the user stack	2011-02-15 15:21:11 -08:00
hugetlb.c	hugetlb: fix handling of parse errors in sysfs	2011-01-13 17:32:49 -08:00
hwpoison-inject.c	HWPOISON, hugetlb: support hwpoison injection for hugepage	2010-08-11 09:23:11 +02:00
init-mm.c	mm: provide init_mm mm_context initializer	2010-08-09 20:44:54 -07:00
internal.h	Revert "mm: batch activate_page() to reduce lock contention"	2011-01-17 14:42:19 -08:00
Kconfig	mm: compaction: don't depend on HUGETLB_PAGE	2011-01-26 10:50:02 +10:00
Kconfig.debug	trivial: improve help text for mm debug config options	2009-09-21 15:14:57 +02:00
kmemcheck.c	kmemcheck: Fix build errors due to missing slab.h	2010-03-30 22:02:32 +09:00
kmemleak-test.c	kmemleak: remove memset by using kzalloc	2011-01-27 18:31:51 +00:00
kmemleak.c	kmemleak: Allow kmemleak metadata allocations to fail	2011-01-27 18:32:06 +00:00
ksm.c	ksm: drain pagevecs to lru	2011-01-13 17:32:49 -08:00
maccess.c	MN10300: Save frame pointer in thread_info struct rather than global var	2010-10-27 17:29:01 +01:00
madvise.c	thp: khugepaged: make khugepaged aware about madvise	2011-01-13 17:32:47 -08:00
Makefile	thp: transparent hugepage core	2011-01-13 17:32:42 -08:00
memblock.c	memblock: don't adjust size in memblock_find_base()	2011-02-11 16:12:20 -08:00
memcontrol.c	memcg: fix event counting breakage from recent THP update	2011-02-02 16:03:19 -08:00
memory_hotplug.c	Merge branch 'slub/hotplug' into slab/urgent	2011-01-15 13:28:17 +02:00
memory-failure.c	thp: fix unsuitable behavior for hwpoisoned tail page	2011-02-02 16:03:19 -08:00
memory.c	mm: prevent concurrent unmap_mapping_range() on the same inode	2011-02-23 19:52:52 -08:00
mempolicy.c	thp: add numa awareness to hugepage allocations	2011-01-13 17:32:45 -08:00
mempool.c	mm: remove broken 'kzalloc' mempool	2009-09-22 07:17:35 -07:00
migrate.c	mm: grab rcu read lock in move_pages()	2011-02-25 15:07:36 -08:00
mincore.c	thp: mincore transparent hugepage support	2011-01-13 17:32:44 -08:00
mlock.c	mlock: operate on any regions with protection != PROT_NONE	2011-02-02 10:20:50 +11:00
mm_init.c
mmap.c	brk: fix min_brk lower bound computation for COMPAT_BRK	2011-01-13 17:32:48 -08:00
mmu_context.c	exit: fix oops in sync_mm_rss	2010-03-24 16:31:21 -07:00
mmu_notifier.c	thp: mmu_notifier_test_young	2011-01-13 17:32:46 -08:00
mmzone.c	mm: page allocator: adjust the per-cpu counter threshold when memory is low	2011-01-13 17:32:31 -08:00
mprotect.c	thp: mprotect: transparent huge page support	2011-01-13 17:32:44 -08:00
mremap.c	mm: fix possible cause of a page_mapped BUG	2011-02-23 21:55:06 -08:00
msync.c	sanitize vfs_fsync calling conventions	2010-05-21 18:31:21 -04:00
nommu.c	mlock: do not hold mmap_sem for extended periods of time	2011-01-13 17:32:36 -08:00
oom_kill.c	oom: kill all threads sharing oom killed task's mm	2010-10-26 16:52:05 -07:00
page_alloc.c	mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator	2011-01-26 10:50:01 +10:00
page_cgroup.c	kmemleak: Annotate false positive in init_section_page_cgroup()	2010-07-19 11:54:14 +01:00
page_io.c	block: unify flags for struct bio and struct request	2010-08-07 18:20:39 +02:00
page_isolation.c	mm: page_isolation: codeclean fix comment and rm unneeded val init	2010-10-26 16:52:11 -07:00
page-writeback.c	writeback: avoid unnecessary determine_dirtyable_memory call	2011-01-13 17:32:38 -08:00
pagewalk.c	thp: split_huge_page_mm/vma	2011-01-13 17:32:41 -08:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	mm: remove gfp mask from pcpu_get_vm_areas	2011-01-13 17:32:34 -08:00
percpu.c	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2011-01-13 10:05:56 -08:00
pgtable-generic.c	mm/pgtable-generic.c: fix CONFIG_SWAP=n build	2011-01-26 10:49:58 +10:00
prio_tree.c
quicklist.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
readahead.c	readahead.c: fix comment	2010-05-25 08:07:00 -07:00
rmap.c	memcg: create extensible page stat update routines	2011-01-13 17:32:50 -08:00
shmem.c	fs: icache RCU free inodes	2011-01-07 17:50:26 +11:00
slab.c	mm/slab.c: make local symbols static	2011-01-15 13:28:36 +02:00
slob.c	kernel: kmem_ptr_validate considered harmful	2011-01-07 17:50:16 +11:00
slub.c	Merge branch 'slub/hotplug' into slab/urgent	2011-01-15 13:28:17 +02:00
sparse-vmemmap.c	tree-wide: fix comment/printk typos	2010-11-01 15:38:34 -04:00
sparse.c	thp: remove PG_buddy	2011-01-13 17:32:43 -08:00
swap_state.c	thp: split_huge_page paging	2011-01-13 17:32:41 -08:00
swap.c	Revert "mm: simplify code of swap.c"	2011-01-17 14:42:34 -08:00
swapfile.c	mm: fix refcounting in swapon	2011-02-24 08:55:01 -08:00
thrash.c	mm: pass mm to grab_swap_token	2009-06-23 12:50:05 -07:00
truncate.c	mm: fix truncate_setsize() comment	2011-01-20 17:02:06 -08:00
util.c	kernel: kmem_ptr_validate considered harmful	2011-01-07 17:50:16 +11:00
vmalloc.c	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6	2011-01-13 20:15:35 -08:00
vmscan.c	mm: vmscan: stop reclaim/compaction earlier due to insufficient progress if !__GFP_REPEAT	2011-02-25 15:07:36 -08:00
vmstat.c	thp: transparent hugepage vmstat	2011-01-13 17:32:43 -08:00