linux/mm
KOSAKI Motohiro 9d8cebd4bc mm: fix mbind vma merge problem
Strangely, current mbind() doesn't merge vma with neighbor vma although it's possible.
Unfortunately, many vma can reduce performance...

This patch fixes it.

    reproduced program
    ----------------------------------------------------------------
     #include <numaif.h>
     #include <numa.h>
     #include <sys/mman.h>
     #include <stdio.h>
     #include <unistd.h>
     #include <stdlib.h>
     #include <string.h>

    static unsigned long pagesize;

    int main(int argc, char** argv)
    {
    	void* addr;
    	int ch;
    	int node;
    	struct bitmask *nmask = numa_allocate_nodemask();
    	int err;
    	int node_set = 0;
    	char buf[128];

    	while ((ch = getopt(argc, argv, "n:")) != -1){
    		switch (ch){
    		case 'n':
    			node = strtol(optarg, NULL, 0);
    			numa_bitmask_setbit(nmask, node);
    			node_set = 1;
    			break;
    		default:
    			;
    		}
    	}
    	argc -= optind;
    	argv += optind;

    	if (!node_set)
    		numa_bitmask_setbit(nmask, 0);

    	pagesize = getpagesize();

    	addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
    		    MAP_ANON|MAP_PRIVATE, 0, 0);
    	if (addr == MAP_FAILED)
    		perror("mmap "), exit(1);

    	fprintf(stderr, "pid = %d \n" "addr = %p\n", getpid(), addr);

    	/* make page populate */
    	memset(addr, 0, pagesize*3);

    	/* first mbind */
    	err = mbind(addr+pagesize, pagesize, MPOL_BIND, nmask->maskp,
    		    nmask->size, MPOL_MF_MOVE_ALL);
    	if (err)
    		error("mbind1 ");

    	/* second mbind */
    	err = mbind(addr, pagesize*3, MPOL_DEFAULT, NULL, 0, 0);
    	if (err)
    		error("mbind2 ");

    	sprintf(buf, "cat /proc/%d/maps", getpid());
    	system(buf);

    	return 0;
    }
    ----------------------------------------------------------------

result without this patch

	addr = 0x7fe26ef09000
	[snip]
	7fe26ef09000-7fe26ef0a000 rw-p 00000000 00:00 0
	7fe26ef0a000-7fe26ef0b000 rw-p 00000000 00:00 0
	7fe26ef0b000-7fe26ef0c000 rw-p 00000000 00:00 0
	7fe26ef0c000-7fe26ef0d000 rw-p 00000000 00:00 0

	=> 0x7fe26ef09000-0x7fe26ef0c000 have three vmas.

result with this patch

	addr = 0x7fc9ebc76000
	[snip]
	7fc9ebc76000-7fc9ebc7a000 rw-p 00000000 00:00 0
	7fffbe690000-7fffbe6a5000 rw-p 00000000	00:00 0	[stack]

	=> 0x7fc9ebc76000-0x7fc9ebc7a000 have only one vma.

[minchan.kim@gmail.com: fix file offset passed to vma_merge()]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:25 -08:00
..
backing-dev.c flusher: Fix PF_FROZEN race 2009-12-03 13:49:43 +01:00
bootmem.c x86: Make 64 bit use early_res instead of bootmem before slab 2010-02-12 09:41:59 -08:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c failslab: add ability to filter slab caches 2010-02-26 19:19:39 +02:00
filemap_xip.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
filemap.c mm: use rlimit helpers 2010-03-06 11:26:24 -08:00
fremap.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
highmem.c highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE 2009-11-10 04:15:47 +01:00
hugetlb.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2010-03-01 09:15:15 -08:00
hwpoison-inject.c HWPOISON: Don't do early filtering if filter is disabled 2009-12-16 12:20:01 +01:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h HWPOISON: add an interface to switch off/on all the page filters 2009-12-16 12:19:59 +01:00
Kconfig Merge branch 'x86-bootmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-03 08:15:05 -08:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 2009-12-17 16:00:19 -08:00
ksm.c ksm: remove unswappable max_kernel_pages 2009-12-15 08:53:20 -08:00
maccess.c maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) 2010-01-07 11:58:36 -06:00
madvise.c HWPOISON: Add a madvise() injector for soft page offlining 2009-12-16 12:20:00 +01:00
Makefile make generic_acl slightly more generic 2009-12-16 12:16:49 -05:00
memcontrol.c memcg: ensure list is empty at rmdir 2010-01-16 12:15:39 -08:00
memory_hotplug.c mm: fix section mismatch in memory_hotplug.c 2009-12-15 08:53:20 -08:00
memory-failure.c HWPOISON: Add PROC_FS dependency to hwpoison injector v2 2009-12-21 19:56:42 +01:00
memory.c mm: count swap usage 2010-03-06 11:26:24 -08:00
mempolicy.c mm: fix mbind vma merge problem 2010-03-06 11:26:25 -08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2010-03-01 09:15:15 -08:00
mincore.c mm: hugetlb: fix hugepage memory leak in mincore() 2009-12-15 08:53:24 -08:00
mlock.c mm: use rlimit helpers 2010-03-06 11:26:24 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: use rlimit helpers 2010-03-06 11:26:24 -08:00
mmu_context.c mm: export use_mm/unuse_mm to modules 2010-01-15 01:43:28 -08:00
mmu_notifier.c ksm: add mmu_notifier set_pte_at_notify() 2009-09-22 07:17:31 -07:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf: Do the big rename: Performance Counters -> Performance Events 2009-09-21 14:28:04 +02:00
mremap.c mm: use rlimit helpers 2010-03-06 11:26:24 -08:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c nommu: fix shared mmap after truncate shrinkage problems 2010-01-16 12:15:40 -08:00
oom_kill.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
page_alloc.c mm: restore zone->all_unreclaimable to independence word 2010-03-06 11:26:25 -08:00
page_cgroup.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
page_io.c swap: rework map_swap_page() again 2009-12-15 08:53:16 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c writeback: remove unused nonblocking and congestion checks 2009-12-03 13:54:25 +01:00
pagewalk.c mm hugetlb: add hugepage support to pagemap 2009-12-15 08:53:24 -08:00
percpu.c early_res: Add free_early_partial() 2010-02-26 08:25:35 +01:00
prio_tree.c
quicklist.c cpumask: use new-style cpumask ops in mm/quicklist. 2009-09-24 09:34:52 +09:30
readahead.c readahead: add blk_run_backing_dev 2009-12-17 15:45:32 -08:00
rmap.c mm: count swap usage 2010-03-06 11:26:24 -08:00
shmem.c Fix breakage in shmem.c 2009-12-16 19:48:48 -05:00
slab.c Merge branches 'slab/cleanups', 'slab/failslab', 'slab/fixes' and 'slub/percpu' into slab-for-linus 2010-03-04 12:07:50 +02:00
slob.c slab: remove duplicate kmem_cache_init_late() declarations 2009-08-06 11:36:25 +03:00
slub.c SLUB: Fix per-cpu merge conflict 2010-03-04 12:09:43 +02:00
sparse-vmemmap.c sparsemem: Put mem map for one node together. 2010-02-12 09:42:38 -08:00
sparse.c sparsemem: Fix compilation on PowerPC 2010-03-01 17:59:24 -08:00
swap_state.c mm: add_to_swap_cache() does not return -EEXIST 2009-09-22 07:17:35 -07:00
swap.c mm: remove free_hot_page() 2010-03-06 11:26:25 -08:00
swapfile.c mm: count swap usage 2010-03-06 11:26:24 -08:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c vfs: Fix vmtruncate() regression 2010-01-13 16:09:33 -08:00
util.c nommu: don't need get_unmapped_area() for NOMMU 2010-01-16 12:15:40 -08:00
vmalloc.c mm: purge fragmented percpu vmap blocks 2010-02-02 12:50:47 -08:00
vmscan.c mm: restore zone->all_unreclaimable to independence word 2010-03-06 11:26:25 -08:00
vmstat.c mm: restore zone->all_unreclaimable to independence word 2010-03-06 11:26:25 -08:00