linux/mm
Naoya Horiguchi 4f16fc107d mm: hugetlb: fix hugepage memory leak in mincore()
Most callers of pmd_none_or_clear_bad() check whether the target page is
in a hugepage or not, but mincore() and walk_page_range() do not check it.
 So if we use mincore() on a hugepage on x86 machine, the hugepage memory
is leaked as shown below.  This patch fixes it by extending mincore()
system call to support hugepages.

Details
=======
My test program (leak_mincore) works as follows:
 - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
 - read()/write() something on it,
 - call mincore() for first ten pages and printf() the values of *vec
 - munmap() and unlink() the file on hugetlbfs

Without my patch
----------------
$ cat /proc/meminfo| grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_mincore
vec[0] 0
vec[1] 0
vec[2] 0
vec[3] 0
vec[4] 0
vec[5] 0
vec[6] 0
vec[7] 0
vec[8] 0
vec[9] 0
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:      999
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs/
$

Return values in *vec from mincore() are set to 0, while the hugepage
should be in memory, and 1 hugepage is still accounted as used while
there is no file on hugetlbfs.

With my patch
-------------
$ cat /proc/meminfo| grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ./leak_mincore
vec[0] 1
vec[1] 1
vec[2] 1
vec[3] 1
vec[4] 1
vec[5] 1
vec[6] 1
vec[7] 1
vec[8] 1
vec[9] 1
$ cat /proc/meminfo |grep "HugePage"
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
$ ls /hugetlbfs/
$

Return value in *vec set to 1 and no memory leaks.

[akpm@linux-foundation.org: cleanup]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
..
backing-dev.c flusher: Fix PF_FROZEN race 2009-12-03 13:49:43 +01:00
bootmem.c mm/bootmem.c: properly __init-annotate helper functions 2009-12-15 08:53:20 -08:00
bounce.c block: remove some includings of blktrace_api.h 2009-06-16 11:19:36 +02:00
debug-pagealloc.c generic debug pagealloc 2009-04-01 08:59:13 -07:00
dmapool.c dmapools: protect page_list walk in show_pools() 2009-06-30 18:56:00 -07:00
fadvise.c readahead: move max_sane_readahead() calls into force_page_cache_readahead() 2009-06-16 19:47:28 -07:00
failslab.c kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c 2009-04-03 12:23:01 +02:00
filemap_xip.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
filemap.c kill wait_on_page_writeback_range 2009-12-10 15:02:50 +01:00
fremap.c Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
highmem.c highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE 2009-11-10 04:15:47 +01:00
hugetlb.c hugetlb: abort a hugepage pool resize if a signal is pending 2009-12-15 08:53:24 -08:00
hwpoison-inject.c HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs 2009-09-16 11:50:17 +02:00
init-mm.c mm: consolidate init_mm definition 2009-06-16 19:47:28 -07:00
internal.h mm: remove unevictable_migrate_page function 2009-12-15 08:53:23 -08:00
Kconfig ksm: remove unswappable max_kernel_pages 2009-12-15 08:53:20 -08:00
Kconfig.debug trivial: improve help text for mm debug config options 2009-09-21 15:14:57 +02:00
kmemcheck.c kmemcheck: add hooks for the page allocator 2009-06-15 15:48:33 +02:00
kmemleak-test.c percpu: clean up percpu variable definitions 2009-06-24 15:13:48 +09:00
kmemleak.c tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed" 2009-11-09 09:40:57 +01:00
ksm.c ksm: remove unswappable max_kernel_pages 2009-12-15 08:53:20 -08:00
maccess.c [S390] maccess: add weak attribute to probe_kernel_write 2009-06-12 10:27:37 +02:00
madvise.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
Makefile percpu: kill legacy percpu allocator 2009-10-02 13:29:29 +09:00
memcontrol.c ksm: mem cgroup charge swapin copy 2009-12-15 08:53:19 -08:00
memory_hotplug.c mm: fix section mismatch in memory_hotplug.c 2009-12-15 08:53:20 -08:00
memory-failure.c mm: CONFIG_MMU for PG_mlocked 2009-12-15 08:53:17 -08:00
memory.c ksm: let shared pages be swappable 2009-12-15 08:53:19 -08:00
mempolicy.c ksm: memory hotremove migration only 2009-12-15 08:53:20 -08:00
mempool.c mm: remove broken 'kzalloc' mempool 2009-09-22 07:17:35 -07:00
migrate.c mm: remove unevictable_migrate_page function 2009-12-15 08:53:23 -08:00
mincore.c mm: hugetlb: fix hugepage memory leak in mincore() 2009-12-15 08:53:24 -08:00
mlock.c mlock: replace stale comments in munlock_vma_page() 2009-12-15 08:53:23 -08:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: uncached vma support with writenotify 2009-12-15 08:53:21 -08:00
mmu_context.c mm: reduce atomic use on use_mm fast path 2009-09-22 07:17:42 -07:00
mmu_notifier.c ksm: add mmu_notifier set_pte_at_notify() 2009-09-22 07:17:31 -07:00
mmzone.c [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 2009-05-18 11:22:24 +01:00
mprotect.c perf: Do the big rename: Performance Counters -> Performance Events 2009-09-21 14:28:04 +02:00
mremap.c Take arch_mmap_check() into get_unmapped_area() 2009-12-11 06:44:58 -05:00
msync.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00
nommu.c NOMMU: Don't pass NULL pointers to fput() in do_mmap_pgoff() 2009-10-31 12:11:37 -07:00
oom_kill.c oom: dump stack and VM state when oom killer panics 2009-12-15 08:53:10 -08:00
page_alloc.c mm: CONFIG_MMU for PG_mlocked 2009-12-15 08:53:17 -08:00
page_cgroup.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
page_io.c swap: rework map_swap_page() again 2009-12-15 08:53:16 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c writeback: remove unused nonblocking and congestion checks 2009-12-03 13:54:25 +01:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
percpu.c Merge branch 'for-linus' into for-next 2009-12-08 10:02:12 +09:00
prio_tree.c
quicklist.c cpumask: use new-style cpumask ops in mm/quicklist. 2009-09-24 09:34:52 +09:30
readahead.c readahead: introduce context readahead algorithm 2009-06-16 19:47:30 -07:00
rmap.c mm: simplify try_to_unmap_one() 2009-12-15 08:53:20 -08:00
shmem_acl.c shmfs: use 'check_acl' instead of 'permission' 2009-09-08 11:08:46 -07:00
shmem.c swap_info: note SWAP_MAP_SHMEM 2009-12-15 08:53:16 -08:00
slab.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-14 10:13:22 -08:00
slob.c slab: remove duplicate kmem_cache_init_late() declarations 2009-08-06 11:36:25 +03:00
slub.c Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-12-14 10:13:22 -08:00
sparse-vmemmap.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
sparse.c memory hotplug: alloc page from other node in memory online 2009-09-22 07:17:26 -07:00
swap_state.c mm: add_to_swap_cache() does not return -EEXIST 2009-09-22 07:17:35 -07:00
swap.c mm: replace various uses of num_physpages by totalram_pages 2009-09-22 07:17:38 -07:00
swapfile.c ksm: let shared pages be swappable 2009-12-15 08:53:19 -08:00
thrash.c mm: pass mm to grab_swap_token 2009-06-23 12:50:05 -07:00
truncate.c mm: fix comments for invalidate_inode_pages2() 2009-12-04 15:39:48 +01:00
util.c fix a struct file leak in do_mmap_pgoff() 2009-12-11 06:44:57 -05:00
vmalloc.c vmalloc(): adjust gfp mask passed on nested vmalloc() invocation 2009-12-15 08:53:13 -08:00
vmscan.c vmscan: simplify code 2009-12-15 08:53:21 -08:00
vmstat.c vmscan: stop kswapd waiting on congestion when the min watermark is not being met 2009-12-15 08:53:16 -08:00