linux/mm
Andy Whitcroft cdfd4325c0 mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations
With Mel's hugetlb private reservation support patches applied, strict
overcommit semantics are applied to both shared and private huge page
mappings.  This can be a problem if an application relied on unlimited
overcommit semantics for private mappings.  An example of this would be an
application which maps a huge area with the intention of using it very
sparsely.  These application would benefit from being able to opt-out of
the strict overcommit.  It should be noted that prior to hugetlb
supporting demand faulting all mappings were fully populated and so
applications of this type should be rare.

This patch stack implements the MAP_NORESERVE mmap() flag for huge page
mappings.  This flag has the same meaning as for small page mappings,
suppressing reservations for that mapping.

Thanks to Mel Gorman for reviewing a number of early versions of these
patches.

This patch:

When a small page mapping is created with mmap() reservations are created
by default for any memory pages required.  When the region is read/write
the reservation is increased for every page, no reservation is needed for
read-only regions (as they implicitly share the zero page).  Reservations
are tracked via the VM_ACCOUNT vma flag which is present when the region
has reservation backing it.  When we convert a region from read-only to
read-write new reservations are aquired and VM_ACCOUNT is set.  However,
when a read-only map is created with MAP_NORESERVE it is indistinguishable
from a normal mapping.  When we then convert that to read/write we are
forced to incorrectly create reservations for it as we have no record of
the original MAP_NORESERVE.

This patch introduces a new vma flag VM_NORESERVE which records the
presence of the original MAP_NORESERVE flag.  This allows us to
distinguish these two circumstances and correctly account the reserve.

As well as fixing this FIXME in the code, this makes it much easier to
introduce MAP_NORESERVE support for huge pages as this flag is available
consistantly for the life of the mapping.  VM_ACCOUNT on the other hand is
heavily used at the generic level in association with small pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:16 -07:00
..
allocpercpu.c Merge commit 'v2.6.26-rc9' into cpus4096 2008-07-06 14:23:39 +02:00
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c mm: unexport __alloc_bootmem_core() 2008-07-24 10:47:14 -07:00
bounce.c
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap_xip.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
filemap.c kill generic_file_direct_IO() 2008-07-24 10:47:14 -07:00
fremap.c mm: fix various kernel-doc comments 2008-03-19 18:53:35 -07:00
highmem.c highmem: Export totalhigh_pages. 2008-07-19 22:39:46 -07:00
hugetlb.c huge page private reservation review cleanups 2008-07-24 10:47:16 -07:00
internal.h mm: remove double indirection on tlb parameter to free_pgd_range() & Co 2008-07-24 10:47:15 -07:00
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 2008-07-14 13:37:29 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c xip: support non-struct page backed memory 2008-04-28 08:58:23 -07:00
Makefile mm: add a basic debugging framework for memory initialisation 2008-07-24 10:47:13 -07:00
memcontrol.c memcg: simple stats for memory resource controller 2008-05-01 08:04:02 -07:00
memory_hotplug.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
memory.c hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed 2008-07-24 10:47:16 -07:00
mempolicy.c mempolicy: mask off internal flags for userspace API 2008-07-04 13:03:05 -07:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm/migrate.c should #include <linux/syscalls.h> 2008-07-24 10:47:14 -07:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c
mm_init.c mm: print out the zonelists on request for manual verification 2008-07-24 10:47:14 -07:00
mmap.c mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations 2008-07-24 10:47:16 -07:00
mmzone.c mm: filter based on a nodemask as well as a gfp_mask 2008-04-28 08:58:19 -07:00
mprotect.c mm: record MAP_NORESERVE status on vmas and fix small page mprotect reservations 2008-07-24 10:47:16 -07:00
mremap.c sparse pointer use of zero as null 2007-10-18 14:37:31 -07:00
msync.c
nommu.c nommu: Correct kobjsize() page validity checks. 2008-06-12 07:56:17 -07:00
oom_kill.c oom_kill: remove unused parameter in badness() 2008-04-28 08:58:26 -07:00
page_alloc.c mm: drop unneeded pgdat argument from free_area_init_node() 2008-07-24 10:47:16 -07:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotremove: unset migrate type "ISOLATE" after removal 2007-11-14 18:45:38 -08:00
page-writeback.c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2008-07-15 08:36:38 -07:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c mm/pdflush.c: merge the same code in two path 2008-05-13 08:02:24 -07:00
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c quicklists: Only consider memory that can be used with GFP_KERNEL 2008-01-14 08:52:22 -08:00
readahead.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
rmap.c mm: remove nopage 2008-04-28 08:58:18 -07:00
shmem_acl.c
shmem.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
slab.c Merge branch 'generic-ipi' into generic-ipi-for-linus 2008-07-15 21:55:59 +02:00
slob.c slob: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
slub.c slub: record page flag overlays explicitly 2008-07-24 10:47:15 -07:00
sparse-vmemmap.c Christoph has moved 2008-07-04 10:40:04 -07:00
sparse.c mm: make defensive checks around PFN values registered for memory usage 2008-07-24 10:47:13 -07:00
swap_state.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
swap.c mm: fix atomic_t overflow in vm 2008-05-24 09:56:09 -07:00
swapfile.c mm: use non-racy method for /proc/swaps creation 2008-04-29 08:06:20 -07:00
thrash.c
tiny-shmem.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2008-03-25 08:57:47 -07:00
truncate.c fix invalidate_inode_pages2_range() to not clear ret 2008-04-28 08:58:18 -07:00
util.c fix mm/util.c:krealloc() 2007-11-14 18:45:41 -08:00
vmalloc.c docbook: fix vmalloc missing parameter notation 2008-05-01 08:03:59 -07:00
vmscan.c mm: fix incorrect variable type in do_try_to_free_pages() 2008-06-12 18:05:39 -07:00
vmstat.c mm/vmstat.c: proper externs 2008-07-24 10:47:14 -07:00