linux/fs
David Gibson b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
..
9p [PATCH] v9fs: assign dentry ops to negative dentries 2006-03-22 07:53:55 -08:00
adfs [PATCH] changing CONFIG_LOCALVERSION rebuilds too much, for no good reason 2005-11-09 07:55:57 -08:00
affs [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
afs add loglevel to printk in fs/afs/cmservice.c 2006-01-11 01:52:40 +01:00
autofs [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
autofs4 [PATCH] autofs4 oops fix 2006-01-14 18:25:19 -08:00
befs remove unused fs/befs/attribute.c 2005-11-08 16:54:53 +01:00
bfs [PATCH] bfs iget() abuses 2005-10-04 13:22:01 -07:00
cifs [CIFS] Always match oplock break (cache notification) to the right tcp 2006-03-05 03:39:55 +00:00
coda [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
configfs [PATCH] BUG_ON() Conversion in fs/configfs/ 2006-02-03 14:03:09 -08:00
cramfs [PATCH] cramfs mounts provide corrupted content since 2.6.15 2006-03-06 18:40:43 -08:00
debugfs [PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data 2006-03-20 13:42:59 -08:00
devfs [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
devpts [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
efs return statement cleanup - kill pointless parentheses 2006-01-15 02:37:08 +01:00
exportfs [PATCH] exportfs: add find_acceptable_alias helper 2006-01-18 19:20:28 -08:00
ext2 [PATCH] Fix ext2 readdir f_pos re-validation logic 2006-03-15 16:31:51 -08:00
ext3 [PATCH] ext3: fix nobh mode for chattr +j inodes 2006-03-11 09:19:34 -08:00
fat [PATCH] fat: Fix truncate() write ordering 2006-02-03 08:32:10 -08:00
freevxfs [PATCH] fix possible PAGE_CACHE_SHIFT overflows 2006-01-08 20:13:54 -08:00
fuse [PATCH] fuse: fix bug in negative lookup 2006-02-28 20:53:43 -08:00
hfs [PATCH] hfs: cleanup HFS prints 2006-01-18 19:20:23 -08:00
hfsplus [PATCH] hfs: set type/creator for symlinks 2006-01-18 19:20:23 -08:00
hostfs [PATCH] uml: hostfs - fix possible PAGE_CACHE_SHIFT overflows 2005-12-29 09:48:15 -08:00
hpfs [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
hppfs [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
hugetlbfs [PATCH] hugepage: Strict page reservation for hugepage inodes 2006-03-22 07:54:03 -08:00
isofs [PATCH] isofs: remove d_splice_alias NULL check from isofs_lookup 2006-01-14 18:27:12 -08:00
jbd [PATCH] jbd: revert checkpoint list changes 2006-02-14 16:09:34 -08:00
jffs [PATCH] fs/jffs/intrep.c: 255 is unsigned char 2006-02-03 08:32:05 -08:00
jffs2 [PATCH] mtd: 64 bit fixes 2006-03-09 19:47:37 -08:00
jfs [PATCH] JFS: Take logsync lock before testing mp->lsn 2006-03-14 14:00:48 -08:00
lockd [PATCH] NLM: Ensure we do not Oops in the case of an unlock 2006-03-14 07:57:18 -08:00
minix [PATCH] update filesystems for new delete_inode behavior 2005-09-09 13:57:27 -07:00
msdos [PATCH] fat: remove the unneeded vfat_find() in vfat_rename() 2005-10-30 17:37:32 -08:00
ncpfs [PATCH] ncpfs: remove kmalloc wrapper 2006-01-14 18:27:12 -08:00
nfs [PATCH] NFSv4: fix mount segfault on errors returned that are < -1000 2006-03-14 07:57:18 -08:00
nfs_common [PATCH] nfsacl: Solaris VxFS compatibility fix 2005-10-11 09:46:54 -07:00
nfsd [PATCH] knfsd: fix nfs4_open lock leak 2006-02-07 16:12:31 -08:00
nls [PATCH] make some things static 2005-05-05 16:36:47 -07:00
ntfs NTFS: Do more detailed reporting of why we cannot mount read-write by 2006-02-24 10:48:14 +00:00
ocfs2 [PATCH] slab: Remove SLAB_NO_REAP option 2006-03-22 07:53:59 -08:00
openpromfs [PATCH] kfree cleanup: fs 2005-11-07 07:54:06 -08:00
partitions [PATCH] s390: dasd partition detection 2006-03-08 14:14:01 -08:00
proc [PATCH] smaps: shared fix 2006-03-06 18:40:45 -08:00
qnx4 fs/qnx4/bitmap.c: #if 0 qnx4_new_block() 2006-01-03 13:21:37 +01:00
ramfs [PATCH] mm: nommu use compound pages 2006-03-22 07:54:01 -08:00
reiserfs [PATCH] reiserfs: fix unaligned bitmap usage 2006-03-02 10:37:59 -08:00
relayfs [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
romfs [PATCH] fix possible PAGE_CACHE_SHIFT overflows 2006-01-08 20:13:54 -08:00
smbfs [PATCH] smbfs readdir vs signal fix 2006-02-01 08:53:09 -08:00
sysfs [PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path 2006-03-20 13:42:59 -08:00
sysv correct email address of Manfred Spraul 2006-01-15 02:43:54 +01:00
udf [PATCH] udf: fix uid/gid options and add uid/gid=ignore and forget options 2006-03-08 14:14:00 -08:00
ufs [PATCH] ufs: fix hang during `rm' 2006-02-03 08:32:04 -08:00
vfat [PATCH] fat: remove the unneeded vfat_find() in vfat_rename() 2005-10-30 17:37:32 -08:00
xfs [XFS] Don't map non-uptodate buffers in xfs_probe_cluster; also fixes 2006-02-28 12:30:30 +11:00
aio.c [PATCH] rcu file: use atomic primitives 2006-01-08 20:13:48 -08:00
attr.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
bad_inode.c [PATCH] make some things static 2005-05-05 16:36:47 -07:00
binfmt_aout.c [PATCH] dump_thread() cleanup 2006-01-10 08:01:25 -08:00
binfmt_elf_fdpic.c [PATCH] fs/binfmt_elf: Remove unneeded kmalloc() return value casts 2006-01-10 08:02:01 -08:00
binfmt_elf.c [PATCH] x86_64: Check for bad elf entry address. 2006-02-26 09:53:30 -08:00
binfmt_em86.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
binfmt_flat.c [PATCH] uclinux: delay binfmt_flat trace 2006-01-10 09:31:27 -08:00
binfmt_misc.c [PATCH] Unlinline a bunch of other functions 2006-01-14 18:27:06 -08:00
binfmt_script.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
binfmt_som.c [PATCH] mm: mm_init set_mm_counters 2005-10-29 21:40:38 -07:00
bio.c [BLOCK] A few kerneldoc fixups 2006-01-31 15:24:34 +01:00
block_dev.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
buffer.c [PATCH] page migration: fail if page is in a vma flagged VM_LOCKED 2006-03-14 21:43:02 -08:00
char_dev.c [PATCH] kobj_map semaphore to mutex conversion 2006-03-20 13:42:58 -08:00
compat_ioctl.c [NET] compat ifconf: fix limits 2006-03-08 16:46:08 -08:00
compat.c [PATCH] select: time comparison fixes 2006-02-17 13:59:28 -08:00
dcache.c [PATCH] fix file counting 2006-03-08 14:14:01 -08:00
dcookies.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
direct-io.c Fix a direct I/O locking issue revealed by the new mutex code. 2006-03-15 15:14:45 +11:00
dnotify.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
dquot.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
drop_caches.c [PATCH] drop-pagecache 2006-01-08 20:12:40 -08:00
eventpoll.c [PATCH] epoll: handle timeout overflow 2005-09-28 07:46:41 -07:00
exec.c [PATCH] Add mm->task_size and fix powerpc vdso 2006-02-28 20:53:44 -08:00
fcntl.c [PATCH] fcntl F_SETFL and read-only IS_APPEND files 2006-02-03 08:32:07 -08:00
fifo.c Simplify fifo_open() locking logic 2006-03-07 09:16:35 -08:00
file_table.c [PATCH] fix file counting 2006-03-08 14:14:01 -08:00
file.c [PATCH] percpu data: only iterate over possible CPUs 2006-02-05 11:06:51 -08:00
filesystems.c [PATCH] fix missing includes 2005-10-30 17:37:32 -08:00
fs-writeback.c [PATCH] kernel-docs: fix kernel-doc format problems 2005-11-07 07:53:55 -08:00
inode.c [PATCH] DocBook: fix some kernel-doc comments in fs and block 2006-02-01 08:53:27 -08:00
inotify.c [PATCH] inotify: fix one-shot support 2006-02-07 16:12:33 -08:00
ioctl.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
ioprio.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
Kconfig o Remove confusing Kconfig text for CONFIGFS_FS. 2006-02-03 13:47:17 -08:00
Kconfig.binfmt [PATCH] frv: suppress configuration of certain features for FRV 2006-01-08 20:13:36 -08:00
libfs.c [PATCH] debugfs: hard link count wrong 2006-02-03 08:32:11 -08:00
locks.c [PATCH] tiny: Uninline some fslocks.c functions 2006-01-08 20:14:10 -08:00
Makefile [PATCH] sanitize building of fs/compat_ioctl.c 2006-01-10 08:01:33 -08:00
mbcache.c [PATCH] Unlinline a bunch of other functions 2006-01-14 18:27:06 -08:00
mpage.c [PATCH] fix possible PAGE_CACHE_SHIFT overflows 2006-01-08 20:13:54 -08:00
namei.c [PATCH] ext3: ext3_symlink should use GFP_NOFS allocations inside 2006-03-11 09:19:34 -08:00
namespace.c [PATCH] fs/namespace.c:dup_namespace(): fix a use after free 2006-03-15 09:37:34 -08:00
nfsctl.c [PATCH] nfsservctl(): remove user-triggerable printk 2006-03-17 07:51:25 -08:00
open.c [PATCH] vfs: *at functions: core 2006-01-18 19:20:29 -08:00
pipe.c Mark the pipe file operations static 2006-03-08 14:03:09 -08:00
pnode.c [PATCH] shared mounts: cleanup 2006-01-08 20:13:56 -08:00
pnode.h [PATCH] unbindable mounts 2005-11-07 18:18:11 -08:00
posix_acl.c [PATCH] gfp flags annotations - part 1 2005-10-08 15:00:57 -07:00
quota_v1.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
quota_v2.c [PATCH] quota_v2: printk warning fixes 2006-02-03 08:32:03 -08:00
quota.c [PATCH] capable/capability.h (fs/) 2006-01-11 18:42:13 -08:00
read_write.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
readdir.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
select.c [PATCH] select: time comparison fixes 2006-02-17 13:59:28 -08:00
seq_file.c [PATCH] allow callers of seq_open do allocation themselves 2005-11-07 18:18:09 -08:00
stat.c [PATCH] fstatat64 support 2006-02-11 21:41:10 -08:00
super.c Revert mount/umount uevent removal 2006-02-22 09:39:02 -08:00
xattr_acl.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
xattr.c [PATCH] move xattr permission checks into the VFS 2006-01-10 08:01:29 -08:00