linux/Documentation
Kirill A. Shutemov dc6c9a35b6 mm: account pmd page tables to the process
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
kernel doesn't account PMD tables to the process, only PTE.

The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.

	#include <errno.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>
	#include <sys/prctl.h>

	#define PUD_SIZE (1UL << 30)
	#define PMD_SIZE (1UL << 21)

	#define NR_PUD 130000

	int main(void)
	{
		char *addr = NULL;
		unsigned long i;

		prctl(PR_SET_THP_DISABLE);
		for (i = 0; i < NR_PUD ; i++) {
			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
			if (addr == MAP_FAILED) {
				perror("mmap");
				break;
			}
			*addr = 'x';
			munmap(addr, PMD_SIZE);
			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
			if (addr == MAP_FAILED)
				perror("re-mmap"), exit(1);
		}
		printf("PID %d consumed %lu KiB in PMD page tables\n",
				getpid(), i * 4096 >> 10);
		return pause();
	}

The patch addresses the issue by account PMD tables to the process the
same way we account PTE.

The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:

 - HugeTLB can share PMD page tables. The patch handles by accounting
   the table to all processes who share it.

 - x86 PAE pre-allocates few PMD tables on fork.

 - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
   check on exit(2).

Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
counter value is used to calculate baseline for badness score by
oom-killer.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
..
ABI Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-02-11 09:32:08 -08:00
accounting Documentation: use subdir-y to avoid unnecessary built-in.o files 2014-09-26 11:02:55 +02:00
acpi ACPI / Documentation: add a missing '=' 2015-01-22 01:21:45 +01:00
aoe
arm Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2014-12-12 15:26:48 -08:00
arm64 arm64: Emulate CP15 Barrier instructions 2014-11-20 16:34:48 +00:00
auxdisplay Documentation: use subdir-y to avoid unnecessary built-in.o files 2014-09-26 11:02:55 +02:00
backlight
blackfin Documentation: add makefiles for more targets 2014-09-26 11:02:56 +02:00
block Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block 2014-12-13 14:14:23 -08:00
blockdev zram: report maximum used memory 2014-10-09 22:26:02 -04:00
bus-devices
cdrom
cgroups mm: memcontrol: default hierarchy interface for memory 2015-02-11 17:06:02 -08:00
connector w1: optional bundling of netlink kernel replies 2014-05-27 13:56:21 -07:00
console
cpu-freq intel_pstate: Add num_pstates to sysfs 2015-01-30 01:52:17 +01:00
cpuidle
cris
crypto crypto: doc - userspace interface spec 2014-11-13 22:31:38 +08:00
development-process Documentation: remove outdated references to the linux-next wiki 2014-10-28 09:06:11 -04:00
device-mapper dm cache policy mq: simplify ability to promote sequential IO to the cache 2014-11-10 15:25:30 -05:00
devicetree MMC core: 2015-02-11 10:56:48 -08:00
dmaengine Documentation: dmanegine: move dmatest.txt to dmaengine folder 2014-11-06 11:17:37 +05:30
DocBook media updates for v3.20-rc1 2015-02-11 08:45:40 -08:00
driver-model PCI changes for the v3.18 merge window: 2014-10-09 15:03:49 -04:00
dvb [media] get_dvb_firmware: Update firmware of ITEtech IT9135 2014-09-21 17:03:04 -03:00
early-userspace
EDID
extcon
fault-injection
fb
filesystems Merge branch 'akpm' (patches from Andrew) 2015-02-10 16:45:56 -08:00
firmware_class doc: fix minor typos in firmware_class README 2014-07-17 18:43:40 -07:00
fmc
frv
gpio This is the bulk of GPIO changes for the v3.19 series: 2014-12-14 14:05:05 -08:00
hid HID: uhid: update documentation 2014-08-25 03:28:09 -05:00
hwmon hwmon: (ina2xx) Add ina231 compatible string 2015-01-25 21:23:59 -08:00
i2c Documentation: i2c: Use PM ops instead of legacy suspend/resume 2014-12-04 19:09:03 +01:00
i2o
ia64 kvm: Documentation: remove ia64 2014-11-20 11:08:55 +01:00
ide
infiniband IB/mad: add new ioctl to ABI to support new registration options 2014-08-10 20:36:00 -07:00
input Docs changes for the 3.19 merge window 2014-12-12 14:42:48 -08:00
ioctl cxl: Add documentation for userspace APIs 2014-10-08 20:16:19 +11:00
isdn
ja_JP
kbuild Documentation: kbuild: Improve grammar 2014-08-19 10:02:56 +02:00
kdump kernel: add panic_on_warn 2014-12-10 17:41:10 -08:00
ko_KR
laptops Documentation: update .gitignore files 2014-09-26 11:02:59 +02:00
leds
locking locking/Documentation: Update code path 2015-01-16 09:09:21 +01:00
m68k
memory-devices
metag
mic Documentation: Build mic/mpssd only for x86_64 2014-12-05 11:18:36 -05:00
mips Documentation: au1xxx-ide.c has moved 2014-08-26 09:35:53 +02:00
misc-devices Documentation: use subdir-y to avoid unnecessary built-in.o files 2014-09-26 11:02:55 +02:00
mmc
mn10300
mtd MTD updates for 3.16: 2014-06-11 08:35:34 -07:00
namespaces
netlabel
networking tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks 2015-02-08 01:03:12 -08:00
nfc
nios2 Documentation: Add documentation for Nios2 architecture 2014-12-08 12:56:06 +08:00
parisc
PCI doc: replace "practise" with "practice" in Documentation 2014-06-19 15:28:56 +02:00
pcmcia Documentation: use subdir-y to avoid unnecessary built-in.o files 2014-09-26 11:02:55 +02:00
phy
platform Documentation: Add list of laptop models supported by the Compal driver 2014-06-10 19:11:06 -04:00
power PM / sleep: Mention async suspend in PM_TRACE documentation 2015-01-30 01:29:46 +01:00
powerpc cxl: Add documentation for userspace APIs 2014-10-08 20:16:19 +11:00
pps
prctl Documentation: Restrict TSC test code to x86 2014-10-28 08:46:27 -04:00
pti
ptp ptp: restore the makefile for building the test program. 2014-10-24 16:07:10 -04:00
rapidio rapidio/tsi721_dma: rework scatter-gather list handling 2014-08-08 15:57:24 -07:00
RCU Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD 2015-01-15 23:34:34 -08:00
s390 s390/docs: Remove sections that are not related to s390 2014-11-18 18:22:59 +01:00
scheduler Documentation/scheduler/sched-deadline.txt: Add minimal main() appendix 2014-09-16 10:23:45 +02:00
scsi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-12-12 10:08:06 -08:00
security Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next 2014-11-19 21:36:07 +11:00
serial serial: Fix locking for uart driver set_termios() method 2014-11-05 18:53:54 -08:00
sh
sound ALSA: hda - Add "eapd" model string for AD1986A codec 2014-12-10 14:00:13 +01:00
spi Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/doc 2014-10-07 21:14:57 -04:00
sysctl mm: account pmd page tables to the process 2015-02-11 17:06:04 -08:00
target Documentation/target: Update fabric_ops to latest code 2015-01-06 13:46:49 -08:00
thermal Documentation: thermal: document of_cpufreq_cooling_register() 2015-01-06 14:39:17 -04:00
timers Documentation: update .gitignore files 2014-09-26 11:02:59 +02:00
tpm
trace Char/Misc driver patches for 3.19-rc1 2014-12-14 16:43:47 -08:00
usb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2015-02-10 18:57:15 -08:00
vDSO vdso: don't require 64-bit math in standalone test 2014-10-25 10:53:44 -04:00
video4linux [media] Documentation/video4linux: remove obsolete text files 2015-01-29 19:16:30 -02:00
virtual Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot 2014-12-15 13:06:40 +01:00
vm mm:add KPF_ZERO_PAGE flag for /proc/kpageflags 2015-02-11 17:06:00 -08:00
w1 w1: new w1_ds2406 driver 2014-06-19 17:45:14 -07:00
watchdog Documentation: use subdir-y to avoid unnecessary built-in.o files 2014-09-26 11:02:55 +02:00
wimax
x86 x86, entry: Switch stacks on a paranoid entry from userspace 2015-01-02 10:22:45 -08:00
xtensa
zh_CN Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2014-08-06 21:03:53 -07:00
00-INDEX locking/Documentation: Move locking related docs into Documentation/locking/ 2014-08-13 10:32:03 +02:00
applying-patches.txt Documentation: change "&" to "and" in Documentation/applying-patches.txt 2014-09-26 11:10:11 +02:00
assoc_array.txt
atomic_ops.txt documentation: Add atomic_long_t to atomic_ops.txt 2014-11-13 10:34:54 -08:00
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt binfmt_misc: touch up documentation a bit 2014-10-14 02:18:16 +02:00
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt rmap: drop support of non-linear mappings 2015-02-10 14:30:31 -08:00
Changes Update old iproute2 and Xen Remus links 2014-12-09 13:38:13 -05:00
circular-buffers.txt
clk.txt clk: Change clk_ops->determine_rate to return a clk_hw as the best parent 2014-12-03 16:21:37 -08:00
coccinelle.txt
CodingStyle CodingStyle: add some more error handling guidelines 2014-12-02 08:55:32 -05:00
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt firewire: revert to 4 GB RDMA, fix protocols using Memory Space 2014-05-29 15:50:30 +02:00
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt Documentation: correct parameter error for dma_mapping_error 2014-09-26 11:22:29 +02:00
DMA-API.txt
DMA-attributes.txt
dma-buf-sharing.txt Documentation/dma-buf-sharing.txt: update API descriptions 2014-08-28 11:57:24 +05:30
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next 2014-06-04 08:50:34 -07:00
efi-stub.txt
eisa.txt
email-clients.txt Documentation/email-clients.txt: add info about Claws Mail 2014-12-02 11:55:29 -05:00
flexible-arrays.txt
futex-requeue-pi.txt doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants 2015-01-19 12:05:32 +01:00
gcov.txt
highuid.txt
HOWTO Documentation: remove outdated references to the linux-next wiki 2014-10-28 09:06:11 -04:00
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt ipmi: Add SMBus interface driver (SSIF) 2014-12-11 15:04:11 -06:00
IRQ-affinity.txt
IRQ-domain.txt irqdomain: Introduce new interfaces to support hierarchy irqdomains 2014-11-23 13:01:45 +01:00
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-02-10 20:01:30 -08:00
kernel-per-CPU-kthreads.txt
kmemcheck.txt
kmemleak.txt Documentation: Add CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF case 2014-10-24 13:59:03 -04:00
kobject.txt kobject: grammar fix 2014-12-08 09:07:11 -05:00
kprobes.txt Documentation/kprobes: add s390 to list of supported architectures 2014-09-09 08:53:27 +02:00
kref.txt
kselftest.txt kselftest: Move the docs to the Documentation dir 2014-11-24 10:49:54 -07:00
ldm.txt
local_ops.txt percpu: update local_ops.txt to reflect this_cpu operations 2014-12-13 12:42:53 -08:00
lockup-watchdogs.txt lockup-watchdogs: Fix a typo 2014-08-26 09:35:52 +02:00
logo.gif
logo.txt
lzo.txt Documentation: lzo: document part of the encoding 2014-09-28 11:08:00 +02:00
magic-number.txt
mailbox.txt Documentation: Fix a typo in mailbox.txt 2014-11-03 11:54:50 -05:00
Makefile Documentation: add makefiles for more targets 2014-09-26 11:02:56 +02:00
ManagementStyle
md.txt
media-framework.txt
memory-barriers.txt documentation: Fix smp typo in memory-barriers.txt 2015-01-07 08:59:32 -08:00
memory-hotplug.txt memory-hotplug: add sysfs valid_zones attribute 2014-10-09 22:25:52 -04:00
module-signing.txt
mono.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt livepatch: kernel: add TAINT_LIVEPATCH 2014-12-22 15:40:48 +01:00
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt phy: improved lookup method 2014-11-21 19:48:50 +05:30
pi-futex.txt
pinctrl.txt pinctrl: clean up after enable refactoring 2014-09-04 10:05:07 +02:00
pnp.txt
preempt-locking.txt
printk-formats.txt lib/vsprintf: add %*pE[achnops] format specifier 2014-10-14 02:18:26 +02:00
pwm.txt
ramoops.txt pstore-ram: Allow optional mapping with pgprot_noncached 2014-12-11 13:38:31 -08:00
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: document rfkill module parameters 2015-01-09 23:22:12 +01:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules: Add pointer to netdev-FAQ for network patches 2014-07-09 15:54:27 -07:00
static-keys.txt
SubmitChecklist
SubmittingDrivers doc: SubmittingPatches: remove dead link, kerneltrap.org no longer works 2014-06-19 15:15:27 +02:00
SubmittingPatches Documentation/SubmittingPatches: Reported-by tags and permission 2014-10-29 08:56:46 -04:00
svga.txt
sysfs-rules.txt Documentation/sysfs-rules.txt: Add device attribute error code documentation 2014-09-19 14:44:51 -07:00
sysrq.txt
this_cpu_ops.txt Docs: this_cpu_ops: remove redundant add forms 2014-09-26 11:03:00 +02:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt drivers/vfio: EEH support for VFIO PCI device 2014-08-05 15:28:48 +10:00
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt xillybus: Move out of staging 2014-09-23 23:44:16 -07:00
xz.txt
zorro.txt