linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

History

Mel Gorman 90afa5de6f vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim A bug was brought to my attention against a distro kernel but it affects mainline and I believe problems like this have been reported in various guises on the mailing lists although I don't have specific examples at the moment. The reported problem was that malloc() stalled for a long time (minutes in some cases) if a large tmpfs mount was occupying a large percentage of memory overall. The pages did not get cleaned or reclaimed by zone_reclaim() because the zone_reclaim_mode was unsuitable, but the lists are uselessly scanned frequencly making the CPU spin at near 100%. This patchset intends to address that bug and bring the behaviour of zone_reclaim() more in line with expectations which were noticed during investigation. It is based on top of mmotm and takes advantage of Kosaki's work with respect to zone_reclaim(). Patch 1 fixes the heuristics that zone_reclaim() uses to determine if the scan should go ahead. The broken heuristic is what was causing the malloc() stall as it uselessly scanned the LRU constantly. Currently, zone_reclaim is assuming zone_reclaim_mode is 1 and historically it could not deal with tmpfs pages at all. This fixes up the heuristic so that an unnecessary scan is more likely to be correctly avoided. Patch 2 notes that zone_reclaim() returning a failure automatically means the zone is marked full. This is not always true. It could have failed because the GFP mask or zone_reclaim_mode were unsuitable. Patch 3 introduces a counter zreclaim_failed that will increment each time the zone_reclaim scan-avoidance heuristics fail. If that counter is rapidly increasing, then zone_reclaim_mode should be set to 0 as a temporarily resolution and a bug reported because the scan-avoidance heuristic is still broken. This patch: On NUMA machines, the administrator can configure zone_reclaim_mode that is a more targetted form of direct reclaim. On machines with large NUMA distances for example, a zone_reclaim_mode defaults to 1 meaning that clean unmapped pages will be reclaimed if the zone watermarks are not being met. There is a heuristic that determines if the scan is worthwhile but the problem is that the heuristic is not being properly applied and is basically assuming zone_reclaim_mode is 1 if it is enabled. The lack of proper detection can manfiest as high CPU usage as the LRU list is scanned uselessly. Historically, once enabled it was depending on NR_FILE_PAGES which may include swapcache pages that the reclaim_mode cannot deal with. Patch vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included pages that were not file-backed such as swapcache and made a calculation based on the inactive, active and mapped files. This is far superior when zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a reasonable starting figure. This patch alters how zone_reclaim() works out how many pages it might be able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set in the reclaim_mode it will either consider NR_FILE_PAGES as potential candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount swapcache and other non-file-backed pages. If RECLAIM_WRITE is not set, then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is not set, then NR_FILE_MAPPED are not. [kosaki.motohiro@jp.fujitsu.com: Estimate unmapped pages minus tmpfs pages] [fengguang.wu@intel.com: Fix underflow problem in Kosaki's estimate] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2009-06-16 19:47:45 -07:00
..
ABI	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block	2009-06-11 11:10:35 -07:00
accounting	Documentation/accounting/getdelays.c: fix endless loop	2009-01-15 16:39:37 -08:00
acpi	ACPI: update debug parameter documentation	2008-11-07 21:45:29 -05:00
aoe	aoe: user can ask driver to forget previously detected devices	2008-02-08 09:22:31 -08:00
arm	[ARM] S3C24XX: GPIO: Change to macros for GPIO numbering	2009-05-18 16:26:03 +01:00
auxdisplay	.gitignore updates	2008-10-30 11:38:45 -07:00
blackfin	Blackfin arch: Add document about bfin-gpio	2009-01-07 23:14:38 +08:00
block	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
blockdev	mflash: initial support	2009-04-07 08:12:38 +02:00
cdrom	doc/cdrom: Trvial documentation error, file not present	2008-10-10 08:22:44 +02:00
cgroups	memcg: fix documentation	2009-04-13 15:04:33 -07:00
connector	Documentation/connector/cn_test.c: don't use gfp_any()	2009-02-12 16:47:01 -08:00
console
cpu-freq	[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions	2009-02-24 22:47:31 -05:00
cpuidle	cpuidle: Add Documentation	2008-02-14 00:16:13 -05:00
cris	fix random typos	2008-10-16 11:21:30 -07:00
crypto	async_tx, dmaengine: document channel allocation and api rework	2009-01-05 18:10:19 -07:00
development-process	docs: Encourage better changelogs in the development process document	2009-06-04 10:32:49 -06:00
device-mapper	dm crypt: add documentation	2008-04-25 13:27:03 +01:00
DocBook	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-06-15 03:02:23 -07:00
driver-model	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
dvb	V4L/DVB (11138): get_dvb_firmware: add support for downloading the cx2584x firmware for pvrusb2	2009-03-30 12:43:31 -03:00
early-userspace	Documentation: Remove last references to BitKeeper.	2008-04-21 22:19:05 +00:00
fault-injection
fb	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
filesystems	oom: move oom_adj value from task_struct to mm_struct	2009-06-16 19:47:43 -07:00
firmware_class	firmware_sample_driver.c: fix coding style	2008-04-21 22:23:30 +00:00
frv	move frv docs one level up	2008-02-03 15:54:28 +02:00
hwmon	hwmon: Update documentation on fan_max	2009-06-01 13:46:50 +02:00
i2c	i2c-ocores: Can add I2C devices to the bus	2009-06-13 10:39:28 +01:00
i2o
ia64	trivial: Fix misspelling of firmware	2009-03-30 15:21:59 +02:00
ide	ide: preserve Host Protected Area by default (v2)	2009-06-07 13:52:52 +02:00
infiniband	IPoIB: Document newish features	2009-04-08 13:52:01 -07:00
input	Input: multitouch - augment event semantics documentation	2009-05-23 09:53:26 -07:00
ioctl	V4L/DVB (10870a): remove all references for video_decoder.h	2009-03-30 12:43:15 -03:00
isdn	isdn: extend INTERFACE.CAPI document	2009-06-08 00:45:52 -07:00
ja_JP	Sync patch for jp_JP/stable_kernel_rules.txt	2009-01-28 15:55:48 -08:00
kbuild	kconfig: resort the documentation of the environment variables	2009-06-09 22:37:47 +02:00
kdump	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
ko_KR	HOWTO: update misspelling and word incorrected	2007-12-17 10:33:19 -08:00
laptops	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
lguest	lguest: add support for indirect ring entries	2009-06-12 22:27:13 +09:30
m68k	[SCSI] 53c7xx: fix removal fallout	2008-01-11 18:22:30 -06:00
make	Documentation/make/headers_install.txt	2007-10-17 08:43:05 -07:00
mips	ide: remove unused CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ	2009-01-14 19:19:03 +01:00
misc-devices	drivers/misc/isl29003.c: driver for the ISL29003 ambient light sensor	2009-04-01 08:59:18 -07:00
mn10300	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
mtd	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
namespaces	The namespaces compatibility list doc	2007-11-29 09:24:53 -08:00
netlabel
networking	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-06-15 03:02:23 -07:00
parisc
PCI	PCI MSI: Add example request loop to MSI-HOWTO.txt	2009-03-20 11:35:04 -07:00
pcmcia	.gitignore updates	2008-10-30 11:38:45 -07:00
power	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2009-06-14 13:46:25 -07:00
powerpc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	2009-06-15 09:40:05 -07:00
prctl	generic, x86: add tests for prctl PR_GET_TSC and PR_SET_TSC	2008-04-19 19:19:55 +02:00
RCU	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
s390	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
scheduler	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
scsi	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
serial	Create/use more directory structure in the Documentation/ tree.	2008-11-14 17:28:53 +00:00
sh	sh: Kill off remaining CONFIG_SH_KGDB bits.	2008-12-22 18:44:05 +09:00
sound	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2009-06-14 13:46:25 -07:00
sparc	sparc: Remove Documentation/sparc/sbus_drivers.txt	2008-08-29 02:15:25 -07:00
spi	spi: documentation: emphasise spi_master.setup() semantics	2009-04-21 13:41:50 -07:00
sysctl	vmscan: properly account for the number of page cache pages zone_reclaim() can reclaim	2009-06-16 19:47:45 -07:00
telephony	remove mention of CONFIG_KMOD from documentation	2008-07-22 19:24:29 +10:00
thermal	thermal: update the documentation	2008-04-29 02:49:47 -04:00
timers	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
trace	trivial: Remove the hyphen from git commands	2009-06-12 18:01:51 +02:00
uml
usb	trivial: usb: fix missing space typo in doc	2009-06-12 18:01:51 +02:00
video4linux	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
vm	pagemap: add page-types tool	2009-06-16 19:47:38 -07:00
w1	w1: send status messages after command processing	2009-01-08 08:31:14 -08:00
watchdog	.gitignore updates	2008-10-30 11:38:45 -07:00
wimax	i2400m: documentation and instructions for usage	2009-01-07 10:00:18 -08:00
x86	Merge branch 'linus' into x86/mce3	2009-06-11 23:31:52 +02:00
zh_CN	Chinese: add translation of Codingstyle	2008-01-24 20:40:04 -08:00
00-INDEX	trivial: fix where cgroup documentation is not correctly referred to	2009-03-30 15:22:02 +02:00
applying-patches.txt
atomic_ops.txt	documentation: atomic_add_unless() doesn't imply mb() on failure	2008-02-23 17:52:36 -08:00
bad_memory.txt	Document handling of bad memory	2008-12-03 16:09:53 -07:00
basic_profiling.txt
binfmt_misc.txt
braille-console.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
bt8xxgpio.txt	gpio: add bt8xxgpio driver	2008-07-25 10:53:30 -07:00
BUG-HUNTING	Documentation: add hint about call traces & module symbols to BUG-HUNTING	2008-02-06 10:41:09 -08:00
c2port.txt	Add c2 port support	2008-11-12 17:17:18 -08:00
cachetlb.txt	remove unused flush_tlb_pgtables	2007-10-19 11:53:34 -07:00
Changes	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next	2009-06-14 14:12:18 -07:00
CodingStyle	trivial: fix typo milisecond/millisecond for documentation and source comments.	2009-06-12 18:01:46 +02:00
cpu-hotplug.txt	x86: use possible_cpus=NUM to extend the possible cpus allowed	2008-12-18 12:08:05 +01:00
cpu-load.txt
cputopology.txt	cpumask: Use topology_core_cpumask()/topology_thread_cpumask()	2009-01-11 19:12:49 +01:00
credentials.txt	CRED: Documentation	2008-11-14 10:39:26 +11:00
dcdbas.txt
debugging-modules.txt	Documentation: Clarify when module debugging actually works.	2008-02-03 15:27:38 +02:00
debugging-via-ohci1394.txt	firewire: fw-ohci: add option for remote debugging	2008-04-18 17:55:33 +02:00
dell_rbu.txt	trivial: Documentation/dell_rbu.txt: fix typos	2009-06-12 18:01:50 +02:00
devices.txt	lanana: assign a device name and numbering for MAX3100	2009-04-07 08:44:05 -07:00
DMA-API.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
DMA-attributes.txt	powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code	2008-07-22 10:39:36 +10:00
DMA-ISA-LPC.txt
DMA-mapping.txt	dma-mapping: update the old macro DMA_nBIT_MASK related documentations	2009-04-07 08:31:12 -07:00
dmaengine.txt	async_tx, dmaengine: document channel allocation and api rework	2009-01-05 18:10:19 -07:00
dontdiff	dontdiff: Fix asm exclude	2009-03-26 15:45:43 -07:00
dynamic-debug-howto.txt	Dynamic debug: allow simple quoting of words	2009-03-24 16:38:27 -07:00
edac.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
eisa.txt
email-clients.txt	Documentation/email-clients.txt: add some info about gmail	2008-11-06 15:41:19 -08:00
exception.txt
feature-removal-schedule.txt	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2009-06-15 03:02:23 -07:00
futex-requeue-pi.txt	futex: add requeue-pi documentation	2009-05-09 07:12:50 +02:00
gpio.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
highuid.txt	[SPARC]: Remove SunOS and Solaris binary support.	2008-04-21 15:10:15 -07:00
HOWTO	Remove Andrew Morton's http://www.zip.com.au/~akpm/	2008-10-16 11:21:32 -07:00
hw_random.txt	hw_random doc updates	2008-03-24 19:22:19 -07:00
ics932s401	ics932s401: new clock generator chip driver	2008-11-12 17:17:18 -08:00
initrd.txt	use the newc archive format as requested by initramfs	2008-02-03 14:54:41 +02:00
Intel-IOMMU.txt	Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.	2008-07-26 12:00:06 -07:00
io_ordering.txt
io-mapping.txt	io mapping: improve documentation	2008-11-03 18:21:44 +01:00
IO-mapping.txt	Documentation: move DMA-mapping.txt to Doc/PCI/	2009-01-29 18:19:29 -08:00
iostats.txt	Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.	2008-07-26 12:00:06 -07:00
IPMI.txt	IPMI: new NMI handling	2007-10-18 14:37:32 -07:00
IRQ-affinity.txt	genirq: Expose default irq affinity mask (take 3)	2008-06-05 15:18:30 +02:00
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt	Documentation/java.txt: typo and grammar fixes	2007-10-20 02:37:21 +02:00
kernel-doc-nano-HOWTO.txt	kernel-doc: restrict syntax for private: and public:	2009-05-02 15:36:10 -07:00
kernel-docs.txt	doc: update to URL and status of kernel-docs.txt entry	2008-06-06 11:29:10 -07:00
kernel-parameters.txt	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc	2009-06-15 09:32:52 -07:00
keys-request-key.txt	keys: allow the callout data to be passed as a blob rather than a string	2008-04-29 08:06:16 -07:00
keys.txt	Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.	2008-07-26 12:00:06 -07:00
kmemleak.txt	kmemleak: Add documentation on the memory leak detector	2009-06-11 17:03:29 +01:00
kobject.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
kprobes.txt	kprobes: support kretprobe and jprobe per-probe disabling	2009-04-07 08:31:08 -07:00
kref.txt	docs: convert kref semaphore to mutex	2008-02-06 10:41:09 -08:00
ldm.txt
leds-class.txt	Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.	2008-07-26 12:00:06 -07:00
local_ops.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
lockdep-design.txt	locking: Documentation: lockdep-design.txt, fix note of state bits	2009-04-26 18:21:24 +02:00
lockstat.txt	lockstat: contend with points	2008-10-20 15:43:10 +02:00
logo.gif	Revert "linux.conf.au 2009: Tuz"	2009-04-27 12:00:27 -07:00
logo.txt	Revert "linux.conf.au 2009: Tuz"	2009-04-27 12:00:27 -07:00
magic-number.txt	documentation: update header file paths	2009-01-06 15:59:28 -08:00
Makefile	docsrc: build Documentation/ sources	2008-08-12 16:07:30 -07:00
ManagementStyle	docs: fix ManagementStyle book name	2008-10-30 11:38:46 -07:00
markers.txt	markers: comment marker_synchronize_unregister() on data dependency	2008-11-28 16:47:41 +01:00
mca.txt	The ps2esdi driver was marked as BROKEN more than two years ago due to being	2008-03-17 09:03:05 +01:00
md.txt	Documentation/md.txt update	2009-03-31 15:18:37 +11:00
memory-barriers.txt	sched: Document memory barriers implied by sleep/wake-up primitives	2009-04-29 14:15:55 +02:00
memory-hotplug.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
memory.txt
mono.txt
mutex-design.txt	Documentation: Add nested versions of mutex locks to docs	2007-10-20 00:15:26 +02:00
nmi_watchdog.txt	x86, nmi-watchdog: update procfs nmi_watchdog file documentation v2	2008-10-30 19:07:04 +01:00
nommu-mmap.txt	NOMMU: Make mmap allocation page trimming behaviour configurable.	2009-01-08 12:04:47 +00:00
numastat.txt
oops-tracing.txt	Taint kernel after WARN_ON(condition)	2008-04-29 08:05:59 -07:00
parport-lowlevel.txt	plip: fix parport_register_device name parameter	2007-11-26 19:39:01 -08:00
parport.txt
pi-futex.txt
pnp.txt	Documentation: Replace obsolete "driverfs" with "sysfs".	2008-01-24 20:40:04 -08:00
preempt-locking.txt
printk-formats.txt	DOC: add printk-formats.txt	2008-11-12 17:17:17 -08:00
prio_tree.txt
rbtree.txt	trivial: rbtree.txt: fix rb_entry() parameters in sample code	2009-06-12 18:01:47 +02:00
rfkill.txt	rfkill: document /dev/rfkill	2009-06-03 14:06:15 -04:00
robust-futex-ABI.txt
robust-futexes.txt
rt-mutex-design.txt
rt-mutex.txt
rtc.txt	rtc: cleanup example code	2008-02-06 10:41:14 -08:00
SAK.txt	Remove Andrew Morton's old email accounts	2008-10-16 11:21:32 -07:00
SecurityBugs
SELinux.txt	selinux: add support for installing a dummy policy (v2)	2008-08-27 08:54:08 +10:00
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
slow-work.txt	Document the slow work thread pool	2009-04-03 16:42:35 +01:00
SM501.txt	trivial: Miscellaneous documentation typo fixes	2009-06-12 18:01:47 +02:00
Smack.txt	smack: implement logging V3	2009-04-14 09:00:23 +10:00
sparse.txt	Documentation: explain the difference between __bitwise and __bitwise__	2009-04-11 08:18:11 +02:00
spinlocks.txt	Add additional examples in Documentation/spinlocks.txt	2008-04-11 13:21:14 -06:00
stable_api_nonsense.txt
stable_kernel_rules.txt	Update stable tree documentation	2008-10-29 15:03:49 -07:00
SubmitChecklist	documentation: explain memory barriers	2008-10-16 11:21:32 -07:00
SubmittingDrivers	Remove Andrew Morton's old email accounts	2008-10-16 11:21:32 -07:00
SubmittingPatches	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	2009-06-14 13:46:25 -07:00
svga.txt
sysfs-rules.txt	Doc/sysfs-rules: Swap the order of the words so the sentence makes more sense	2009-05-08 19:22:20 -07:00
sysrq.txt	Merge branch 'tracing/core-v2' into tracing-for-linus	2009-04-02 00:49:02 +02:00
tomoyo.txt	tomoyo: add Documentation/tomoyo.txt	2009-04-14 09:14:58 +10:00
unaligned-memory-access.txt	introduce HAVE_EFFICIENT_UNALIGNED_ACCESS Kconfig symbol	2008-07-25 10:53:27 -07:00
unicode.txt
unshare.txt
VGA-softcursor.txt
video-output.txt
volatile-considered-harmful.txt	Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.	2008-07-26 12:00:06 -07:00
voyager.txt
zorro.txt