linux/mm
Robin Dong d741c9cdee mm: fix nonuniform page status when writing new file with small buffer
When writing a new file with 2048 bytes buffer, such as write(fd, buffer,
2048), it will call generic_perform_write() twice for every page:

	write_begin
	mark_page_accessed(page)
	write_end

	write_begin
	mark_page_accessed(page)
	write_end

Pages 1-13 will be added to lru-pvecs in write_begin() and will *NOT* be
added to active_list even they have be accessed twice because they are not
PageLRU(page).  But when page 14th comes, all pages in lru-pvecs will be
moved to inactive_list (by __lru_cache_add() ) in first write_begin(), now
page 14th *is* PageLRU(page).  And after second write_end() only page 14th
will be in active_list.

In Hadoop environment, we do comes to this situation: after writing a
file, we find out that only 14th, 28th, 42th...  page are in active_list
and others in inactive_list.  Now kswapd works, shrinks the inactive_list,
the file only have 14th, 28th...pages in memory, the readahead request
size will be broken to only 52k (13*4k), system's performance falls
dramatically.

This problem can also replay by below steps (the machine has 8G memory):

	1. dd if=/dev/zero of=/test/file.out bs=1024 count=1048576
	2. cat another 7.5G file to /dev/null
	3. vmtouch -m 1G -v /test/file.out, it will show:

	/test/file.out
	[oooooooooooooooooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 187847/262144

	the 'o' means same pages are in memory but same are not.

The solution for this problem is simple: the 14th page should be added to
lru_add_pvecs before mark_page_accessed() just as other pages.

[akpm@linux-foundation.org: tweak comment]
[akpm@linux-foundation.org: grab better comment from the v3 patch]
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
..
backing-dev.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
bootmem.c bootmem: Fix the short description of reserve_bootmem() 2012-08-27 07:57:21 -07:00
bounce.c bounce: allow use of bounce pool via config option 2012-07-18 16:40:35 -04:00
cleancache.c ->encode_fh() API change 2012-05-29 23:28:33 -04:00
compaction.c mm: compaction: Abort async compaction if locks are contended or taking too long 2012-08-21 16:45:03 -07:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: fix implicit stat.h usage in dmapool.c 2011-10-31 09:20:12 -04:00
fadvise.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap_xip.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
filemap.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
fremap.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
frontswap.c frontswap: support exclusive gets if tmem backend is capable 2012-09-21 10:38:12 -04:00
highmem.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
huge_memory.c mm: kill vma flag VM_INSERTPAGE 2012-10-09 16:22:17 +09:00
hugetlb_cgroup.c hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate 2012-07-31 18:42:41 -07:00
hugetlb.c mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables 2012-07-31 18:42:50 -07:00
hwpoison-inject.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
init-mm.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
internal.h mm: compaction: Abort async compaction if locks are contended or taking too long 2012-08-21 16:45:03 -07:00
Kconfig mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c kmemleak: remove memset by using kzalloc 2011-01-27 18:31:51 +00:00
kmemleak.c kmemleak: Replace list_for_each_continue_rcu with new interface 2012-09-23 07:42:52 -07:00
ksm.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
maccess.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
madvise.c mm: prepare VM_DONTDUMP for using in drivers 2012-10-09 16:22:18 +09:00
Makefile mm: factor out memory isolate functions 2012-07-31 18:42:45 -07:00
memblock.c mm/memblock: Use NULL instead of 0 for pointers 2012-09-05 08:32:30 +02:00
memcontrol.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
memory_hotplug.c memory hotplug: fix section info double registration bug 2012-09-17 15:00:38 -07:00
memory-failure.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
memory.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
mempolicy.c Remove user-triggerable BUG from mpol_to_str 2012-09-06 09:37:58 -07:00
mempool.c mempool: add @gfp_mask to mempool_create_node() 2012-06-25 11:53:47 +02:00
migrate.c mm: memcg: fix compaction/migration failing due to memcg limits 2012-07-31 18:42:48 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
mm_init.c mm: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mmap.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: fix freed page still mapped in secondary MMU 2012-07-31 18:42:49 -07:00
mmzone.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: account the total_vm in the vm_stat_account() 2012-07-31 18:42:39 -07:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-11 16:04:50 -07:00
nommu.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
oom_kill.c mm, memcg: move all oom handling to memcontrol.c 2012-07-31 18:42:45 -07:00
page_alloc.c mm: remove __GFP_NO_KSWAPD 2012-10-09 16:22:15 +09:00
page_cgroup.c memcg: rename config variables 2012-07-31 18:42:43 -07:00
page_io.c mm: add support for direct_IO to highmem pages 2012-07-31 18:42:47 -07:00
page_isolation.c memory-hotplug: fix kswapd looping forever problem 2012-07-31 18:42:45 -07:00
page-writeback.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
pagewalk.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu-km.c
percpu-vm.c mm: fix kernel-doc warnings 2012-06-20 14:39:36 -07:00
percpu.c sections: fix section conflicts in mm/percpu.c 2012-10-06 03:04:44 +09:00
pgtable-generic.c arch/tile: allow building Linux with transparent huge pages enabled 2012-05-25 12:48:21 -04:00
prio_tree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
process_vm_access.c aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() 2012-05-31 17:49:32 -07:00
quicklist.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
readahead.c switch simple cases of fget_light to fdget 2012-09-26 22:20:08 -04:00
rmap.c mm: remove swap token code 2012-05-29 16:22:19 -07:00
shmem.c mm: kill vma flag VM_CAN_NONLINEAR 2012-10-09 16:22:17 +09:00
slab_common.c Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-10-07 07:53:13 +09:00
slab.h Revert "mm/sl[aou]b: Move sysfs_slab_add to common" 2012-09-05 12:07:44 +03:00
slob.c Merge branch 'slab/tracing' into slab/for-linus 2012-10-03 09:57:17 +03:00
slub.c Merge branch 'slab/common-for-cgroups' into slab/for-linus 2012-10-03 09:56:37 +03:00
sparse-vmemmap.c mm: delete various needless include <linux/module.h> 2011-10-31 09:20:11 -04:00
sparse.c mm/sparse: remove index_init_lock 2012-07-31 18:42:49 -07:00
swap_state.c mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages 2012-07-31 18:42:47 -07:00
swap.c mm: fix nonuniform page status when writing new file with small buffer 2012-10-09 16:22:19 +09:00
swapfile.c mm: swapfile: clean up unuse_pte race handling 2012-07-31 18:42:48 -07:00
truncate.c mm/fs: remove truncate_range 2012-05-29 16:22:23 -07:00
util.c mm: Use __do_krealloc to do the krealloc job 2012-09-04 10:22:58 +03:00
vmalloc.c mm: kill vma flag VM_RESERVED and mm->reserved_vm counter 2012-10-09 16:22:19 +09:00
vmscan.c memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails 2012-09-17 15:00:37 -07:00
vmstat.c workqueue: make deferrable delayed_work initializer names consistent 2012-08-21 13:18:23 -07:00