linux

History

Shaohua Li 579f82901f swap: add a simple detector for inappropriate swapin readahead This is a patch to improve swap readahead algorithm. It's from Hugh and I slightly changed it. Hugh's original changelog: swapin readahead does a blind readahead, whether or not the swapin is sequential. This may be ok on harddisk, because large reads have relatively small costs, and if the readahead pages are unneeded they can be reclaimed easily - though, what if their allocation forced reclaim of useful pages? But on SSD devices large reads are more expensive than small ones: if the readahead pages are unneeded, reading them in caused significant overhead. This patch adds very simplistic random read detection. Stealing the PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages() simply looks at readahead's current success rate, and narrows or widens its readahead window accordingly. There is little science to its heuristic: it's about as stupid as can be whilst remaining effective. The table below shows elapsed times (in centiseconds) when running a single repetitive swapping load across a 1000MB mapping in 900MB ram with 1GB swap (the harddisk tests had taken painfully too long when I used mem=500M, but SSD shows similar results for that). Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch which Shaohua showed to be defective; HughNew this Nov 14 patch, with page_cluster as usual at default of 3 (8-page reads); HughPC4 this same patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0 (1-page reads: no readahead). HDD for swapping to harddisk, SSD for swapping to VertexII SSD. Seq for sequential access to the mapping, cycling five times around; Rand for the same number of random touches. Anon for a MAP_PRIVATE anon mapping; Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs. One weakness of Shaohua's vma/anon_vma approach was that it did not optimize Shmem: seen below. Konstantin's approach was perhaps mistuned, 50% slower on Seq: did not compete and is not shown below. HDD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0 Seq Anon 73921 76210 75611 76904 78191 121542 Seq Shmem 73601 73176 73855 72947 74543 118322 Rand Anon 895392 831243 871569 845197 846496 841680 Rand Shmem 1058375 1053486 827935 764955 764376 756489 SSD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0 Seq Anon 24634 24198 24673 25107 21614 70018 Seq Shmem 24959 24932 25052 25703 22030 69678 Rand Anon 43014 26146 28075 25989 26935 25901 Rand Shmem 45349 45215 28249 24268 24138 24332 These tests are, of course, two extremes of a very simple case: under heavier mixed loads I've not yet observed any consistent improvement or degradation, and wider testing would be welcome. Shaohua Li: Test shows Vanilla is slightly better in sequential workload than Hugh's patch. I observed with Hugh's patch sometimes the readahead size is shrinked too fast (from 8 to 1 immediately) in sequential workload if there is no hit. And in such case, continuing doing readahead is good actually. I don't prepare a sophisticated algorithm for the sequential workload because so far we can't guarantee sequential accessed pages are swap out sequentially. So I slightly change Hugh's heuristic - don't shrink readahead size too fast. Here is my test result (unit second, 3 runs average): Vanilla Hugh New Seq 356 370 360 Random 4525 2447 2444 Attached graph is the swapin/swapout throughput I collected with 'vmstat 2'. The first part is running a random workload (till around 1200 of the x-axis) and the second part is running a sequential workload. swapin and swapout throughput are almost identical in steady state in both workloads. These are expected behavior. while in Vanilla, swapin is much bigger than swapout especially in random workload (because wrong readahead). Original patches by: Shaohua Li and Konstantin Khlebnikov. [fengguang.wu@intel.com: swapin_nr_pages() can be static] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-02-06 13:48:51 -08:00
..
backing-dev.c	mm/backing-dev.c: check user buffer length before copying data to the related user buffer	2013-09-11 15:58:03 -07:00
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
bounce.c	block: Convert bio_for_each_segment() to bvec_iter	2013-11-23 22:33:49 -08:00
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
compaction.c	mm: improve documentation of page_order	2014-01-23 16:36:53 -08:00
debug-pagealloc.c
dmapool.c	dmapool: make DMAPOOL_DEBUG detect corruption of free marker	2012-12-11 17:22:24 -08:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
filemap.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-01-28 08:38:04 -08:00
fremap.c	mm: fix use-after-free in sys_remap_file_pages	2014-01-02 14:40:30 -08:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc	2014-01-27 21:03:39 -08:00
hugetlb_cgroup.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
hugetlb.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
hwpoison-inject.c	mm/hwpoison: add '#' to hwpoison_inject	2014-01-21 16:19:48 -08:00
init-mm.c
internal.h	mm/page-writeback.c: do not count anon pages as dirtyable memory	2014-01-29 16:22:39 -08:00
interval_tree.c	mm: add CONFIG_DEBUG_VM_RB build option	2012-10-09 16:22:42 +09:00
Kconfig	zsmalloc: move it under mm	2014-01-30 16:56:55 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c	mm: kmemleak: avoid false negatives on vmalloc'ed objects	2013-11-13 12:09:07 +09:00
ksm.c	mm: audit/fix non-modular users of module_init in core code	2014-01-23 16:36:52 -08:00
list_lru.c	mm: list_lru: fix almost infinite loop causing effective livelock	2013-10-30 12:57:46 -07:00
maccess.c
madvise.c	mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood	2013-09-30 14:31:02 -07:00
Makefile	zsmalloc: move it under mm	2014-01-30 16:56:55 -08:00
memblock.c	memblock: add limit checking to memblock_virt_alloc	2014-01-29 16:22:40 -08:00
memcontrol.c	memcg: fix mutex not unlocked on memcg_create_kmem_cache fail path	2014-01-30 16:56:56 -08:00
memory_hotplug.c	mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug	2014-01-23 16:36:52 -08:00
memory-failure.c	mm/memory-failure.c: shift page lock from head page to tail page after thp split	2014-01-23 16:36:52 -08:00
memory.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mempolicy.c	mm/mempolicy.c: fix mempolicy printing in numa_maps	2014-01-30 16:56:56 -08:00
mempool.c	mm/mempool.c: convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)	2013-09-11 15:58:14 -07:00
migrate.c	mm/migrate.c: fix setting of cpupid on page migration twice against normal page	2014-01-27 21:02:40 -08:00
mincore.c	mm: do_mincore() cleanup	2014-01-23 16:36:52 -08:00
mlock.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
mm_init.c	mm: bring back /sys/kernel/mm	2014-01-27 21:02:39 -08:00
mmap.c	mm: ignore VM_SOFTDIRTY on VMA merging	2014-01-23 16:36:53 -08:00
mmu_context.c	mm: remove old aio use_mm() comment	2013-05-07 18:38:27 -07:00
mmu_notifier.c	mm: audit/fix non-modular users of module_init in core code	2014-01-23 16:36:52 -08:00
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: numa: do not automatically migrate KSM pages	2014-01-21 16:19:48 -08:00
mremap.c	mm: revert mremap pud_free anti-fix	2013-10-16 21:35:53 -07:00
msync.c
nobootmem.c	mm/nobootmem: free_all_bootmem again	2014-01-23 16:36:52 -08:00
nommu.c	mm: add overcommit_kbytes sysctl variable	2014-01-21 16:19:44 -08:00
oom_kill.c	mm, oom: base root bonus on current usage	2014-01-30 16:56:56 -08:00
page_alloc.c	mm: show message when updating min_free_kbytes in thp	2014-01-23 16:36:52 -08:00
page_cgroup.c	Merge branch 'akpm' (incoming from Andrew)	2014-01-21 19:05:45 -08:00
page_io.c	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	2014-01-30 11:19:05 -08:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
page-writeback.c	mm/page-writeback.c: do not count anon pages as dirtyable memory	2014-01-29 16:22:39 -08:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c
percpu-vm.c	mm: fix kernel-doc warnings	2012-06-20 14:39:36 -07:00
percpu.c	Merge branch 'akpm' (incoming from Andrew)	2014-01-21 19:05:45 -08:00
pgtable-generic.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
process_vm_access.c	Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys	2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c	mm/readahead.c: fix do_readahead() for no readpage(s)	2014-01-29 16:22:40 -08:00
rmap.c	mm/rmap: fix coccinelle warnings	2014-01-23 16:36:53 -08:00
shmem.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-01-28 08:38:04 -08:00
slab_common.c	slab: fix wrong retval on kmem_cache_create_memcg error path	2014-01-29 16:22:40 -08:00
slab.c	mm: Fix warning on make htmldocs caused by slab.c	2014-01-31 13:52:25 +02:00
slab.h	memcg, slab: RCU protect memcg_params for root caches	2014-01-23 16:36:51 -08:00
slob.c	mm/sl[aou]b: Move kmallocXXX functions to common code	2013-09-04 20:51:33 +03:00
slub.c	Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux	2014-02-02 11:30:08 -08:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
swap_state.c	swap: add a simple detector for inappropriate swapin readahead	2014-02-06 13:48:51 -08:00
swap.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
swapfile.c	mm/swapfile.c: do not skip lowest_bit in scan_swap_map() scan loop	2014-01-23 16:36:53 -08:00
truncate.c	truncate: drop 'oldsize' truncate_pagecache() parameter	2013-09-12 15:38:02 -07:00
util.c	mm: add overcommit_kbytes sysctl variable	2014-01-21 16:19:44 -08:00
vmalloc.c	Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}"	2014-01-27 21:02:39 -08:00
vmpressure.c	memcg: make cgroup_event deal with mem_cgroup instead of cgroup_subsys_state	2013-11-22 18:20:43 -05:00
vmscan.c	mm/page-writeback.c: do not count anon pages as dirtyable memory	2014-01-29 16:22:39 -08:00
vmstat.c	mm: numa: return the number of base pages altered by protection changes	2013-11-13 12:09:11 +09:00
zbud.c	mm/zbud: fix some trivial typos in comments	2013-09-11 15:57:35 -07:00
zsmalloc.c	zsmalloc: add copyright	2014-01-30 16:56:55 -08:00
zswap.c	mm/zswap.c: change params from hidden to ro	2014-01-23 16:36:50 -08:00