linux

History

Huang Ying ec560175c0 mm, swap: VMA based swap readahead The swap readahead is an important mechanism to reduce the swap in latency. Although pure sequential memory access pattern isn't very popular for anonymous memory, the space locality is still considered valid. In the original swap readahead implementation, the consecutive blocks in swap device are readahead based on the global space locality estimation. But the consecutive blocks in swap device just reflect the order of page reclaiming, don't necessarily reflect the access pattern in virtual memory. And the different tasks in the system may have different access patterns, which makes the global space locality estimation incorrect. In this patch, when page fault occurs, the virtual pages near the fault address will be readahead instead of the swap slots near the fault swap slot in swap device. This avoid to readahead the unrelated swap slots. At the same time, the swap readahead is changed to work on per-VMA from globally. So that the different access patterns of the different VMAs could be distinguished, and the different readahead policy could be applied accordingly. The original core readahead detection and scaling algorithm is reused, because it is an effect algorithm to detect the space locality. The test and result is as follow, Common test condition ===================== Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM) Swap device: NVMe disk Micro-benchmark with combined access pattern ============================================ vm-scalability, sequential swap test case, 4 processes to eat 50G virtual memory space, repeat the sequential memory writing until 300 seconds. The first round writing will trigger swap out, the following rounds will trigger sequential swap in and out. At the same time, run vm-scalability random swap test case in background, 8 processes to eat 30G virtual memory space, repeat the random memory write until 300 seconds. This will trigger random swap-in in the background. This is a combined workload with sequential and random memory accessing at the same time. The result (for sequential workload) is as follow, Base Optimized ---- --------- throughput 345413 KB/s 414029 KB/s (+19.9%) latency.average 97.14 us 61.06 us (-37.1%) latency.50th 2 us 1 us latency.60th 2 us 1 us latency.70th 98 us 2 us latency.80th 160 us 2 us latency.90th 260 us 217 us latency.95th 346 us 369 us latency.99th 1.34 ms 1.09 ms ra_hit% 52.69% 99.98% The original swap readahead algorithm is confused by the background random access workload, so readahead hit rate is lower. The VMA-base readahead algorithm works much better. Linpack ======= The test memory size is bigger than RAM to trigger swapping. Base Optimized ---- --------- elapsed_time 393.49 s 329.88 s (-16.2%) ra_hit% 86.21% 98.82% The score of base and optimized kernel hasn't visible changes. But the elapsed time reduced and readahead hit rate improved, so the optimized kernel runs better for startup and tear down stages. And the absolute value of readahead hit rate is high, shows that the space locality is still valid in some practical workloads. Link: http://lkml.kernel.org/r/20170807054038.1843-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-09-06 17:27:29 -07:00
..
kasan	Merge branch 'linus' into locking/core, to pick up fixes	2017-08-10 12:20:53 +02:00
backing-dev.c	bdi: Drop 'parent' argument from bdi_register[_va]()	2017-04-20 12:09:55 -06:00
balloon_compaction.c	mm/balloon_compaction.c: don't zero ballooned pages	2017-08-10 15:54:07 -07:00
bootmem.c	mm/bootmem.c: cosmetic improvement of code readability	2017-02-22 16:41:29 -08:00
cleancache.c	fs: switch ->s_uuid to uuid_t	2017-06-05 16:59:12 +02:00
cma_debug.c	mm/cma_debug.c: fix stack corruption due to sprintf usage	2017-08-18 15:32:02 -07:00
cma.c	cma: fix calculation of aligned offset	2017-07-10 16:32:32 -07:00
cma.h	cma: Store a name in the cma structure	2017-04-18 20:41:12 +02:00
compaction.c	mm, compaction: skip over holes in __reset_isolation_suitable	2017-07-06 16:24:32 -07:00
debug_page_ref.c
debug.c	mm: make tlb_flush_pending global	2017-08-10 15:54:07 -07:00
dmapool.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
early_ioremap.c	x86/mm: Add support to access boot related data in the clear	2017-07-18 11:38:02 +02:00
fadvise.c	mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED	2016-12-20 09:48:46 -08:00
failslab.c
filemap.c	mm: use find_get_pages_range() in filemap_range_has_page()	2017-09-06 17:27:27 -07:00
frame_vector.c	treewide: use kv[mz]alloc* rather than opencoded variants	2017-05-08 17:15:13 -07:00
frontswap.c
gup.c	mm/gup: make __gup_device_* require THP	2017-09-06 17:27:26 -07:00
highmem.c
huge_memory.c	mm, THP, swap: support splitting THP for THP swap out	2017-09-06 17:27:28 -07:00
hugetlb_cgroup.c
hugetlb.c	mm, hugetlb: do not allocate non-migrateable gigantic pages from movable zones	2017-09-06 17:27:29 -07:00
hwpoison-inject.c	mm: hwpoison: call shake_page() unconditionally	2017-05-03 15:52:12 -07:00
init-mm.c	mm: Add a user_ns owner to mm_struct and fix ptrace permission checks	2016-11-22 11:49:48 -06:00
internal.h	mm, memory_hotplug: drop zone from build_all_zonelists	2017-09-06 17:27:25 -07:00
interval_tree.c
Kconfig	mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups	2017-09-06 17:27:29 -07:00
Kconfig.debug	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
khugepaged.c	mm: make PR_SET_THP_DISABLE immediately active	2017-07-10 16:32:31 -07:00
kmemcheck.c	mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU	2017-04-18 11:42:36 -07:00
kmemleak-test.c
kmemleak.c	mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects	2017-07-06 16:24:34 -07:00
ksm.c	mm/ksm.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
list_lru.c	mm/list_lru.c: fix list_lru_count_node() to be race free	2017-07-10 16:32:33 -07:00
maccess.c
madvise.c	mm, madvise: ensure poisoned pages are removed from per-cpu lists	2017-08-31 16:33:15 -07:00
Makefile	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
memblock.c	mm/memblock.c: reversed logic in memblock_discard()	2017-08-25 16:12:46 -07:00
memcontrol.c	memcg, THP, swap: make mem_cgroup_swapout() support THP	2017-09-06 17:27:28 -07:00
memory_hotplug.c	mm, memory_hotplug: get rid of zonelists_mutex	2017-09-06 17:27:26 -07:00
memory-failure.c	x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages	2017-08-17 10:30:49 +02:00
memory.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
mempolicy.c	mm/mempolicy: fix use after free when calling get_mempolicy	2017-08-18 15:32:02 -07:00
mempool.c	sched/wait: Rename wait_queue_t => wait_queue_entry_t	2017-06-20 12:18:27 +02:00
memtest.c
migrate.c	Sanitize 'move_pages()' permission checks	2017-08-20 13:26:27 -07:00
mincore.c	mm: remove shmem_mapping() shmem_zero_setup() duplicates	2017-02-24 17:46:56 -08:00
mlock.c	mlock: fix mlock count can not decrease in race condition	2017-06-02 15:07:38 -07:00
mm_init.c
mmap.c	userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds	2017-09-06 17:27:29 -07:00
mmu_context.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
mmu_notifier.c	mm/mmu_notifier: kill invalidate_page	2017-08-31 16:13:00 -07:00
mmzone.c	mm/mmzone.c: swap likely to unlikely as code logic is different for next_zones_zonelist()	2017-02-22 16:41:29 -08:00
mprotect.c	mm: migrate: prevent racy access to tlb_flush_pending	2017-08-10 15:54:07 -07:00
mremap.c	mm/mremap: fail map duplication attempts for private mappings	2017-09-06 17:27:26 -07:00
msync.c
nobootmem.c	mm: discard memblock data later	2017-08-18 15:32:01 -07:00
nommu.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
oom_kill.c	mm/oom_kill.c: add tracepoints for oom reaper-related events	2017-07-10 16:32:32 -07:00
page_alloc.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
page_counter.c
page_ext.c	mm, page_ext: periodically reschedule during page_ext_init()	2017-09-06 17:27:26 -07:00
page_idle.c	mm/page_idle.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
page_io.c	mm: test code to write THP to swap device as a whole	2017-09-06 17:27:28 -07:00
page_isolation.c	mm: unify new_node_page and alloc_migrate_target	2017-07-10 16:32:31 -07:00
page_owner.c	mm, page_owner: don't grab zone->lock for init_pages_in_zone()	2017-09-06 17:27:26 -07:00
page_poison.c	mm: enable page poisoning early at boot	2017-05-03 15:52:10 -07:00
page_vma_mapped.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
page-writeback.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
pagewalk.c	mm/hugetlb: add size parameter to huge_pte_offset()	2017-07-06 16:24:34 -07:00
percpu-internal.h	percpu: fix early calls for spinlock in pcpu_stats	2017-06-21 13:53:52 -04:00
percpu-km.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu-stats.c	percpu: expose statistics about percpu memory via debugfs	2017-06-20 15:31:38 -04:00
percpu-vm.c	percpu: fix static checker warnings in pcpu_destroy_chunk	2017-06-29 11:23:38 -04:00
percpu.c	percpu: resolve err may not be initialized in pcpu_alloc	2017-06-21 12:00:45 -04:00
pgtable-generic.c	mm: convert generic code to 5-level paging	2017-03-09 11:48:47 -08:00
process_vm_access.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>	2017-03-02 08:42:28 +01:00
quicklist.c
readahead.c	mm: don't cap request size based on read-ahead setting	2016-12-12 18:55:08 -08:00
rmap.c	mm/rmap: update to new mmu_notifier semantic v2	2017-08-31 16:12:59 -07:00
rodata_test.c	mm: remove rodata_test_data export, add pr_fmt	2017-05-03 15:52:09 -07:00
shmem.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
slab_common.c	mm: allow slab_nomerge to be set at build time	2017-07-06 16:24:31 -07:00
slab.c	mm: memcontrol: account slab stats per lruvec	2017-07-06 16:24:35 -07:00
slab.h	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slob.c	locking/lockdep: Rework FS_RECLAIM annotation	2017-08-10 12:29:03 +02:00
slub.c	mm/slub.c: constify attribute_group structures	2017-09-06 17:27:27 -07:00
sparse-vmemmap.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
sparse.c	mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations	2017-09-06 17:27:26 -07:00
swap_cgroup.c	mm, THP, swap: delay splitting THP during swap out	2017-07-06 16:24:31 -07:00
swap_slots.c	mm/swap_slots.c: don't disable preemption while taking the per-CPU cache	2017-07-10 16:32:32 -07:00
swap_state.c	mm, swap: VMA based swap readahead	2017-09-06 17:27:29 -07:00
swap.c	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
swapfile.c	mm, THP, swap: support splitting THP for THP swap out	2017-09-06 17:27:28 -07:00
truncate.c	mm/truncate.c: fix THP handling in invalidate_mapping_pages()	2017-07-10 16:32:32 -07:00
usercopy.c	mm/usercopy: Drop extra is_vmalloc_or_module() check	2017-04-05 12:30:18 -07:00
userfaultfd.c	userfaultfd: shmem: wire up shmem_mfill_zeropage_pte	2017-09-06 17:27:28 -07:00
util.c	mm: rename global_page_state to global_zone_page_state	2017-09-06 17:27:29 -07:00
vmacache.c	sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
vmalloc.c	mm/vmalloc.c: don't reinvent the wheel but use existing llist API	2017-09-06 17:27:29 -07:00
vmpressure.c	mm, vmpressure: pass-through notification support	2017-07-10 16:32:31 -07:00
vmscan.c	mm, THP, swap: add THP swapping out fallback counting	2017-09-06 17:27:28 -07:00
vmstat.c	mm, swap: add swap readahead hit statistics	2017-09-06 17:27:29 -07:00
workingset.c	mm: memcontrol: per-lruvec stats infrastructure	2017-07-06 16:24:35 -07:00
z3fold.c	z3fold: fix page locking in z3fold_alloc()	2017-04-13 18:24:20 -07:00
zbud.c
zpool.c
zsmalloc.c	zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse	2017-09-06 17:27:26 -07:00
zswap.c	mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()	2017-07-06 16:24:35 -07:00