linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-21 18:42:44 +00:00

History

Aaron Lu 4efaceb1c5 mm, swap: use rbtree for swap_extent swap_extent is used to map swap page offset to backing device's block offset. For a continuous block range, one swap_extent is used and all these swap_extents are managed in a linked list. These swap_extents are used by map_swap_entry() during swap's read and write path. To find out the backing device's block offset for a page offset, the swap_extent list will be traversed linearly, with curr_swap_extent being used as a cache to speed up the search. This works well as long as swap_extents are not huge or when the number of processes that access swap device are few, but when the swap device has many extents and there are a number of processes accessing the swap device concurrently, it can be a problem. On one of our servers, the disk's remaining size is tight: $df -h Filesystem Size Used Avail Use% Mounted on ... ... /dev/nvme0n1p1 1.8T 1.3T 504G 72% /home/t4 When creating a 80G swapfile there, there are as many as 84656 swap extents. The end result is, kernel spends abou 30% time in map_swap_entry() and swap throughput is only 70MB/s. As a comparison, when I used smaller sized swapfile, like 4G whose swap_extent dropped to 2000, swap throughput is back to 400-500MB/s and map_swap_entry() is about 3%. One downside of using rbtree for swap_extent is, 'struct rbtree' takes 24 bytes while 'struct list_head' takes 16 bytes, that's 8 bytes more for each swap_extent. For a swapfile that has 80k swap_extents, that means 625KiB more memory consumed. Test: Since it's not possible to reboot that server, I can not test this patch diretly there. Instead, I tested it on another server with NVMe disk. I created a 20G swapfile on an NVMe backed XFS fs. By default, the filesystem is quite clean and the created swapfile has only 2 extents. Testing vanilla and this patch shows no obvious performance difference when swapfile is not fragmented. To see the patch's effects, I used some tweaks to manually fragment the swapfile by breaking the extent at 1M boundary. This made the swapfile have 20K extents. nr_task=4 kernel swapout(KB/s) map_swap_entry(perf) swapin(KB/s) map_swap_entry(perf) vanilla 165191 90.77% 171798 90.21% patched 858993 +420% 2.16% 715827 +317% 0.77% nr_task=8 kernel swapout(KB/s) map_swap_entry(perf) swapin(KB/s) map_swap_entry(perf) vanilla 306783 92.19% 318145 87.76% patched 954437 +211% 2.35% 1073741 +237% 1.57% swapout: the throughput of swap out, in KB/s, higher is better 1st map_swap_entry: cpu cycles percent sampled by perf swapin: the throughput of swap in, in KB/s, higher is better. 2nd map_swap_entry: cpu cycles percent sampled by perf nr_task=1 doesn't show any difference, this is due to the curr_swap_extent can be effectively used to cache the correct swap extent for single task workload. [akpm@linux-foundation.org: s/BUG_ON(1)/BUG()/] Link: http://lkml.kernel.org/r/20190523142404.GA181@aaronlu Signed-off-by: Aaron Lu <ziqian.lzq@antfin.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-07-12 11:05:43 -07:00
..
kasan	mm/kasan: change kasan_check_{read,write} to return boolean	2019-07-12 11:05:42 -07:00
backing-dev.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
balloon_compaction.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
cleancache.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482	2019-06-19 17:09:52 +02:00
cma_debug.c	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()	2019-05-14 09:47:45 -07:00
cma.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98	2019-05-24 17:37:54 +02:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	mm, compaction: make sure we isolate a valid PFN	2019-06-01 15:51:32 -07:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	mm: update references to page _refcount	2019-05-14 19:52:47 -07:00
dmapool.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 403	2019-06-05 17:37:13 +02:00
early_ioremap.c	mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep	2017-12-11 14:54:44 +01:00
fadvise.c	vfs: implement readahead(2) using POSIX_FADV_WILLNEED	2018-08-30 20:01:32 +02:00
failslab.c	mm/failslab.c: by default, do not fail allocations with direct reclaim only	2019-07-12 11:05:43 -07:00
filemap.c	mm/filemap.c: correct the comment about VM_FAULT_RETRY	2019-07-12 11:05:43 -07:00
frame_vector.c	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'	2017-12-14 16:00:48 -08:00
frontswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482	2019-06-19 17:09:52 +02:00
gup_benchmark.c	mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM	2019-05-14 09:47:45 -07:00
gup.c	mm/gup.c: make follow_page_mask() static	2019-07-12 11:05:42 -07:00
highmem.c	mm: convert totalram_pages and totalhigh_pages variables to atomic	2018-12-28 12:11:47 -08:00
hmm.c	mm/devm_memremap_pages: fix final page put race	2019-06-13 17:34:56 -10:00
huge_memory.c	Revert "mm: page cache: store only head pages in i_pages"	2019-07-05 19:55:18 -07:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hugetlb.c	mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge	2019-06-29 16:43:45 +08:00
hwpoison-inject.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
interval_tree.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 248	2019-06-19 17:09:08 +02:00
Kconfig	Linux 5.2-rc4	2019-06-14 14:18:53 -06:00
Kconfig.debug	mm, debug_pagealloc: use a page type instead of page_ext flag	2019-07-12 11:05:43 -07:00
khugepaged.c	Revert "mm: page cache: store only head pages in i_pages"	2019-07-05 19:55:18 -07:00
kmemleak-test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333	2019-06-05 17:37:06 +02:00
kmemleak.c	mm/kmemleak.c: change error at _write when kmemleak is disabled	2019-07-12 11:05:42 -07:00
ksm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482	2019-06-19 17:09:52 +02:00
list_lru.c	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node	2019-06-13 17:34:56 -10:00
maccess.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
madvise.c	mm/mmu_notifier: use correct mmu_notifier events for each invalidation	2019-05-14 09:47:49 -07:00
Makefile	mm: shuffle initial free memory to improve memory-side-cache utilization	2019-05-14 19:52:48 -07:00
memblock.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
memcontrol.c	mm/memcontrol: fix wrong statistics in memory.stat	2019-07-12 11:05:40 -07:00
memfd.c	Revert "mm: page cache: store only head pages in i_pages"	2019-07-05 19:55:18 -07:00
memory_hotplug.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
memory-failure.c	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2019-07-08 21:48:15 -07:00
memory.c	mm, swap: fix race between swapoff and some swap operations	2019-07-12 11:05:43 -07:00
mempolicy.c	mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask	2019-06-29 16:43:44 +08:00
mempool.c	docs/core-api/mm: fix return value descriptions in mm/	2019-03-05 21:07:20 -08:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	Revert "mm: page cache: store only head pages in i_pages"	2019-07-05 19:55:18 -07:00
mincore.c	mm/mincore.c: make mincore() more conservative	2019-05-14 19:52:48 -07:00
mlock.c	mm/mlock.c: change count_mm_mlocked_page_nr return type	2019-06-13 17:34:56 -10:00
mm_init.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
mmu_context.c
mmu_gather.c	mm: mmu_gather: remove __tlb_reset_range() for force flush	2019-06-13 17:34:56 -10:00
mmu_notifier.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499	2019-06-19 17:09:53 +02:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	mm/mprotect.c: fix compilation warning because of unused 'mm' variable	2019-05-14 09:47:51 -07:00
mremap.c	mm/mmu_notifier: contextual information for event triggering invalidation	2019-05-14 09:47:49 -07:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nommu.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
oom_kill.c	mm/oom_kill.c: fix uninitialized oc->constraint	2019-06-29 16:43:45 +08:00
page_alloc.c	mm, debug_pagealloc: use a page type instead of page_ext flag	2019-07-12 11:05:43 -07:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	mm, debug_pagealloc: use a page type instead of page_ext flag	2019-07-12 11:05:43 -07:00
page_idle.c	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn	2019-06-29 16:43:45 +08:00
page_io.c	mm, swap: use rbtree for swap_extent	2019-07-12 11:05:43 -07:00
page_isolation.c	mm/page_isolation.c: change the prototype of undo_isolate_page_range()	2019-07-12 11:05:43 -07:00
page_owner.c	mm/page_owner: Simplify stack trace handling	2019-04-29 12:37:50 +02:00
page_poison.c	page_poison: play nicely with KASAN	2019-03-05 21:07:13 -08:00
page_vma_mapped.c	mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly	2018-10-31 08:54:11 -07:00
page-writeback.c	mm: remove the account_page_dirtied export	2019-07-12 11:05:42 -07:00
pagewalk.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
percpu-internal.h	percpu: convert chunk hints to be based on pcpu_block_md	2019-03-13 12:25:31 -07:00
percpu-km.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-stats.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu-vm.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
percpu.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428	2019-06-05 17:37:16 +02:00
pgtable-generic.c	x86/mm: Page size aware flush_tlb_mm_range()	2018-10-09 16:51:11 +02:00
process_vm_access.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
rmap.c	mm/rmap.c: use the pra.mapcount to do the check	2019-05-14 09:47:49 -07:00
rodata_test.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441	2019-06-05 17:37:17 +02:00
shmem.c	Revert "mm: page cache: store only head pages in i_pages"	2019-07-05 19:55:18 -07:00
shuffle.c	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
shuffle.h	mm: maintain randomization of page free lists	2019-05-14 19:52:48 -07:00
slab_common.c	mm/kasan: add object validation in ksize()	2019-07-12 11:05:42 -07:00
slab.c	mm/slab: refactor common ksize KASAN logic into slab_common.c	2019-07-12 11:05:42 -07:00
slab.h	mm/slab: sanity-check page type when looking up cache	2019-07-12 11:05:41 -07:00
slob.c	mm/slab: refactor common ksize KASAN logic into slab_common.c	2019-07-12 11:05:42 -07:00
slub.c	mm/slab: refactor common ksize KASAN logic into slab_common.c	2019-07-12 11:05:42 -07:00
sparse-vmemmap.c	mm: remove include/linux/bootmem.h	2018-10-31 08:54:16 -07:00
sparse.c	mm/sparse.c: clean up obsolete code comment	2019-05-14 09:47:48 -07:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm, swap, get_swap_pages: use entry_size instead of cluster in parameter	2018-08-22 10:52:44 -07:00
swap_state.c	mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device()	2019-07-12 11:05:43 -07:00
swap.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
swapfile.c	mm, swap: use rbtree for swap_extent	2019-07-12 11:05:43 -07:00
truncate.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
usercopy.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
userfaultfd.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499	2019-06-19 17:09:53 +02:00
util.c	prctl_set_mm: downgrade mmap_sem to read lock	2019-06-01 15:51:31 -07:00
vmacache.c	mm: get rid of vmacache_flush_all() entirely	2018-09-13 15:18:04 -10:00
vmalloc.c	arm64 updates for 5.3:	2019-07-08 09:54:55 -07:00
vmpressure.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
vmscan.c	mm: vmscan: scan anonymous pages on file refaults	2019-07-12 11:05:39 -07:00
vmstat.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
workingset.c	mm: memcontrol: make cgroup stats and events query API explicitly local	2019-05-14 19:52:53 -07:00
z3fold.c	mm/z3fold.c: lock z3fold page before __SetPageMovable()	2019-07-12 11:05:40 -07:00
zbud.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zpool.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
zsmalloc.c	mm/zsmalloc.c: fix fall-through annotation	2018-10-26 16:26:35 -07:00
zswap.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157	2019-05-30 11:26:37 -07:00