linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-12 22:23:55 +00:00

History

Kirill Tkhai f90280d6b7 mm/vmscan.c: clear shrinker bit if there are no objects related to memcg To avoid further unneed calls of do_shrink_slab() for shrinkers, which already do not have any charged objects in a memcg, their bits have to be cleared. This patch introduces a lockless mechanism to do that without races without parallel list lru add. After do_shrink_slab() returns SHRINK_EMPTY the first time, we clear the bit and call it once again. Then we restore the bit, if the new return value is different. Note, that single smp_mb__after_atomic() in shrink_slab_memcg() covers two situations: 1)list_lru_add() shrink_slab_memcg list_add_tail() for_each_set_bit() <--- read bit do_shrink_slab() <--- missed list update (no barrier) <MB> <MB> set_bit() do_shrink_slab() <--- seen list update This situation, when the first do_shrink_slab() sees set bit, but it doesn't see list update (i.e., race with the first element queueing), is rare. So we don't add <MB> before the first call of do_shrink_slab() instead of this to do not slow down generic case. Also, it's need the second call as seen in below in (2). 2)list_lru_add() shrink_slab_memcg() list_add_tail() ... set_bit() ... ... for_each_set_bit() do_shrink_slab() do_shrink_slab() clear_bit() ... ... ... list_lru_add() ... list_add_tail() clear_bit() <MB> <MB> set_bit() do_shrink_slab() The barriers guarantee that the second do_shrink_slab() in the right side task sees list update if really cleared the bit. This case is drawn in the code comment. [Results/performance of the patchset] After the whole patchset applied the below test shows signify increase of performance: $echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy $mkdir /sys/fs/cgroup/memory/ct $echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes $for i in `seq 0 4000`; do mkdir /sys/fs/cgroup/memory/ct/$i; echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs; mkdir -p s/$i; mount -t tmpfs $i s/$i; touch s/$i/file; done Then, 5 sequential calls of drop caches: $time echo 3 > /proc/sys/vm/drop_caches 1)Before: 0.00user 13.78system 0:13.78elapsed 99%CPU 0.00user 5.59system 0:05.60elapsed 99%CPU 0.00user 5.48system 0:05.48elapsed 99%CPU 0.00user 8.35system 0:08.35elapsed 99%CPU 0.00user 8.34system 0:08.35elapsed 99%CPU 2)After 0.00user 1.10system 0:01.10elapsed 99%CPU 0.00user 0.00system 0:00.01elapsed 64%CPU 0.00user 0.01system 0:00.01elapsed 82%CPU 0.00user 0.00system 0:00.01elapsed 64%CPU 0.00user 0.01system 0:00.01elapsed 82%CPU The results show the performance increases at least in 548 times. Shakeel Butt tested this patchset with fork-bomb on his configuration: > I created 255 memcgs, 255 ext4 mounts and made each memcg create a > file containing few KiBs on corresponding mount. Then in a separate > memcg of 200 MiB limit ran a fork-bomb. > > I ran the "perf record -ag -- sleep 60" and below are the results: > > Without the patch series: > Samples: 4M of event 'cycles', Event count (approx.): 3279403076005 > + 36.40% fb.sh [kernel.kallsyms] [k] shrink_slab > + 18.97% fb.sh [kernel.kallsyms] [k] list_lru_count_one > + 6.75% fb.sh [kernel.kallsyms] [k] super_cache_count > + 0.49% fb.sh [kernel.kallsyms] [k] down_read_trylock > + 0.44% fb.sh [kernel.kallsyms] [k] mem_cgroup_iter > + 0.27% fb.sh [kernel.kallsyms] [k] up_read > + 0.21% fb.sh [kernel.kallsyms] [k] osq_lock > + 0.13% fb.sh [kernel.kallsyms] [k] shmem_unused_huge_count > + 0.08% fb.sh [kernel.kallsyms] [k] shrink_node_memcg > + 0.08% fb.sh [kernel.kallsyms] [k] shrink_node > > With the patch series: > Samples: 4M of event 'cycles', Event count (approx.): 2756866824946 > + 47.49% fb.sh [kernel.kallsyms] [k] down_read_trylock > + 30.72% fb.sh [kernel.kallsyms] [k] up_read > + 9.51% fb.sh [kernel.kallsyms] [k] mem_cgroup_iter > + 1.69% fb.sh [kernel.kallsyms] [k] shrink_node_memcg > + 1.35% fb.sh [kernel.kallsyms] [k] mem_cgroup_protected > + 1.05% fb.sh [kernel.kallsyms] [k] queued_spin_lock_slowpath > + 0.85% fb.sh [kernel.kallsyms] [k] _raw_spin_lock > + 0.78% fb.sh [kernel.kallsyms] [k] lruvec_lru_size > + 0.57% fb.sh [kernel.kallsyms] [k] shrink_node > + 0.54% fb.sh [kernel.kallsyms] [k] queue_work_on > + 0.46% fb.sh [kernel.kallsyms] [k] shrink_slab_memcg [ktkhai@virtuozzo.com: v9] Link: http://lkml.kernel.org/r/153112561772.4097.11011071937553113003.stgit@localhost.localdomain Link: http://lkml.kernel.org/r/153063070859.1818.11870882950920963480.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Li RongQing <lirongqing@baidu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sahitya Tummala <stummala@codeaurora.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2018-08-17 16:20:31 -07:00
..
kasan	kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN	2018-08-17 16:20:30 -07:00
backing-dev.c	bdi: Fix another oops in wb_workfn()	2018-06-22 12:08:07 -06:00
balloon_compaction.c	virtio_balloon: fix deadlock on OOM	2017-11-14 23:57:38 +02:00
bootmem.c	docs/mm: bootmem: add overview documentation	2018-08-02 12:17:27 -06:00
cleancache.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma_debug.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
cma.c	Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"	2018-05-24 10:07:50 -07:00
cma.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compaction.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
debug_page_ref.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
debug.c	mm: teach dump_page() to correctly output poisoned struct pages	2018-07-03 17:32:19 -07:00
dmapool.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
early_ioremap.c	mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep	2017-12-11 14:54:44 +01:00
fadvise.c	mm/fadvise.c: fix signed overflow UBSAN complaint	2018-08-17 16:20:30 -07:00
failslab.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
filemap.c	mm: use new return type vm_fault_t	2018-06-07 17:34:36 -07:00
frame_vector.c	mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'	2017-12-14 16:00:48 -08:00
frontswap.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
gup_benchmark.c	treewide: kvzalloc() -> kvcalloc()	2018-06-12 16:19:22 -07:00
gup.c	mm: do not bug_on on incorrect length in __mm_populate()	2018-07-14 11:11:10 -07:00
highmem.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
hmm.c	mm: convert return type of handle_mm_fault() caller to vm_fault_t	2018-08-17 16:20:28 -07:00
huge_memory.c	mm, huge page: copy target sub-page last when copy huge page	2018-08-17 16:20:29 -07:00
hugetlb_cgroup.c	mm: rename page_counter's count/limit into usage/max	2018-06-07 17:34:35 -07:00
hugetlb.c	mm, hugetlbfs: pass fault address to cow handler	2018-08-17 16:20:29 -07:00
hwpoison-inject.c	mm/memory_failure: Remove unused trapno from memory_failure	2018-01-23 12:17:42 -06:00
init-mm.c	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	2018-07-17 09:35:30 +02:00
internal.h	Changes for 4.18:	2018-06-05 13:24:20 -07:00
interval_tree.c	mm/interval_tree.c: use vma_pages() helper	2018-01-31 17:18:37 -08:00
Kconfig	mm: make DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM	2018-08-17 16:20:30 -07:00
Kconfig.debug	kmemcheck: rip it out	2017-11-15 18:21:05 -08:00
khugepaged.c	mm: thp: pass correct vm_flags to hugepage_vma_check()	2018-08-17 16:20:30 -07:00
kmemleak-test.c
kmemleak.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
ksm.c	mm: convert return type of handle_mm_fault() caller to vm_fault_t	2018-08-17 16:20:28 -07:00
list_lru.c	mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance	2018-08-17 16:20:31 -07:00
maccess.c	mm: docs: fix parameter names mismatch	2018-02-06 18:32:48 -08:00
madvise.c	mm/memory_failure: Remove unused trapno from memory_failure	2018-01-23 12:17:42 -06:00
Makefile	mm: restructure memfd code	2018-06-07 17:34:35 -07:00
memblock.c	mm/memblock.c: replace u64 with phys_addr_t where appropriate	2018-08-17 16:20:30 -07:00
memcontrol.c	mm/vmscan.c: clear shrinker bit if there are no objects related to memcg	2018-08-17 16:20:31 -07:00
memfd.c	alloc_file(): switch to passing O_... flags instead of FMODE_... mode	2018-07-12 10:02:57 -04:00
memory_hotplug.c	mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()	2018-08-17 16:20:29 -07:00
memory-failure.c	mm, migrate: remove reason argument from new_page_t	2018-04-11 10:28:32 -07:00
memory.c	memcg, oom: move out_of_memory back to the charge path	2018-08-17 16:20:30 -07:00
mempolicy.c	mm: use vma_init() to initialize VMAs on stack and data segments	2018-07-26 19:38:03 -07:00
mempool.c	mm/mempool.c: remove unused argument in kasan_unpoison_element() and remove_element()	2018-08-17 16:20:28 -07:00
memtest.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
migrate.c	dax: remove VM_MIXEDMAP for fsdax and device dax	2018-08-17 16:20:27 -07:00
mincore.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mlock.c	dax: remove VM_MIXEDMAP for fsdax and device dax	2018-08-17 16:20:27 -07:00
mm_init.c
mmap.c	dax: remove VM_MIXEDMAP for fsdax and device dax	2018-08-17 16:20:27 -07:00
mmu_context.c
mmu_notifier.c	mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks	2018-01-31 17:18:38 -08:00
mmzone.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mprotect.c	x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings	2018-06-20 19:10:01 +02:00
mremap.c	mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns	2018-06-15 07:55:24 +09:00
msync.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
nobootmem.c	mm/memblock: add a name for memblock flags enumeration	2018-08-02 12:17:27 -06:00
nommu.c	mm: provide a fallback for PAGE_KERNEL_EXEC for architectures	2018-08-17 16:20:29 -07:00
oom_kill.c	mm: fix oom_kill event handling	2018-06-15 07:55:25 +09:00
page_alloc.c	mm: drop VM_BUG_ON from __get_free_pages	2018-08-17 16:20:29 -07:00
page_counter.c	memcg: introduce memory.min	2018-06-07 17:34:36 -07:00
page_ext.c	mm/page_ext.c: constify lookup_page_ext() argument	2018-08-17 16:20:28 -07:00
page_idle.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
page_io.c	swap,blkcg: issue swap io with the appropriate context	2018-07-09 09:07:54 -06:00
page_isolation.c	mm, migrate: remove reason argument from new_page_t	2018-04-11 10:28:32 -07:00
page_owner.c	mm: use octal not symbolic permissions	2018-06-15 07:55:25 +09:00
page_poison.c	mm/page_poison.c: make early_page_poison_param() __init	2018-04-05 21:36:26 -07:00
page_vma_mapped.c	mm, page_vma_mapped: Introduce pfn_in_hpage()	2018-01-22 12:15:57 -08:00
page-writeback.c	mm/page-writeback.c: update stale account_page_redirty() comment	2018-08-17 16:20:30 -07:00
pagewalk.c	mm: kernel-doc: add missing parameter descriptions	2018-04-05 21:36:27 -07:00
percpu-internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
percpu-km.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu-stats.c	treewide: Use array_size() in vmalloc()	2018-06-12 16:19:22 -07:00
percpu-vm.c	percpu: allow select gfp to be passed to underlying allocators	2018-02-18 05:33:01 -08:00
percpu.c	arch: remove obsolete architecture ports	2018-04-02 20:20:12 -07:00
pgtable-generic.c	mm: do not lose dirty and accessed bits in pmdp_invalidate()	2018-01-31 17:18:38 -08:00
process_vm_access.c	mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors	2018-02-06 18:32:48 -08:00
quicklist.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
readahead.c	readahead: stricter check for bdi io_pages	2018-07-27 09:09:53 -06:00
rmap.c	mm: do not drop unused pages when userfaultd is running	2018-07-14 11:11:09 -07:00
rodata_test.c	mm: fix RODATA_TEST failure "rodata_test: test data was not read only"	2017-10-03 17:54:24 -07:00
shmem.c	shmem: use monotonic time for i_generation	2018-08-17 16:20:28 -07:00
slab_common.c	mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB	2018-08-17 16:20:30 -07:00
slab.c	treewide: kzalloc() -> kcalloc()	2018-06-12 16:19:22 -07:00
slab.h	mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB	2018-08-17 16:20:30 -07:00
slob.c	slab: __GFP_ZERO is incompatible with a constructor	2018-06-07 17:34:34 -07:00
slub.c	mm, slub: restore the original intention of prefetch_freepointer()	2018-08-17 16:20:28 -07:00
sparse-vmemmap.c	mm: merge vmem_altmap_alloc into altmap_alloc_block_buf	2018-01-08 11:46:23 -08:00
sparse.c	mm/sparse.c: make sparse_init_one_section void and remove check	2018-08-17 16:20:30 -07:00
swap_cgroup.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
swap_slots.c	mm/swap_slots.c: make swap_slots_cache_mutex and swap_slots_cache_enable_mutex static	2018-08-17 16:20:30 -07:00
swap_state.c	treewide: kvzalloc() -> kvcalloc()	2018-06-12 16:19:22 -07:00
swap.c	mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS	2018-05-22 06:59:39 -07:00
swapfile.c	for-4.19/block-20180812	2018-08-14 10:23:25 -07:00
truncate.c	page cache: use xa_lock	2018-04-11 10:28:39 -07:00
usercopy.c	usercopy: Allow boot cmdline disabling of hardening	2018-07-04 08:04:52 -07:00
userfaultfd.c	userfaultfd: prevent non-cooperative events vs mcopy_atomic races	2018-06-07 17:34:38 -07:00
util.c	mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags	2018-06-07 17:34:38 -07:00
vmacache.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
vmalloc.c	mm: provide a fallback for PAGE_KERNEL_EXEC for architectures	2018-08-17 16:20:29 -07:00
vmpressure.c	mm/vmpressure.c: convert to use match_string() helper	2018-06-07 17:34:36 -07:00
vmscan.c	mm/vmscan.c: clear shrinker bit if there are no objects related to memcg	2018-08-17 16:20:31 -07:00
vmstat.c	Revert mm/vmstat.c: fix vmstat_update() preemption BUG	2018-06-28 11:16:44 -07:00
workingset.c	mm: add SHRINK_EMPTY shrinker methods return value	2018-08-17 16:20:31 -07:00
z3fold.c	z3fold: fix reclaim lock-ups	2018-05-11 17:28:45 -07:00
zbud.c	mm: docs: fix parameter names mismatch	2018-02-06 18:32:48 -08:00
zpool.c	mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc	2018-02-21 15:35:43 -08:00
zsmalloc.c	mm/zsmalloc.c: make several functions and a struct static	2018-08-17 16:20:30 -07:00
zswap.c	zswap: re-check zswap_is_full() after do zswap_shrink()	2018-07-26 19:38:03 -07:00