linux/kernel
Feng Tang 56f3547bfa mm: adjust vm_committed_as_batch according to vm overcommit policy
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
cgroup for-5.9/block-20200802 2020-08-03 11:57:03 -07:00
configs compiler: remove CONFIG_OPTIMIZE_INLINING entirely 2020-04-07 10:43:42 -07:00
debug Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
dma It's been a busy cycle for documentation - hopefully the busiest for a 2020-08-04 22:47:54 -07:00
entry entry: Correct 'noinstr' attributes 2020-07-26 15:42:20 +02:00
events Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
gcov treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
irq This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
kcsan Merge branch 'kcsan' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core 2020-08-01 09:26:27 +02:00
livepatch livepatch: Make klp_apply_object_relocs static 2020-05-11 00:31:38 +02:00
locking s390: implement diag318 2020-08-06 12:59:31 -07:00
power mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
printk Printk changes for 5.9 2020-08-04 22:22:25 -07:00
rcu This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
sched This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
time Time, timers and related driver updates: 2020-08-04 18:17:37 -07:00
trace This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acct.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit_fsnotify.c fsnotify: use helpers to access data by data_type 2020-03-23 18:19:06 +01:00
audit_tree.c audit: Use struct_size() helper in alloc_chunk 2020-06-17 16:43:11 -04:00
audit_watch.c \n 2020-04-06 08:58:42 -07:00
audit.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
audit.h revert: 1320a4052e ("audit: trigger accompanying records when no rules present") 2020-07-29 10:00:36 -04:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-04-22 15:23:10 -04:00
auditsc.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c
compat.c uaccess: Selectively open read or write user access 2020-05-01 12:35:21 +10:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-05-15 11:44:34 -07:00
cpu.c The changes in this cycle are: 2020-06-03 13:06:42 -07:00
crash_core.c crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo 2020-07-02 17:56:11 +01:00
crash_dump.c crash_dump: Remove no longer used saved_max_pfn 2020-04-15 11:21:54 +02:00
cred.c exec: Teach prepare_exec_creds how exec treats uids & gids 2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-08-04 14:27:25 -07:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c
fork.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
freezer.c
futex.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c mm: remove the pgprot argument to __vmalloc 2020-06-02 10:59:11 -07:00
hung_task.c kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected 2020-06-08 11:05:56 -07:00
iomem.c
irq_work.c irq_work, smp: Allow irq_work on call_single_queue 2020-05-28 10:54:15 +02:00
jump_label.c
kallsyms.c Linux 5.8-rc7 2020-07-28 13:18:01 +02:00
kcmp.c kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve 2020-03-25 10:04:01 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks sched/rt, locking: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
Kconfig.preempt sched/Kconfig: Fix spelling mistake in user-visible help text 2019-11-12 11:35:32 +01:00
kcov.c kcov: check kcov_softirq in kcov_remote_stop() 2020-06-10 19:14:17 -07:00
kexec_core.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_elf.c
kexec_file.c integrity-v5.9 2020-08-06 11:35:57 -07:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-10 15:36:22 -07:00
kprobes.c kprobes: Remove unnecessary module_mutex locking from kprobe_optimizer() 2020-07-28 13:19:05 +02:00
ksysfs.c
kthread.c kthread: remove incorrect comment in kthread_create_on_cpu() 2020-08-07 11:33:21 -07:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
Makefile Generic implementation of common syscall, interrupt and exception 2020-08-04 21:00:11 -07:00
module_signature.c
module_signing.c
module-internal.h
module.c dyndbg: rename __verbose section to __dyndbg 2020-07-24 17:00:08 +02:00
notifier.c mm: remove vmalloc_sync_(un)mappings() 2020-06-02 10:59:12 -07:00
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: remove padata_parallel_queue 2020-07-23 17:34:18 +10:00
panic.c bug: Annotate WARN/BUG/stackfail as noinstr safe 2020-06-11 15:14:36 +02:00
params.c
pid_namespace.c pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid 2020-07-19 20:14:42 +02:00
pid.c cap-checkpoint-restore-v5.9 2020-08-04 15:02:07 -07:00
profile.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-18 13:51:39 +01:00
range.c
reboot.c arch: remove unicore32 port 2020-07-01 12:09:13 +03:00
relay.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
resource.c /dev/mem: Revoke mappings when a driver claims the region 2020-05-27 11:10:05 +02:00
rseq.c rseq: Reject unknown flags on rseq unregister 2019-12-25 10:41:20 +01:00
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Introduce addfd ioctl to seccomp user notifier 2020-07-14 16:29:42 -07:00
signal.c signal: fix typo in dequeue_synchronous_signal() 2020-07-26 23:57:52 +02:00
smp.c smp: Fix a potential usage of stale nr_cpus 2020-07-22 10:22:04 +02:00
smpboot.c
smpboot.h
softirq.c tasklets API update for v5.9-rc1 2020-08-04 13:40:35 -07:00
stackleak.c gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving 2020-06-24 07:48:28 -07:00
stacktrace.c
stop_machine.c stop_machine: Make stop_cpus() static 2020-01-17 10:19:21 +01:00
sys_ni.c y2038: allow disabling time32 system calls 2019-11-15 14:38:30 +01:00
sys.c prctl: exe link permission error changed from -EINVAL to -EPERM 2020-07-19 20:14:42 +02:00
sysctl_binary.c sysctl: Remove the sysctl system call 2019-11-26 13:03:56 -06:00
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
task_work.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
taskstats.c taskstats: fix data-race 2019-12-04 15:18:39 +01:00
test_kprobes.c
torture.c torture: Dump ftrace at shutdown only if requested 2020-06-29 12:01:45 -07:00
tracepoint.c
tsacct.c tsacct: add 64-bit btime field 2019-12-18 18:07:31 +01:00
ucount.c ucount: Make sure ucounts in /proc/sys/user don't regress again 2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c exec: Implement kernel_execve 2020-07-21 08:24:52 -05:00
up.c smp/up: Make smp_call_function_single() match SMP semantics 2020-02-07 15:34:12 +01:00
user_namespace.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
user-return-notifier.c
user.c user.c: make uidhash_table static 2020-06-04 19:06:24 -07:00
usermode_driver.c umd: Stop using split_argv 2020-07-07 11:58:59 -05:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
utsname.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
watch_queue.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
watchdog_hld.c
watchdog.c kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases 2020-06-08 11:05:56 -07:00
workqueue_internal.h
workqueue.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00