linux/kernel
Mel Gorman 2ebb177175 sched/core: Offload wakee task activation if it the wakee is descheduling
The previous commit:

  c6e7bd7afa: ("sched/core: Optimize ttwu() spinning on p->on_cpu")

avoids spinning on p->on_rq when the task is descheduling, but only if the
wakee is on a CPU that does not share cache with the waker.

This patch offloads the activation of the wakee to the CPU that is about to
go idle if the task is the only one on the runqueue. This potentially allows
the waker task to continue making progress when the wakeup is not strictly
synchronous.

This is very obvious with netperf UDP_STREAM running on localhost. The
waker is sending packets as quickly as possible without waiting for any
reply. It frequently wakes the server for the processing of packets and
when netserver is using local memory, it quickly completes the processing
and goes back to idle. The waker often observes that netserver is on_rq
and spins excessively leading to a drop in throughput.

This is a comparison of 5.7-rc6 against "sched: Optimize ttwu() spinning
on p->on_cpu" and against this patch labeled vanilla, optttwu-v1r1 and
localwakelist-v1r2 respectively.

                                  5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                    vanilla           optttwu-v1r1     localwakelist-v1r2
Hmean     send-64         251.49 (   0.00%)      258.05 *   2.61%*      305.59 *  21.51%*
Hmean     send-128        497.86 (   0.00%)      519.89 *   4.43%*      600.25 *  20.57%*
Hmean     send-256        944.90 (   0.00%)      997.45 *   5.56%*     1140.19 *  20.67%*
Hmean     send-1024      3779.03 (   0.00%)     3859.18 *   2.12%*     4518.19 *  19.56%*
Hmean     send-2048      7030.81 (   0.00%)     7315.99 *   4.06%*     8683.01 *  23.50%*
Hmean     send-3312     10847.44 (   0.00%)    11149.43 *   2.78%*    12896.71 *  18.89%*
Hmean     send-4096     13436.19 (   0.00%)    13614.09 (   1.32%)    15041.09 *  11.94%*
Hmean     send-8192     22624.49 (   0.00%)    23265.32 *   2.83%*    24534.96 *   8.44%*
Hmean     send-16384    34441.87 (   0.00%)    36457.15 *   5.85%*    35986.21 *   4.48%*

Note that this benefit is not universal to all wakeups, it only applies
to the case where the waker often spins on p->on_rq.

The impact can be seen from a "perf sched latency" report generated from
a single iteration of one packet size:

   -----------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
   -----------------------------------------------------------------------------------------------------------------

  vanilla
    netperf:4337          |  21709.193 ms |     2932 | avg:    0.002 ms | max:    0.041 ms | max at:    112.154512 s
    netserver:4338        |  14629.459 ms |  5146990 | avg:    0.001 ms | max: 1615.864 ms | max at:    140.134496 s

  localwakelist-v1r2
    netperf:4339          |  29789.717 ms |     2460 | avg:    0.002 ms | max:    0.059 ms | max at:    138.205389 s
    netserver:4340        |  18858.767 ms |  7279005 | avg:    0.001 ms | max:    0.362 ms | max at:    135.709683 s
   -----------------------------------------------------------------------------------------------------------------

Note that the average wakeup delay is quite small on both the vanilla
kernel and with the two patches applied. However, there are significant
outliers with the vanilla kernel with the maximum one measured as 1615
milliseconds with a vanilla kernel but never worse than 0.362 ms with
both patches applied and a much higher rate of context switching.

Similarly a separate profile of cycles showed that 2.83% of all cycles
were spent in try_to_wake_up() with almost half of the cycles spent
on spinning on p->on_rq. With the two patches, the percentage of cycles
spent in try_to_wake_up() drops to 1.13%

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: valentin.schneider@arm.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20200524202956.27665-3-mgorman@techsingularity.net
2020-05-25 07:04:10 +02:00
..
bpf bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range 2020-05-15 08:10:36 -07:00
cgroup Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2020-04-03 11:30:20 -07:00
configs compiler: remove CONFIG_OPTIMIZE_INLINING entirely 2020-04-07 10:43:42 -07:00
debug SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
dma dma-debug: fix displaying of dma allocation type 2020-04-08 21:46:57 +02:00
events perf/core: fix parent pid/tid in task exit events 2020-04-22 23:10:14 +02:00
gcov kernel/gcov/fs.c: gcov_seq_next() should increase position index 2020-04-10 15:36:22 -07:00
irq genirq: Remove setup_irq() and remove_irq() 2020-04-14 10:08:50 +02:00
livepatch New tracing features: 2019-11-27 11:42:01 -08:00
locking locking/lockdep: Improve 'invalid wait context' splat 2020-04-08 12:05:07 +02:00
power PM: hibernate: Freeze kernel threads in software_resume() 2020-04-27 10:30:30 +02:00
printk printk: queue wake_up_klogd irq_work only if per-CPU areas are ready 2020-04-10 13:18:57 -07:00
rcu Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent 2020-04-14 08:36:41 +02:00
sched sched/core: Offload wakee task activation if it the wakee is descheduling 2020-05-25 07:04:10 +02:00
time proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets 2020-04-16 12:10:54 +02:00
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:10:06 -07:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acct.c acct: stop using get_seconds() 2019-12-18 18:07:31 +01:00
async.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
audit_fsnotify.c fsnotify: use helpers to access data by data_type 2020-03-23 18:19:06 +01:00
audit_tree.c
audit_watch.c \n 2020-04-06 08:58:42 -07:00
audit.c audit: check the length of userspace generated audit records 2020-04-20 17:10:58 -04:00
audit.h audit: trigger accompanying records when no rules present 2020-03-12 10:42:51 -04:00
auditfilter.c audit: fix error handling in audit_data_to_entry() 2020-02-22 20:36:47 -05:00
auditsc.c audit: trigger accompanying records when no rules present 2020-03-12 10:42:51 -04:00
backtracetest.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
bounds.c
capability.c
compat.c y2038: remove unused time32 interfaces 2020-02-21 11:22:15 -08:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ 2020-02-14 16:05:04 +01:00
cpu_pm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
cpu.c sched/core: Fix illegal RCU from offline CPUs 2020-04-30 20:14:41 +02:00
crash_core.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 230 2019-06-19 17:09:06 +02:00
crash_dump.c
cred.c kernel: doc: remove outdated comment cred.c 2020-03-25 10:04:01 -05:00
delayacct.c
dma.c
elfcore.c kernel/elfcore.c: include proper prototypes 2019-09-25 17:51:39 -07:00
exec_domain.c
exit.c exit: Move preemption fixup up, move blocking operations down 2020-04-30 20:14:38 +02:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c fail_function: no need to check return value of debugfs_create functions 2019-06-03 15:49:06 +02:00
fork.c fork: prevent accidental access to clone3 features 2020-05-08 17:31:50 +02:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:17:15 -07:00
gen_kheaders.sh kheaders: explain why include/config/autoconf.h is excluded from md5sum 2019-11-11 20:10:01 +09:00
groups.c
hung_task.c
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c lockdep: Annotate irq_work 2020-03-21 16:00:24 +01:00
jump_label.c jump_label: Don't warn on __exit jump entries 2019-08-29 15:10:10 +01:00
kallsyms.c kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol() 2020-04-07 10:43:44 -07:00
kcmp.c kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve 2020-03-25 10:04:01 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks sched/rt, locking: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
Kconfig.preempt sched/Kconfig: Fix spelling mistake in user-visible help text 2019-11-12 11:35:32 +01:00
kcov.c kernel/kcov.c: fix typos in kcov_remote_start documentation 2020-05-07 19:27:20 -07:00
kexec_core.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_elf.c kexec_elf: support 32 bit ELF files 2019-09-06 23:58:44 +02:00
kexec_file.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kheaders.c
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-10 15:36:22 -07:00
kprobes.c kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic 2020-01-09 12:40:13 +01:00
ksysfs.c
kthread.c kthread: Do not preempt current task if it is going to call schedule() 2020-03-20 13:06:20 +01:00
latencytop.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
Makefile kcov: ignore fault-inject and stacktrace 2020-01-31 10:30:41 -08:00
module_signature.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module_signing.c MODSIGN: Export module signature definitions 2019-08-05 18:39:56 -04:00
module-internal.h
module.c Modules updates for v5.7 2020-04-09 12:52:34 -07:00
notifier.c x86/mm: split vmalloc_sync_all() 2020-03-21 18:56:06 -07:00
nsproxy.c ns: Introduce Time Namespace 2020-01-14 12:20:48 +01:00
padata.c crypto: pcrypt - simplify error handling in pcrypt_create_aead() 2020-03-06 12:28:24 +11:00
panic.c locking/refcount: Remove unused 'refcount_error_report()' function 2019-11-25 09:15:42 +01:00
params.c lockdown: Lock down module params that specify hardware parameters (eg. ioport) 2019-08-19 21:54:16 -07:00
pid_namespace.c pid: Improve the comment about waiting in zap_pid_ns_processes 2020-02-28 16:29:12 -06:00
pid.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-04-10 12:59:56 -07:00
profile.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ptrace.c ptrace: reintroduce usage of subjective credentials in ptrace_has_cap() 2020-01-18 13:51:39 +01:00
range.c
reboot.c
relay.c
resource.c mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range() 2019-09-24 15:54:09 -07:00
rseq.c rseq: Reject unknown flags on rseq unregister 2019-12-25 10:41:20 +01:00
seccomp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-03-31 17:29:33 -07:00
signal.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-04-23 13:30:18 -07:00
smp.c cpu/hotplug: Move bringup of secondary CPUs out of smp_init() 2020-03-25 12:59:37 +01:00
smpboot.c
smpboot.h
softirq.c lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}() 2020-03-21 16:03:54 +01:00
stackleak.c
stacktrace.c stacktrace: Get rid of unneeded '!!' pattern 2019-11-11 10:30:59 +01:00
stop_machine.c stop_machine: Make stop_cpus() static 2020-01-17 10:19:21 +01:00
sys_ni.c y2038: allow disabling time32 system calls 2019-11-15 14:38:30 +01:00
sys.c sys/sysinfo: Respect boottime inside time namespace 2020-03-03 19:34:32 +01:00
sysctl_binary.c sysctl: Remove the sysctl system call 2019-11-26 13:03:56 -06:00
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c mm/compaction: Disable compact_unevictable_allowed on RT 2020-04-02 09:35:31 -07:00
task_work.c task_work_run: don't take ->pi_lock unconditionally 2020-03-02 14:06:33 -07:00
taskstats.c taskstats: fix data-race 2019-12-04 15:18:39 +01:00
test_kprobes.c
torture.c CPU (hotplug) updates: 2020-03-30 18:06:39 -07:00
tracepoint.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
tsacct.c tsacct: add 64-bit btime field 2019-12-18 18:07:31 +01:00
ucount.c ucount: Make sure ucounts in /proc/sys/user don't regress again 2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:10:06 -07:00
up.c smp/up: Make smp_call_function_single() match SMP semantics 2020-02-07 15:34:12 +01:00
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user-return-notifier.c
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname_sysctl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
utsname.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
watchdog_hld.c
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-01-17 11:19:22 +01:00
workqueue_internal.h
workqueue.c workqueue: Remove the warning in wq_worker_sleeping() 2020-04-08 11:35:20 +02:00