linux/kernel
Steven Rostedt (Google) 86087383ec tracing/hist: Call hist functions directly via a switch statement
Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.

Using the histogram benchmark:

Before:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        129 } hitcount:       2213
{ delta:        130 } hitcount:     285965
{ delta:        131 } hitcount:    1146545
{ delta:        132 } hitcount:    5185432
{ delta:        133 } hitcount:   19896215
{ delta:        134 } hitcount:   53118616
{ delta:        135 } hitcount:   83816709
{ delta:        136 } hitcount:   68329562
{ delta:        137 } hitcount:   41859349
{ delta:        138 } hitcount:   46257797
{ delta:        139 } hitcount:   54400831
{ delta:        140 } hitcount:   72875007
{ delta:        141 } hitcount:   76193272
{ delta:        142 } hitcount:   49504263
{ delta:        143 } hitcount:   38821072
{ delta:        144 } hitcount:   47702679
{ delta:        145 } hitcount:   41357297
{ delta:        146 } hitcount:   22058238
{ delta:        147 } hitcount:    9720002
{ delta:        148 } hitcount:    3193542
{ delta:        149 } hitcount:     927030
{ delta:        150 } hitcount:     850772
{ delta:        151 } hitcount:    1477380
{ delta:        152 } hitcount:    2687977
{ delta:        153 } hitcount:    2865985
{ delta:        154 } hitcount:    1977492
{ delta:        155 } hitcount:    2475607
{ delta:        156 } hitcount:    3403612

After:

 # event histogram
 #
 # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
 #

{ delta:        113 } hitcount:        272
{ delta:        114 } hitcount:        840
{ delta:        118 } hitcount:        344
{ delta:        119 } hitcount:      25428
{ delta:        120 } hitcount:     350590
{ delta:        121 } hitcount:    1892484
{ delta:        122 } hitcount:    6205004
{ delta:        123 } hitcount:   11583521
{ delta:        124 } hitcount:   37590979
{ delta:        125 } hitcount:  108308504
{ delta:        126 } hitcount:  131672461
{ delta:        127 } hitcount:   88700598
{ delta:        128 } hitcount:   65939870
{ delta:        129 } hitcount:   45055004
{ delta:        130 } hitcount:   33174464
{ delta:        131 } hitcount:   31813493
{ delta:        132 } hitcount:   29011676
{ delta:        133 } hitcount:   22798782
{ delta:        134 } hitcount:   22072486
{ delta:        135 } hitcount:   17034113
{ delta:        136 } hitcount:    8982490
{ delta:        137 } hitcount:    2865908
{ delta:        138 } hitcount:     980382
{ delta:        139 } hitcount:    1651944
{ delta:        140 } hitcount:    4112073
{ delta:        141 } hitcount:    3963269
{ delta:        142 } hitcount:    1712508
{ delta:        143 } hitcount:     575941
{ delta:        144 } hitcount:     351427
{ delta:        145 } hitcount:     218077
{ delta:        146 } hitcount:     167297
{ delta:        147 } hitcount:     146198
{ delta:        148 } hitcount:     116122
{ delta:        149 } hitcount:      58993
{ delta:        150 } hitcount:      40228

The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.

Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26 13:01:10 -04:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2022-08-26 12:19:09 +01:00
cgroup cgroup: cgroup_get_from_id() must check the looked-up kn is a directory 2022-09-23 07:18:45 -10:00
configs xen: branch for v6.0-rc1b 2022-08-14 09:28:54 -07:00
debug Modules updates for v5.19-rc1 2022-05-26 17:13:43 -07:00
dma dma-mapping: mark dma_supported static 2022-09-07 10:38:28 +02:00
entry context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
events Misc fixes to kprobes and the faddr2line script, plus a cleanup. 2022-08-06 17:28:12 -07:00
futex drm for 5.19-rc1 2022-05-25 16:18:27 -07:00
gcov gcov: Remove compiler version check 2021-12-02 17:25:21 +09:00
irq irqchip/genirq updates for 5.20: 2022-07-28 12:36:35 +02:00
kcsan kcsan: test: Add a .kunitconfig to run KCSAN tests 2022-07-22 09:22:59 -06:00
livepatch Livepatching changes for 5.19 2022-06-02 08:55:01 -07:00
locking RCU pull request for v5.20 (or whatever) 2022-08-02 19:12:45 -07:00
module module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m 2022-08-15 13:51:07 -06:00
power Char / Misc driver changes for 6.0-rc1 2022-08-04 11:05:48 -07:00
printk printk: do not wait for consoles when suspended 2022-07-15 10:52:11 +02:00
rcu - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
sched Driver core fixes for 6.0-rc5 2022-09-09 15:08:40 -04:00
time time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 2022-08-09 20:02:13 +02:00
trace tracing/hist: Call hist functions directly via a switch statement 2022-09-26 13:01:10 -04:00
.gitignore
acct.c kernel/acct: move acct sysctls to its own file 2022-04-06 13:43:44 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-03 11:20:34 -08:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c audit: use fsnotify group lock helpers 2022-04-25 14:37:28 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2022-04-25 14:37:12 +02:00
audit.c audit: make is_audit_feature_set() static 2022-06-13 14:08:57 -04:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-02-22 13:51:40 -05:00
auditfilter.c audit/stable-5.17 PR 20220110 2022-01-11 13:08:21 -08:00
auditsc.c audit: move audit_return_fixup before the filters 2022-08-25 17:25:08 -04:00
backtracetest.c
bounds.c
capability.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
cfi.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
compat.c
configs.c
context_tracking.c MAINTAINERS: Add Paul as context tracking maintainer 2022-07-05 13:33:00 -07:00
cpu_pm.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
cpu.c Intel Trust Domain Extensions 2022-05-23 17:51:12 -07:00
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c x86: Mark __invalid_creds() __noreturn 2022-03-15 10:32:44 +01:00
delayacct.c delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
dma.c
exec_domain.c
exit.c exit: Fix typo in comment: s/sub-theads/sub-threads 2022-08-03 10:44:54 +02:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c
fork.c execve reverts for v6.0-rc7 2022-09-20 08:38:55 -07:00
freezer.c
gen_kheaders.sh kheaders: Have cpio unconditionally replace files 2022-05-08 03:16:59 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c kernel/hung_task: fix address space of proc_dohung_task_timeout_secs 2022-07-29 18:12:35 -07:00
iomem.c
irq_work.c irq_work: use kasan_record_aux_stack_noalloc() record callstack 2022-04-15 14:49:55 -07:00
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kallsyms_internal.h kallsyms: move declarations to internal header 2022-07-17 17:31:39 -07:00
kallsyms.c Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
kcov.c kcov: update pos before writing pc in trace function 2022-05-25 13:05:42 -07:00
kexec_core.c kexec: drop weak attribute from functions 2022-07-15 12:21:16 -04:00
kexec_elf.c
kexec_file.c Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c kprobes: Prohibit probes in gate area 2022-09-08 17:08:43 -04:00
ksysfs.c kernel/ksysfs.c: use helper macro __ATTR_RW 2022-03-23 19:00:33 -07:00
kthread.c kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal 2022-06-16 19:11:30 -07:00
latencytop.c latencytop: move sysctl to its own file 2022-04-21 11:40:59 -07:00
Makefile kernel: remove platform_has() infrastructure 2022-08-01 07:42:56 +02:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c padata: replace cpumask_weight with cpumask_empty in padata.c 2022-01-31 11:21:46 +11:00
panic.c linux-kselftest-kunit-5.20-rc1 2022-08-02 19:34:45 -07:00
params.c kobject: remove kset from struct kset_uevent_ops callbacks 2021-12-28 11:26:18 +01:00
pid_namespace.c kernel: pid_namespace: use NULL instead of using plain integer as pointer 2022-04-29 14:38:00 -07:00
pid.c pid: add pidfd_get_task() helper 2021-10-14 13:29:18 +02:00
profile.c profile: setup_profiling_timer() is moslty not implemented 2022-07-29 18:12:36 -07:00
ptrace.c ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced() 2022-07-09 11:06:19 -07:00
range.c
reboot.c Merge branch 'rework/kthreads' into for-linus 2022-06-23 19:11:28 +02:00
regset.c
relay.c relay: remove redundant assignment to pointer buf 2022-05-12 20:38:37 -07:00
resource_kunit.c
resource.c resource: Introduce alloc_free_mem_region() 2022-07-21 17:19:25 -07:00
rseq.c rseq: Kill process when unknown flags are encountered in ABI structures 2022-08-01 15:21:42 +02:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-04-11 17:07:29 -07:00
scs.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
seccomp.c seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-07 09:53:43 -07:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2022-07-19 11:40:00 -07:00
smpboot.c cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again. 2022-04-12 14:13:01 +02:00
smpboot.h
softirq.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
stackleak.c stackleak: add on/off stack variants 2022-05-08 01:33:09 -07:00
stacktrace.c uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sys.c arm64/sme: Implement vector length configuration prctl()s 2022-04-22 18:50:54 +01:00
sysctl-test.c
sysctl.c kernel/sysctl.c: Remove trailing white space 2022-08-08 09:01:36 -07:00
task_work.c task_work: allow TWA_SIGNAL without a rescheduling IPI 2022-04-30 08:39:32 -06:00
taskstats.c kernel: make taskstats available from all net namespaces 2022-04-29 14:38:03 -07:00
torture.c torture: Wake up kthreads after storing task_struct pointer 2022-02-01 17:24:39 -08:00
tracepoint.c tracepoint: Allow trace events in modules with TAINT_TEST 2022-09-06 22:26:00 -04:00
tsacct.c taskstats: version 12 with thread group and exe info 2022-04-29 14:38:03 -07:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-17 09:11:57 -06:00
uid16.c
uid16.h
umh.c kthread: Don't allocate kthread_struct for init and umh 2022-05-06 14:49:44 -05:00
up.c
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-02-25 10:40:14 -06:00
user-return-notifier.c
user.c
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname_sysctl.c
utsname.c
watch_queue.c This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
watchdog_hld.c Revert "printk: add functions to prefer direct printing" 2022-06-23 18:41:40 +02:00
watchdog.c powerpc updates for 6.0 2022-08-06 16:38:17 -07:00
workqueue_internal.h
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-08-16 06:27:35 -10:00