forked from Minki/linux
722eddaa40
The division is a slow operation. If the divisor is a power of 2, use a shift instead. Results were obtained using Android's version of perf (simpleperf[1]) as described below: 1. hist_field_div() is modified to call 2 test functions: test_hist_field_div_[not]_optimized(); passing them the same args. Use noinline and volatile to ensure these are not optimized out by the compiler. 2. Create a hist event trigger that uses division: events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>' >> trigger events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x' >> trigger 3. Run Android's lmkd_test[2] to generate rss_stat events, and record CPU samples with Android's simpleperf: simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g -f 2000 -o perf.data == Results == Divisor is a power of 2 (divisor == 32): test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles If the divisor is a power of 2, the optimized version is ~5.3x faster. Divisor is not a power of 2 (divisor == 33): test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles If the divisor is not a power of 2, as expected, the optimized version is slightly slower (~24% slower). [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
||
---|---|---|
.. | ||
blktrace.c | ||
bpf_trace.c | ||
bpf_trace.h | ||
error_report-traces.c | ||
fgraph.c | ||
ftrace_internal.h | ||
ftrace.c | ||
Kconfig | ||
kprobe_event_gen_test.c | ||
Makefile | ||
pid_list.c | ||
pid_list.h | ||
power-traces.c | ||
preemptirq_delay_test.c | ||
ring_buffer_benchmark.c | ||
ring_buffer.c | ||
rpm-traces.c | ||
synth_event_gen_test.c | ||
trace_benchmark.c | ||
trace_benchmark.h | ||
trace_boot.c | ||
trace_branch.c | ||
trace_clock.c | ||
trace_dynevent.c | ||
trace_dynevent.h | ||
trace_entries.h | ||
trace_eprobe.c | ||
trace_event_perf.c | ||
trace_events_filter_test.h | ||
trace_events_filter.c | ||
trace_events_hist.c | ||
trace_events_inject.c | ||
trace_events_synth.c | ||
trace_events_trigger.c | ||
trace_events.c | ||
trace_export.c | ||
trace_functions_graph.c | ||
trace_functions.c | ||
trace_hwlat.c | ||
trace_irqsoff.c | ||
trace_kdb.c | ||
trace_kprobe_selftest.c | ||
trace_kprobe_selftest.h | ||
trace_kprobe.c | ||
trace_mmiotrace.c | ||
trace_nop.c | ||
trace_osnoise.c | ||
trace_output.c | ||
trace_output.h | ||
trace_preemptirq.c | ||
trace_printk.c | ||
trace_probe_tmpl.h | ||
trace_probe.c | ||
trace_probe.h | ||
trace_recursion_record.c | ||
trace_sched_switch.c | ||
trace_sched_wakeup.c | ||
trace_selftest_dynamic.c | ||
trace_selftest.c | ||
trace_seq.c | ||
trace_stack.c | ||
trace_stat.c | ||
trace_stat.h | ||
trace_synth.h | ||
trace_syscalls.c | ||
trace_uprobe.c | ||
trace.c | ||
trace.h | ||
tracing_map.c | ||
tracing_map.h |