linux/kernel/trace
Steven Rostedt (VMware) af0009fc16 tracing: Move trace_handle_return() out of line
Currently trace_handle_return() looks like this:

 static inline enum print_line_t trace_handle_return(struct trace_seq *s)
 {
        return trace_seq_has_overflowed(s) ?
                TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
 }

Where trace_seq_overflowed(s) is:

 static inline bool trace_seq_has_overflowed(struct trace_seq *s)
 {
	return s->full || seq_buf_has_overflowed(&s->seq);
 }

And seq_buf_has_overflowed(&s->seq) is:

 static inline bool
 seq_buf_has_overflowed(struct seq_buf *s)
 {
	return s->len > s->size;
 }

Making trace_handle_return() into:

 return (s->full || (s->seq->len > s->seq->size)) ?
           TRACE_TYPE_PARTIAL_LINE :
           TRACE_TYPE_HANDLED;

One would think this is not an issue to keep as an inline. But because this
is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
the system. Taking a look at a single tracepoint x86_irq_vector (was the
first one I randomly chosen). As trace_handle_return is used in the
TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
trace_raw_output_x86_irq_vector and do a diff:

- is the original
+ is the out-of-line code

I removed identical lines that were different just due to different
addresses.

--- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
+++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
@@ -6,27 +6,23 @@
        53                      push   %rbx
        48 89 fb                mov    %rdi,%rbx
        4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
        e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
        83 f8 01                cmp    $0x1,%eax
        74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
        5b                      pop    %rbx
        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq
        41 8b 54 24 08          mov    0x8(%r12),%edx
-       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
+       48 81 c3 98 10 00 00    add    $0x1098,%rbx
-       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
+       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
-       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>

 === here's the start of the main difference ===

+       48 89 df                mov    %rbx,%rdi
+       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
-       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
-       31 c0                   xor    %eax,%eax
-       85 d2                   test   %edx,%edx
-       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
-       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
-       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
-       0f 93 c0                setae  %al
+       48 89 df                mov    %rbx,%rdi
+       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
        5b                      pop    %rbx
-       0f b6 c0                movzbl %al,%eax

 === end ===

        41 5c                   pop    %r12
        5d                      pop    %rbp
        c3                      retq

If you notice, the original has 22 bytes of text more than the out of line
version. As this is for every TRACE_EVENT() defined in the system, this can
become quite large.

   text	   data	    bss	    dec	    hex	filename
8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle

This change has a total of 8580 bytes in savings.

 $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
324

That's 324 tracepoints. But this does not include modules (which contain
many more tracepoints). For an allyesconfig build:

 $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
1401

That's 1401 tracepoints giving us:

   text    data     bss     dec     hex filename
137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle

92920 bytes in savings!!!

Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-03-24 20:51:50 -04:00
..
blktrace.c blktrace: use existing disk debugfs directory 2017-02-02 10:20:16 -07:00
bpf_trace.c bpf: mark all registered map/prog types as __ro_after_init 2017-02-17 13:40:04 -05:00
ftrace.c ftrace: Allow for function tracing to record init functions on boot up 2017-03-24 20:51:49 -04:00
Kconfig perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS 2017-03-01 10:26:39 +01:00
Makefile perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS 2017-03-01 10:26:39 +01:00
power-traces.c cpufreq: schedutil: New governor based on scheduler utilization data 2016-04-02 01:09:12 +02:00
ring_buffer_benchmark.c sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> 2017-03-02 08:42:27 +01:00
ring_buffer.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
rpm-traces.c
trace_benchmark.c tracing: Fix return value check in trace_benchmark_reg() 2017-02-15 09:02:27 -05:00
trace_benchmark.h tracing: Have the reg function allow to fail 2016-12-09 09:13:30 -05:00
trace_branch.c tracing: Add the constant count for branch tracer 2017-01-19 08:57:41 -05:00
trace_clock.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
trace_entries.h tracing: Add the constant count for branch tracer 2017-01-19 08:57:41 -05:00
trace_event_perf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-05-17 16:26:30 -07:00
trace_events_filter_test.h
trace_events_filter.c tracing/filter: Define op as the enum that it is 2016-11-14 16:42:59 -05:00
trace_events_hist.c sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
trace_events_trigger.c sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
trace_events.c tracing: Have system enable return error if one of the events fail 2016-12-09 09:15:41 -05:00
trace_export.c tracing: ftrace_event_is_function() can return boolean 2015-11-02 14:28:05 -05:00
trace_functions_graph.c tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too 2016-12-09 09:21:35 -05:00
trace_functions.c ftrace: Have function tracing start in early boot up 2017-03-24 20:51:48 -04:00
trace_hwlat.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
trace_irqsoff.c clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
trace_kdb.c tracing: Move trace_flags from global to a trace_array field 2015-09-30 15:22:55 -04:00
trace_kprobe.c sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
trace_mmiotrace.c tracing: Expose CPU physical addresses (resource values) for PCI devices 2016-06-20 09:54:22 -04:00
trace_nop.c tracing: Fix typoes in code comment and printk in trace_nop.c 2016-03-08 11:23:57 -05:00
trace_output.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> 2017-03-02 08:42:28 +01:00
trace_output.h tracing: Turn seq_print_user_ip() into a static function 2015-09-28 10:16:12 -04:00
trace_printk.c tracing: Handle NULL formats in hold_module_trace_bprintk_format() 2016-06-20 09:46:12 -04:00
trace_probe.c tracing/probes: Fix a warning message to show correct maximum length 2017-02-15 10:32:48 -05:00
trace_probe.h perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS 2017-03-01 10:26:39 +01:00
trace_sched_switch.c sched/core: Fix trace_sched_switch() 2015-10-06 17:08:15 +02:00
trace_sched_wakeup.c clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
trace_selftest_dynamic.c
trace_selftest.c sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> 2017-03-02 08:42:27 +01:00
trace_seq.c
trace_stack.c scripts/spelling.txt: add "overide" pattern and fix typo instances 2017-03-09 17:01:09 -08:00
trace_stat.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
trace_stat.h
trace_syscalls.c tracing/syscalls: fix multiline in error message text 2016-09-29 10:25:23 +02:00
trace_uprobe.c sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
trace.c tracing: Move trace_handle_return() out of line 2017-03-24 20:51:50 -04:00
trace.h ftrace: Have function tracing start in early boot up 2017-03-24 20:51:48 -04:00
tracing_map.c tracing: Handle tracing_map_alloc_elts() error path correctly 2016-04-26 09:40:30 -04:00
tracing_map.h tracing: Update some tracing_map constants and comments 2016-04-19 12:16:06 -04:00