linux/kernel/trace
Alexei Starovoitov c4f6699dfc bpf: introduce BPF_RAW_TRACEPOINT
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
..
blktrace.c blktrace: fix trace mutex deadlock 2017-11-27 12:03:58 -07:00
bpf_trace.c bpf: introduce BPF_RAW_TRACEPOINT 2018-03-28 22:55:19 +02:00
ftrace.c ftrace: Remove incorrect setting of glob search field 2018-02-08 10:11:11 -05:00
Kconfig Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-17 00:10:42 -05:00
Makefile Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
power-traces.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ring_buffer_benchmark.c ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU 2017-08-02 14:23:02 -04:00
ring_buffer.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rpm-traces.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_benchmark.c trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched() 2017-12-04 10:28:58 -08:00
trace_benchmark.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_branch.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_clock.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
trace_entries.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_event_perf.c perf/ftrace: Small cleanup 2017-10-16 18:13:28 -04:00
trace_events_filter_test.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_events_filter.c tracing: Fix parsing of globs with a wildcard at the beginning 2018-02-08 10:11:47 -05:00
trace_events_hist.c tracing: Reimplement log2 2017-10-04 13:10:39 -04:00
trace_events_trigger.c tracing: Update stack trace skipping for ORC unwinder 2018-01-23 15:57:00 -05:00
trace_events.c tracing: Make sure the parsed string always terminates with '\0' 2018-01-23 15:57:28 -05:00
trace_export.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_functions_graph.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_functions.c tracing: Update stack trace skipping for ORC unwinder 2018-01-23 15:57:00 -05:00
trace_hwlat.c trace: make trace_hwlat timestamp y2038 safe 2017-05-08 17:15:15 -07:00
trace_irqsoff.c tracing: Add support for preempt and irq enable/disable events 2017-10-10 18:58:43 -04:00
trace_kdb.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_kprobe.c error-injection: Separate error-injection from kprobe 2018-01-12 17:33:38 -08:00
trace_mmiotrace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_nop.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_output.c sched/debug: Rename task-state printing helpers 2017-10-10 11:43:29 +02:00
trace_output.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_printk.c
trace_probe.c tracing: Make traceprobe parsing code reusable 2017-10-04 13:06:44 -04:00
trace_probe.h tracing/kprobe: bpf: Check error injectable event is on function entry 2018-01-12 17:33:37 -08:00
trace_sched_switch.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_sched_wakeup.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
trace_selftest_dynamic.c ftrace: Mark function tracer test functions noinline/noclone 2018-01-23 15:57:29 -05:00
trace_selftest.c Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
trace_seq.c
trace_stack.c tracing: Have stack trace not record if RCU is not watching 2017-12-14 20:48:22 -05:00
trace_stat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_stat.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
trace_syscalls.c Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
trace_uprobe.c trace_uprobe: Display correct offset in uprobe_events 2018-01-23 15:57:29 -05:00
trace.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
trace.h Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00
tracing_map.c tracing: Remove lookups from tracing_map hitcount 2017-10-04 13:05:42 -04:00
tracing_map.h Tracing updates for 4.15: 2017-11-17 14:58:01 -08:00