mirror of
https://github.com/torvalds/linux.git
synced 2024-11-01 01:31:44 +00:00
654443e20d
Pull user-space probe instrumentation from Ingo Molnar: "The uprobes code originates from SystemTap and has been used for years in Fedora and RHEL kernels. This version is much rewritten, reviews from PeterZ, Oleg and myself shaped the end result. This tree includes uprobes support in 'perf probe' - but SystemTap (and other tools) can take advantage of user probe points as well. Sample usage of uprobes via perf, for example to profile malloc() calls without modifying user-space binaries. First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled. If you don't know which function you want to probe you can pick one from 'perf top' or can get a list all functions that can be probed within libc (binaries can be specified as well): $ perf probe -F -x /lib/libc.so.6 To probe libc's malloc(): $ perf probe -x /lib64/libc.so.6 malloc Added new event: probe_libc:malloc (on 0x7eac0) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 Make use of it to create a call graph (as the flat profile is going to look very boring): $ perf record -e probe_libc:malloc -gR make [ perf record: Woken up 173 times to write data ] [ perf record: Captured and wrote 44.190 MB perf.data (~1930712 $ perf report | less 32.03% git libc-2.15.so [.] malloc | --- malloc 29.49% cc1 libc-2.15.so [.] malloc | --- malloc | |--0.95%-- 0x208eb1000000000 | |--0.63%-- htab_traverse_noresize 11.04% as libc-2.15.so [.] malloc | --- malloc | 7.15% ld libc-2.15.so [.] malloc | --- malloc | 5.07% sh libc-2.15.so [.] malloc | --- malloc | 4.99% python-config libc-2.15.so [.] malloc | --- malloc | 4.54% make libc-2.15.so [.] malloc | --- malloc | |--7.34%-- glob | | | |--93.18%-- 0x41588f | | | --6.82%-- glob | 0x41588f ... Or: $ perf report -g flat | less # Overhead Command Shared Object Symbol # ........ ............. ............. .......... # 32.03% git libc-2.15.so [.] malloc 27.19% malloc 29.49% cc1 libc-2.15.so [.] malloc 24.77% malloc 11.04% as libc-2.15.so [.] malloc 11.02% malloc 7.15% ld libc-2.15.so [.] malloc 6.57% malloc ... The core uprobes design is fairly straightforward: uprobes probe points register themselves at (inode:offset) addresses of libraries/binaries, after which all existing (or new) vmas that map that address will have a software breakpoint injected at that address. vmas are COW-ed to preserve original content. The probe points are kept in an rbtree. If user-space executes the probed inode:offset instruction address then an event is generated which can be recovered from the regular perf event channels and mmap-ed ring-buffer. Multiple probes at the same address are supported, they create a dynamic callback list of event consumers. The basic model is further complicated by the XOL speedup: the original instruction that is probed is copied (in an architecture specific fashion) and executed out of line when the probe triggers. The XOL area is a single vma per process, with a fixed number of entries (which limits probe execution parallelism). The API: uprobes are installed/removed via /sys/kernel/debug/tracing/uprobe_events, the API is integrated to align with the kprobes interface as much as possible, but is separate to it. Injecting a probe point is privileged operation, which can be relaxed by setting perf_paranoid to -1. You can use multiple probes as well and mix them with kprobes and regular PMU events or tracepoints, when instrumenting a task." Fix up trivial conflicts in mm/memory.c due to previous cleanup of unmap_single_vma(). * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf probe: Detect probe target when m/x options are absent perf probe: Provide perf interface for uprobes tracing: Fix kconfig warning due to a typo tracing: Provide trace events interface for uprobes tracing: Extract out common code for kprobes/uprobes trace events tracing: Modify is_delete, is_return from int to bool uprobes/core: Decrement uprobe count before the pages are unmapped uprobes/core: Make background page replacement logic account for rss_stat counters uprobes/core: Optimize probe hits with the help of a counter uprobes/core: Allocate XOL slots for uprobes use uprobes/core: Handle breakpoint and singlestep exceptions uprobes/core: Rename bkpt to swbp uprobes/core: Make order of function parameters consistent across functions uprobes/core: Make macro names consistent uprobes: Update copyright notices uprobes/core: Move insn to arch specific structure uprobes/core: Remove uprobe_opcode_sz uprobes/core: Make instruction tables volatile uprobes: Move to kernel/events/ uprobes/core: Clean up, refactor and improve the code ...
513 lines
15 KiB
Plaintext
513 lines
15 KiB
Plaintext
#
|
|
# Architectures that offer an FUNCTION_TRACER implementation should
|
|
# select HAVE_FUNCTION_TRACER:
|
|
#
|
|
|
|
config USER_STACKTRACE_SUPPORT
|
|
bool
|
|
|
|
config NOP_TRACER
|
|
bool
|
|
|
|
config HAVE_FTRACE_NMI_ENTER
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_FUNCTION_TRACER
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_FUNCTION_GRAPH_TRACER
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_FUNCTION_GRAPH_FP_TEST
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_FUNCTION_TRACE_MCOUNT_TEST
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_SYSCALL_TRACEPOINTS
|
|
bool
|
|
help
|
|
See Documentation/trace/ftrace-design.txt
|
|
|
|
config HAVE_C_RECORDMCOUNT
|
|
bool
|
|
help
|
|
C version of recordmcount available?
|
|
|
|
config TRACER_MAX_TRACE
|
|
bool
|
|
|
|
config RING_BUFFER
|
|
bool
|
|
|
|
config FTRACE_NMI_ENTER
|
|
bool
|
|
depends on HAVE_FTRACE_NMI_ENTER
|
|
default y
|
|
|
|
config EVENT_TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
bool
|
|
|
|
config EVENT_POWER_TRACING_DEPRECATED
|
|
depends on EVENT_TRACING
|
|
bool "Deprecated power event trace API, to be removed"
|
|
default y
|
|
help
|
|
Provides old power event types:
|
|
C-state/idle accounting events:
|
|
power:power_start
|
|
power:power_end
|
|
and old cpufreq accounting event:
|
|
power:power_frequency
|
|
This is for userspace compatibility
|
|
and will vanish after 5 kernel iterations,
|
|
namely 3.1.
|
|
|
|
config CONTEXT_SWITCH_TRACER
|
|
bool
|
|
|
|
config RING_BUFFER_ALLOW_SWAP
|
|
bool
|
|
help
|
|
Allow the use of ring_buffer_swap_cpu.
|
|
Adds a very slight overhead to tracing when enabled.
|
|
|
|
# All tracer options should select GENERIC_TRACER. For those options that are
|
|
# enabled by all tracers (context switch and event tracer) they select TRACING.
|
|
# This allows those options to appear when no other tracer is selected. But the
|
|
# options do not appear when something else selects it. We need the two options
|
|
# GENERIC_TRACER and TRACING to avoid circular dependencies to accomplish the
|
|
# hiding of the automatic options.
|
|
|
|
config TRACING
|
|
bool
|
|
select DEBUG_FS
|
|
select RING_BUFFER
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
|
select TRACEPOINTS
|
|
select NOP_TRACER
|
|
select BINARY_PRINTF
|
|
select EVENT_TRACING
|
|
|
|
config GENERIC_TRACER
|
|
bool
|
|
select TRACING
|
|
|
|
#
|
|
# Minimum requirements an architecture has to meet for us to
|
|
# be able to offer generic tracing facilities:
|
|
#
|
|
config TRACING_SUPPORT
|
|
bool
|
|
# PPC32 has no irqflags tracing support, but it can use most of the
|
|
# tracers anyway, they were tested to build and work. Note that new
|
|
# exceptions to this list aren't welcomed, better implement the
|
|
# irqflags tracing for your architecture.
|
|
depends on TRACE_IRQFLAGS_SUPPORT || PPC32
|
|
depends on STACKTRACE_SUPPORT
|
|
default y
|
|
|
|
if TRACING_SUPPORT
|
|
|
|
menuconfig FTRACE
|
|
bool "Tracers"
|
|
default y if DEBUG_KERNEL
|
|
help
|
|
Enable the kernel tracing infrastructure.
|
|
|
|
if FTRACE
|
|
|
|
config FUNCTION_TRACER
|
|
bool "Kernel Function Tracer"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
select KALLSYMS
|
|
select GENERIC_TRACER
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
Enable the kernel to trace every kernel function. This is done
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
|
instruction at the beginning of every kernel function, which NOP
|
|
sequence is then dynamically patched into a tracer call when
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
(the bootup default), then the overhead of the instructions is very
|
|
small and not measurable even in micro-benchmarks.
|
|
|
|
config FUNCTION_GRAPH_TRACER
|
|
bool "Kernel Function Graph Tracer"
|
|
depends on HAVE_FUNCTION_GRAPH_TRACER
|
|
depends on FUNCTION_TRACER
|
|
depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
|
|
default y
|
|
help
|
|
Enable the kernel to trace a function at both its return
|
|
and its entry.
|
|
Its first purpose is to trace the duration of functions and
|
|
draw a call graph for each thread with some information like
|
|
the return value. This is done by setting the current return
|
|
address on the current task structure into a stack of calls.
|
|
|
|
|
|
config IRQSOFF_TRACER
|
|
bool "Interrupts-off Latency Tracer"
|
|
default n
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
depends on !ARCH_USES_GETTIMEOFFSET
|
|
select TRACE_IRQFLAGS
|
|
select GENERIC_TRACER
|
|
select TRACER_MAX_TRACE
|
|
select RING_BUFFER_ALLOW_SWAP
|
|
help
|
|
This option measures the time spent in irqs-off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /sys/kernel/debug/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increase with this option
|
|
enabled. This option and the preempt-off timing option can be
|
|
used together or separately.)
|
|
|
|
config PREEMPT_TRACER
|
|
bool "Preemption-off Latency Tracer"
|
|
default n
|
|
depends on !ARCH_USES_GETTIMEOFFSET
|
|
depends on PREEMPT
|
|
select GENERIC_TRACER
|
|
select TRACER_MAX_TRACE
|
|
select RING_BUFFER_ALLOW_SWAP
|
|
help
|
|
This option measures the time spent in preemption-off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /sys/kernel/debug/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increase with this option
|
|
enabled. This option and the irqs-off timing option can be
|
|
used together or separately.)
|
|
|
|
config SCHED_TRACER
|
|
bool "Scheduling Latency Tracer"
|
|
select GENERIC_TRACER
|
|
select CONTEXT_SWITCH_TRACER
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This tracer tracks the latency of the highest priority task
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
config ENABLE_DEFAULT_TRACERS
|
|
bool "Trace process context switches and events"
|
|
depends on !GENERIC_TRACER
|
|
select TRACING
|
|
help
|
|
This tracer hooks to various trace points in the kernel,
|
|
allowing the user to pick and choose which trace point they
|
|
want to trace. It also includes the sched_switch tracer plugin.
|
|
|
|
config FTRACE_SYSCALLS
|
|
bool "Trace syscalls"
|
|
depends on HAVE_SYSCALL_TRACEPOINTS
|
|
select GENERIC_TRACER
|
|
select KALLSYMS
|
|
help
|
|
Basic tracer to catch the syscall entry and exit events.
|
|
|
|
config TRACE_BRANCH_PROFILING
|
|
bool
|
|
select GENERIC_TRACER
|
|
|
|
choice
|
|
prompt "Branch Profiling"
|
|
default BRANCH_PROFILE_NONE
|
|
help
|
|
The branch profiling is a software profiler. It will add hooks
|
|
into the C conditionals to test which path a branch takes.
|
|
|
|
The likely/unlikely profiler only looks at the conditions that
|
|
are annotated with a likely or unlikely macro.
|
|
|
|
The "all branch" profiler will profile every if-statement in the
|
|
kernel. This profiler will also enable the likely/unlikely
|
|
profiler.
|
|
|
|
Either of the above profilers adds a bit of overhead to the system.
|
|
If unsure, choose "No branch profiling".
|
|
|
|
config BRANCH_PROFILE_NONE
|
|
bool "No branch profiling"
|
|
help
|
|
No branch profiling. Branch profiling adds a bit of overhead.
|
|
Only enable it if you want to analyse the branching behavior.
|
|
Otherwise keep it disabled.
|
|
|
|
config PROFILE_ANNOTATED_BRANCHES
|
|
bool "Trace likely/unlikely profiler"
|
|
select TRACE_BRANCH_PROFILING
|
|
help
|
|
This tracer profiles all likely and unlikely macros
|
|
in the kernel. It will display the results in:
|
|
|
|
/sys/kernel/debug/tracing/trace_stat/branch_annotated
|
|
|
|
Note: this will add a significant overhead; only turn this
|
|
on if you need to profile the system's use of these macros.
|
|
|
|
config PROFILE_ALL_BRANCHES
|
|
bool "Profile all if conditionals"
|
|
select TRACE_BRANCH_PROFILING
|
|
help
|
|
This tracer profiles all branch conditions. Every if ()
|
|
taken in the kernel is recorded whether it hit or miss.
|
|
The results will be displayed in:
|
|
|
|
/sys/kernel/debug/tracing/trace_stat/branch_all
|
|
|
|
This option also enables the likely/unlikely profiler.
|
|
|
|
This configuration, when enabled, will impose a great overhead
|
|
on the system. This should only be enabled when the system
|
|
is to be analyzed in much detail.
|
|
endchoice
|
|
|
|
config TRACING_BRANCHES
|
|
bool
|
|
help
|
|
Selected by tracers that will trace the likely and unlikely
|
|
conditions. This prevents the tracers themselves from being
|
|
profiled. Profiling the tracing infrastructure can only happen
|
|
when the likelys and unlikelys are not being traced.
|
|
|
|
config BRANCH_TRACER
|
|
bool "Trace likely/unlikely instances"
|
|
depends on TRACE_BRANCH_PROFILING
|
|
select TRACING_BRANCHES
|
|
help
|
|
This traces the events of likely and unlikely condition
|
|
calls in the kernel. The difference between this and the
|
|
"Trace likely/unlikely profiler" is that this is not a
|
|
histogram of the callers, but actually places the calling
|
|
events into a running trace buffer to see when and where the
|
|
events happened, as well as their results.
|
|
|
|
Say N if unsure.
|
|
|
|
config STACK_TRACER
|
|
bool "Trace max stack"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
select FUNCTION_TRACER
|
|
select STACKTRACE
|
|
select KALLSYMS
|
|
help
|
|
This special tracer records the maximum stack footprint of the
|
|
kernel and displays it in /sys/kernel/debug/tracing/stack_trace.
|
|
|
|
This tracer works by hooking into every function call that the
|
|
kernel executes, and keeping a maximum stack depth value and
|
|
stack-trace saved. If this is configured with DYNAMIC_FTRACE
|
|
then it will not have any overhead while the stack tracer
|
|
is disabled.
|
|
|
|
To enable the stack tracer on bootup, pass in 'stacktrace'
|
|
on the kernel command line.
|
|
|
|
The stack tracer can also be enabled or disabled via the
|
|
sysctl kernel.stack_tracer_enabled
|
|
|
|
Say N if unsure.
|
|
|
|
config BLK_DEV_IO_TRACE
|
|
bool "Support for tracing block IO actions"
|
|
depends on SYSFS
|
|
depends on BLOCK
|
|
select RELAY
|
|
select DEBUG_FS
|
|
select TRACEPOINTS
|
|
select GENERIC_TRACER
|
|
select STACKTRACE
|
|
help
|
|
Say Y here if you want to be able to trace the block layer actions
|
|
on a given queue. Tracing allows you to see any traffic happening
|
|
on a block device queue. For more information (and the userspace
|
|
support tools needed), fetch the blktrace tools from:
|
|
|
|
git://git.kernel.dk/blktrace.git
|
|
|
|
Tracing also is possible using the ftrace interface, e.g.:
|
|
|
|
echo 1 > /sys/block/sda/sda1/trace/enable
|
|
echo blk > /sys/kernel/debug/tracing/current_tracer
|
|
cat /sys/kernel/debug/tracing/trace_pipe
|
|
|
|
If unsure, say N.
|
|
|
|
config KPROBE_EVENT
|
|
depends on KPROBES
|
|
depends on HAVE_REGS_AND_STACK_ACCESS_API
|
|
bool "Enable kprobes-based dynamic events"
|
|
select TRACING
|
|
select PROBE_EVENTS
|
|
default y
|
|
help
|
|
This allows the user to add tracing events (similar to tracepoints)
|
|
on the fly via the ftrace interface. See
|
|
Documentation/trace/kprobetrace.txt for more details.
|
|
|
|
Those events can be inserted wherever kprobes can probe, and record
|
|
various register and memory values.
|
|
|
|
This option is also required by perf-probe subcommand of perf tools.
|
|
If you want to use perf tools, this option is strongly recommended.
|
|
|
|
config UPROBE_EVENT
|
|
bool "Enable uprobes-based dynamic events"
|
|
depends on ARCH_SUPPORTS_UPROBES
|
|
depends on MMU
|
|
select UPROBES
|
|
select PROBE_EVENTS
|
|
select TRACING
|
|
default n
|
|
help
|
|
This allows the user to add tracing events on top of userspace
|
|
dynamic events (similar to tracepoints) on the fly via the trace
|
|
events interface. Those events can be inserted wherever uprobes
|
|
can probe, and record various registers.
|
|
This option is required if you plan to use perf-probe subcommand
|
|
of perf tools on user space applications.
|
|
|
|
config PROBE_EVENTS
|
|
def_bool n
|
|
|
|
config DYNAMIC_FTRACE
|
|
bool "enable/disable ftrace tracepoints dynamically"
|
|
depends on FUNCTION_TRACER
|
|
depends on HAVE_DYNAMIC_FTRACE
|
|
default y
|
|
help
|
|
This option will modify all the calls to ftrace dynamically
|
|
(will patch them out of the binary image and replace them
|
|
with a No-Op instruction) as they are called. A table is
|
|
created to dynamically enable them again.
|
|
|
|
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but
|
|
otherwise has native performance as long as no tracing is active.
|
|
|
|
The changes to the code are done by a kernel thread that
|
|
wakes up once a second and checks to see if any ftrace calls
|
|
were made. If so, it runs stop_machine (stops all CPUS)
|
|
and modifies the code to jump over the call to ftrace.
|
|
|
|
config FUNCTION_PROFILER
|
|
bool "Kernel function profiler"
|
|
depends on FUNCTION_TRACER
|
|
default n
|
|
help
|
|
This option enables the kernel function profiler. A file is created
|
|
in debugfs called function_profile_enabled which defaults to zero.
|
|
When a 1 is echoed into this file profiling begins, and when a
|
|
zero is entered, profiling stops. A "functions" file is created in
|
|
the trace_stats directory; this file shows the list of functions that
|
|
have been hit and their counters.
|
|
|
|
If in doubt, say N.
|
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
def_bool y
|
|
depends on DYNAMIC_FTRACE
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
config FTRACE_SELFTEST
|
|
bool
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
bool "Perform a startup test on ftrace"
|
|
depends on GENERIC_TRACER
|
|
select FTRACE_SELFTEST
|
|
help
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
a series of tests are made to verify that the tracer is
|
|
functioning properly. It will do tests on all the configured
|
|
tracers of ftrace.
|
|
|
|
config EVENT_TRACE_TEST_SYSCALLS
|
|
bool "Run selftest on syscall events"
|
|
depends on FTRACE_STARTUP_TEST
|
|
help
|
|
This option will also enable testing every syscall event.
|
|
It only enables the event and disables it and runs various loads
|
|
with the event enabled. This adds a bit more time for kernel boot
|
|
up since it runs this on every system call defined.
|
|
|
|
TBD - enable a way to actually call the syscalls as we test their
|
|
events
|
|
|
|
config MMIOTRACE
|
|
bool "Memory mapped IO tracing"
|
|
depends on HAVE_MMIOTRACE_SUPPORT && PCI
|
|
select GENERIC_TRACER
|
|
help
|
|
Mmiotrace traces Memory Mapped I/O access and is meant for
|
|
debugging and reverse engineering. It is called from the ioremap
|
|
implementation and works via page faults. Tracing is disabled by
|
|
default and can be enabled at run-time.
|
|
|
|
See Documentation/trace/mmiotrace.txt.
|
|
If you are not helping to develop drivers, say N.
|
|
|
|
config MMIOTRACE_TEST
|
|
tristate "Test module for mmiotrace"
|
|
depends on MMIOTRACE && m
|
|
help
|
|
This is a dumb module for testing mmiotrace. It is very dangerous
|
|
as it will write garbage to IO memory starting at a given address.
|
|
However, it should be safe to use on e.g. unused portion of VRAM.
|
|
|
|
Say N, unless you absolutely know what you are doing.
|
|
|
|
config RING_BUFFER_BENCHMARK
|
|
tristate "Ring buffer benchmark stress tester"
|
|
depends on RING_BUFFER
|
|
help
|
|
This option creates a test to stress the ring buffer and benchmark it.
|
|
It creates its own ring buffer such that it will not interfere with
|
|
any other users of the ring buffer (such as ftrace). It then creates
|
|
a producer and consumer that will run for 10 seconds and sleep for
|
|
10 seconds. Each interval it will print out the number of events
|
|
it recorded and give a rough estimate of how long each iteration took.
|
|
|
|
It does not disable interrupts or raise its priority, so it may be
|
|
affected by processes that are running.
|
|
|
|
If unsure, say N.
|
|
|
|
endif # FTRACE
|
|
|
|
endif # TRACING_SUPPORT
|
|
|