forked from Minki/linux
78d904b46a
Impact: prevent deadlock in NMI The ring buffers are not yet totally lockless with writing to the buffer. When a writer crosses a page, it grabs a per cpu spinlock to protect against a reader. The spinlocks taken by a writer are not to protect against other writers, since a writer can only write to its own per cpu buffer. The spinlocks protect against readers that can touch any cpu buffer. The writers are made to be reentrant with the spinlocks disabling interrupts. The problem arises when an NMI writes to the buffer, and that write crosses a page boundary. If it grabs a spinlock, it can be racing with another writer (since disabling interrupts does not protect against NMIs) or with a reader on the same CPU. Luckily, most of the users are not reentrant and protects against this issue. But if a user of the ring buffer becomes reentrant (which is what the ring buffers do allow), if the NMI also writes to the ring buffer then we risk the chance of a deadlock. This patch moves the ftrace_nmi_enter called by nmi_enter() to the ring buffer code. It replaces the current ftrace_nmi_enter that is used by arch specific code to arch_ftrace_nmi_enter and updates the Kconfig to handle it. When an NMI is called, it will set a per cpu variable in the ring buffer code and will clear it when the NMI exits. If a write to the ring buffer crosses page boundaries inside an NMI, a trylock is used on the spin lock instead. If the spinlock fails to be acquired, then the entry is discarded. This bug appeared in the ftrace work in the RT tree, where event tracing is reentrant. This workaround solved the deadlocks that appeared there. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
368 lines
11 KiB
Plaintext
368 lines
11 KiB
Plaintext
#
|
|
# Architectures that offer an FUNCTION_TRACER implementation should
|
|
# select HAVE_FUNCTION_TRACER:
|
|
#
|
|
|
|
config USER_STACKTRACE_SUPPORT
|
|
bool
|
|
|
|
config NOP_TRACER
|
|
bool
|
|
|
|
config HAVE_FTRACE_NMI_ENTER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_TRACER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_GRAPH_TRACER
|
|
bool
|
|
|
|
config HAVE_FUNCTION_TRACE_MCOUNT_TEST
|
|
bool
|
|
help
|
|
This gets selected when the arch tests the function_trace_stop
|
|
variable at the mcount call site. Otherwise, this variable
|
|
is tested by the called function.
|
|
|
|
config HAVE_DYNAMIC_FTRACE
|
|
bool
|
|
|
|
config HAVE_FTRACE_MCOUNT_RECORD
|
|
bool
|
|
|
|
config HAVE_HW_BRANCH_TRACER
|
|
bool
|
|
|
|
config TRACER_MAX_TRACE
|
|
bool
|
|
|
|
config RING_BUFFER
|
|
bool
|
|
|
|
config FTRACE_NMI_ENTER
|
|
bool
|
|
depends on HAVE_FTRACE_NMI_ENTER
|
|
default y
|
|
|
|
config TRACING
|
|
bool
|
|
select DEBUG_FS
|
|
select RING_BUFFER
|
|
select STACKTRACE if STACKTRACE_SUPPORT
|
|
select TRACEPOINTS
|
|
select NOP_TRACER
|
|
|
|
menu "Tracers"
|
|
|
|
config FUNCTION_TRACER
|
|
bool "Kernel Function Tracer"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
depends on DEBUG_KERNEL
|
|
select FRAME_POINTER
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
Enable the kernel to trace every kernel function. This is done
|
|
by using a compiler feature to insert a small, 5-byte No-Operation
|
|
instruction to the beginning of every kernel function, which NOP
|
|
sequence is then dynamically patched into a tracer call when
|
|
tracing is enabled by the administrator. If it's runtime disabled
|
|
(the bootup default), then the overhead of the instructions is very
|
|
small and not measurable even in micro-benchmarks.
|
|
|
|
config FUNCTION_GRAPH_TRACER
|
|
bool "Kernel Function Graph Tracer"
|
|
depends on HAVE_FUNCTION_GRAPH_TRACER
|
|
depends on FUNCTION_TRACER
|
|
default y
|
|
help
|
|
Enable the kernel to trace a function at both its return
|
|
and its entry.
|
|
It's first purpose is to trace the duration of functions and
|
|
draw a call graph for each thread with some informations like
|
|
the return value.
|
|
This is done by setting the current return address on the current
|
|
task structure into a stack of calls.
|
|
|
|
config IRQSOFF_TRACER
|
|
bool "Interrupts-off Latency Tracer"
|
|
default n
|
|
depends on TRACE_IRQFLAGS_SUPPORT
|
|
depends on GENERIC_TIME
|
|
depends on DEBUG_KERNEL
|
|
select TRACE_IRQFLAGS
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in irqs-off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the preempt-off timing option can be
|
|
used together or separately.)
|
|
|
|
config PREEMPT_TRACER
|
|
bool "Preemption-off Latency Tracer"
|
|
default n
|
|
depends on GENERIC_TIME
|
|
depends on PREEMPT
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This option measures the time spent in preemption off critical
|
|
sections, with microsecond accuracy.
|
|
|
|
The default measurement method is a maximum search, which is
|
|
disabled by default and can be runtime (re-)started
|
|
via:
|
|
|
|
echo 0 > /debugfs/tracing/tracing_max_latency
|
|
|
|
(Note that kernel size and overhead increases with this option
|
|
enabled. This option and the irqs-off timing option can be
|
|
used together or separately.)
|
|
|
|
config SYSPROF_TRACER
|
|
bool "Sysprof Tracer"
|
|
depends on X86
|
|
select TRACING
|
|
help
|
|
This tracer provides the trace needed by the 'Sysprof' userspace
|
|
tool.
|
|
|
|
config SCHED_TRACER
|
|
bool "Scheduling Latency Tracer"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
select TRACER_MAX_TRACE
|
|
help
|
|
This tracer tracks the latency of the highest priority task
|
|
to be scheduled in, starting from the point it has woken up.
|
|
|
|
config CONTEXT_SWITCH_TRACER
|
|
bool "Trace process context switches"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select MARKERS
|
|
help
|
|
This tracer gets called from the context switch and records
|
|
all switching of tasks.
|
|
|
|
config BOOT_TRACER
|
|
bool "Trace boot initcalls"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
select CONTEXT_SWITCH_TRACER
|
|
help
|
|
This tracer helps developers to optimize boot times: it records
|
|
the timings of the initcalls and traces key events and the identity
|
|
of tasks that can cause boot delays, such as context-switches.
|
|
|
|
Its aim is to be parsed by the /scripts/bootgraph.pl tool to
|
|
produce pretty graphics about boot inefficiencies, giving a visual
|
|
representation of the delays during initcalls - but the raw
|
|
/debug/tracing/trace text output is readable too.
|
|
|
|
You must pass in ftrace=initcall to the kernel command line
|
|
to enable this on bootup.
|
|
|
|
config TRACE_BRANCH_PROFILING
|
|
bool "Trace likely/unlikely profiler"
|
|
depends on DEBUG_KERNEL
|
|
select TRACING
|
|
help
|
|
This tracer profiles all the the likely and unlikely macros
|
|
in the kernel. It will display the results in:
|
|
|
|
/debugfs/tracing/profile_annotated_branch
|
|
|
|
Note: this will add a significant overhead, only turn this
|
|
on if you need to profile the system's use of these macros.
|
|
|
|
Say N if unsure.
|
|
|
|
config PROFILE_ALL_BRANCHES
|
|
bool "Profile all if conditionals"
|
|
depends on TRACE_BRANCH_PROFILING
|
|
help
|
|
This tracer profiles all branch conditions. Every if ()
|
|
taken in the kernel is recorded whether it hit or miss.
|
|
The results will be displayed in:
|
|
|
|
/debugfs/tracing/profile_branch
|
|
|
|
This configuration, when enabled, will impose a great overhead
|
|
on the system. This should only be enabled when the system
|
|
is to be analyzed
|
|
|
|
Say N if unsure.
|
|
|
|
config TRACING_BRANCHES
|
|
bool
|
|
help
|
|
Selected by tracers that will trace the likely and unlikely
|
|
conditions. This prevents the tracers themselves from being
|
|
profiled. Profiling the tracing infrastructure can only happen
|
|
when the likelys and unlikelys are not being traced.
|
|
|
|
config BRANCH_TRACER
|
|
bool "Trace likely/unlikely instances"
|
|
depends on TRACE_BRANCH_PROFILING
|
|
select TRACING_BRANCHES
|
|
help
|
|
This traces the events of likely and unlikely condition
|
|
calls in the kernel. The difference between this and the
|
|
"Trace likely/unlikely profiler" is that this is not a
|
|
histogram of the callers, but actually places the calling
|
|
events into a running trace buffer to see when and where the
|
|
events happened, as well as their results.
|
|
|
|
Say N if unsure.
|
|
|
|
config POWER_TRACER
|
|
bool "Trace power consumption behavior"
|
|
depends on DEBUG_KERNEL
|
|
depends on X86
|
|
select TRACING
|
|
help
|
|
This tracer helps developers to analyze and optimize the kernels
|
|
power management decisions, specifically the C-state and P-state
|
|
behavior.
|
|
|
|
|
|
config STACK_TRACER
|
|
bool "Trace max stack"
|
|
depends on HAVE_FUNCTION_TRACER
|
|
depends on DEBUG_KERNEL
|
|
select FUNCTION_TRACER
|
|
select STACKTRACE
|
|
help
|
|
This special tracer records the maximum stack footprint of the
|
|
kernel and displays it in debugfs/tracing/stack_trace.
|
|
|
|
This tracer works by hooking into every function call that the
|
|
kernel executes, and keeping a maximum stack depth value and
|
|
stack-trace saved. If this is configured with DYNAMIC_FTRACE
|
|
then it will not have any overhead while the stack tracer
|
|
is disabled.
|
|
|
|
To enable the stack tracer on bootup, pass in 'stacktrace'
|
|
on the kernel command line.
|
|
|
|
The stack tracer can also be enabled or disabled via the
|
|
sysctl kernel.stack_tracer_enabled
|
|
|
|
Say N if unsure.
|
|
|
|
config HW_BRANCH_TRACER
|
|
depends on HAVE_HW_BRANCH_TRACER
|
|
bool "Trace hw branches"
|
|
select TRACING
|
|
help
|
|
This tracer records all branches on the system in a circular
|
|
buffer giving access to the last N branches for each cpu.
|
|
|
|
config KMEMTRACE
|
|
bool "Trace SLAB allocations"
|
|
select TRACING
|
|
help
|
|
kmemtrace provides tracing for slab allocator functions, such as
|
|
kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
|
|
data is then fed to the userspace application in order to analyse
|
|
allocation hotspots, internal fragmentation and so on, making it
|
|
possible to see how well an allocator performs, as well as debug
|
|
and profile kernel code.
|
|
|
|
This requires an userspace application to use. See
|
|
Documentation/vm/kmemtrace.txt for more information.
|
|
|
|
Saying Y will make the kernel somewhat larger and slower. However,
|
|
if you disable kmemtrace at run-time or boot-time, the performance
|
|
impact is minimal (depending on the arch the kernel is built for).
|
|
|
|
If unsure, say N.
|
|
|
|
config WORKQUEUE_TRACER
|
|
bool "Trace workqueues"
|
|
select TRACING
|
|
help
|
|
The workqueue tracer provides some statistical informations
|
|
about each cpu workqueue thread such as the number of the
|
|
works inserted and executed since their creation. It can help
|
|
to evaluate the amount of work each of them have to perform.
|
|
For example it can help a developer to decide whether he should
|
|
choose a per cpu workqueue instead of a singlethreaded one.
|
|
|
|
|
|
config DYNAMIC_FTRACE
|
|
bool "enable/disable ftrace tracepoints dynamically"
|
|
depends on FUNCTION_TRACER
|
|
depends on HAVE_DYNAMIC_FTRACE
|
|
depends on DEBUG_KERNEL
|
|
default y
|
|
help
|
|
This option will modify all the calls to ftrace dynamically
|
|
(will patch them out of the binary image and replaces them
|
|
with a No-Op instruction) as they are called. A table is
|
|
created to dynamically enable them again.
|
|
|
|
This way a CONFIG_FUNCTION_TRACER kernel is slightly larger, but otherwise
|
|
has native performance as long as no tracing is active.
|
|
|
|
The changes to the code are done by a kernel thread that
|
|
wakes up once a second and checks to see if any ftrace calls
|
|
were made. If so, it runs stop_machine (stops all CPUS)
|
|
and modifies the code to jump over the call to ftrace.
|
|
|
|
config FTRACE_MCOUNT_RECORD
|
|
def_bool y
|
|
depends on DYNAMIC_FTRACE
|
|
depends on HAVE_FTRACE_MCOUNT_RECORD
|
|
|
|
config FTRACE_SELFTEST
|
|
bool
|
|
|
|
config FTRACE_STARTUP_TEST
|
|
bool "Perform a startup test on ftrace"
|
|
depends on TRACING && DEBUG_KERNEL
|
|
select FTRACE_SELFTEST
|
|
help
|
|
This option performs a series of startup tests on ftrace. On bootup
|
|
a series of tests are made to verify that the tracer is
|
|
functioning properly. It will do tests on all the configured
|
|
tracers of ftrace.
|
|
|
|
config MMIOTRACE
|
|
bool "Memory mapped IO tracing"
|
|
depends on HAVE_MMIOTRACE_SUPPORT && DEBUG_KERNEL && PCI
|
|
select TRACING
|
|
help
|
|
Mmiotrace traces Memory Mapped I/O access and is meant for
|
|
debugging and reverse engineering. It is called from the ioremap
|
|
implementation and works via page faults. Tracing is disabled by
|
|
default and can be enabled at run-time.
|
|
|
|
See Documentation/tracers/mmiotrace.txt.
|
|
If you are not helping to develop drivers, say N.
|
|
|
|
config MMIOTRACE_TEST
|
|
tristate "Test module for mmiotrace"
|
|
depends on MMIOTRACE && m
|
|
help
|
|
This is a dumb module for testing mmiotrace. It is very dangerous
|
|
as it will write garbage to IO memory starting at a given address.
|
|
However, it should be safe to use on e.g. unused portion of VRAM.
|
|
|
|
Say N, unless you absolutely know what you are doing.
|
|
|
|
endmenu
|