The kernel watchdog is a great debugging tool for finding tasks that
consume a disproportionate amount of CPU time in contiguous chunks. One
can imagine building a similar watchdog for arbitrary driver threads
using save_stack_trace_tsk() and print_stack_trace(). However, this is
not viable for dynamically loaded driver modules on ARM platforms
because save_stack_trace_tsk() is not exported for those architectures.
Export save_stack_trace_tsk() for the ARM64 architecture to align with
x86 and support various debugging use cases such as arbitrary driver
thread watchdog timers.
Signed-off-by: Dustin Brown <dustinb@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed
before a task is destroyed. To account for this, the stacks are
refcounted, and when manipulating the stack of another task, it is
necessary to get/put the stack to ensure it isn't freed and/or re-used
while we do so.
This patch reworks the arm64 stack walking code to account for this.
When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no
refcounting, and this should only be a structural change that does not
affect behaviour.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The walk_stackframe functions is architecture-specific, with a varying
prototype, and common code should not use it directly. None of its
current users can be built as modules. With THREAD_INFO_IN_TASK, users
will also need to hold a stack reference before calling it.
There's no reason for it to be exported, and it's very easy to misuse,
so unexport it for now.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We define current_stack_pointer in <asm/thread_info.h>, though other
files and header relying upon it do not have this necessary include, and
are thus fragile to changes in the header soup.
Subsequent patches will affect the header soup such that directly
including <asm/thread_info.h> may result in a circular header include in
some of these cases, so we can't simply include <asm/thread_info.h>.
Instead, factor current_thread_info into its own header, and have all
existing users include this explicitly.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In some places, dump_backtrace() is called with a NULL tsk parameter,
e.g. in bug_handler() in arch/arm64, or indirectly via show_stack() in
core code. The expectation is that this is treated as if current were
passed instead of NULL. Similar is true of unwind_frame().
Commit a80a0eb70c ("arm64: make irq_stack_ptr more robust") didn't
take this into account. In dump_backtrace() it compares tsk against
current *before* we check if tsk is NULL, and in unwind_frame() we never
set tsk if it is NULL.
Due to this, we won't initialise irq_stack_ptr in either function. In
dump_backtrace() this results in calling dump_mem() for memory
immediately above the IRQ stack range, rather than for the relevant
range on the task stack. In unwind_frame we'll reject unwinding frames
on the IRQ stack.
In either case this results in incomplete or misleading backtrace
information, but is not otherwise problematic. The initial percpu areas
(including the IRQ stacks) are allocated in the linear map, and dump_mem
uses __get_user(), so we shouldn't access anything with side-effects,
and will handle holes safely.
This patch fixes the issue by having both functions handle the NULL tsk
case before doing anything else with tsk.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a80a0eb70c ("arm64: make irq_stack_ptr more robust")
Acked-by: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, enabling stacktrace of a kprobe events generates warning:
echo stacktrace > /sys/kernel/debug/tracing/trace_options
echo "p xhci_irq" > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
save_stack_trace_regs() not implemented yet.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at ../kernel/stacktrace.c:74 save_stack_trace_regs+0x3c/0x48
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-dirty #5128
Hardware name: ARM Juno development board (r1) (DT)
task: ffff800975dd1900 task.stack: ffff800975ddc000
PC is at save_stack_trace_regs+0x3c/0x48
LR is at save_stack_trace_regs+0x3c/0x48
pc : [<ffff000008126c64>] lr : [<ffff000008126c64>] pstate: 600003c5
sp : ffff80097ef52c00
Call trace:
save_stack_trace_regs+0x3c/0x48
__ftrace_trace_stack+0x168/0x208
trace_buffer_unlock_commit_regs+0x5c/0x7c
kprobe_trace_func+0x308/0x3d8
kprobe_dispatcher+0x58/0x60
kprobe_breakpoint_handler+0xbc/0x18c
brk_handler+0x50/0x90
do_debug_exception+0x50/0xbc
This patch implements save_stack_trace_regs(), so that stacktrace of a
kprobe events can be obtained.
After this patch, there is no warning and we can see the stacktrace for
kprobe events in trace buffer.
more /sys/kernel/debug/tracing/trace
<idle>-0 [004] d.h. 1356.000496: p_xhci_irq_0:(xhci_irq+0x0/0x9ac)
<idle>-0 [004] d.h. 1356.000497: <stack trace>
=> xhci_irq
=> __handle_irq_event_percpu
=> handle_irq_event_percpu
=> handle_irq_event
=> handle_fasteoi_irq
=> generic_handle_irq
=> __handle_domain_irq
=> gic_handle_irq
=> el1_irq
=> arch_cpu_idle
=> default_idle_call
=> cpu_startup_entry
=> secondary_start_kernel
=>
Tested-by: David A. Long <dave.long@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Switching between stacks is only valid if we are tracing ourselves while on the
irq_stack, so it is only valid when in current and non-preemptible context,
otherwise is is just zeroed off.
Fixes: 132cd887b5 ("arm64: Modify stack trace and dump for use with irq_stack")
Acked-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Function graph tracer modifies a return address (LR) in a stack frame
to hook a function return. This will result in many useless entries
(return_to_handler) showing up in
a) a stack tracer's output
b) perf call graph (with perf record -g)
c) dump_backtrace (at panic et al.)
For example, in case of a),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ echo 1 > /proc/sys/kernel/stack_trace_enabled
$ cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (54 entries)
----- ---- --------
0) 4504 16 gic_raise_softirq+0x28/0x150
1) 4488 80 smp_cross_call+0x38/0xb8
2) 4408 48 return_to_handler+0x0/0x40
3) 4360 32 return_to_handler+0x0/0x40
...
In case of b),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ perf record -e mem:XXX:x -ag -- sleep 10
$ perf report
...
| | |--0.22%-- 0x550f8
| | | 0x10888
| | | el0_svc_naked
| | | sys_openat
| | | return_to_handler
| | | return_to_handler
...
In case of c),
$ echo function_graph > /sys/kernel/debug/tracing/current_tracer
$ echo c > /proc/sysrq-trigger
...
Call trace:
[<ffffffc00044d3ac>] sysrq_handle_crash+0x24/0x30
[<ffffffc000092250>] return_to_handler+0x0/0x40
[<ffffffc000092250>] return_to_handler+0x0/0x40
...
This patch replaces such entries with real addresses preserved in
current->ret_stack[] at unwind_frame(). This way, we can cover all
the cases.
Reviewed-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[will: fixed minor context changes conflicting with irq stack bits]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Function graph tracer modifies a return address (LR) in a stack frame
to hook a function's return. This will result in many useless entries
(return_to_handler) showing up in a call stack list.
We will fix this problem in a later patch ("arm64: ftrace: fix a stack
tracer's output under function graph tracer"). But since real return
addresses are saved in ret_stack[] array in struct task_struct,
unwind functions need to be notified of, in addition to a stack pointer
address, which task is being traced in order to find out real return
addresses.
This patch extends unwind functions' interfaces by adding an extra
argument of a pointer to task_struct.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The code for switching to irq_stack stores three pieces of information on
the stack, fp+lr, as a fake stack frame (that lets us walk back onto the
interrupted tasks stack frame), and the address of the struct pt_regs that
contains the register values from kernel entry. (which dump_backtrace()
will print in any stack trace).
To reduce this, we store fp, and the pointer to the struct pt_regs.
unwind_frame() can recognise this as the irq_stack dummy frame, (as it only
appears at the top of the irq_stack), and use the struct pt_regs values
to find the missing interrupted link-register.
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When unwind_frame() reaches the bottom of the irq_stack, the last fp
points to the original task stack. unwind_frame() uses
IRQ_STACK_TO_TASK_STACK() to find the sp value. If either values is
wrong, we may end up walking a corrupt stack.
Check these values are sane by testing if they are both on the stack
pointed to by current->stack.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch allows unwind_frame() to traverse from interrupt stack to task
stack correctly. It requires data from a dummy stack frame, created
during irq_stack_entry(), added by a later patch.
A similar approach is taken to modify dump_backtrace(), which expects to
find struct pt_regs underneath any call to functions marked __exception.
When on an irq_stack, the struct pt_regs is stored on the old task stack,
the location of which is stored in the dummy stack frame.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[james.morse: merged two patches, reworked for per_cpu irq_stacks, and
no alignment guarantees, added irq_stack definitions]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This reverts commit e306dfd06f.
With this patch applied, we were the only architecture making this sort
of adjustment to the PC calculation in the unwinder. This causes
problems for ftrace, where the PC values are matched against the
contents of the stack frames in the callchain and fail to match any
records after the address adjustment.
Whilst there has been some effort to change ftrace to workaround this,
those patches are not yet ready for mainline and, since we're the odd
architecture in this regard, let's just step in line with other
architectures (like arch/arm/) for now.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Use the global current_stack_pointer to get the value of the stack pointer.
This change supports being able to compile the kernel with both gcc and clang.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Reviewed-by: Olof Johansson <olof@lixom.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
walk_stackframe() calls unwind_frame(), and if walk_stackframe() is
"notrace", unwind_frame() should be also "notrace".
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The frame PC value in the unwind code used to just take the saved LR
value and use that. That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.
In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.
Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We need at least 24 bytes above frame pointer.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The patch contains the exception entry code (kernel/entry.S), pt_regs
structure and related accessors, undefined instruction trapping and
stack tracing.
AArch64 Linux kernel (including kernel threads) runs in EL1 mode using
the SP1 stack. The vectors don't have a fixed address, only alignment
(2^11) requirements.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>