mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 21:21:41 +00:00
powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
In panic path, fadump is triggered via a panic notifier function. Before calling panic notifier functions, smp_send_stop() gets called, which stops all CPUs except the panic'ing CPU. Commit8389b37dff
("powerpc: stop_this_cpu: remove the cpu from the online map.") and again commitbab26238bb
("powerpc: Offline CPU in stop_this_cpu()") started marking CPUs as offline while stopping them. So, if a kernel has either of the above commits, vmcore captured with fadump via panic path would not process register data for all CPUs except the panic'ing CPU. Sample output of crash-utility with such vmcore: # crash vmlinux vmcore ... KERNEL: vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Wed Nov 10 09:56:34 EST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 2.27, 0.69, 0.24 TASKS: 183 NODENAME: XXXXXXXXX RELEASE: 5.15.0+ VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 MACHINE: ppc64le (2500 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3394 COMMAND: "bash" TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> crash> crash> p -x __cpu_active_mask __cpu_active_mask = $2 = { bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> While this has been the case since fadump was introduced, the issue was not identified for two probable reasons: - In general, the bulk of the vmcores analyzed were from crash due to exception. - The above did change since commit8341f2f222
("sysrq: Use panic() to force a crash") started using panic() instead of deferencing NULL pointer to force a kernel crash. But then commitde6e5d3841
("powerpc: smp_send_stop do not offline stopped CPUs") stopped marking CPUs as offline till kernel commitbab26238bb
("powerpc: Offline CPU in stop_this_cpu()") reverted that change. To ensure post processing register data of all other CPUs happens as intended, let panic() function take the crash friendly path (read crash_smp_send_stop()) with the help of crash_kexec_post_notifiers option. Also, as register data for all CPUs is captured by f/w, skip IPI callbacks here for fadump, to avoid any complications in finding the right backtraces. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
This commit is contained in:
parent
219572d2fc
commit
06e629c25d
@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
|
||||
else if (fw_dump.reserve_dump_area_size)
|
||||
fw_dump.ops->fadump_init_mem_struct(&fw_dump);
|
||||
|
||||
/*
|
||||
* In case of panic, fadump is triggered via ppc_panic_event()
|
||||
* panic notifier. Setting crash_kexec_post_notifiers to 'true'
|
||||
* lets panic() function take crash friendly path before panic
|
||||
* notifiers are invoked.
|
||||
*/
|
||||
crash_kexec_post_notifiers = true;
|
||||
|
||||
return 1;
|
||||
}
|
||||
subsys_initcall(setup_fadump);
|
||||
|
@ -61,6 +61,7 @@
|
||||
#include <asm/cpu_has_feature.h>
|
||||
#include <asm/ftrace.h>
|
||||
#include <asm/kup.h>
|
||||
#include <asm/fadump.h>
|
||||
|
||||
#ifdef DEBUG
|
||||
#include <asm/udbg.h>
|
||||
@ -638,6 +639,15 @@ void crash_smp_send_stop(void)
|
||||
{
|
||||
static bool stopped = false;
|
||||
|
||||
/*
|
||||
* In case of fadump, register data for all CPUs is captured by f/w
|
||||
* on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
|
||||
* this rtas call to avoid tricky post processing of those CPUs'
|
||||
* backtraces.
|
||||
*/
|
||||
if (should_fadump_crash())
|
||||
return;
|
||||
|
||||
if (stopped)
|
||||
return;
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user