linux/arch/x86/kernel
Hidehiro Kawai 58c5661f21 panic, x86: Allow CPUs to save registers even if looping in NMI context
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:

  CPU 0                             CPU 1
  ===========================       ==========================
  receive an unknown NMI
  unknown_nmi_error()
    panic()                         receive an unknown NMI
      spin_trylock(&panic_lock)     unknown_nmi_error()
      crash_kexec()                   panic()
                                        spin_trylock(&panic_lock)
                                        panic_smp_self_stop()
                                          infinite loop
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                          infinite loop...

Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.

In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.

To save registers in this case, we need to:

  a) Return from NMI handler instead of looping infinitely
  or
  b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices.  So, we chose b).

This patch does the following:

1. Move the infinite looping of CPUs which haven't called panic() in NMI
   context (actually done by panic_smp_self_stop()) outside of panic() to
   enable us to refer pt_regs. Please note that panic_smp_self_stop() is
   still used for normal context.

2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
   registers and do some cleanups after setting waiting_for_crash_ipi which
   is used for counting down the number of CPUs which handled the callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
..
acpi x86, ACPI: Handle apic/x2apic entries in MADT in correct order 2015-10-15 01:31:06 +02:00
apic Merge branch 'linus' into x86/apic 2015-12-19 11:03:18 +01:00
cpu Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-12-08 13:01:23 -08:00
fpu x86/fpu: Fix get_xsave_addr() behavior under virtualization 2015-11-12 09:34:58 +01:00
kprobes Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 14:37:47 -07:00
.gitignore
alternative.c x86/alternatives: Make optimize_nops() interrupt safe and synced 2015-09-03 21:27:47 +02:00
amd_gart_64.c
amd_nb.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apb_timer.c x86/asm/tsc: Rename native_read_tsc() to rdtsc() 2015-07-06 15:23:28 +02:00
aperture_64.c x86/gart: Check for GART support before accessing GART registers 2015-05-06 11:15:53 +02:00
apm_32.c apm32: Fix cputime == jiffies assumption 2015-07-29 15:44:58 +02:00
asm-offsets_32.c x86: Remove unused TI_cpu 2015-05-05 20:48:02 +02:00
asm-offsets_64.c x86/asm/entry: (Re-)rename __NR_entry_INT80_compat_max to __NR_syscall_compat_max 2015-06-08 23:43:38 +02:00
asm-offsets.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-11-03 21:05:40 -08:00
audit_64.c x86: hook up execveat system call 2014-12-13 12:42:51 -08:00
bootflag.c x86: don't use module_init for non-modular core bootflag code 2015-06-16 14:12:34 -04:00
check.c Linux 4.2-rc8 2015-08-25 09:59:19 +02:00
cpuid.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/kexec: Remove obsolete 'in_crash_kexec' flag 2015-10-12 09:43:11 +02:00
devicetree.c Replace module_init with equivalent device_initcall in non modules. 2015-07-02 10:30:48 -07:00
doublefault.c
dumpstack_32.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
dumpstack_64.c x86/kernel: Use kstack_end() in dumpstack_64.c 2015-02-23 18:37:13 +01:00
dumpstack.c Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-13 13:23:34 -07:00
e820.c x86/e820: Deinline e820_type_to_string, save 126 bytes 2015-09-30 21:54:40 +02:00
early_printk.c x86/early_printk: Set __iomem address space for IO 2015-10-13 21:45:56 +02:00
early-quirks.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
espfix_64.c Merge branch 'x86/urgent' into x86/asm, before applying dependent patches 2015-07-31 10:23:35 +02:00
ftrace.c ftrace: Calculate the correct dyn_ftrace number to report to the userspace 2015-10-22 15:44:24 -04:00
head32.c x86: Store a per-cpu shadow copy of CR4 2015-02-04 12:10:42 +01:00
head64.c x86/kasan: Fix KASAN shadow region page tables 2015-07-06 14:53:13 +02:00
head_32.S x86/microcode: Merge the early microcode loader 2015-10-21 11:22:12 +02:00
head_64.S x86/cpu: Call verify_cpu() after having entered long mode too 2015-11-07 10:45:02 +01:00
head.c
hpet.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
hw_breakpoint.c perf/x86/hw_breakpoints: Fix check for kernel-space breakpoints 2015-08-04 10:16:55 +02:00
i386_ksyms_32.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00
i8237.c
i8253.c clockevents/drivers/i8253: Migrate to new 'set-state' interface 2015-08-10 11:40:30 +02:00
i8259.c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs 2015-11-07 10:37:37 +01:00
io_delay.c
ioport.c x86/asm/entry: Rename 'init_tss' to 'cpu_tss' 2015-03-06 08:32:58 +01:00
irq_32.c genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
irq_64.c x86/irq: Drop unlikely before IS_ERR_OR_NULL 2015-10-01 11:08:56 +02:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
irq.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 15:20:51 -07:00
irqinit.c x86/irq: Store irq descriptor in vector array 2015-08-06 00:14:59 +02:00
jump_label.c jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP} 2015-08-03 11:34:12 +02:00
kdebugfs.c
kexec-bzimage64.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-09-08 12:41:25 -07:00
kgdb.c x86/kgdb: Replace bool_int_array[NR_CPUS] with bitmap 2015-09-28 10:13:31 +02:00
ksysfs.c
kvm.c The bulk of the changes here is for x86. And for once it's not 2015-06-24 09:36:49 -07:00
kvmclock.c x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO 2015-10-01 15:06:42 +02:00
ldt.c x86/ldt: Fix small LDT allocation for Xen 2015-09-14 12:10:50 +02:00
livepatch.c livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX 2015-11-06 11:10:03 +01:00
machine_kexec_32.c x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h 2014-12-16 14:08:17 +01:00
machine_kexec_64.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2015-06-23 14:07:26 -07:00
Makefile kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
mcount_64.S x86/ftrace: Add comment on static function tracing 2015-11-19 11:07:49 +01:00
mmconf-fam10h_64.c
module.c x86/mm/KASLR: Propagate KASLR status to kernel proper 2015-04-03 15:26:15 +02:00
mpparse.c x86: Cleanup irq_domain ops 2015-04-24 15:36:55 +02:00
msr.c
nmi_selftest.c
nmi.c panic, x86: Allow CPUs to save registers even if looping in NMI context 2015-12-19 11:07:01 +01:00
paravirt_patch_32.c x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks 2015-07-06 15:23:26 +02:00
paravirt_patch_64.c Merge branch 'locking/core' into x86/core, to prepare for dependent patch 2015-06-03 10:07:35 +02:00
paravirt-spinlocks.c locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS 2015-05-11 09:52:09 +02:00
paravirt.c x86/paravirt: Replace the paravirt nop with a bona fide empty function 2015-09-22 22:40:28 +02:00
pci-calgary_64.c
pci-dma.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c x86/swiotlb: Try coherent allocations with __GFP_NOWARN 2015-06-11 08:28:38 +02:00
pcspeaker.c
perf_regs.c perf/x86/64: Report regs_user->ax too in get_regs_user() 2015-04-11 13:08:53 +02:00
pmem.c libnvdimm, e820: skip module loading when no type-12 2015-11-30 09:10:33 -08:00
probe_roms.c
process_32.c sched/core, sched/x86: Kill thread_info::saved_preempt_count 2015-10-06 17:08:18 +02:00
process_64.c sched/x86: Fix typo in __switch_to() comments 2015-10-19 10:18:53 +02:00
process.c x86/vm86: Set thread.vm86 to NULL on fork/clone 2015-10-31 09:50:25 +01:00
ptrace.c x86/entry: Move C entry and exit code to arch/x86/entry/common.c 2015-07-07 10:59:05 +02:00
pvclock.c x86: pvclock: Really remove the sched notifier for cross-cpu migrations 2015-04-27 15:49:30 +02:00
quirks.c timers/x86/hpet: Type adjustments 2015-10-21 11:17:32 +02:00
reboot_fixups_32.c
reboot.c panic, x86: Allow CPUs to save registers even if looping in NMI context 2015-12-19 11:07:01 +01:00
relocate_kernel_32.S x86/asm: Optimize unnecessarily wide TEST instructions 2015-03-07 11:12:43 +01:00
relocate_kernel_64.S x86/asm: Replace "MOVQ $imm, %reg" with MOVL 2015-04-01 13:17:39 +02:00
resource.c
rtc.c kernel.h: remove ancient __FUNCTION__ hack 2015-02-12 18:54:13 -08:00
setup_percpu.c
setup.c x86/microcode: Initialize the driver late when facilities are up 2015-11-23 10:39:49 +01:00
signal_compat.c x86/compat: Move copy_siginfo_*_user32() to signal_compat.c 2015-07-06 15:28:55 +02:00
signal.c x86/signal: Fix restart_syscall number for x32 tasks 2015-12-05 18:52:14 +01:00
smp.c x86/smp: Remove single IPI wrapper 2015-11-05 13:07:54 +01:00
smpboot.c x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs 2015-11-25 23:17:48 +01:00
stacktrace.c
step.c Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes 2015-08-18 09:39:47 +02:00
sys_x86_64.c x86/mm: Improve AMD Bulldozer ASLR workaround 2015-03-31 10:01:17 +02:00
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c
tce_64.c
test_nx.c
test_rodata.c treewide: Fix typo in printk messages 2015-03-06 23:05:39 +01:00
time.c x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
tls.c x86, tls: Interpret an all-zero struct user_desc as "no segment" 2015-01-22 21:45:07 +01:00
tls.h
topology.c x86: Drop bogus __ref / __refdata annotations 2015-07-20 18:57:20 +02:00
trace_clock.c x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites 2015-07-06 15:23:29 +02:00
tracepoint.c
traps.c x86/fpu/mpx: Rework MPX 'xstate' types 2015-09-14 12:22:00 +02:00
tsc_msr.c
tsc_sync.c x86/asm/tsc/sync: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers 2015-07-06 15:23:29 +02:00
tsc.c perf/x86: Fix time_shift in perf_event_mmap_page 2015-10-20 10:30:52 +02:00
uprobes.c uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever 2015-07-31 10:38:06 +02:00
verify_cpu.S x86/cpu: Call verify_cpu() after having entered long mode too 2015-11-07 10:45:02 +01:00
vm86_32.c x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 2015-09-05 09:01:16 +02:00
vmlinux.lds.S kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
vsmp_64.c x86: replace __init_or_module with __init in non-modular vsmp_64.c 2015-06-16 14:12:41 -04:00
x86_init.c PCI changes for the v4.2 merge window: 2015-06-23 13:41:24 -07:00
x8664_ksyms_64.c preempt: Use preempt_schedule_context() as the official tracing preemption point 2015-06-07 15:57:42 +02:00