linux/arch/x86/kernel
HATAYAMA Daisuke b292d7a104 perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.

For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.

This commit deals with the issue simply by ignoring CondChgd bit.

Here is explanation in detail.

On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.

intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.

The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.

As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.

I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.

On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.

I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:

  In 18.9.1:

  "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
   indicate changes to the state of performancmonitoring hardware"

  In Table 35-2 IA-32 Architectural MSRs

  63 CondChg: status bits of this register has changed.

These are different from the bahviour I see on the actual system as I
explained above.

At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-02 08:35:55 +02:00
..
acpi asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
apic Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 20:03:47 -07:00
cpu perf/x86/intel: ignore CondChgd bit to avoid false NMI handling 2014-07-02 08:35:55 +02:00
kprobes kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
.gitignore
alternative.c kprobes, x86: Allow kprobes on text_poke/hw_breakpoint 2014-04-24 10:03:02 +02:00
amd_gart_64.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
amd_nb.c A bunch of EDAC updates all over the place: 2014-04-01 13:54:00 -07:00
apb_timer.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
aperture_64.c x86/gart: Tidy messages and add bridge device info 2014-05-23 10:47:19 -06:00
apm_32.c sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG 2014-05-08 09:16:56 +02:00
asm-offsets_32.c x86: Get rid of ->hard_math and all the FPU asm fu 2013-06-06 14:32:04 -07:00
asm-offsets_64.c x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c
bootflag.c
check.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
cpuid.c x86, cpuid: Fix CPU hotplug callback registration 2014-03-20 13:43:42 +01:00
crash_dump_32.c
crash_dump_64.c
crash.c x86, crash: Unify ifdef 2014-03-13 15:32:44 -07:00
devicetree.c x86: use FDT accessors for FDT blob header data 2014-04-30 00:59:19 -05:00
doublefault.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
dumpstack_32.c Merge branch 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 10:17:18 -07:00
dumpstack_64.c x86: Fix dumpstack_64 irq stack handling 2014-04-02 11:46:50 -07:00
dumpstack.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
e820.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c Merge commit '9e9a928eed8796a0a1aaed7e0b676db86ba84594' into drm-next 2014-06-05 20:28:59 +10:00
entry_32.S Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
entry_64.S Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
espfix_64.c x86, espfix: Move espfix definitions into a separate header file 2014-05-01 14:16:15 -07:00
ftrace.c ftrace/x86: Call text_ip_addr() instead of the duplicated code 2014-06-03 19:44:37 -04:00
head32.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
head64.c kernel/printk: use symbolic defines for console loglevels 2014-06-04 16:54:17 -07:00
head_32.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head_64.S x86: fix compile error due to X86_TRAP_NMI use in asm files 2014-03-07 18:58:40 -08:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
hpet.c Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-05 08:05:29 -07:00
hw_breakpoint.c kprobes, x86: Allow kprobes on text_poke/hw_breakpoint 2014-04-24 10:03:02 +02:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU 2014-03-11 12:32:52 -07:00
i8237.c
i8253.c
i8259.c x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately 2014-04-14 11:49:55 -07:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
iosf_mbi.c x86, iosf: Add PCI ID macros for better readability 2014-05-09 14:57:35 -07:00
irq_32.c x86, threadinfo: Redo "x86: Use inline assembler to get sp" 2014-03-10 17:32:01 -07:00
irq_64.c irq: Consolidate do_softirq() arch overriden implementations 2013-10-01 12:53:25 +02:00
irq_work.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
irq.c Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 20:03:47 -07:00
irqinit.c x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs 2014-01-12 13:13:02 +01:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c
kgdb.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
ksysfs.c x86: ksysfs.c build fix 2014-01-03 14:37:13 +00:00
kvm.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
kvmclock.c kvm: remove redundant registration of BSP's hv_clock area 2014-02-22 15:53:32 +01:00
ldt.c Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" 2014-05-21 10:22:59 -07:00
machine_kexec_32.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
machine_kexec_64.c x86, kaslr: export offset in VMCOREINFO ELF notes 2014-02-25 16:57:47 -08:00
Makefile Lots of tweaks, small fixes, optimizations, and some helper functions 2014-06-09 16:39:15 -07:00
mcount_64.S ftrace/x86: Move the mcount/fentry code out of entry_64.S 2014-05-14 11:37:31 -04:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c x86, kaslr: fix module lock ordering problem 2014-03-24 10:18:26 -07:00
mpparse.c
msr.c x86, msr: Fix CPU hotplug callback registration 2014-03-20 13:43:42 +01:00
nmi_selftest.c
nmi.c kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation 2014-04-24 10:26:38 +02:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c kprobes, x86: Prohibit probing on native_set_debugreg()/load_idt() 2014-04-24 10:02:58 +02:00
pci-calgary_64.c x86, calgary: Use 8M TCE table size by default 2014-04-10 19:51:32 -07:00
pci-dma.c arch/x86/kernel/pci-dma.c: fix dma_generic_alloc_coherent() when CONFIG_DMA_CMA is enabled 2014-06-04 16:53:57 -07:00
pci-iommu_table.c
pci-nommu.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
pci-swiotlb.c x86: enable DMA CMA with swiotlb 2014-06-04 16:53:57 -07:00
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
preempt.S sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process_32.c x86: Keep thread_info on thread stack in x86_32 2014-03-06 16:56:55 -08:00
process_64.c Merge branch 'perf/urgent' into perf/core, to resolve conflict and to prepare for new patches 2014-06-06 07:55:06 +02:00
process.c sched/idle, x86: Remove redundant cpuidle_idle_call() 2014-02-11 09:58:28 +01:00
ptrace.c x86: Keep thread_info on thread stack in x86_32 2014-03-06 16:56:55 -08:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86/amd/numa: Fix northbridge quirk to assign correct NUMA node 2014-03-14 11:05:36 +01:00
reboot_fixups_32.c
reboot.c x86/reboot: Add reboot quirk for Certec BPC600 2014-05-07 11:22:10 +02:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c
setup.c cma: add placement specifier for "cma=" kernel parameter 2014-06-04 16:53:57 -07:00
signal.c x86, vdso: Reimplement vdso.so preparation in build-time C 2014-05-05 13:18:51 -07:00
smp.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
smpboot.c Merge branch 'next' (accumulated 3.16 merge window patches) into master 2014-06-08 11:31:16 -07:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning 2013-10-03 07:51:11 +02:00
sysfb.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-01 11:22:57 -07:00
tls.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c x86/kprobes: Fix build errors and blacklist context_track_user 2014-06-14 09:07:44 +02:00
tsc_msr.c x86: tsc: Add missing Baytrail frequency to the table 2014-02-19 17:12:24 +01:00
tsc_sync.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
tsc.c Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 12:26:43 -07:00
uprobes.c uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates 2014-06-05 16:21:57 +02:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86, vdso: Zero-pad the VVAR page 2014-03-18 12:52:44 -07:00
vsmp_64.c asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
vsyscall_64.c x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO 2014-05-05 13:19:01 -07:00
vsyscall_emu_64.S
vsyscall_gtod.c x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall() 2014-05-09 08:45:52 -07:00
vsyscall_trace.h
x86_init.c PCI: Drop "irq" param from *_restore_msi_irqs() 2013-12-13 08:44:30 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86, xsave: Support eager-only xsave features, add MPX support 2013-12-06 17:17:42 -08:00