linux/arch/x86_64/kernel
Tim Hockin bd78432c8f x86_64: mcelog tolerant level cleanup
Background:
 The MCE handler has several paths that it can take, depending on various
 conditions of the MCE status and the value of the 'tolerant' knob.  The
 exact semantics are not well defined and the code is a bit twisty.

Description:
 This patch makes the MCE handler's behavior more clear by documenting the
 behavior for various 'tolerant' levels.  It also fixes or enhances
 several small things in the handler.  Specifically:
     * If RIPV is set it is not safe to restart, so set the 'no way out'
       flag rather than the 'kill it' flag.
     * Don't panic() on correctable MCEs.
     * If the _OVER bit is set *and* the _UC bit is set (meaning possibly
       dropped uncorrected errors), set the 'no way out' flag.
     * Use EIPV for testing whether an app can be killed (SIGBUS) rather
       than RIPV.  According to docs, EIPV indicates that the error is
       related to the IP, while RIPV simply means the IP is valid to
       restart from.
     * Don't clear the MCi_STATUS registers until after the panic() path.
       This leaves the status bits set after the panic() so clever BIOSes
       can find them (and dumb BIOSes can do nothing).

 This patch also calls nonseekable_open() in mce_open (as suggested by akpm).

Result:
 Tolerant levels behave almost identically to how they always have, but
 not it's well defined.  There's a slightly higher chance of panic()ing
 when multiple errors happen (a good thing, IMHO).  If you take an MBE and
 panic(), the error status bits are not cleared.

Alternatives:
 None.

Testing:
 I used software to inject correctable and uncorrectable errors.  With
 tolerant = 3, the system usually survives.  With tolerant = 2, the system
 usually panic()s (PCC) but not always.  With tolerant = 1, the system
 always panic()s.  When the system panic()s, the BIOS is able to detect
 that the cause of death was an MC4.  I was not able to reproduce the
 case of a non-PCC error in userspace, with EIPV, with (tolerant < 3).
 That will be rare at best.

Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21 18:37:10 -07:00
..
acpi PM: Integrate beeping flag with existing acpi_sleep flags 2007-07-19 10:04:43 -07:00
cpufreq [CPUFREQ] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI 2007-07-13 01:29:51 -04:00
aperture.c x86_64: off-by-two error in aperture.c 2007-05-11 12:53:00 -07:00
apic.c x86_64: apic.c coding style janitor work 2007-07-21 18:37:09 -07:00
asm-offsets.c [PATCH] x86-64: Auto compute __NR_syscall_max at compile time 2007-05-02 19:27:18 +02:00
audit.c [PATCH] audit signal recipients 2007-05-11 05:38:25 -04:00
bugs.c x86_64: Add asm/mtrr.h include for some builds 2007-05-12 09:47:15 -07:00
crash_dump.c [PATCH] kdump: read previous kernel's memory 2006-01-10 08:01:28 -08:00
crash.c move die notifier handling to common code 2007-05-08 11:15:04 -07:00
e820.c x86_64: extract helper function from e820_register_active_regions 2007-07-21 18:37:10 -07:00
early_printk.c xen: use the hvc console infrastructure for Xen console 2007-07-18 08:47:44 -07:00
early-quirks.c [PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525 2007-05-02 19:27:04 +02:00
entry.S x86_64: support poll() on /dev/mcelog 2007-07-21 18:37:10 -07:00
genapic_flat.c [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c 2007-05-02 19:27:21 +02:00
genapic.c [PATCH] x86: adjust inclusion of asm/fixmap.h 2007-05-02 19:27:04 +02:00
head64.c x86_64: display more intuitive error message if kernel is not 2MB aligned 2007-05-11 08:29:32 -07:00
head.S x86: initial fixmap support 2007-07-16 09:05:35 -07:00
hpet.c x86_64: fiuxp pt_reqs leftovers 2007-07-21 18:37:09 -07:00
i387.c [PATCH] x86-64: use BUILD_BUG_ON in FPU code 2006-12-07 02:14:01 +01:00
i8259.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
init_task.c use the new percpu interface for shared data 2007-07-19 10:04:45 -07:00
io_apic.c x86_64: set the irq_chip name for lapic 2007-06-26 16:54:29 -07:00
ioport.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
irq.c x86_64 irq: use mask/unmask and proper locking in fixup_irqs() 2007-06-26 16:54:29 -07:00
k8.c Avoid zero size allocation in cache_k8_northbridges() 2007-05-23 20:14:12 -07:00
kprobes.c Kprobes: The ON/OFF knob thru debugfs 2007-05-08 11:15:19 -07:00
ldt.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
machine_kexec.c Revert "[PATCH] x86: __pa and __pa_symbol address space separation" 2007-05-07 08:44:24 -07:00
Makefile Use a new CPU feature word to cover features that are spread around 2007-07-12 10:55:54 -07:00
mce_amd.c x86_64: Fix APIC typo 2007-07-21 18:37:09 -07:00
mce_intel.c [PATCH] x86: Add a cumulative thermal throttle event counter. 2006-09-26 10:52:42 +02:00
mce.c x86_64: mcelog tolerant level cleanup 2007-07-21 18:37:10 -07:00
module.c [PATCH] Generic BUG for x86-64 2006-12-08 08:28:39 -08:00
mpparse.c x86_64: remove unused variable maxcpus 2007-07-21 18:37:09 -07:00
nmi.c x86_64: speedup touch_nmi_watchdog 2007-07-17 10:23:04 -07:00
pci-calgary.c [PATCH] x86-64: dma_ops as const 2007-05-02 19:27:06 +02:00
pci-dma.c PCI: remove pci_dac_dma_... APIs 2007-07-11 16:02:11 -07:00
pci-gart.c x86_64: off-by-two error in aperture.c 2007-05-11 12:53:00 -07:00
pci-nommu.c [PATCH] x86-64: dma_ops as const 2007-05-02 19:27:06 +02:00
pci-swiotlb.c [PATCH] x86-64: dma_ops as const 2007-05-02 19:27:06 +02:00
pmtimer.c [PATCH] time: x86_64: convert x86_64 to use GENERIC_TIME 2007-02-16 08:14:00 -08:00
process.c x86_64: Quicklist support for x86_64 2007-07-21 18:37:09 -07:00
ptrace.c Handle bogus %cs selector in single-step instruction decoding 2007-07-18 12:09:01 -07:00
reboot.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
relocate_kernel.S [PATCH] Avoid overwriting the current pgd (V4, x86_64) 2006-09-26 10:52:38 +02:00
setup64.c x86_64: Ignore compat mode SYSCALL when IA32_EMULATION is not defined 2007-06-22 18:41:19 -07:00
setup.c i386: Add L3 cache support to AMD CPUID4 emulation 2007-07-21 18:37:08 -07:00
signal.c x86_64: support poll() on /dev/mcelog 2007-07-21 18:37:10 -07:00
smp.c x86_64: Quicklist support for x86_64 2007-07-21 18:37:09 -07:00
smpboot.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
stacktrace.c simplify the stacktrace code 2007-05-08 11:14:58 -07:00
suspend_asm.S [PATCH] x86-64: Relocatable Kernel Support 2007-05-02 19:27:07 +02:00
suspend.c [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending 2007-05-02 19:27:17 +02:00
sys_x86_64.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
syscall.c [PATCH] x86-64: Auto compute __NR_syscall_max at compile time 2007-05-02 19:27:18 +02:00
tce.c Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
time.c x86_64: time.c white space wreckage cleanup 2007-07-21 18:37:09 -07:00
trampoline.S [PATCH] x86-64: Move cpu verification code to common file 2007-05-02 19:27:08 +02:00
traps.c drivers/edac: add new nmi rescan 2007-07-19 10:04:53 -07:00
tsc_sync.c [PATCH] x86: Log reason why TSC was marked unstable 2007-05-02 19:27:08 +02:00
tsc.c x86_64: Remove dead code and other janitor work in tsc.c 2007-07-21 18:37:08 -07:00
verify_cpu.S Unify the CPU features vectors between i386 and x86-64 2007-07-12 10:55:54 -07:00
vmlinux.lds.S x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu 2007-07-21 18:37:08 -07:00
vsmp.c [PATCH] Fix build breakage with CONFIG_X86_VSMP 2006-10-12 12:25:27 -07:00
vsyscall.c x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu 2007-07-21 18:37:08 -07:00
x8664_ksyms.c [PATCH] x86: Export _proxy_pda for gcc 4.2 2007-03-16 21:07:36 +01:00