linux/arch/x86/include/asm/nmi.h
Don Zickus 58687acba5 lockup_detector: Combine nmi_watchdog and softlockup detector
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector.  Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.

Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.

To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event.  If that stops counting on a cpu, then the cpu is
most likely in trouble.

To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.

I tested this on x86_64 and both the softlockup and hardlockup paths
work.

V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events

V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions

V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes

V5:
- split apart warn flags for hard and soft lockups

TODO:
- figure out how to make an arch-agnostic clock2cycles call
  (if possible) to feed into perf events as a sample period

[fweisbec: merged conflict patch]

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-12 23:55:33 +02:00

81 lines
2.2 KiB
C

#ifndef _ASM_X86_NMI_H
#define _ASM_X86_NMI_H
#include <linux/pm.h>
#include <asm/irq.h>
#include <asm/io.h>
#ifdef ARCH_HAS_NMI_WATCHDOG
/**
* do_nmi_callback
*
* Check to see if a callback exists and execute it. Return 1
* if the handler exists and was handled successfully.
*/
int do_nmi_callback(struct pt_regs *regs, int cpu);
extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
extern int check_nmi_watchdog(void);
#if !defined(CONFIG_LOCKUP_DETECTOR)
extern int nmi_watchdog_enabled;
#endif
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
extern void release_evntsel_nmi(unsigned int);
extern void setup_apic_nmi_watchdog(void *);
extern void stop_apic_nmi_watchdog(void *);
extern void disable_timer_nmi_watchdog(void);
extern void enable_timer_nmi_watchdog(void);
extern int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason);
extern void cpu_nmi_set_wd_enabled(void);
extern atomic_t nmi_active;
extern unsigned int nmi_watchdog;
#define NMI_NONE 0
#define NMI_IO_APIC 1
#define NMI_LOCAL_APIC 2
#define NMI_INVALID 3
struct ctl_table;
extern int proc_nmi_enabled(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
extern int unknown_nmi_panic;
void arch_trigger_all_cpu_backtrace(void);
#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
static inline void localise_nmi_watchdog(void)
{
if (nmi_watchdog == NMI_IO_APIC)
nmi_watchdog = NMI_LOCAL_APIC;
}
/* check if nmi_watchdog is active (ie was specified at boot) */
static inline int nmi_watchdog_active(void)
{
/*
* actually it should be:
* return (nmi_watchdog == NMI_LOCAL_APIC ||
* nmi_watchdog == NMI_IO_APIC)
* but since they are power of two we could use a
* cheaper way --cvg
*/
return nmi_watchdog & (NMI_LOCAL_APIC | NMI_IO_APIC);
}
#endif
void lapic_watchdog_stop(void);
int lapic_watchdog_init(unsigned nmi_hz);
int lapic_wd_event(unsigned nmi_hz);
unsigned lapic_adjust_nmi_hz(unsigned hz);
void disable_lapic_nmi_watchdog(void);
void enable_lapic_nmi_watchdog(void);
void stop_nmi(void);
void restart_nmi(void);
#endif /* _ASM_X86_NMI_H */