linux/arch/x86_64/kernel
Ingo Molnar 5d0e600d90 [PATCH] x86: fix laptop bootup hang in init_acpi()
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():

 Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
 Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
 Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
 Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
 Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
 Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
 Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
 Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()

It's a hard hang that not even an NMI could punch through!  Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...

After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.

So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang.  (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)

Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.

As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang.  I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts.  Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).

I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:24 +01:00
..
acpi [PATCH] x86-64: Remove fastcall references in x86_64 code 2007-02-13 13:26:22 +01:00
cpufreq [CPUFREQ] select consistently 2006-12-22 22:45:41 -05:00
aperture.c [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1 2006-09-26 10:52:41 +02:00
apic.c [PATCH] x86-64: Remove unused GET_APIC_VERSION call from clear_local_APIC 2006-12-07 02:14:11 +01:00
asm-offsets.c [CRYPTO] all: Pass tfm instead of ctx to algorithms 2006-06-26 17:34:39 +10:00
audit.c [PATCH] audit: AUDIT_PERM support 2006-09-11 13:32:30 -04:00
crash_dump.c
crash.c [PATCH] Kexec / Kdump: Unify elf note code 2006-12-07 08:39:46 -08:00
e820.c [PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole 2007-02-13 13:26:22 +01:00
early_printk.c [PATCH] x86_64: fix 'earlyprintk=...,keep' regression 2006-11-28 10:58:21 -08:00
early-quirks.c ACPICA: Remove duplicate table definitions (non-conflicting), cont 2007-02-02 21:14:29 -05:00
entry.S Remove stack unwinder for now 2006-12-15 08:47:51 -08:00
functionlist [NET]: make skb_release_data() static 2006-06-29 16:58:30 -07:00
genapic_cluster.c [PATCH] x86_64 irq: Allocate a vector across all cpus for genapic_flat. 2006-10-08 12:24:02 -07:00
genapic_flat.c [PATCH] x86-64: Put more than one cpu in TARGET_CPUS 2006-10-21 18:37:02 +02:00
genapic.c ACPICA: use new ACPI headers. 2007-02-02 21:14:28 -05:00
head64.c [PATCH] Dynamic kernel command-line: fixups 2007-02-12 09:48:39 -08:00
head.S [PATCH] x86-64: x86_64 - Fix FS/GS registers for VT execution 2007-02-13 13:26:24 +01:00
i387.c [PATCH] x86-64: use BUILD_BUG_ON in FPU code 2006-12-07 02:14:01 +01:00
i8259.c [PATCH] x86_64: interrupt array size should be aligned to NR_VECTORS 2006-12-07 02:14:12 +01:00
init_task.c [PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c 2006-10-02 07:57:20 -07:00
io_apic.c msi: Make MSI useable more architectures 2007-02-07 15:50:08 -08:00
ioport.c [PATCH] x86-64: Use constant instead of raw number in x86_64 ioperm.c 2007-02-13 13:26:22 +01:00
irq.c [PATCH] x86-64: Rate limit no irq handler messages 2006-12-07 02:14:09 +01:00
k8.c [PATCH] x86_64: Clean and enhance up K8 northbridge access code 2006-06-26 10:48:15 -07:00
kprobes.c [PATCH] kprobes: enable booster on the preemptible kernel 2006-12-07 08:39:38 -08:00
ldt.c
machine_kexec.c [PATCH] Avoid overwriting the current pgd (V4, x86_64) 2006-09-26 10:52:38 +02:00
Makefile [PATCH] x86: Refactor thermal throttle processing 2006-09-26 10:52:42 +02:00
mce_amd.c [PATCH] x86-64: Allow to run a program when a machine check event is detected 2007-02-13 13:26:23 +01:00
mce_intel.c [PATCH] x86: Add a cumulative thermal throttle event counter. 2006-09-26 10:52:42 +02:00
mce.c [PATCH] x86-64: Allow to run a program when a machine check event is detected 2007-02-13 13:26:23 +01:00
module.c [PATCH] Generic BUG for x86-64 2006-12-08 08:28:39 -08:00
mpparse.c ACPICA: use new ACPI headers. 2007-02-02 21:14:28 -05:00
nmi.c [PATCH] x86: fix laptop bootup hang in init_acpi() 2007-02-13 13:26:24 +01:00
pci-calgary.c [PATCH] x86-64: robustify bad_dma_address handling 2007-02-13 13:26:24 +01:00
pci-dma.c [PATCH] x86-64: improved iommu documentation 2007-02-13 13:26:21 +01:00
pci-gart.c [PATCH] x86-64: Fix off by one error in IOMMU boundary checking 2007-02-13 13:26:24 +01:00
pci-nommu.c [PATCH] remove superflous BUG_ON's in nommu and gart 2006-09-26 10:52:32 +02:00
pci-swiotlb.c [IA64] swiotlb cleanup 2007-02-05 18:51:25 -08:00
pmtimer.c [PATCH] make pmtmr_ioport __read_mostly 2006-06-26 09:58:21 -07:00
process.c [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code 2006-12-22 08:55:51 -08:00
ptrace.c [PATCH] x86-64: Check return value of putreg in PTRACE_SETREGS 2007-02-13 13:26:24 +01:00
reboot.c [PATCH] x86_64: Move export symbols to their C functions 2006-06-26 10:48:22 -07:00
relocate_kernel.S [PATCH] Avoid overwriting the current pgd (V4, x86_64) 2006-09-26 10:52:38 +02:00
setup64.c [PATCH] x86-64: Unexport __supported_pte_mask 2007-02-13 13:26:24 +01:00
setup.c [PATCH] x86-64: Don't reserve ROMs 2007-02-13 13:26:24 +01:00
signal.c [PATCH] Remove all traces of signal number conversion 2006-09-26 10:52:41 +02:00
smp.c Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 2006-12-07 08:59:11 -08:00
smpboot.c Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 2006-12-07 08:59:11 -08:00
stacktrace.c [PATCH] x86-64: do not always end the stack trace with ULONG_MAX 2007-02-13 13:26:21 +01:00
suspend_asm.S [PATCH] Change the name of pagedir_nosave 2006-09-26 08:49:01 -07:00
suspend.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
sys_x86_64.c [PATCH] namespaces: utsname: switch to using uts namespaces 2006-10-02 07:57:21 -07:00
syscall.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
tce.c Remove all inclusions of <linux/config.h> 2006-10-04 03:38:54 -04:00
time.c [PATCH] x86-64: - Ignore long SMI interrupts in clock calibration code - update 1 2007-02-13 13:26:24 +01:00
trampoline.S [PATCH] Fix gdt table size in trampoline.S 2006-09-26 10:52:32 +02:00
traps.c [PATCH] x86_64: Fix dump_trace() 2007-01-03 08:49:59 -08:00
vmlinux.lds.S [PATCH] disable init/initramfs.c: architectures 2007-02-11 10:51:25 -08:00
vsmp.c [PATCH] Fix build breakage with CONFIG_X86_VSMP 2006-10-12 12:25:27 -07:00
vsyscall.c [PATCH] sysctl: remove unused "context" param 2006-12-10 09:55:41 -08:00
x8664_ksyms.c [PATCH] x86-64: Remove fastcall references in x86_64 code 2007-02-13 13:26:22 +01:00