Some buggy systems can machine check when config space accesses
happen for some non existent devices. i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.
This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.
Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
This is useful on systems with broken PCI bus. Affects various
scans in x86-64 and i386's early ACPI quirk scan.
Cc: gregkh@suse.de
Cc: len.brown@intel.com
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Current gcc generates calls not jumps to noreturn functions. When that happens the
return address can point to the next function, which confuses the unwinder.
This patch works around it by marking asynchronous exception
frames in contrast normal call frames in the unwind information. Then teach
the unwinder to decode this.
For normal call frames the unwinder now subtracts one from the address which avoids
this problem. The standard libgcc unwinder uses the same trick.
It doesn't include adjustment of the printed address (i.e. for the original
example, it'd still be kernel_math_error+0 that gets displayed, but the
unwinder wouldn't get confused anymore.
This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
unfortunately because earlier binutils don't support .cfi_signal_frame
[AK: added automatic detection of the new binutils and wrote description]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
auto_eoi has been set. Keep track of the current status and restore that
on resume. This fixes it for AMD64 and i386.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Sometimes, bug reports come in where we've had an oops, and the
only record we have is what the reporter saw on screen shortly
before the system locked up completely. Unfortunatly, syslog
only prints lines beginning with KERN_EMERG to the console, so
some lines get lost.
An example of this can be seen at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=203723
Some of this information isn't vital to diagnosis, but some parts
are useful, such as the tainted flag.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Needs earlier patch to split type 1 probing from use.
This patch should fix the x86 macs where type 1 PCI config space access
doesn't work, but MCFG does. They also don't have a usable e820 table
so the e820 sanity check failed.
Instead assume now that if type 1 doesn't work then MCFG must work
and don't do the e820 check.
Signed-off-by: Andi Kleen <ak@suse.de>
First probe if type1/2 accesses work, but then only initialize them at the end.
This is useful for a later patch that needs this information inbetween.
Signed-off-by: Andi Kleen <ak@suse.de>
kexec has been marked experimental for a year now and all
of the serious problems have been worked through. So it
is time (if not past time) to remove the experimental mark.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add HPET(s) into resource map. This will allow for the HPET(s) to be
visibile within /proc/iomem.
Signed-off-by: Aaron Durbin <adurbin@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add early i386 fault handlers with debug information for common faults.
Handles:
divide error
invalid opcode
protection fault
page fault
Also adds code to detect early recursive/multiple faults and halt the
system when they happen (taken from x86_64.)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
We allow for the fact that the guest kernel may not run in ring 0. This
requires some abstraction in a few places when setting %cs or checking
privilege level (user vs kernel).
This is Chris' [RFC PATCH 15/33] move segment checks to subarch, except rather
than using #define USER_MODE_MASK which depends on a config option, we use
Zach's more flexible approach of assuming ring 3 == userspace. I also used
"get_kernel_rpl()" over "get_kernel_cs()" because I think it reads better in
the code...
1) Remove the hardcoded 3 and introduce #define SEGMENT_RPL_MASK 3 2) Add a
get_kernel_rpl() macro, and don't assume it's zero.
And:
Clean up of patch for letting kernel run other than ring 0:
a. Add some comments about the SEGMENT_IS_*_CODE() macros.
b. Add a USER_RPL macro. (Code was comparing a value to a mask
in some places and to the magic number 3 in other places.)
c. Add macros for table indicator field and use them.
d. Change the entry.S tests for LDT stack segment to use the macros
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Abstract sensitive instructions in assembler code, replacing them with macros
(which currently are #defined to the native versions). We use long names:
assembler is case-insensitive, so if something goes wrong and macros do not
expand, it would assemble anyway.
Resulting object files are exactly the same as before.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
kexec: Avoid overwriting the current pgd (V4, i386)
This patch upgrades the i386-specific kexec code to avoid overwriting the
current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
to start a secondary kernel that dumps the memory of the previous kernel.
The code introduces a new set of page tables. These tables are used to provide
an executable identity mapping without overwriting the current pgd.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The EDD code would scan the command line as a fixed array, without
taking account of either whitespace, null-termination, the old
command-line protocol, late overrides early, or the fact that the
command line may not be reachable from INITSEG.
This should fix those problems, and enable us to use a longer command
line.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
In i386's entry.S, FIX_STACK() needs annotation because it
replaces the stack pointer. And the rest of nmi() needs
annotation in order to compile with these new annotations.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
A kprobe executes IRET early and that could cause NMI recursion and stack
corruption.
Note: This problem was originally spotted and solved by Andi Kleen in the
x86_64 architecture. This patch is an adaption of his patch for i386.
AK: Merged with current code which was a bit different.
AK: Removed printk in nmi handler that shouldn't be there in the first time
AK: Added missing include.
AK: added KPROBES_END
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
A kprobe executes IRET early and that could cause NMI recursion and stack
corruption.
Note: This problem was originally spotted by Andi Kleen. This patch
adds fixes not included in his original patch.
[AK: Jan Beulich originally discovered these classes of bugs]
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu cache functions as __cpuinit. They are all
only called from arch/i386/common.c:display_cache_info() that already is
marked as __cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu identification functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is marked as
__cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Mark i386-specific cpu init functions as __cpuinit. They are all
only called from arch/i386/common.c:identify_cpu() that already is marked as
__cpuinit. This patch also removes the empty function init_umc().
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The different cpu_dev structures are all used from __cpuinit callers what
I can tell. So mark them as __cpuinitdata instead of __initdata. I am a
little bit unsure about arch/i386/common.c:default_cpu, especially when it
comes to the purpose of this_cpu.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
cpu_dev->c_identify is only called from arch/i386/common.c:identify_cpu(), and
this after generic_identify() already has been called. There is no need to call
this function twice and hook it in c_identify - but I may be wrong, please
double check before applying.
This patch also removes generic_identify() from cpu.h to avoid unnecessary
future nesting.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch enables ACPI based physical CPU hotplug support for x86_64.
Implements acpi_map_lsapic() and acpi_unmap_lsapic() to support physical cpu
hotplug.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
cyrix_identify() should be __init because transmeta_identify() is.
tsc_init() is only called from setup_arch() which is marked as __init.
These two section mismatches have been detected using running modpost on
a vmlinux image compiled with CONFIG_RELOCATABLE=y.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The implementation comes from Zach's [RFC, PATCH 10/24] i386 Vmi
descriptor changes:
Descriptor and trap table cleanups. Add cleanly written accessors for
IDT and GDT gates so the subarch may override them. Note that this
allows the hypervisor to transparently tweak the DPL of the descriptors
as well as the RPL of segments in those descriptors, with no unnecessary
kernel code modification. It also allows the hypervisor implementation
of the VMI to tweak the gates, allowing for custom exception frames or
extra layers of indirection above the guest fault / IRQ handlers.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
And add proper CFI annotation to it which was previously
impossible. This prevents "stuck" messages by the dwarf2 unwinder
when reaching the top of a kernel stack.
Includes feedback from Jan Beulich
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
It's needed for external debuggers and overhead is very small.
Also make the actual notifier chain they use static
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Fix
linux/arch/i386/kernel/mpparse.c: In function #MP_bus_info#:
linux/arch/i386/kernel/mpparse.c:232: warning: comparison is always false due to limited range of data type
Signed-off-by: Andi Kleen <ak@suse.de>
Improve Kconfig description of CONFIG_CRASH_DUMP. Previously
it was too brief to be useful.
Cc: vgoyal@in.ibm.com
Cc: ebiederm@xmission.com
Signed-off-by: Andi Kleen <ak@suse.de>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We have a test that looks for invalid pairings of certain athlon/durons
that weren't designed for SMP, and taint accordingly (with 'S') if we find
such a configuration. However, this test shouldn't fire if there's only
a single CPU present. It's perfectly valid for an SMP kernel to boot on UP
hardware for example.
AK: changed to num_possible_cpus()
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Fix a very dubious piece of code in
arch/i386/kernel/cpu/common.c:cpu_init(). This clears out %fs and
%gs, but clobbers %eax in the process without telling gcc. It turns
out that gcc happens to be not using %eax at that point anyway so it
doesn't matter much, but it looks like a bomb waiting to go off.
This does end up saving an instruction, because gcc wants %eax==0 for
the set_debugreg()s below.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Following x86-64 patches. Reuses code from them in fact.
Convert the standard backtracer to do all output using
callbacks. Use the x86-64 stack tracer implementation
that uses these callbacks to implement the stacktrace interface.
This allows to use the new dwarf2 unwinder for stacktrace
and get better backtraces.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
is_at_popf() needs to test for the iret instruction as well as
popf. So add that test and rename it to is_setting_trap_flag().
Also change max insn length from 16 to 15 to match reality.
LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Removes code duplication between i386/x86-64.
Not needed anymore in setup.c since early_param cleanup
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
This patch replaces the open-coded early commandline parsing
throughout the i386 boot code with the generic mechanism (already used
by ppc, powerpc, ia64 and s390). The code was inconsistent with
whether it deletes the option from the cmdline or not, meaning some of
these will get passed through the environment into init.
This transformation is mainly mechanical, but there are some notable
parts:
1) Grammar: s/linux never set's it up/linux never sets it up/
2) Remove hacked-in earlyprintk= option scanning. When someone
actually implements CONFIG_EARLY_PRINTK, then they can use
early_param().
[AK: actually it is implemented, but I'm adding the early_param it in the next
x86-64 patch]
3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
4) Various parameters now moved into their appropriate files (thanks Andi).
5) All parse functions which examine arg need to check for NULL,
except one where it has subtle humor value.
AK: readded acpi_sci handling which was completely dropped
AK: moved some more variables into acpi/boot.c
Cc: len.brown@intel.com
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Lock sections don't work the new dwarf2 unwinder
This generates slightly smaller code. It adds one more taken
jump to the fast path.
Also move the trampolines into semaphore.S and add proper CFI
annotations.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Move initialization of all memory end variables to as early as
possible, so that dependent code doesn't need to check whether these
variables have already been set.
Change the range check in kunmap_atomic to actually make use of this
so that the no-mapping-estabished path (under CONFIG_DEBUG_HIGHMEM)
gets used only when the address is inside the lowmem area (and BUG()
otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
... instead of using a CONFIG option. The config option still controls
if the resulting executable actually has unwind information.
This is useful to prevent compilation errors when users select
CONFIG_STACK_UNWIND on old binutils and also allows to use
CFI in the future for non kernel debugging applications.
Cc: jbeulich@novell.com
Cc: sam@ravnborg.org
Signed-off-by: Andi Kleen <ak@suse.de>
Remove some unlinuxy ways to write function parameter definitions.
Remove some stray "return;"s
No functional change.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Bugzilla #6552 says:
"In arch/i386/boot/setup.S, movw is used instead of movb for PS/2 mouse
information, although it is unsigned char. This does not harm, because
the jmp instruction overwritten by movw is used before executing movw,
and never be used again"
I've no idea if this is a real bug or how it gets fixed, so I'm submitting
it for review instead of letting it die of boredom in bugzilla. Aditionally
to i386, I've changed x86-64, which mirrors the same code.
Credits to Yoshinori K. Okuji, who found the problem and suggested a fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Similar to earlier x86-64 patch.
Includes a fix by Jiri Slaby for a mistake that broke resume
Signed-off-by: Andi Kleen <ak@suse.de>
- Move them to a pure assembly file. Previously they were in
a C file that only consisted of inline assembly. Doing it in pure
assembler is much nicer.
- Add a frame.i include with FRAME/ENDFRAME macros to easily
add frame pointers to assembly functions
- Add dwarf2 annotation to them so that the new dwarf2 unwinder
doesn't get stuck on them
- Random cleanups
Includes feedback from Jan Beulich and a UML build fix from Andrew
Morton.
Cc: jbeulich@novell.com
Cc: jdike@addtoit.com
Signed-off-by: Andi Kleen <ak@suse.de>
This ports the algorithm from x86-64 (with improvements) to i386.
Previously this only worked for frame pointer enabled kernels.
But spinlocks have a very simple stack frame that can be manually
analyzed. Do this.
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
AK: This redoes the changes I temporarily reverted.
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
I've had good experiences with having this on by default on x86-64.
It turns nasty hangs into easier to debug oopses.
Enable the local APIC wdog by default for systems newer than 2004.
This comes from a strange compromise: according to arjan the reason
it was off by default was some old IBM systems that corrupted
registered when NMI happened in SMI. Can't remember more specific,
but >= 2004 should avoid these. It's probably overly broad
because most older systems should be ok (and the really old systems
won't be supported by the local apic watchdog anyways)
Signed-off-by: Andi Kleen <ak@suse.de>
After a crash we should wait for NMI IPI event and not for external NMI or
NMI watchdog tick.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch makes the following needlessly global functions static:
- nmi_int.c: profile_exceptions_notify()
- nmi_timer_int.c: profile_timer_exceptions_notify()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
method. APs should follow CPU hotplug code path.
And:
+From: Don Zickus <dzickus@redhat.com>
Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.
AK: I merged the two patches together
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Clean up some of the output messages on the nmi error paths to make more
sense when they are displayed. This is mainly a cosmetic fix and
shouldn't impact any normal code path.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
To quote Alan Cox:
The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.
A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.
This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.
By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system. By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
they are no longer needed. The various subsystems are modified to register
with the die_notifier instead.
Also includes compile fixes by Andrew Morton.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch cleans up the NMI interrupt path. Instead of being gated by if
the 'nmi callback' is set, the interrupt handler now calls everyone who is
registered on the die_chain and additionally checks the nmi watchdog,
reseting it if enabled. This allows more subsystems to hook into the NMI if
they need to (without being block by set_nmi_callback).
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch includes the changes to make the nmi watchdog on i386 SMP aware.
A bunch of code was moved around to make it simpler to read. In addition,
it is now possible to determine if a particular NMI was the result of the
watchdog or not. This feature allows the kernel to filter out unknown NMIs
easier.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Incorporates the new performance counter reservation system in oprofile.
Also cleans up a lot of the initialization code. The code original zero'd
out every register associated with performance counters regardless if those
registers were used or not. This causes issues with the nmi watchdog.
Now oprofile tries to reserve registers and gives up if it can't get them.
Cc: levon@movementarian.org
Cc: oprofile-list@lists.sf.net
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds basic infrastructure to allow subsystems to reserve performance
counters on the x86 chips. Only UP kernels are supported in this patch to
make reviewing easier. The SMP portion makes a lot more changes.
Think of this as a locking mechanism where each bit represents a different
counter. In addition, each subsystem should also reserve an appropriate
event selection register that will correspond to the performance counter it
will be using (this is mainly neccessary for the Pentium 4 chips as they
break the 1:1 relationship to performance counters).
This will help prevent subsystems like oprofile from interfering with the
nmi watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There are some machines around (large xSeries or Unisys ES7000) that
need physical IO-APIC destination mode to access all of their IO
devices. This currently doesn't work in UP kernels as used in
distribution installers.
This patch allows to compile even UP kernels as GENERICARCH which
allows to use physical or clustered APIC mode.
Signed-off-by: Andi Kleen <ak@suse.de>
If there is only 1 node in the system cpus should think they are apart of
some other node.
If cases where a real numa system boots the Flat numa option make sure the
cpus don't claim to be apart on a non-existent node.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_PHYSICAL_START set to a non default values the i386
boot_ioremap code calculated its pte index wrong and users of boot_ioremap
have their areas incorrectly mapped (for me SRAT table not mapped during
early boot). This patch removes the addr < BOOT_PTE_PTRS constraint.
[ Keith says this is applicable to 2.6.16 and 2.6.17 as well ]
Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
[CPUFREQ] Fix some more CPU hotplug locking.
[CPUFREQ] Workaround for BIOS bug in software coordination of frequency
[CPUFREQ] Longhaul - Add voltage scaling to driver
[CPUFREQ] Fix sparse warning in ondemand
[CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static
[CPUFREQ] Longhaul - Add ignore_latency option
[CPUFREQ] Longhaul - Disable arbiter
[CPUFREQ][2/2] ondemand: updated add powersave_bias tunable
[CPUFREQ][1/2] ondemand: updated tune for hardware coordination
[CPUFREQ] Fix typo.
sw_any_bug_dmi_table can be used on resume, so it isn't initdata.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Now that the tfm is passed directly to setkey instead of the ctx, we no
longer need to pass the &tfm->crt_flags pointer.
This patch also gets rid of a few unnecessary checks on the key length
for ciphers as the cipher layer guarantees that the key length is within
the bounds specified by the algorithm.
Rather than testing dia_setkey every time, this patch does it only once
during crypto_alloc_tfm. The redundant check from crypto_digest_setkey
is also removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This reverts commits 11012d419c and
40dd2d20f2, which allowed us to use the
MMIO accesses for PCI config cycles even without the area being marked
reserved in the e820 memory tables.
Those changes were needed for EFI-environment Intel macs, but broke some
newer Intel 965 boards, so for now it's better to revert to our old
2.6.17 behaviour and at least avoid introducing any new breakage.
Andi Kleen has a set of patches that work with both EFI and the broken
Intel 965 boards, which will be applied once they get wider testing.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(And reset it on new thread creation)
It turns out that eflags is important to save and restore not just
because of iopl, but due to the magic bits like the NT bit, which we
don't want leaking between different threads.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Take default arch/*/kernel/audit.c to lib/, have those with special
needs (== biarch) define AUDIT_ARCH in their Kconfig.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Some buggy BIOSes do a "software any" kind of coordination without telling
about it to OS. So, when OS sets frequency on one CPU on these platforms,
it will also impact all the other logical CPUs that are in the same power
domain. Attached patch is a workaround for those buggy BIOSes.
Patch should be a noop on the normal non-buggy platforms.
Applies over previously sent acpi-cpufreq and software coordination
bug fix patch
Signed-off-by: Denis Sadykov <denis.m.sadykov@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Rename option "dont_scale_voltage" to "scale_voltage" because
don't will be default.
Use "pos" for calculating voltage. In this way driver don't need
to know mV value or low level value. Simply min U is one pos and
max U is second pos. All pos between these two are used.
Assume that min U is for min f and max U for max f. For frequency
between min and max calculate pos based on difference between
current frequency and min f.
Values in mobile VRM table changed to values from
C3-M datasheet.
Signed-off-by: Rafa³ Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Apparently some systems export valid HPET addresses, but hpet_enable()
fails. Then when the HPET clocksource starts up, it only checks for a
valid HPET address, and the result is a system where time does not advance.
See http://bugme.osdl.org/show_bug.cgi?id=7062 for details.
This patch just makes sure we better check that the HPET is functional
before registering the HPET clocksource.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There was a bogus hunk from the genirq merge that essentially
broke stack switching for hard interrupts. Remove it since it isn't
needed.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As a replacement for the earlier removal of the e820 MCFG check
we blacklist the Intel SDV with the original BIOS bug that
motivated that check. On those machines don't use MMCONFIG.
This also adds a new pci=mmconf parameter to override the blacklist.
Cc: Greg KH <gregkh@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The unwinder fallback logic still had potential for falling through to
the legacy stack trace code without printing an indication (at once
serving as a separator) of this.
Further, the stack pointer retrieval for the fallback should be as
restrictive as possible (in order to avoid having the legacy stack
tracer try to access invalid memory). The patch tightens that, but
this could certainly be further improved.
Also making the call_trace command line option now conditional upon
CONFIG_STACK_UNWIND (as it's meaningless otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One open question: Should this added push perhaps be made conditional
upon CONFIG_STACK_UNWIND or CONFIG_UNWIND_INFO?
[AK: not needed, these are all very slow paths]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The check for the MCFG table being reserved in the e820 map was originally
added to detect a broken BIOS in a preproduction Intel SDV. However it also
breaks the Apple x86 Macs, which can't supply this properly, but need
a working MCFG. With this patch they wouldn't use the MCFG and not work.
After some discussion I think it's best to remove the heuristic again.
It also failed on some other boxes (although it didn't cause much
problems there because old style port access for PCI config space
still works as fallback), but the preproduction SDVs can just use
pci=nommcfg. Supporting production machines properly is more
important.
Edgar Hucek did all the debugging work.
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Edgar Hucek <hostmaster@ed-soft.at>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When we select NUMA with i386, the system is only X86_NUMAQ or using ACPI.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ignore the return value of early_init_acpi(), as it can give false error
messages. If there is something really wrong, then register_driver will
fail cleanly with EINVAL later.
[ background: modprobe acpi-cpufreq on systems not capable of speed-scaling
started failing with 'invalid argument', where previously it would only
ever -ENODEV
I'm not 100% happy with the solution. It'd be better to handle
failure properly, but this is a low-impact change for 2.6.18
We can always revisit doing this better in .19 --davej.]
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On i386 PCI mmconfig forgets the bus number when setting the fallback_slots
bits which means fallback to conf1 only works for bus 0.
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
there was a change in 2.6.17 which affected the order in which the PCI
access methods are probed. this gives regressions on some machines with
broken BIOS. the problem is that PCBIOS sometimes reports last bus wrong,
leaving cardbus non-funcational. previously those system worked fine with
direct access.
The patch changes the PCI init code to have PCBIOS as last fallback, yet
the PCBIOS code still has to run first to set pcibios_last_bus to the value
reported by the BIOS. this is needed in case legacy PCI probing
(arch/i386/pci/legacy.c) is used to detect peer busses. using direct
access if available fixes the cardbus problems.
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ACPI 3.0 appended a variable length UID string to the LAPIC structure
as part of support for > 256 processors. So the BAD_MADT_ENTRY() sanity
check can no longer compare for equality with a fixed structure length.
Signed-off-by: Alexey Y Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>