- Moved check for online cpu out of smp_prepare_cpu()
- Moved default declaration of smp_prepare_cpu() to kernel/cpu.c
- Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since
its called from cpu_up() as well now.
- Removed clearing from cpu_present_map during cpu_offline as it breaks
using cpu_up() directly during a subsequent online operation.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached patch fixes invalid pointer arithmetic in DMI code to make onboard
device discovery working again.
akpm: bug has been present since dmi_find_device() was added in 2.6.14.
Affects ipmi only (I think) - the symptoms weren't described.
akpm: changed to use pointer arithmetic rather than open-coded sizeof.
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Cc: Corey Minyard <minyard@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I screwed up this conversion - we should be iterating across online CPUs, not
possible ones.
Spotted by Joe Perches <joe@perches.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On Tue, 21 Feb 2006, Ivan Kokshaysky wrote:
> There are two bogus entries in the BIOS memory map table which are
> conflicting with a prefetchable memory range of the AGP bridge:
>
> BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>
> 0000:00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000c000-0000cfff
> Memory behind bridge: e7e00000-e7efffff
> Prefetchable memory behind bridge: fec00000-ffcfffff
> ^^^^^^^^^^^^^^^^^
Yes. However, it's pretty clear that the e820 entries are there for a
reason. Probably they are a hack by the BIOS maintainers to keep Windows
from stomping/moving that region, exactly because they want to keep the
bridge where it is (or, it's actually for the BIOS itself - the BIOS
tables are a horrid mess, and BIOS engineers are pretty hacky people:
they'll add random entries to make their own broken algorithms do the
"right thing").
> Starting from 2.6.13, kernel tries to resolve that sort of conflicts,
> so that prefetch window of the bridge and the framebuffer memory behind
> it get moved to 0x10000000.
I think we could (and probably should) solve this another way: consider
the ACPI "reserved regions" from the e820 map exactly the same way that we
do other ACPI hints - they should restrict _new_ allocations, but not
impact stuff we figure out on our own.
Basically, right now we assign _unassigned_ resources at "fs_initcall"
time. If we were to add in the e820 "reserved region" stuff before that
(but after we've done PCI discovery), we'd probably do the right thing.
Right now we do the e820 reserved regions very early indeed: we call
"register_memory()" from setup_arch(). We could move at least part of it
(the part that registers the resources) down a bit.
Here's a test-patch. I'm not saying we should absolutely do this, but it
might be interesting to try...
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: <bjk@luxsci.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate all kernel bug printouts to begin with the "BUG: " string.
Makes it easier to find them in large bootup logs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch from Pavel moves userland freeze signals handling into more logical
place. It now hits even with mysqld running.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't create "online" control file for BSP (i386/x86_64) since its
not removable.
We originally added this to support ppc64 if the kernel has support but
BIOS indicated no offline support, we just didnt create online files for
them.
We used the same method in ia64 as well, if we have a cpu taking platform
interrupts but cannot be removed if those interrupts cannot be re-targeted
to another cpu.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Checking APIC version instead of CPU family to determine XAPIC. Family 6
CPU could have xapic as well.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 has a small bug in the stack dump code where it prints an extra log
level code. Remove that and fix the alignment of normal stack dump
printout. Also remove some unnecessary printk() calls.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With cpu_gdt_descr having been converted to per-CPU data, the old object
(in head.S) no longer needs to reserve space for each CPU's instance. With
cpu_gdt_table not being used for CPU 0 anymore, it doesn't seem to need
page alignment (or if in fact there is a need for it to retain that
alignment, the whole object should go into .data.page_align).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Document a limitation of vsyscall-sysenter, since patches to fix it have
been rejected.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>commit 76381fee7e
>Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
>Date: Thu Jun 23 00:08:46 2005 -0700
>
> [PATCH] xen: x86_64: use more usermode macro
>
> Make use of the user_mode macro where it's possible. This is useful for Xen
> because it will need only to redefine only the macro to a hypervisor call.
I am of the opinion that the above changeset is incomplete, i.e. it missed
converting some previous uses of user_mode to user_mode_vm. While most of
them could be considered just cosmetical, at least the one in die_nmi
doesn't appear to be.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Registering a callback handler through register_die_notifier() is obviously
primarily intended for use by modules. However, the way these currently
get called it is basically impossible for them to actually be used by
modules, as there is, on non-PAE configurationes, a good chance (the larger
the module, the better) for the system to crash as a result.
This is because the callback gets invoked
(a) in the page fault path before the top level page table propagation
gets carried out (hence a fault to propagate the top level page table
entry/entries mapping to module's code/data would nest infinitly) and
(b) in the NMI path, where nested faults must absolutely not happen,
since otherwise the IRET from the nested fault re-enables NMIs,
potentially resulting in nested NMI occurences.
Besides the modular aspect, similar problems would even arise for in-
kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
memory inside their handlers.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The history is that -mm kernels do not work for me for a few months
already. The things started from crashing somewhere after starting init,
and for the last month - no boot at all, just "Uncompressing... OK,
booting kernel", and silence. Early console didn't work too. With the
latest releases this degraded into an infinite stream of the "Unknown
interrupt or fault" messages. So today my patience ran out and I started
to think how can I collect at least some info for the bug-report. Attached
is the patch that allows to gather some valueable debug info on the problem
by making an early console more useable. I can't properly test the patch,
as the kernel still doesn't boot, so I'll explain it in details in a hope
someone else can justify the intrusive changes.
arch_hooks.h: added prototypes for setup_early_printk() and early_printk().
setup.c: killed wrong setup_early_printk() prototype. Moved
setup_early_printk() a bit earlier, as it was not "early enough" to cover
the bug I was fighting with.
early_printk.c: made it to start printing from the bottom of the screen,
otherwise the messages interfere with the ones of the boot-loader, so you
can't read them.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: Andi Kleen <ak@muc.de>
Cc: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow signal handlers to set the RF bit in EFLAGS. This lets a simple
debugger using SIGTRAP skip one instruction after returning from a signal.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's no good reason for allowing ptrace to set the NT bit in EFLAGS, so
mask it off.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Merge a few printk calls in i386 traps.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ES7000 platform code clean up for compilation errors and a warning.
Ifdef'd the ACPI related parts in the ES7000 platform code. They were
causing compile errors in certain configuration (without ACPI defined). I
think this approach would be best (as opposed to Kconfig changes) since it
only touches the subarch...
Signed-off-by: <Natalie.Protasevich@unisys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When vendor-specific i386 initialization code is unavailable the kernel
falls back to a default CPU model name. Make that model name reflect the
CPU family instead of an internal vendor index.
Tested on Pentium II (family 6 model 5).
/proc/cpuinfo before:
model name : ff/05
after:
model name : 06/05
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow the x86 "sep" feature to be disabled at bootup. This forces use of the
int80 vsyscall. Mainly for testing or benchmarking the int80 vsyscall code.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Several places in arch/i386/kernel/cpu and kernel/cpu were using __devinit
when they should have been __cpuinit. Fixing that saves ~4K when
CONFIG_HOTPLUG && !CONFIG_HOTPLUG_CPU.
Noticed by Andrew Morton.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement SMP alternatives, i.e. switching at runtime between different
code versions for UP and SMP. The code can patch both SMP->UP and UP->SMP.
The UP->SMP case is useful for CPU hotplug.
With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
when the number of CPUs goes down to 1, and switches to SMP when the number
of CPUs goes up to 2.
Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
patched once at boot time (if needed) and the tables are released
afterwards.
The changes in detail:
* The current alternatives bits are moved to a separate file,
the SMP alternatives code is added there.
* The patch adds some new elf sections to the kernel:
.smp_altinstructions
like .altinstructions, also contains a list
of alt_instr structs.
.smp_altinstr_replacement
like .altinstr_replacement, but also has some space to
save original instruction before replaving it.
.smp_locks
list of pointers to lock prefixes which can be nop'ed
out on UP.
The first two are used to replace more complex instruction
sequences such as spinlocks and semaphores. It would be possible
to deal with the lock prefixes with that as well, but by handling
them as special case the table sizes become much smaller.
* The sections are page-aligned and padded up to page size, so they
can be free if they are not needed.
* Splitted the code to release init pages to a separate function and
use it to release the elf sections if they are unused.
Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Print stack backtraces in multiple columns, saving screen space. Number of
columns is configurable and defaults to one so behavior is
backwards-compatible.
Also removes the brackets around addresses when printing more
that one entry per line so they print as:
<address>
instead of:
[<address>]
This helps multiple entries fit better on one line.
Original idea by Dave Jones, taken from x86_64.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the inlining of the new vs old mmap system call common code. This
reduces the size of the resulting vmlinux for defconfig as follows:
mb@pc1:~/develop/git/linux-2.6$ size vmlinux.mmap*
text data bss dec hex filename
3303749 521524 186564 4011837 3d373d vmlinux.mmapinline
3303557 521524 186564 4011645 3d367d vmlinux.mmapnoinline
The new sys_mmap2() has also one function call overhead removed, now.
(probably it was already optimized to a jmp before, but anyway...)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When on_each_cpu() runs the callback on other CPUs, it runs with local
interrupts disabled. So we should run the function with local interrupts
disabled on this CPU, too.
And do the same for UP, so the callback is run in the same environment on both
UP and SMP. (strictly it should do preempt_disable() too, but I think
local_irq_disable is sufficiently equivalent).
Also uninlines on_each_cpu(). softirq.c was the most appropriate file I could
find, but it doesn't seem to justify creating a new file.
Oh, and fix up that comment over (under?) x86's smp_call_function(). It
drives me nuts.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This variable is rarely written to. Mark the variable accordingly.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/i386/kernel/efi.c: In function `efi_call_phys_epilog': arch/i386/kernel/efi.c:118: warning: assignment makes integer from pointer without a cast
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
hi,
The motivation behind the patch below was to address messages in
/var/log/messages such as:
Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing
multiple contexts (1)
Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing
multiple contexts (2)
I can reproduce by running 'get-edid' from:
http://john.fremlin.de/programs/linux/read-edid/.
These messages come about in the log b/c the vm86 calls do not exit via
the normal system call exit paths and thus do not call
'audit_syscall_exit'. The next system call will then free the context for
itself and for the vm86 context, thus generating the above messages. This
patch addresses the issue by simply adding a call to 'audit_syscall_exit'
from the vm86 code.
Besides fixing the above error messages the patch also now allows vm86
system calls to become auditable. This is useful since strace does not
appear to properly record the return values from sys_vm86.
I think this patch is also a step in the right direction in terms of
cleaning up some core auditing code. If we can correct any other paths
that do not properly call the audit exit and entries points, then we can
also eliminate the notion of context chaining.
I've tested this patch by verifying that the log messages no longer
appear, and that the audit records for sys_vm86 appear to be correct.
Also, 'read_edid' produces itentical output.
thanks,
-Jason
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.
On x86, CPU0 is never offlined. The subsequent attempt to online CPU0
doesn't take that into account. It actually tries to bootup the already
booted CPU. Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.
Check if cpu is already online.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace
-- BUG: spinlock lockup on CPU#1, bash/4396, c52781a0
[<c01c1870>] _raw_spin_lock+0xb7/0xd2
[<c029e148>] _spin_lock+0x6/0x8
[<c011b33f>] scheduler_tick+0xe7/0x328
[<c0128a7c>] update_process_times+0x51/0x5d
[<c0114592>] smp_apic_timer_interrupt+0x4f/0x58
[<c01141ff>] lapic_shutdown+0x76/0x7e
[<c0104d7c>] apic_timer_interrupt+0x1c/0x30
[<c01141ff>] lapic_shutdown+0x76/0x7e
[<c0116659>] machine_crash_shutdown+0x83/0xaa
[<c013cc36>] crash_kexec+0xc1/0xe3
[<c029e148>] _spin_lock+0x6/0x8
[<c013cc22>] crash_kexec+0xad/0xe3
[<c0215280>] __handle_sysrq+0x84/0xfd
[<c018d937>] write_sysrq_trigger+0x2c/0x35
[<c015e47b>] vfs_write+0xa2/0x13b
[<c015ea73>] sys_write+0x3b/0x64
[<c0103c69>] syscall_call+0x7/0xb
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current pcspkr code combines the device and driver registration.
This patch splits these, putting the device registration in the arch
specific code.
PowerPC and MIPS only have the pcspkr present sometimes.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This seems to work for a short period of time, but when
used in conjunction with a userspace governor that changes
the frequency regularly, it's only a matter of time before
everything just locks up.
Signed-off-by: Dave Jones <davej@redhat.com>
Fix the code to disable freqs less than 2GHz in N60 errata.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
ATI chipsets tend to generate double timer interrupts for the local APIC
timer when both the 8254 and the IO-APIC timer pins are enabled. This is
because they route it to both and the result is anded together and the CPU
ends up processing it twice.
This patch changes check_timer to disable the 8254 routing for interrupt 0.
I think it would be safe on all chipsets actually (i tested it on a couple
and it worked everywhere) and Windows seems to do it in a similar way, but
to be conservative this patch only enables this mode on ATI (and adds
options to enable/disable too)
Ported over from a similar x86-64 change.
I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
tweaked it a bit to work even without ACPI.
Inspired by a patch from Chuck Ebbert, but redone.
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While testing kexec and kdump we hit problems where the new kernel would
freeze or instantly reboot. The easiest way to trigger it was to kexec a
kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7
instead would work fine.
The patch fixes a few problems with the kexec inline asm.
Signed-off-by: Chris Mason <mason@suse.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The x86_model calculation also applies for family 6. early_cpu_detect
does the right thing, but generic_identify misses.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix i386 nmi_watchdog that does not meet watchdog timeout condition. It
does not hit die_nmi when it should be triggered, because the current
nmi_watchdog_tick in arch/i386/kernel/nmi.c never count up alert_counter
like this:
void nmi_watchdog_tick (struct pt_regs * regs) {
if (last_irq_sums[cpu] == sum) {
alert_counter[cpu]++; <- count up alert_counter, but
if (alert_counter[cpu] == 5*nmi_hz)
die_nmi(regs, "NMI Watchdog detected LOCKUP");
alert_counter[cpu] = 0; <- reset alert_counter
This patch changes it back to the previous and working version.
This was found and originally written by Kohta NAKASHIMA.
(akpm: also uninline write_watchdog_counter(), saving 184 byets)
Signed-off-by: GOTO Masanori <gotom@sanori.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the kernel bootable again on ia32 EFI systems.
Signed-off-by: Edgar Hucek <hostmaster@ed-soft.at>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 timer_resume is updating jiffies, not jiffies_64. It looks there is a
potential overflow problem. And jiffies_64 and wall_jiffies should be
protected by xtime_lock.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
powernow-k8: Let cpufreq driver handle affected CPUs
Let the cpufreq driver manage AMD Dual-Core CPUs being tied together.
Since cpufreq driver's affected CPUs data, cpufreq_policy->cpus, already
knows about which cores are tied together, powernow driver does not have
keep its internal data for every core. (even a pointer.. it will never
be called on) Telling cpufreq driver about cpu_core_map at init time is
sufficient.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This driver loops over 'num_online_cpus', but it doesn't account for holes
in the online map created by offlined cpus, and assumes that the cpu
numbers stay linear.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 13a229abc2.
Quoth Andi:
"After some consideration and feedback from various people it turns
out this wasn't that good an idea. It has some problems and needs
more work. Since it was only an optimization anyways it's best to
just back it out again for now."
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[description from AK]
This fixes booting in APIC mode on some ACER laptops. x86-64
did a similar change some time ago.
See http://bugzilla.kernel.org/show_bug.cgi?id=4700 for details
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Big Unisys systems have multiple clusters too, but they have an
synchronized TSC.
I'm using the SMBIOS to check for vendor == IBM.
Cc: Chris McDermott <lcm@us.ibm.com>
Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When compiling a non-default subarch, topology.c is missing from the kernel
build. This causes builds with CONFIG_HOTPLUG_CPU to fail. In addition,
on Intel processors with cpuid level > 4, it causes intel_cacheinfo.c to
reference uninitialized data that should have been set up by the initcall
in topology.c which calls register_cpu. This causes a kernel panic on boot
on newer Intel processors. Moving topology.c to arch/i386/kernel fixes
both of these problems.
Thanks to Dan Hecht for finding and fixing this problem.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Dan Hecht <dhect@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent GDT changes broke the SMP boot sequence if the booting CPU is
numbered anything other than zero. There's also a subtle source of error
in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
for booting CPUs in head.S). This patch fixes both problems by making GDT
descriptors themselves allocated from a per_cpu area and switching to them
in cpu_init(), which now means that cpu_gdt_table is exclusively used for
booting CPUs again.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a problem seen on i686 machine with NX support where the instruction
could not be single stepped because of NX bit set on the memory pages
allocated by kprobes module. This patch provides allocation of instruction
solt so that the processor can execute the instruction from that location
similar to x86_64 architecture. Thanks to Bibo and Masami for testing this
patch.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I'm seeing a kernel panic on an ES7000-600 when booting in virtual wire
mode. The panic happens because smp_read_mpc() is passed a physical
address, and it should be virtual. I tested the attached patch on the
ES7000-600 and on a 2 cpu Dell box, and saw no problems on either.
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This matches the fix for a bug seen on x86-64. Test booted on old hardware
that had 32 byte cachelines to begin with.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.
Also I moved the variable to a more appropiate place and make
it independent from sysctl
And marked __read_mostly which it is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some more gitignore files for i386 architecture. This files are
created during the build process of a i386 kernel.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This path isn't obvious. It looks as if the kernel will be taking three
args from the user stack, but it only takes one from there.
Signed-off-by: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lost a few hours debugging an early-bootup fault within printk itself,
which manifested itself as a hard to debug early hang.
This patch makes it much easier by printing out early faults via
early_printk(), which function is a lot simpler than a full printk, and
hence more likely to succeed in emergencies. (We do not recover from early
faults anyway, so there's no loss from not having these messages in the
normal printk buffer.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The *at patches introduced fstatat and, due to inusfficient research, I
used the newfstat functions generally as the guideline. The result is that
on 32-bit platforms we don't have all the information needed to implement
fstatat64.
This patch modifies the code to pass up 64-bit information if
__ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make
this clear. Other archs will continue to use the existing code. On x86-64
the compat code is implemented using a new sys32_ function. this is what
is done for the other stat syscalls as well.
This patch might break some other archs (those which define
__ARCH_WANT_STAT64 and which already wired up the syscall). Yet others
might need changes to accomodate the compatibility mode. I really don't
want to do that work because all this stat handling is a mess (more so in
glibc, but the kernel is also affected). It should be done by the arch
maintainers. I'll provide some stand-alone test shortly. Those who are
eager could compile glibc and run 'make check' (no installation needed).
The patch below has been tested on x86 and x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that
a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather
than over possible ones. That can be quite expensive.
b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with
CPU_MASK_ALL, we'll be wasting memory.
I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug
case. Should be OK..
I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing
that.
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Registers system call for the i386 architecture.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix wrong '!' in bad apic fix
I forgot to remove the ! when moving the code from x86-64 to i386 x86-64
tested !disable_apic, but of course for cpu_has_apic it shouldn't be
negated.
Credit goes to Jan Beulich for spotting it with eagle eyes.
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Show first field of kernel version in register dumps like x86_64 does.
Changes output from e.g.:
(2.6.16-rc1)
to:
(2.6.16-rc1 #12)
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 CPU init code accesses freed init memory when booting a newly-started
processor after CPU hotplug. The cpu_devs array is searched to find the
vendor and it contains pointers to freed data.
Fix that by:
1. Zeroing entries for freed vendor data after bootup.
2. Changing Transmeta, NSC and UMC to all __init[data].
3. Printing a warning (once only) and setting this_cpu
to a safe default when the vendor is not found.
This does not change behavior for AMD systems. They were broken already
but no error was reported.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
dump_stack() on page allocation failure presently has an irritating habit
of shouting just "====" at everyone: please stop it.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.
As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().
(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's bad juju to touch the APIC when it hasn't been enabled.
I also moved ack_bad_irq for x86-64 out of line following i386.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some broken BIOS's had processors disabled, but
same apic id as a valid processor. This causes
acpi_processor_start() to think this disabled
cpu is ok, and croak. So we dont record bad
apicid's anymore.
http://bugzilla.kernel.org/show_bug.cgi?id=5930
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Test for old_freq equals 0 to insure not to divide by 0:
______________________________________________
Check for not initialized freq on cpufreq changes
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Avoid lost tick compensation early in boot before the TSCs are
synchronized. Currently timekeeping is enabled before the TSCs are
synchronized, thus when the TSCs are synched (reset to zero), it appears
that a number of lost ticks have occurred. This can cause premature expiry
of timers and in extreme cases can cause the soft lockup detection to fire.
This resolves issues reported by Andy Whitcroft as well as bug #5366
reported by Tim Mann.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ignore clock frequencies below 2Ghz for CPU's detected with N60 errata bug.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch fixes the following compile error:
...
CC arch/i386/kernel/cpu/cpufreq/gx-suspmod.o
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c: In function 'gx_detect_chipset':
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c:193: error: implicit declaration of function 'pci_match_id'
arch/i386/kernel/cpu/cpufreq/gx-suspmod.c:193: warning: comparison between pointer and integer
make[3]: *** [arch/i386/kernel/cpu/cpufreq/gx-suspmod.o] Error 1
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.
The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.
From: doug thompson <norsk5@xmission.com>
This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the sys_pselect6() and sys_poll() calls to the i386 syscall table.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:
[PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
[PATCH] 3/3 Generic sys_rt_sigsuspend
It does the following:
(1) Declares TIF_RESTORE_SIGMASK for i386.
(2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.
(3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
in current->saved_sigmask.
(4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.
(5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
rather than attempting to fudge the return registers.
(6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
intrinsically.
(7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
-EFAULT rather than true/false to be consistent with the rest of the
kernel.
Due to the fact do_signal() is then only called from one place:
(8) Makes do_signal() no longer have a return value is it was just being
ignored; force_sig() takes care of this.
(9) Discards the old sigmask argument to do_signal() as it's no longer
necessary.
(10) Makes do_signal() static.
(11) Marks the second argument to do_notify_resume() as unused. The unused
argument should remain in the middle as the arguments are passed in as
registers, and the ordering is specific in entry.S
Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(),
they no longer need access to the exception frame, and so can just take
arguments normally.
This patch depends on sys_rt_sigsuspend patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Wire up the x86 syscalls
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpufreq init can be called when a CPU is set online.
Need to make powernow-k8's initialisation functions __cpuinit to
prevents oopses when a CPU is off/onlined on a AMD system
Cc: trenn@suse.de
Cc: mark.langsdorf@amd.com
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recent changes caused part of stack traces from SysRq-T to print at
KERN_EMERG loglevel. Also, parts of stack dump during oops were failing to
print at that level when they should.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I tried to send the forcedeth maintainer an email, but it came back with:
"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."
This patch fixes this.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
)
From: Al Viro <viro@ftp.linux.org.uk>
task_pt_regs() needs the same offset-by-8 to match copy_thread()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Ingo Molnar <mingo@elte.hu>
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:
- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.
[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]
Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:
- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.
This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.
(The reliable measurement of migration costs is tricky - see the source
for details.)
Furthermore i've added various boot-time options to override/tune
migration behavior.
Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:
migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.
I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:
Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------
Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.
Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------
Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------
This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The explicit and implicit calls to setup_early_printk() were passing
inconsistent arguments.
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
early_cpu_detect only runs on the BP, but this code needs to run
on all CPUs.
Looks like a mismerge somewhere. Also add a warning comment.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>