The "Type 2" SMBIOS record that contains Board Name is not
strictly required and may be absent in the SMBIOS on some
platforms.
( Please note that Type 2 is not listed in Table 3 in Sec 6.2
("Required Structures and Data") of the SMBIOS v2.7
Specification. )
Use the Manufacturer Name (aka System Vendor) name.
Print Board Name only when it is present.
Before the fix:
(i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6
After the fix:
(i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mp_find_ioapic() prints errors like:
ERROR: Unable to locate IOAPIC for GSI 13
if it can't find the IOAPIC that manages that specific GSI. I
see errors like that at every boot of a laptop that apparently
doesn't have any IOAPICs.
But if there are no IOAPICs it doesn't seem to be an error that
none can be found. A solution that gets rid of this message is
to directly return if nr_ioapics (still) is zero. (But keep
returning -1 in that case, so nothing breaks from this change.)
The call chain that generates this error is:
pnpacpi_allocated_resource()
case ACPI_RESOURCE_TYPE_IRQ:
pnpacpi_parse_allocated_irqresource()
acpi_get_override_irq()
mp_find_ioapic()
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit d518573de6 ("x86, amd: Normalize compute unit IDs on
multi-node processors") introduced compute unit normalization
but causes a compiler warning:
arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp':
arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function
arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here
The compiler is right - initialize it with a proper value.
Also, fix up a comment while at it.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some wall clock devices use MMIO based HW register, this new
function will give them a chance to do some initialization work
before their get/set_time service get called, which is usually
in early kernel boot phase.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
One of the error printouts in generic_processor_info() prints out
the APIC version instead of the cpu index the warning text describes.
Move version validation down, after we get the right cpu index.
-v2: add comments about reason why we can have cpu=0 there.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D5240A9.4080703@kernel.org>
[ Cleaned up and made the BIOS bug printouts more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Emit warning when "mem=nopentium" is specified on any arch other
than x86_32 (the only that arch supports it).
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Avoid removing all of memory and panicing when "mem={invalid}"
is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on
platforms other than x86_32).
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/553464
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> # .3x: as far back as it applies
LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use it in non __cpuinit code now too so drop marker.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20110211171754.GA21047@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/mm/numa_64.c
Merge reason: fix the conflict, update to latest -rc and pick up this
dependent fix from Yinghai:
e6d2e2b2b1: memblock: don't adjust size in memblock_find_base()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit a3c08e5d(x86: Convert irq_chip access to new functions)
accidentally zapped desc = irq_to_desc(irq); in the vector loop.
So we lock some random irq descriptor.
Add it back.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org> # .37
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs()
can be moved into .cpuinit.text.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4D525DDD0200007800030F07@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just consolidating the common parts. Full unification would seem
straight forward, but it's not clear the necessary #ifdef-s would
be acceptable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D525D520200007800030EE9@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This complements commit:
47f19a0814: percpu: Remove the multi-page alignment facility
reverting one leftover of:
fe8e0c25ca: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4D525CE60200007800030EE5@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Additionally doing things conditionally upon smp_processor_id()
being zero is generally a bad idea, as this means CPU 0 cannot
be offlined and brought back online later again.
While there may be other places where this is done, I think adding
more of those should be avoided so that some day SMP can really
become "symmetrical".
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <4D525C7E0200007800030EE1@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The different families have a different max size for the ucode patch,
adjust size checking to the family we're running on. Also, do not
vzalloc the max size of the ucode but only the actual size that is
passed on from the firmware loader.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Unify pr_* to use pr_fmt, shorten messages, correct type formatting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
collect_cpu_info_amd() clears its csig arg but this is done in the
microcode_core's collect_cpu_info() by clearing the embedding struct
ucode_cpu_info. Drop it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Do not copy the section header but look at it directly through the
pointer. Also, make it return a ptr to a ucode header directly
thus dropping a bunch of unneeded casts. Finally, simplify
generic_load_microcode(), while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
There's no need to memcpy the ucode header in order to look at it only
in this function - use the original buffer instead. Also, fix return
type semantics by returning a negative value on error and a positive
otherwise.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
When the ucode magic is wrong, for whatever reason, we don't release the
loaded firmware binary and its related resources. Make sure we do. Also,
fix function naming to fit this driver's convention and shorten variable
names.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
When we encounter an error while initting the microcode driver on a CPU,
we must undo the previously added sysfs group.
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Andreas Herrmann <Andreas.Herrmann3@amd.com>
L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used
for evictions by the L2 cache of each compute unit. By writing a 4-bit
hexadecimal mask into the the sysfs file
/sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the
enabled subcaches for a CPU.
The settings are directly read from and written to the hardware, so there is no
way to have contradicting settings for two CPUs belonging to the same compute
unit. Writing will always overwrite any previous setting for a compute unit.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <Andreas.Herrmann3@amd.com>
LKML-Reference: <1297098639-431383-1-git-send-email-hans.rosenfeld@amd.com>
[ -v3: minor style fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We reserve lowmem for the things that need it, like the ACPI
wakeup code, way early to guarantee availability. This happens
before we set up the proper pagetables, so set_memory_x() has no
effect.
Until we have a better solution, use an initcall to mark the
wakeup code executable.
Originally-by: Matthieu Castet <castet.matthieu@free.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthias Hopf <mhopf@suse.de>
Cc: rjw@sisk.pl
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D4F8019.2090104@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure the stack is set up before we use it
x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
x86, nx: Don't force pages RW when setting NX bits
Since checkin ebba638ae7 we call
verify_cpu even in 32-bit mode. Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code! This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.
This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...
Reported-by: Matthieu Castet <castet.matthieu@free.fr>
LKML-Reference: <4D41E86D.8060205@free.fr>
Tested-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.
commit d0af9eed5a
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date: Wed Aug 19 18:05:36 2009 -0700
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()
Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.
BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.
However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.
During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.
We didn't see this boot hang issue on this platform before the
commit d0af9eed5a, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.
Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.
Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
[ By the way, this behavior of the bios modifying MTRR's after the start
of the OS boot is not common and the kernel is not prepared to
handle this situation well. Irrespective of this issue, during
suspend/resume, linux kernel will try to reprogram the BP's MTRR values
to the values seen during the start of the OS boot. So suspend/resume might
be already broken on this platform for all linux kernel versions. ]
Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the clock_adjtime system call to the x86 architecture.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20110201134419.968905083@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early
percpu variable) and following changes introduced and used
x86_cpu_to_logical_apicid percpu variable. It was declared and
defined inside CONFIG_SMP && CONFIG_X86_32 but if
CONFIG_X86_UP_APIC is set UP configuration makes use of it and
build fails.
Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC
&& CONFIG_X86_32.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <20110128162248.GA25746@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that everything else is unified, NUMA initialization can be
unified too.
* numa_init_array() and init_cpu_to_node() are moved from
numa_64 to numa.
* numa_32::initmem_init() is updated to call numa_init_array()
and setup_arch() to call init_cpu_to_node() on 32bit too.
* x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on
32bit too. This is safe now as numa_init_array() will initialize
it early during boot.
This makes NUMA mapping fully initialized before
setup_per_cpu_areas() on 32bit too and thus makes the first
percpu chunk which contains all the static variables and some of
dynamic area allocated with NUMA affinity correctly considered.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-17-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
x86_32 has been managing node_to_cpumask_map explicitly from
map_cpu_to_node() and friends in a rather ugly way. With
previous changes, it's now possible to share the code with
64bit.
* When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are
implemented in numa.c and shared by 32 and 64bit. CONFIG_NUMA_EMU
versions still live in numa_64.c.
NUMA_EMU's dependency on 64bit is planned to be removed and the
above should go away together.
* identify_cpu() now calls numa_add_cpu() for 32bit too. This
makes the explicit mask management from map_cpu_to_node() unnecessary.
* The whole x86_32 specific map_cpu_to_node() chunk is no longer
necessary. Dropped.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
CPU -> NUMA node mapping. Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.
* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
* x86_cpu_to_node_map and numa_set/clear_node() are moved from
numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized
with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected
behavior change and will be updated once init path is unified.
* srat_detect_node() is now enabled for x86_32 too. It calls
numa_set_node() and initializes the mapping making explicit
cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
The mapping between cpu/apicid and node is done via
apicid_to_node[] on 64bit and apicid_2_node[] +
apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
difficult to further unify 32 and 64bit NUMA handling.
This patch unifies it by replacing both apicid_to_node[] and
apicid_2_node[] with __apicid_to_node[] array, which is accessed
by two accessors - set_apicid_to_node() and numa_cpu_node(). On
64bit, numa_cpu_node() always consults __apicid_to_node[]
directly while 32bit goes through apic->numa_cpu_node() method
to allow apic implementations to override it.
srat_detect_node() for amd cpus contains workaround for broken
NUMA configuration which assumes relationship between APIC ID,
HT node ID and NUMA topology. Leave it to access
__apicid_to_node[] directly as mapping through CPU might result
in undesirable behavior change. The comment is reformatted and
updated to note the ugliness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
apic->apicid_to_node() is 32bit specific apic operation which
determines NUMA node for a CPU. Depending on the APIC
implementation, it can be easier to determine NUMA node from
either physical or logical apicid. Currently,
->apicid_to_node() takes @logical_apicid and calls
hard_smp_processor_id() if the physical apicid is needed.
This prevents NUMA mapping from being queried from a different
CPU, which in turn makes it impossible to initialize NUMA
mapping before SMP bringup.
This patch replaces apic->apicid_to_node() with
->x86_32_numa_cpu_node() which takes @cpu, from which both
logical and physical apicids can easily be determined. While at
it, drop duplicate implementations from bigsmp_32 and summit_32,
and use the default one.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-13-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86_32, the mapping between cpu and logical apic ID differs
depending on the specific apic implementation in use. The
mapping is initialized while bringing up CPUs; however, this
makes early inits ignore memory topology.
Add a x86_32 specific apic->x86_32_early_logical_apicid() which
is called early during boot to query the mapping. The mapping
is later verified against the result of init_apic_ldr(). The
method is allowed to return BAD_APICID if it can't be determined
early.
noop variant which always returns BAD_APICID is implemented and
added to all x86_32 apic implementations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-8-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After the previous patch, apic->cpu_to_logical_apicid() is no
longer used. Kill it.
For apic types with custom cpu_to_logical_apicid() which is also
used for other purposes, remove the function and modify its
users to do the mapping directly.
#ifdef's on CONFIG_SMP in es7000_32 and summit_32 are ignored
during conversion as they are not used for UP kernels.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-7-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, cpu -> logical apic id translation is done by
apic->cpu_to_logical_apicid() callback which may or may not use
x86_cpu_to_logical_apicid. This is unnecessary as it should
always equal logical_smp_processor_id() which is known early
during CPU bring up.
Initialize x86_cpu_to_logical_apicid after apic->init_apic_ldr()
in setup_local_APIC() and always use x86_cpu_to_logical_apicid
for cpu -> logical apic id mapping.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-6-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unlike x86_64, on x86_32, the mapping from cpu to logical apicid
may vary depending on apic in use. cpu_2_logical_apicid[] array
is used for this mapping. Replace it with early percpu variable
x86_cpu_to_logical_apicid to make it better aligned with other
mappings.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: yinghai@kernel.org
Cc: brgerst@gmail.com
Cc: gorcunov@gmail.com
Cc: penberg@kernel.org
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-5-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Both functions are used only in 32bit. Put them inside
CONFIG_X86_32. This is to prepare for logical apicid handling
update.
- Cyrill Gorcunov spotted that I forgot to move declarations in
ipi.h under CONFIG_X86_32. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: eric.dumazet@gmail.com
Cc: brgerst@gmail.com
Cc: shaohui.zheng@intel.com
Cc: rientjes@google.com
LKML-Reference: <1295789862-25482-4-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
init_hw_perf_events() is called via early_initcall now.
x86_pmu_event_init is x86_pmu member function.
So we can change them to static.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <4D3A16F9.109@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes some issues with raw event validation on
Pentium 4 (Netburst) based processors.
As I was testing libpfm4 Netburst support, I ran into two
problems in the p4_validate_raw_event() function:
- the shared field must be checked ONLY when HT is on
- the binding to ESCR register was missing
The second item was causing raw events to not be encoded
correctly compared to generic PMU events.
With this patch, I can now pass Netburst events to libpfm4
examples and get meaningful results:
$ task -e global_power_events🏃u noploop 1
noploop for 1 seconds
3,206,304,898 global_power_events:running
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
Cc: acme@redhat.com
Cc: gorcunov@gmail.com
Cc: ming.m.lin@intel.com
LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpu_info is already with per_cpu, We can take llc_shared_map out
of cpu_info, and declare it as per_cpu variable directly.
So later referencing could be simple and directly instead of
diving to find cpu_info at first.
Also could make smp_store_cpu_info() much simple to avoid to do
save and restore trick.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Alok N Kataria <akataria@vmware.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hans J. Koch <hjk@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D3A16E8.5020608@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"Link Control" devices (NB function 4) will be used by L3 cache
partitioning on family 0x15.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-4-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
AMD family 0x15 CPUs support L3 cache index disable, so enable
it on them.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-3-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On multi-node CPUs we don't need the socket wide compute unit ID
but the node-wide compute unit ID. Thus we need to normalize the
value. This is similar to what we do with cpu_core_id.
A compute unit is then identified by physical_package_id,
node_id, and compute_unit_id.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <1295881543-572552-2-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
memmove_64.c only implements memmove() function which is completely written in
inline assembly code. Therefore it doesn't make sense to keep the assembly code
in .c file.
Currently memmove() doesn't store return value to rax. This may cause issue if
caller uses the return value. The patch fixes this issue.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1295314755-6625-1-git-send-email-fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.
This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.
This is based on Shaohua's x86 only patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>
In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:
...
if (!stack) {
unsigned long dummy;
stack = &dummy;
if (task && task != current)
stack = (unsigned long *)task->thread.sp;
}
bp = stack_frame(task, regs);
/*
* Print function call entries in all stacks, starting at the
* current stack address. If the stacks consist of nested
* exceptions
*/
tinfo = task_thread_info(task);
for (;;) {
char *id;
unsigned long *estack_end;
estack_end = in_exception_stack(cpu, (unsigned long)stack,
&used, &id);
...
You'll notice that we assign to 'stack' the address of the variable
'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
access stack (at the end of the above, and assuming we did not take the
'if (task && task != current)' branch) we'll be using the address of a
variable that is no longer in scope. I believe this patch is the proper
fix, but I freely admit that I'm not 100% certain.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
ea53069231 made a CPU use monitor/mwait
when offline. This is not the optimal choice for AMD wrt to powersavings
and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
same selection whether to use monitor/mwait has to be used as when we
select the idle routine for the machine.
With this patch, offlining cores 1-5 on a X6 machine allows core0 to
boost again.
[ hpa: putting this in urgent since it is a (power) regression fix ]
Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org # 37.x
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In therm_throt.c, commit
9e76a97efd patch doesn't export
the symbol platform_thermal_notify.
Other drivers (e.g. drivers/hwmon/coretemp.c) can not find the
symbol platform_thermal_notify when defining threshould
interrupt handler.
Please apply this patch to allow threshold interrupt handler in
coretemp.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: R Durgadoss <durgadoss.r@intel.com>
Cc: khali@linux-fr.org <khali@linux-fr.org>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
LKML-Reference: <20110121041239.GB26954@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Update to latest definitions in:
http://www.intel.com/Assets/PDF/appnote/241618.pdf
[ Note, this update of the doc has removed some old values which
we have listed. I think until we have clarification that they
were never used in production, they should be left there. ]
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <20110120012055.GA15985@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 86b1e8dd83 ("x86: Make relocatable kernel work with
new binutils").
Markus Trippelsdorf reported a boot failure caused by this patch.
The real solution to the original patch will likely involve an
arch-generic solution to define an overlaid jiffies_64 and jiffies
variables.
Until that's done and tested on all architectures revert this commit to
solve the regression.
Reported-and-bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <4D36A759.60704@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Clear irqstack thread_info
x86: Make relocatable kernel work with new binutils
Mathias Merz reported that v2.6.37 failed to boot on his
system.
Make sure that the thread_info part of the irqstack is
initialized to zeroes.
Reported-and-Tested-by: Matthias Merz <linux@merz-ka.de>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <AANLkTimyKXfJ1x8tgwrr1hYnNLrPfgE1NTe4z7L6tUDm@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
boot panic.
According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
using such binutils. See:
http://sourceware.org/bugzilla/show_bug.cgi?id=12327
The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
warning saying jiffies is an absolute address and can't be relocatable. At
runtime, jiffies will have virtual address 0.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Lu Hongjiu<hongjiu.lu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: avoid pointless blocked-task warnings
rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, olpc: Add missing Kconfig dependencies
x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timekeeping: Make local variables static
time: Rename misnamed minsec argument of clocks_calc_mult_shift()
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Remove syscall_exit_fields
tracing: Only process module tracepoints once
perf record: Add "nodelay" mode, disabled by default
perf sched: Fix list of events, dropping unsupported ':r' modifier
Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
perf top: Fix annotate segv
perf evsel: Fix order of event list deletion
Offlining the secondary CPU causes the timer irq affinity to be set to
CPU 0. When the secondary CPU is back online again, the wrong irq
affinity will be used.
This patch ensures secondary per CPU timer always has the correct
IRQ affinity when enabled.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1294963604-18111-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org> 2.6.37
Konrad Wilk reported that the new delayed calibration crashes with a
divide by zero on Xen. The reason is that Xen sets the pmtimer
address, but reading from it returns 0xffffff. That results in the
ref_start and ref_stop value being the same, so the delta is zero
which causes the divide by zero later in the calculation.
The conditional (!hpet && !ref_start && !ref_stop) which sanity checks
the calibration reference values doesn't really make sense. If the
refs are null, but hpet is on, we still want to break out.
The div by zero would be possible to trigger by chance if both reads
from the hardware provided the exact same value (due to hardware
wrapping).
So checking if both the ref values are the same should handle if we
don't have hardware (both null) or if they are the same value (either by
invalid hardware, or by chance), avoiding the div by zero issue.
[ tglx: Applied the same fix to native_calibrate_tsc() where this
check was copied from ]
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
ACPI: fix resource check message
ACPI / Battery: Update information on info notification and resume
ACPI: Drop device flag wake_capable
ACPI: Always check if _PRW is present before trying to evaluate it
ACPI / PM: Check status of power resources under mutexes
ACPI / PM: Rename acpi_power_off_device()
ACPI / PM: Drop acpi_power_nocheck
ACPI / PM: Drop acpi_bus_get_power()
Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
ACPI / Fan: Rework the handling of power resources
ACPI / PM: Register power resource devices as soon as they are needed
ACPI / PM: Register acpi_power_driver early
ACPI / PM: Add function for updating device power state consistently
ACPI / PM: Add function for device power state initialization
ACPI / PM: Introduce __acpi_bus_get_power()
ACPI / PM: Introduce function for refcounting device power resources
ACPI / PM: Add functions for manipulating lists of power resources
ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
ACPICA: Update version to 20101209
...
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
intel_idle: open broadcast clock event
cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific
cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions
SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW
cpuidle: delete NOP CPUIDLE_FLAG_POLL
ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs
cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL
ACPI, intel_idle: Cleanup idle= internal variables
cpuidle: Make cpuidle_enable_device() call poll_idle_init()
intel_idle: update Sandy Bridge core C-state residency targets
split_huge_page_pmd compat code. Each one of those would need to be
expanded to hundred of lines of complex code without a fully reliable
split_huge_page_pmd design.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pte alloc routines must wait for split_huge_page if the pmd is not present
and not null (i.e. pmd_trans_splitting). The additional branches are
optimized away at compile time by pmd_trans_splitting if the config option
is off. However we must pass the vma down in order to know the anon_vma
lock to wait for.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be
necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
pmd_update_defer), but this is to keep full simmetry with pte paravirt
ops, which looks cleaner and simpler from a common code POV.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init(). Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().
__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.
Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, olpc: Speed up device tree creation during boot
x86, olpc: Add OLPC device-tree support
x86, of: Define irq functions to allow drivers/of/* to build on x86
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: Initialize fpu state in preemptible context
KVM: VMX: when entering real mode align segment base to 16 bytes
KVM: MMU: handle 'map_writable' in set_spte() function
KVM: MMU: audit: allow audit more guests at the same time
KVM: Fetch guest cr3 from hardware on demand
KVM: Replace reads of vcpu->arch.cr3 by an accessor
KVM: MMU: only write protect mappings at pagetable level
KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
KVM: MMU: Initialize base_role for tdp mmus
KVM: VMX: Optimize atomic EFER load
KVM: VMX: Add definitions for more vm entry/exit control bits
KVM: SVM: copy instruction bytes from VMCB
KVM: SVM: implement enhanced INVLPG intercept
KVM: SVM: enhance mov DR intercept handler
KVM: SVM: enhance MOV CR intercept handler
KVM: SVM: add new SVM feature bit names
KVM: cleanup emulate_instruction
KVM: move complete_insn_gp() into x86.c
KVM: x86: fix CR8 handling
KVM guest: Fix kvm clock initialization when it's configured out
...
Occasionally the system gets into a state where the CMOS clock has gotten
slightly ahead of current time and the periodic update of RTC fails. The
message is a nuisance and repeats spamming the log.
See: http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#Q-LINUX-SET-RTC-MMSS
Rather than just removing the message, make it show only once and reduce
severity since it indicates a normal and non urgent condition.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.
It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
- arch/arm/mach-at91/cpuidle.c
- arch/arm/mach-davinci/cpuidle.c
- arch/arm/mach-kirkwood/cpuidle.c
- arch/arm/mach-omap2/cpuidle34xx.c
- arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
- arch/x86/kernel/process.c (did throw events before, but was a mess)
- drivers/idle/intel_idle.c (did throw events before)
Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.
This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Having four variables for the same thing:
idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.
if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides
Introduces following functional changes/fixes:
- intel_idle driver does not register if any idle=xy
boot param is passed.
- processor_idle.c will also not register a cpuidle driver
and get active if idle=halt is passed.
Before a cpuidle driver with one (C1, halt) state got registered
Now the default_idle function will be used which finally uses
the same idle call to enter sleep state (safe_halt()), but
without registering a whole cpuidle driver.
That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.
Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context. Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.
KVM-Stable-Tag.
Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Enable async PF in a guest if async PF capability is discovered.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
This patch adds POLL/IRQ/NMI notification types support.
Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.
Known issue:
- Error information can not be printed for recoverable errors notified
via NMI, because printk is not NMI-safe. Will fix this via delay
printing to IRQ context via irq_work or make printk NMI-safe.
v2:
- adjust printk format per comments.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix Moorestown VRTC fixmap placement
x86/gpio: Implement x86 gpio_to_irq convert function
x86, UV: Fix APICID shift for Westmere processors
x86: Use PCI method for enabling AMD extended config space before MSR method
x86: tsc: Prevent delayed init if initial tsc calibration failed
x86, lapic-timer: Increase the max_delta to 31 bits
x86: Fix sparse non-ANSI function warnings in smpboot.c
x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation
x86, AMD, PCI: Add AMD northbridge PCI device id for CPU families 12h and 14h
x86, numa: Fix cpu to node mapping for sparse node ids
x86, numa: Fake node-to-cpumask for NUMA emulation
x86, numa: Fake apicid and pxm mappings for NUMA emulation
x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU
x86, numa: Reduce minimum fake node size to 32M
Fix up trivial conflict in arch/x86/kernel/apic/x2apic_uv_x.c
Westmere processors use a different algorithm for
assigning APICIDs on SGI UV systems. The location of the
node number within the apicid is now a function of the
processor type.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20110110195210.GA18737@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While both methods should work equivalently well for the native
case, the Xen Dom0 case can't reliably work with the MSR one,
since there's no guarantee that the virtual CPUs it has
available fully cover all necessary physical ones.
As per the suggestion of Robert Richter the patch only adds the
PCI method, but leaves the MSR one as a fallback to cover new
systems the PCI IDs of which may not have got added to the code
base yet.
The only change in v2 is the breaking out of the new CPI
initialization method into a separate function, as requested by
Ingo.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Robert Richter <robert.richter@amd.com>
Cc: Andreas Herrmann3 <Andreas.Herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
LKML-Reference: <4D2B3FD7020000780002B67D@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit a8760ec (x86: Check tsc available/disabled in the delayed init
function) missed to prevent the setup of the delayed init function in
case the initial tsc calibration failed. This results in the same
divide by zero bug as we have seen without the tsc disabled check.
Skip the delayed work setup when tsc_khz (the initial calibration
value) is 0.
Bisected-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: HVM X2APIC support
apic: Move hypervisor detection of x2apic to hypervisor.h
Latest atom socs(penwell) does not have hpet timer.
As their local APIC timer is clocked at 400KHZ, and the current
code limit their Initial Counter register to 23 bits, they
cannot sleep more than 1.34 seconds which leads to ~2 spurious
wakeup per second (1 per thread)
These SOCs support 32bit timer so we change the max_delta to at
least 31bits. So we can at least sleep for 300 seconds.
We could not find any previous chip errata where lapic would
only have 23 bit precision As powertop is suggesting to activate
HPET to "sleep longer", this could mean this problem is already
known.
Problem is here since very first implementation of lapic timer
as a clock event e9e2cdb [PATCH] clockevents: i386 drivers.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Pierre Tardy <pierre.tardy@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
LKML-Reference: <1294327409-19426-1-git-send-email-pierre.tardy@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don found that P4 PMU reads CCCR register instead of counter
itself (in attempt to catch unflagged event) this makes P4
NMI handler to consume all NMIs it observes. So the other
NMI users such as kgdb simply have no chance to get NMI
on their hands.
Side note: at moment there is no way to run nmi-watchdog
together with perf tool. This is because both 'perf top' and
nmi-watchdog use same event. So while nmi-watchdog reserves
one event/counter for own needs there is no room for perf tool
left (there is a way to disable nmi-watchdog on boot of course).
Ming has tested this patch with the following results
| 1. watchdog disabled
|
| kgdb tests on boot OK
| perf works OK
|
| 2. watchdog enabled, without patch perf-x86-p4-nmi-4
|
| kgdb tests on boot hang
|
| 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
| tests on boot
|
| "perf top" partialy works
| cpu-cycles no
| instructions yes
| cache-references no
| cache-misses no
| branch-instructions no
| branch-misses yes
| bus-cycles no
|
| 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
|
| kgdb tests on boot OK
| perf does not work, NMI "Dazed and confused" messages show up
|
Which means we still have problems with p4 box due to 'unknown'
nmi happens but at least it should fix kgdb test cases.
Reported-by: Jason Wessel <jason.wessel@windriver.com>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Lin Ming <ming.m.lin@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4D275E7E.3040903@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix sparse warning for non-ANSI function declaration:
arch/x86/kernel/smpboot.c💯30: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_lock'
arch/x86/kernel/smpboot.c:105:32: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_unlock'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <20110108195914.95d366ea.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...
Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
From the x86_64 low level interrupt handlers, the frame pointer is
saved right after the partial pt_regs frame.
rbp is not supposed to be part of the irq partial saved registers,
but it only requires to extend the pt_regs frame by 8 bytes to
do so, plus a tiny stack offset fixup on irq exit.
This changes a bit the semantics or get_irq_entry() that is supposed
to provide only the value of caller saved registers and the cpu
saved frame. However it's a win for unwinders that can walk through
stack frames on top of get_irq_regs() snapshots.
A noticeable impact is that it makes perf events cpu-clock and
task-clock events based callchains working on x86_64.
Let's then save rbp into the irq pt_regs.
As a result with:
perf record -e cpu-clock perf bench sched messaging
perf report --stdio
Before:
20.94% perf [kernel.kallsyms] [k] lock_acquire
|
--- lock_acquire
|
|--44.01%-- __write_nocancel
|
|--43.18%-- __read
|
|--6.08%-- fork
| create_worker
|
|--0.88%-- _dl_fixup
|
|--0.65%-- do_lookup_x
|
|--0.53%-- __GI___libc_read
--4.67%-- [...]
After:
19.23% perf [kernel.kallsyms] [k] __lock_acquire
|
--- __lock_acquire
|
|--97.74%-- lock_acquire
| |
| |--21.82%-- _raw_spin_lock
| | |
| | |--37.26%-- unix_stream_recvmsg
| | | sock_aio_read
| | | do_sync_read
| | | vfs_read
| | | sys_read
| | | system_call
| | | __read
| | |
| | |--24.09%-- unix_stream_sendmsg
| | | sock_aio_write
| | | do_sync_write
| | | vfs_write
| | | sys_write
| | | system_call
| | | __write_nocancel
v2: Fix cfi annotations.
Reported-by: Soeren Sandmann Pedersen <sandmann@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jan Beulich <JBeulich@novell.com>
In dump_stack function, bp isn't used anymore, which is introduced by
commit 9c0729dc80. This patch removes bp
completely.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Soeren Sandmann <sandmann@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <AANLkTik9U_Z0WSZ7YjrykER_pBUfPDdgUUmtYx=R74nL@mail.gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Then we can reuse it for Xen later.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Avi Kivity <avi@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Just re-arrange the code a bit to make it easier to follow what is
going on. Basically un-negating the if-statement and swapping the code
inside the if-statement with code outside.
No functional changes.
Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-7-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In original NMI handler, NMI reason io port (0x61) is only processed
on BSP. This makes it impossible to hot-remove BSP. To solve the
issue, a raw spinlock is used to allow the port to be processed on any
CPU.
Originally-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-6-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With priorities in place and no one really understanding the difference between
DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
This also simplifies default_do_nmi() a little bit. Instead of calling the
die_notifier in both the if and else part, just pull it out and call it before
the if-statement. This has the side benefit of avoiding a call to the ioport
to see if there is an external NMI sitting around until after the (more frequent)
internal NMIs are dealt with.
Patch-Inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to consolidate the NMI die_chain events, we need to setup the priorities
for the die notifiers.
I started by defining a bunch of common priorities that can be used by the
notifier blocks. Then I modified the notifier blocks to use the newly created
priorities.
Now that the priorities are straightened out, it should be easier to remove the
event DIE_NMI_IPI.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
They are a handful of places in the code that register a die_notifier
as a catch all in case no claims the NMI. Unfortunately, they trigger
on events like DIE_NMI and DIE_NMI_IPI, which depending on when they
registered may collide with other handlers that have the ability to
determine if the NMI is theirs or not.
The function unknown_nmi_error() makes one last effort to walk the
die_chain when no one else has claimed the NMI before spitting out
messages that the NMI is unknown.
This is a better spot for these devices to execute any code without
colliding with the other handlers.
The two drivers modified are only compiled on x86 arches I believe, so
they shouldn't be affected by other arches that may not have
DIE_NMIUNKNOWN defined.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: dann frazier <dannf@hp.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace the NMI related magic numbers with symbol constants.
Memory parity error is only valid for IBM PC-AT, newer machine use
bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually
reported via MCE. So corresponding function name and kernel log string
is changed.
But on some machines, PCI SERR line is still used to report memory
errors. This is used by EDAC, so corresponding EDAC call is reserved.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/include/asm/io_apic.h
Merge reason: Resolve the conflict, update to a more recent -rc base
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The saving of the ACPI NVS area during hibernation and suspend and
restoring it during the subsequent resume is entirely specific to
ACPI, so move it to drivers/acpi and drop the CONFIG_SUSPEND_NVS
configuration option which is redundant.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, asm: Use fxsaveq/fxrestorq in more places
* 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hwmon: Add core threshold notification to therm_throt.c
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, paravirt: Use native_halt on a halt, not native_safe_halt
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking, lockdep: Convert sprintf_symbol to %pS
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Better struct irqaction layout
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, UV, BAU: Extend for more than 16 cpus per socket
x86, UV: Fix the effect of extra bits in the hub nodeid register
x86, UV: Add common uv_early_read_mmr() function for reading MMRs
* 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Check tsc available/disabled in the delayed init function
x86: Improve TSC calibration using a delayed workqueue
x86: Make tsc=reliable override boot time stability checks
* 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
module: Move RO/NX module protection to after ftrace module update
x86: Resume trampoline must be executable
x86: Add RO/NX protection for loadable kernel modules
x86: Add NX protection for kernel data
x86: Fix improper large page preservation
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, earlyprintk: Move mrst early console to platform/ and fix a typo
x86, apbt: Setup affinity for apb timers acting as per-cpu timer
ce4100: Add errata fixes for UART on CE4100
x86: platform: Move iris to x86/platform where it belongs
x86, mrst: Check platform_device_register() return code
x86/platform: Add Eurobraille/Iris power off support
x86, mrst: Add explanation for using 1960 as the year offset for vrtc
x86, mrst: Fix dependencies of "select INTEL_SCU_IPC"
x86, mrst: The shutdown for MRST requires the SCU IPC mechanism
x86: Ce4100: Add reboot_fixup() for CE4100
ce4100: Add PCI register emulation for CE4100
x86: Add CE4100 platform support
x86: mrst: Set vRTC's IRQ to level trigger type
x86: mrst: Add audio driver bindings
rtc: Add drivers/rtc/rtc-mrst.c
x86: mrst: Add vrtc driver which serves as a wall clock device
x86: mrst: Add Moorestown specific reboot/shutdown support
x86: mrst: Parse SFI timer table for all timer configs
x86/mrst: Add SFI platform device parsing code
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, microcode, AMD: Cleanup code a bit
x86, microcode, AMD: Replace vmalloc+memset with vzalloc
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix included-by file reference comments
x86, cpu: Only CPU features determine NX capabilities
x86, cpu: Call verify_cpu during 32bit CPU startup
x86, cpu: Clear XD_DISABLED flag on Intel to regain NX
x86, cpu: Rename verify_cpu_64.S to verify_cpu.S
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion
x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation
x86, acpi: Add MAX_LOCAL_APIC for 32bit
x86: io_apic: Split setup_ioapic_ids_from_mpc()
x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage
x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()
x86: Allow platforms to force enable apic
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Cleanup L3 cache index disable support
x86, amd-nb: Cleanup AMD northbridge caching code
x86, amd-nb: Complete the rename of AMD NB and related code
Prevent the long delay in io_check_error making NMI watchdog
timeout.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The spin_lock_debug/rcu_cpu_stall detector uses
trigger_all_cpu_backtrace() to dump cpu backtrace.
Therefore it is possible that trigger_all_cpu_backtrace()
could be called at the same time on different CPUs, which
triggers and 'unknown reason NMI' warning. The following case
illustrates the problem:
CPU1 CPU2 ... CPU N
trigger_all_cpu_backtrace()
set "backtrace_mask" to cpu mask
|
generate NMI interrupts generate NMI interrupts ...
\ | /
\ | /
The "backtrace_mask" will be cleaned by the first NMI interrupt
at nmi_watchdog_tick(), then the following NMI interrupts
generated by other cpus's arch_trigger_all_cpu_backtrace() will
be taken as unknown reason NMI interrupts.
This patch uses a test_and_set to avoid the problem, and stop
the arch_trigger_all_cpu_backtrace() from calling to avoid
dumping a double cpu backtrace info when there is already a
trigger_all_cpu_backtrace() in progress.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Cc: fweisbec@gmail.com
LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Don Zickus <dzickus@redhat.com>
There are some paths that walk the die_chain with preemption on.
Make sure we are in an NMI call before we start doing anything.
This was triggered by do_general_protection calling notify_die
with DIE_GPF.
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Found one x2apic pre-enabled system, x2apic_mode suddenly get
corrupted after register some cpus, when compiled
CONFIG_NR_CPUS=255 instead of 512.
It turns out that generic_processor_info() ==> phyid_set(apicid,
phys_cpu_present_map) causes the problem.
phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
system some cpus have an apic id > 255.
The variable after phys_cpu_present_map may get corrupted
silently:
ffffffff828e8420 B phys_cpu_present_map
ffffffff828e8440 B apic_verbosity
ffffffff828e8444 B local_apic_timer_c2_ok
ffffffff828e8448 B disable_apic
ffffffff828e844c B x2apic_mode
ffffffff828e8450 B x2apic_disabled
ffffffff828e8454 B num_processors
...
Actually phys_cpu_present_map is referenced via apic id, instead
index. We should use MAX_LOCAL_APIC instead MAX_APICS.
For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
on 64-bit:
text data bss dec filename
21696943 4193748 12787712 38678403 vmlinux.before
21696943 4193748 12791808 38682499 vmlinux.after
No change on 32bit.
Finally we can remove MAX_APCIS that was rather confusing.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4D23BD9C.3070102@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is not related to init_memory_mapping(), and init_memory_mapping() is
getting more bigger.
So make it as seperated function and call it from reserve_brk() and that is
point when _brk_end is concluded.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933E0.7090305@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
v2.6.36-rc8-54-gb40827f (x86-32, mm: Add an initial page table
for core bootstrapping) made x86 boot using initial_page_table
and broke lguest.
For 2.6.37 we simply cut & paste the initialization code into
lguest (da32dac101 "lguest: populate initial_page_table"), now
we fix it properly by doing that initialization before the
paravirt jump.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: lguest <lguest@ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add these new power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend
The old C-state/idle accounting events:
power:power_start
power:power_end
Have now a replacement (but we are still keeping the old
tracepoints for compatibility):
power:cpu_idle
and
power:power_frequency
is replaced with:
power:cpu_frequency
power:machine_suspend is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.
the type= field got removed from both, it was never
used and the type is differed by the event type itself.
perf timechart userspace tool gets adjusted in a separate patch.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
This patch adds code to therm_throt.c to notify core thermal threshold
events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
The status/log for the same is monitored using the IA32_THERM_STATUS register.
The necessary #defines are in msr-index.h. A call back is added to mce.h, to
further notify the thermal stack, about the threshold events.
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Replace all uses of current_cpu_data with this_cpu operations on the
per cpu structure cpu_info. The scala accesses are replaced with the
matching this_cpu ops which results in smaller and more efficient
code.
In the long run, it might be a good idea to remove cpu_data() macro
too and use per_cpu macro directly.
tj: updated description
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Go through x86 code and replace __get_cpu_var and get_cpu_var
instances that refer to a scalar and are not used for address
determinations.
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move it into head file. to prepare use it in other files.
[ hpa: added missing <linux/types.h> and changed type to phys_addr_t. ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933BA.8000508@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
When trying to change alloc_bootmem with memblock to go with real top-down
Found one old system:
[ 0.000000] Node 0: aperture @ ac000000 size 64 MB
[ 0.000000] Aperture pointing to e820 RAM. Ignoring.
[ 0.000000] Your BIOS doesn't leave a aperture memory hole
[ 0.000000] Please enable the IOMMU option in the BIOS setup
[ 0.000000] This costs you 64 MB of RAM
[ 0.000000] memblock_x86_reserve_range: [0x2020000000-0x2023ffffff] aperture64
[ 0.000000] Cannot allocate aperture memory hole (ffff882020000000,65536K)
[ 0.000000] memblock_x86_free_range: [0x2020000000-0x2023ffffff]
[ 0.000000] Kernel panic - not syncing: Not enough memory for aperture
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc5-tip-yh-06229-gb792dc2-dirty #331
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff81cf50fe>] ? panic+0x91/0x1a3
[ 0.000000] [<ffffffff827c66b2>] ? gart_iommu_hole_init+0x3d7/0x4a3
[ 0.000000] [<ffffffff81d026a9>] ? _etext+0x0/0x3
[ 0.000000] [<ffffffff827ba940>] ? pci_iommu_alloc+0x47/0x71
[ 0.000000] [<ffffffff827c820b>] ? mem_init+0x19/0xec
[ 0.000000] [<ffffffff827b3c40>] ? start_kernel+0x20a/0x3e8
[ 0.000000] [<ffffffff827b32cc>] ? x86_64_start_reservations+0x9c/0xa0
[ 0.000000] [<ffffffff827b33e4>] ? x86_64_start_kernel+0x114/0x11b
it means __alloc_bootmem_nopanic() get too high for that aperture.
Use memblock_find_in_range() with limit directly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0C0740.90104@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
we have this:
while (leftover) {
...
if (get_ucode_data(mc, ucode_ptr, mc_size) ||
microcode_sanity_check(mc) < 0) {
vfree(mc);
break;
}
...
}
if (mc)
vfree(mc);
This will cause a double free of 'mc'. This patch fixes that by
just removing the vfree() call in the loop since 'mc' will be
freed nicely just after we break out of the loop.
There's also a second change in the patch. I noticed a lot of
checks for pointers being NULL before passing them to vfree().
That's completely redundant since vfree() deals gracefully with
being passed a NULL pointer. Removing the redundant checks
yields a nice size decrease for the object file.
Size before the patch:
text data bss dec hex filename
4578 240 1032 5850 16da arch/x86/kernel/microcode_intel.o
Size after the patch:
text data bss dec hex filename
4489 240 984 5713 1651 arch/x86/kernel/microcode_intel.o
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf probe: Fix to support libdwfl older than 0.148
perf tools: Fix lazy wildcard matching
perf buildid-list: Fix error return for success
perf buildid-cache: Fix symbolic link handling
perf symbols: Stop using vmlinux files with no symbols
perf probe: Fix use of kernel image path given by 'k' option
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, kexec: Limit the crashkernel address appropriately
Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.
If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].
for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
[ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[ 9.235021] divide error: 0000 [#1] SMP
[ 9.235315] last sysfs file:
[ 9.235481] CPU 1
[ 9.235592] Modules linked in:
[ 9.245398]
[ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800
[ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[ 9.665356] RSP <ffff88103f8d1c40>
[ 9.665568] ---[ end trace 2296156d35fdfc87 ]---
So let just parse all cpu entries in SRAT.
Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].
it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id=22662
-v2: expand to 32bit according to hpa
need to add MAX_LOCAL_APIC for 32bit
Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD486.9020704@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
of local apics.
Also apic_version[] array should use MAX_LOCAL_APICs.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD464.2020408@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c. This shift has caused a whole bunch of
compile problems under different config options. I attempt to
simplify things with the patch below.
In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing,
the former on a local level and the latter on a global level.
With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more. x86 will now use the global
implementation.
The changes below do a few things. First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here). Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.
Second, I removed the x86 implementation of
touch_nmi_watchdog(). It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.
Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86. And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.
Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can
only have one nmi_watchdog.
Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)
Hopefully, after this patch I won't get any more compile broken
emails. :-)
v3:
changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
UV systems can be partitioned into multiple independent SSIs.
Large partitioned systems may have extra bits in the node_id
register. These bits are used when the total memory on all SSIs
exceeds 16TB. These extra bits need to be ignored when
calculating x2apic_extra_bits.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.972776133@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Early in boot, reading MMRs from the UV hub controller require
calls to early_ioremap()/early_iounmap(). Rather than
duplicating code, add a common function to do the
map/read/unmap.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.834804371@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure we can map all of lowmem if we need to
x86, vt-d: Handle previous faults after enabling fault handling
x86: Enable the intr-remap fault handling after local APIC setup
x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic
x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
bootmem: Add alloc_bootmem_align()
x86, gcc-4.6: Use gcc -m options when building vdso
x86: HPET: Chose a paranoid safe value for the ETIME check
x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix off by one in perf_swevent_init()
perf: Fix duplicate events with multiple-pmu vs software events
ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO
scripts/tags.sh: Add magic for trace-events
tracing: Fix panic when lseek() called on "trace" opened for writing
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86: avoid high BIOS area when allocating address space
x86: avoid E820 regions when allocating address space
x86: avoid low BIOS area when allocating address space
resources: add arch hook for preventing allocation in reserved areas
Revert "resources: support allocating space within a region from the top down"
Revert "PCI: allocate bus resources from the top down"
Revert "x86/PCI: allocate space from the end of a region, not the beginning"
Revert "x86: allocate space within a region top-down"
Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
PCI: Update MCP55 quirk to not affect non HyperTransport variants
Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB
for 64 bits. For 32 bits, this retains compatibility with earlier
kernel releases, and makes it work even if the vmalloc= setting is
adjusted.
For 64 bits, we should be able to increase this substantially once a
hard-coded limit in kexec-tools is fixed.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101217195035.GE14502@redhat.com>
This prevents allocation of the last 2MB before 4GB.
The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27
This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.
On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS. On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,
BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory. This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.
I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below). That means we're vulnerable to BIOS defects that Windows would not
trip over. For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas. This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.
We previously avoided this area in pcibios_align_resource(). This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use this_cpu ops in various places to optimize per cpu data access.
Cc: Jason Baron <jbaron@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high. Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>
Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.
Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.
If we want to enumerate the PMUs they need a name, provide one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some BIOSes use PMU resources, which can cause various bugs:
- Non-working or erratic PMU based statistics - the PMU can end up
counting the wrong thing, resulting in misleading statistics
- Profiling can stop working or it can profile the wrong thing
- A non-working or erratic NMI watchdog that cannot be relied on
- The kernel may disturb whatever thing the BIOS tries to use the
PMU for - possibly causing hardware malfunction in extreme cases.
- ... and other forms of potential misbehavior
Various forms of such misbehavior has been observed in practice - there are
BIOSes that just corrupt the PMU state, consequences be damned.
The PMU is a CPU resource that is handled by the kernel and the BIOS
stealing+corrupting it is not acceptable nor robust, so we detect it,
warn about it and further refuse to touch the PMU ourselves.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.
In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:
Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.
This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.
For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
- Define a stub irq_create_of_mapping for x86 as a stop-gap solution until
drivers/of/irq is further along.
- Define irq_dispose_mapping for x86 to appease of_i2c.c
These are needed to allow stuff in drivers/of/ to build on x86. This stuff
will eventually get replaced; quoting Grant,
"The long term plan is to have the drivers/of/ code handling the mapping
intelligently like powerpc currently does." But for now, just provide
these functions.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101111214526.5de7121b@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.
Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()
A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
During suspend, we disable all the non boot cpus. And during resume we bring
them all back again. So no need to do alternatives_smp_switch() in between.
On my core 2 based laptop, this speeds up the suspend path by 15msec and the
resume path by 5 msec (suspend/resume speed up differences can be attributed
to the different P-states that the cpu is in during suspend/resume).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Alignment of alloc_bootmem() depends on the value of
L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment. Use
alloc_bootmem_align() and explicitly specify the alignment instead.
This fixes a kernel boot crash reported by Jody when the cpu in .config
is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU.
Reported-by: Jody Bruchon <jody@nctritech.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
When adjusting the code to handle removing the old nmi watchdog,
I forgot to consider the compile case when the local apic is not
enabled.
This change fixes the following build error:
arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <20101213153719.GD18577@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty)
chose 8 HPET cycles as a safe value for the ETIME check, as we had the
confirmation that the posted write to the comparator register is
delayed by two HPET clock cycles on Intel chipsets which showed
readback problems.
After that patch hit mainline we got reports from machines with newer
AMD chipsets which seem to have an even longer delay. See
http://thread.gmane.org/gmane.linux.kernel/1054283 and
http://thread.gmane.org/gmane.linux.kernel/1069458 for further
information.
Boris tried to come up with an ACPI based selection of the minimum
HPET cycles, but this failed on a couple of test machines. And of
course we did not get any useful information from the hardware folks.
For now our only option is to chose a paranoid high and safe value for
the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
value for the HPET clockevent accordingly.
Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
Cc: Simon Kirby <sim@hostway.ca>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
Cc: John Stultz <johnstul@us.ibm.com>
The delayed TSC init function does not check whether the system has no
TSC or TSC is disabled at the kernel command line, which results in a
crash in the work queue based extended calibration due to division by
zero because the basic calibration never happened.
Add the missing checks and do not touch TSC when not available or
disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
setup_local_APIC() is used to setup local APIC early during CPU
initialization and already assumes that preemption is disabled on
entry. However, The function unnecessarily disables and enables
preemption and uses smp_processor_id() multiple times in and out of
the nested preemption disabled section. This gives the wrong
impression that the function might be able to handle being called with
preemption enabled and/or migrated to another processor in the middle.
Make it clear that the function is always called with preemption
disabled, drop the confusing preemption disable block and call
smp_processor_id() once at the beginning of the function.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: brgerst@gmail.com
LKML-Reference: <4D00B3B9.7060702@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Originally adapted from Huang Ying's patch which moved the
unknown_nmi_panic to the traps.c file. Because the old nmi
watchdog was deleted before this change happened, the
unknown_nmi_panic sysctl was lost. This re-adds it.
Also, the nmi_watchdog sysctl was re-implemented and its
documentation updated accordingly.
Patch-inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
My patch that removed the old x86 nmi watchdog broke other
arches. This change reverts a piece of that patch and puts the
change in the correct spot.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: fweisbec@gmail.com
Cc: yinghai@kernel.org
LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
assign_to_mp_irq() is copying the struct mpc_intsrc members one by
one. That's silly. Use memcpy() and let the compiler figure it out.
Same for the identical function assign_to_mpc_intsrc()
mp_irq_mpc_intsrc_cmp() is comparing the struct members one by one,
but no caller ever checks the different return codes. Use memcmp()
instead.
Remove the extra printk in MP_ioapic_info()
Signed-off-by: Feng Tang <feng.tang@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <20101208151857.212f0018@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There are 3 places defining similar functions of saving IRQ vector
info into mp_irqs[] array: mmparse/acpi/mrst.
Replace the redundant code by a common function in io_apic.c as it's
only called when CONFIG_X86_IO_APIC=y
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101207133204.4d913c5a@feng-i7>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic
id register, even there is no change needed.
Skip the write, when readout and mptable match.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
LKML-Reference: <4CFDF785.7010401@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If x2apic is preenabled and used by the kernel, we don't need to map
the lapic address. That mapping will never be used.
So just skip that in register_lapic_address()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF69C.9070501@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove the printk as well, we don't want to print when nothing
changed. We print in register_lapic_address() already.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF68A.7020902@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It is almost the same as smp_register_lapic_addr(). We just need to
let smp_read_mpc() call smp_register_lapic_addr() when early==1.
Add the apic_printk to smp_register_lapic_address()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <4CFDF681.3030509@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They are the same, move the common function to apic.c to allow
further cleanups.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4CFDF675.4060305@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reason: apic cleanup series depends on x86/apic, x86/amd-nb x86/platform
Conflicts:
arch/x86/include/asm/io_apic.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level':
arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used
commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under
ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced
by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog).
Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace
section again.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since all the hotplug stuff is serialized by the hotplug mutex,
do away with the amd_nb_lock.
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/pvclock: Zero last_value on resume
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf record: Fix eternal wait for stillborn child
perf header: Don't assume there's no attr info if no sample ids is provided
perf symbols: Figure out start address of kernel map from kallsyms
perf symbols: Fix kallsyms kernel/module map splitting
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: Fix printk_needs_cpu() return value on offline cpus
printk: Fix wake_up_klogd() vs cpu hotplug
Move the code to arch/x86/platform/mrst/. Also fix a typo to use
the correct config option: ONFIG_EARLY_PRINTK_MRST
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: alan@linux.intel.com
LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use text_poke_smp_batch() on unoptimization path for reducing
the number of stop_machine() issues. If the number of
unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256),
kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks
optimizer for remaining probes.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use text_poke_smp_batch() in optimization path for reducing
the number of stop_machine() issues. If the number of optimizing
probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
first MAX_OPTIMIZE_PROBES probes and kicks optimizer for
remaining probes.
Changes in v5:
- Use kick_kprobe_optimizer() instead of directly calling
schedule_delayed_work().
- Rescheduling optimizer outside of kprobe mutex lock.
Changes in v2:
- Allocate code buffer and parameters in arch_init_kprobes()
instead of using static arraies.
- Merge previous max optimization limit patch into this patch.
So, this patch introduces upper limit of optimization at
once.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce text_poke_smp_batch(). This function modifies several
text areas with one stop_machine() on SMP. Because calling
stop_machine() is heavy task, it is better to aggregate
text_poke requests.
( Note: I've talked with Rusty about this interface, and
he would not like to expand stop_machine() interface, since
it is not for generic use. )
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unoptimization occurs when a probe is unregistered or disabled,
and is heavy because it recovers instructions by using
stop_machine(). This patch delays unoptimization operations and
unoptimize several probes at once by using
text_poke_smp_batch(). This can avoid unexpected system slowdown
coming from stop_machine().
Changes in v5:
- Split this patch into several cleanup patches and this patch.
- Fix some text_mutex lock miss.
- Use bool instead of int for behavior flags.
- Add additional comment for (un)optimizing path.
Changes in v2:
- Use dynamic allocated buffers and params.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit a5ef2e70 "x86: Sanitize apb timer interrupt handling" forgot
the affinity setup when cleaning up the code, this patch just
adds the forgotten part
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sodaville needs to setup the IO_APIC ids as the boot loader leaves
them uninitialized. Split out the setter function so it can be called
unconditionally from the sodaville board code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101126165020.GA26361@www.tglx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Boot to boot the TSC calibration may vary by quite a large amount.
While normal variance of 50-100ppm can easily be seen, the quick
calibration code only requires 500ppm accuracy, which is the limit
of what NTP can correct for.
This can cause problems for systems being used as NTP servers, as
every time they reboot it can take hours for them to calculate the
new drift error caused by the calibration.
The classic trade-off here is calibration accuracy vs slow boot times,
as during the calibration nothing else can run.
This patch uses a delayed workqueue to calibrate the TSC over the
period of a second. This allows very accurate calibration (in my
tests only varying by 1khz or 0.4ppm boot to boot). Additionally this
refined calibration step does not block the boot process, and only
delays the TSC clocksoure registration by a few seconds in early boot.
If the refined calibration strays 1% from the early boot calibration
value, the system will fall back to already calculated early boot
calibration.
Credit to Andi Kleen who suggested using a timer quite awhile back,
but I dismissed it thinking the timer calibration would be done after
the clocksource was registered (which would break things). Forgive
me for my short-sightedness.
This patch has worked very well in my testing, but TSC hardware is
quite varied so it would probably be good to get some extended
testing, possibly pushing inclusion out to 2.6.39.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@elte.hu>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Clark Williams <williams@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled
x86-64: Fix and clean up AMD Fam10 MMCONF enabling
x86: UV: Address interrupt/IO port operation conflict
x86: Use online node real index in calulate_tbl_offset()
x86, asm: Fix binutils 2.15 build failure
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf symbols: Remove incorrect open-coded container_of()
perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
x86/kprobes: Prevent kprobes to probe on save_args()
irq_work: Drop cmpxchg() result
perf: Fix owner-list vs exit
x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
tracing: Fix recursive user stack trace
perf,hw_breakpoint: Initialize hardware api earlier
x86: Ignore trap bits on single step exceptions
tracing: Force arch_local_irq_* notrace for paravirt
tracing: Fix module use of trace_bprintk()
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).
The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.
Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.
Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When booting up a CPU set the various topology masks before
calling the CPU_STARTING notifier. This way the notifier
can actually use the masks.
This is needed for a perf change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
and use it when appropriate.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In a kvm virt guests, the perf counters are not emulated. Instead they
return zero on a rdmsrl. The perf nmi handler uses the fact that crossing
a zero means the counter overflowed (for those counters that do not have
specific interrupt bits). Therefore on kvm guests, perf will swallow all
NMIs thinking the counters overflowed.
This causes problems for subsystems like kgdb which needs NMIs to do its
magic. This problem was discovered by running kgdb tests.
The solution is to write garbage into a perf counter during the
initialization and hopefully reading back the same number. On kvm
guests, the value will be read back as zero and we disable perf as
a result.
Reported-by: Jason Wessel <jason.wessel@windriver.com>
Patch-inspired-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb,ppc: Fix regression in evr register handling
kgdb,x86: fix regression in detach handling
kdb: fix crash when KDB_BASE_CMD_MAX is exceeded
kdb: fix memory leak in kdb_main.c
Adaptions to the changes of the AMD northbridge caching code: instead
of a bool in each l3 struct, use a flag in amd_northbridges.flags to
indicate L3 cache index disable support; use a pointer to the whole
northbridge instead of the misc device in the l3 struct; simplify the
initialisation; dynamically generate sysfs attribute array.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Not only the naming of the files was confusing, it was even more so for
the function and variable names.
Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.
However, there are only really three use cases for stack tracing:
(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task
In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
be 0. If it _is_ defined, then
- in case (a) bp should be gotten directly from the CPU's register, so
the caller should pass NULL for regs,
- in case (b) the caller should should pass the IRQ registers to
dump_trace(),
- in case (c) bp should be gotten from the top of the task's stack, so
the caller should pass NULL for regs.
Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.
This patch introduces a new inline function stack_frame(task, regs)
that computes the desired bp. This function is then called from the
two versions of dump_stack().
Signed-off-by: Soren Sandmann <ssp@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Candidate memory ranges were not calculated properly (start
addresses got needlessly rounded down, and end addresses didn't
get rounded up at all), address comparison for secondary CPUs
was done on only part of the address, and disabled status wasn't
tracked properly.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <4CE24DF40200007800022737@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Prevent kprobes to probe on save_args() since this function
will be called from breakpoint exception handler. That will
cause infinit loop on breakpoint handling.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch is a logical extension of the protection provided by
CONFIG_DEBUG_RODATA to LKMs. The protection is provided by
splitting module_core and module_init into three logical parts
each and setting appropriate page access permissions for each
individual section:
1. Code: RO+X
2. RO data: RO+NX
3. RW data: RW+NX
In order to achieve proper protection, layout_sections() have
been modified to align each of the three parts mentioned above
onto page boundary. Next, the corresponding page access
permissions are set right before successful exit from
load_module(). Further, free_module() and sys_init_module have
been modified to set module_core and module_init as RW+NX right
before calling module_free().
By default, the original section layout and access flags are
preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y,
the patch will page-align each group of sections to ensure that
each page contains only one type of content and will enforce
RO/NX for each group of pages.
-v1: Initial proof-of-concept patch.
-v2: The patch have been re-written to reduce the number of #ifdefs
and to make it architecture-agnostic. Code formatting has also
been corrected.
-v3: Opportunistic RO/NX protection is now unconditional. Section
page-alignment is enabled when CONFIG_DEBUG_RODATA=y.
-v4: Removed most macros and improved coding style.
-v5: Changed page-alignment and RO/NX section size calculation
-v6: Fixed comments. Restricted RO/NX enforcement to x86 only
-v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added
calls to set_all_modules_text_rw() and set_all_modules_text_ro()
in ftrace
-v8: updated for compatibility with linux 2.6.33-rc5
-v9: coding style fixes
-v10: more coding style fixes
-v11: minor adjustments for -tip
-v12: minor adjustments for v2.6.35-rc2-tip
-v13: minor adjustments for v2.6.37-rc1-tip
Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F914.9070106@free.fr>
[ minor cleanliness edits, -v14: build failure fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.
The following steps are taken to achieve this:
1. Linker script is adjusted so .text always starts and ends on a page bound
2. Linker script is adjusted so .rodata always start and end on a page boundary
3. NX is set for all pages from _etext through _end in mark_rodata_ro.
4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
5. bios rom is set to x when pcibios is used.
The results of patch application may be observed in the diff of kernel page
table dumps:
pcibios:
-- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400
++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400
0x00000000-0xc0000000 3G pmd
---[ Kernel Mapping ]---
-0xc0000000-0xc0100000 1M RW GLB x pte
+0xc0000000-0xc00a0000 640K RW GLB NX pte
+0xc00a0000-0xc0100000 384K RW GLB x pte
-0xc0100000-0xc03d7000 2908K ro GLB x pte
+0xc0100000-0xc0318000 2144K ro GLB x pte
+0xc0318000-0xc03d7000 764K ro GLB NX pte
-0xc03d7000-0xc0600000 2212K RW GLB x pte
+0xc03d7000-0xc0600000 2212K RW GLB NX pte
0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd
0xf7a00000-0xf7bfe000 2040K RW GLB NX pte
0xf7bfe000-0xf7c00000 8K pte
No pcibios:
-- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400
++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400
0x00000000-0xc0000000 3G pmd
---[ Kernel Mapping ]---
-0xc0000000-0xc0100000 1M RW GLB x pte
+0xc0000000-0xc0100000 1M RW GLB NX pte
-0xc0100000-0xc03d7000 2908K ro GLB x pte
+0xc0100000-0xc0318000 2144K ro GLB x pte
+0xc0318000-0xc03d7000 764K ro GLB NX pte
-0xc03d7000-0xc0600000 2212K RW GLB x pte
+0xc03d7000-0xc0600000 2212K RW GLB NX pte
0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd
0xf7a00000-0xf7bfe000 2040K RW GLB NX pte
0xf7bfe000-0xf7c00000 8K pte
The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
-v1: initial patch for 2.6.30
-v2: patch for 2.6.31-rc7
-v3: moved all code into arch/x86, adjusted credits
-v4: fixed ifdef, removed credits from CREDITS
-v5: fixed an address calculation bug in mark_nxdata_nx()
-v6: added acked-by and PT dump diff to commit log
-v7: minor adjustments for -tip
-v8: rework with the merge of "Set first MB as RW+NX"
Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch for SGI UV systems addresses a problem whereby
interrupt transactions being looped back from a local IOH,
through the hub to a local CPU can (erroneously) conflict with
IO port operations and other transactions.
To workaound this we set a high bit in the APIC IDs used for
interrupts. This bit appears to be ignored by the sockets, but
it avoids the conflict in the hub.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20101116222352.GA8155@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
___
arch/x86/include/asm/uv/uv_hub.h | 4 ++++
arch/x86/include/asm/uv/uv_mmrs.h | 19 ++++++++++++++++++-
arch/x86/kernel/apic/x2apic_uv_x.c | 25 +++++++++++++++++++++++--
arch/x86/platform/uv/tlb_uv.c | 2 +-
arch/x86/platform/uv/uv_time.c | 4 +++-
5 files changed, 49 insertions(+), 5 deletions(-)
The Iris machines from Eurobraille do not have APM or ACPI support
to shut themselves down properly. A special I/O sequence is
needed to do so. This modle runs this I/O sequence at
kernel shutdown when its force parameter is set to 1.
Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
[ did minor coding style edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adjust the paths for files that are including verify_cpu.S.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add parentheses around one pushl_cfi argument.
Commit df5d1874 "x86: Use {push,pop}{l,q}_cfi in more places"
caused GNU assembler 2.15 (Debian Sarge) to fail. It is still
failing as of commit 07bd8516 "x86, asm: Restore parentheses
around one pushl_cfi argument". This patch solves build failure
with GNU assembler 2.15.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: heukelum@fastmail.fm
Cc: hpa@linux.intel.com
LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
backtrace_mask has been used under the code context of
ARCH_HAS_NMI_WATCHDOG. So put it into that context.
We were warned by the following warning:
arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the bulk of the old nmi_watchdog is gone, remove all
the stub variables and hooks associated with it.
This touches lots of files mainly because of how the io_apic
nmi_watchdog was implemented. Now that the io_apic nmi_watchdog
is forever gone, remove all its fingers.
Most of this code was not being exercised by virtue of
nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
risky here.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we have a new nmi_watchdog that is more generic and
sits on top of the perf subsystem, we really do not need the old
nmi_watchdog any more.
In addition, the old nmi_watchdog doesn't really work if you are
using the default clocksource, hpet. The old nmi_watchdog code
relied on local apic interrupts to determine if the cpu is still
alive. With hpet as the clocksource, these interrupts don't
increment any more and the old nmi_watchdog triggers false
postives.
This piece removes the old nmi_watchdog code and stubs out any
variables and functions calls. The stubs are the same ones used
by the new nmi_watchdog code, so it should be well tested.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: gorcunov@openvz.org
LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The fix from ba773f7c51
(x86,kgdb: Fix hw breakpoint regression) was not entirely complete.
The kgdb_remove_all_hw_break() function also needs to call the
hw_break_release_slot() or else a breakpoint can get activated again
after the debugger has detached.
The kgdb test suite exposes the behavior in the form of either a hang
or repetitive failure. The kernel config that exposes the problem
contains all of the following:
CONFIG_DEBUG_RODATA=y
CONFIG_KGDB_TESTS=y
CONFIG_KGDB_TESTS_ON_BOOT=y
CONFIG_KGDB_TESTS_BOOT_STRING="V1F100"
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a single step exception fires, the trap bits, used to
signal hardware breakpoints, are in a random state.
These trap bits might be set if another exception will follow,
like a breakpoint in the next instruction, or a watchpoint in the
previous one. Or there can be any junk there.
So if we handle these trap bits during the single step exception,
we are going to handle an exception twice, or we are going to
handle junk.
Just ignore them in this case.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332
Reported-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: All since 2.6.33.x <stable@kernel.org>
This patch adds the CE4100 reboot fixup to reboot_fixups_32.c
[ tglx: Moved PCI id to reboot_fixups_32.c ]
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The XD_DISABLE-clearing side-effect needs to happen for both 32bit
and 64bit, but the 32bit init routines were not calling verify_cpu()
yet. This adds that call to gain the side-effect.
The longmode/SSE tests being performed in verify_cpu() need to happen very
early for 64bit but not for 32bit. Instead of including it in two places
for 32bit, we can just include it once in arch/x86/kernel/head_32.S.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Intel CPUs have an additional MSR bit to indicate if the BIOS was
configured to disable the NX cpu feature. This bit was traditionally
used for operating systems that did not understand how to handle the
NX bit. Since Linux understands this, this BIOS flag should be ignored
by default.
In a review[1] of reported hardware being used by Ubuntu bug reporters,
almost 10% of systems had an incorrectly configured BIOS, leaving their
systems unable to use the NX features of their CPU.
This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX
cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under
very strange hardware configurations, NX actually needs to be disabled,
"noexec=off" can be used to restore the prior behavior.
[1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/
Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The code is 32bit already, and can be used in 32bit routines.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jasper suggested we use the zeroing capability of the allocators
instead of calling memset ourselves. Add node affinity while we're at
it.
Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
get_ucode_data is a memcpy() wrapper which always returns 0. Move it
into the header and make it an inline. Remove all code checking its
return value and turn it into a void.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
We don't have to do memset() ourselves after vmalloc() when we have
vzalloc(), so change that in
arch/x86/kernel/microcode_amd.c::get_next_ucode().
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
check_enable_amd_mmconf_dmi() gets called only for the BSP,
hence everything hanging off of it can be __init*.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CD2DE1E0200007800020990@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A new version of the SGI UV hub node controller is being
developed. A few of the MMRs (control registers) that exist on
the current hub no longer exist on the new hub. Fortunately,
there are alternate MMRs that are are functionally equivalent
and that exist on both hubs.
This patch changes the UV code to use MMRs that exist in BOTH
versions of the hub node controller.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101106204056.GA27584@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>