Spinlocks on shared processor partitions use H_YIELD to notify the
hypervisor we are waiting on another virtual CPU. Unfortunately this means
the hcall tracepoints can recurse.
The patch below adds a percpu depth and checks it on both the entry and
exit hcall tracepoints.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@kernel.org
When converting to the new cpumask code I screwed up:
- if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) {
- cpu_clear(cpu, numa_cpumask_lookup_table[node]);
+ if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) {
+ cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
This was introduced in commit 25863de07a (powerpc/cpumask: Convert NUMA code
to new cpumask API)
Fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no need to start up the timer and monitor topology changes on a
dedicated processor partition, so disable it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The rest of the NUMA code expects an OF associativity property with
the first cell containing the length. Without this fix all topology changes
cause us to misparse the property and put the cpu into node 0.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The hypervisor uses unsigned 1 byte counters to signal topology changes to
the OS. Since they can wrap we need to check for any difference, not just if
the hypervisor count is greater than the previous count.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
VPHN supports up to 8 distance fields but the number of entries in
ibm,associativity-reference-points signifies how many are in use.
Don't look at all the VPHN counts, only distance_ref_points_depth
worth.
Since we already cap our distance metrics at MAX_DISTANCE_REF_POINTS,
use that to size the VPHN arrays and add a BUILD_BUG_ON to avoid it growing
larger than the VPHN maximum of 8.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Correct a spelling error in VPHN comments in numa.c.
Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some of those functions try to adjust the CPU features, for example
to remove NAP support on some revisions. However, they seem to use
r5 as an index into the CPU table entry, which might have been right
a long time ago but no longer is. r4 is the right register to use.
This probably caused some off behaviours on some PowerMac variants
using 750cx or 7455 processor revisions.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@kernel.org
When calling setup_cpu() on 64-bit, we pass a pointer to the
cputable entry we have found. This used to be fine when cur_cpu_spec
was a pointer to that entry, but nowadays, we copy the entry into
a separate variable, and we do so before we call the setup_cpu()
callback. That means that any attempt by that callback at patching
the CPU table entry (to adjust CPU features for example) will patch
the wrong table.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
max_mapnr is a pfn, not an index innto mem_map[]. So don't add
ARCH_PFN_OFFSET a second time.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
m32r: Fixup last __do_IRQ leftover
genirq: Add missing status flags to modification mask
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure the stack is set up before we use it
x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
x86, nx: Don't force pages RW when setting NX bits
FREQ is a ridiculously short name for a platform-specific macro in a
generic header, and it now conflicts with an enumeration in the
gspca/ov519 driver.
Also delete conditional reference to ixp4xx_get_board_tick_rate()
which is not defined anywhere.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Queues should be empty when released, if not, there is a safety valve.
Make sure the queue is usable after it triggers.
Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
Somehow I managed to miss the last __do_IRQ caller when I cleanup the
remaining users. m32r is fully converted to the generic irq layer, but
I managed to not commit the conversion of __do_IRQ() to
generic_handle_irq() after compile testing the quilt series :(
Pointed-out-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Since checkin ebba638ae7 we call
verify_cpu even in 32-bit mode. Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code! This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.
This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...
Reported-by: Matthieu Castet <castet.matthieu@free.fr>
LKML-Reference: <4D41E86D.8060205@free.fr>
Tested-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
IPI's while the cr3 is still pointing to the prev mm. And this window
can lead to the possibility of bogus TLB fills resulting in strange
failures. One such problematic scenario is mentioned below.
T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
etc between the point of clearing the cpu from the mm_cpumask(mm1)
and before reloading the cr3 with the new mm2.
T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1
as it doesn't see that cpu listed in the mm_cpumask(mm1).
T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
page-table pages associated with the removed vma mapping.
T4. CPU-2 now allocates those freed page-table pages for something
else.
T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
can potentially speculate and walk through the page-table caches
and can insert new TLB entries. As the page-table pages are
already freed and being used on CPU-2, this page walk can
potentially insert a bogus global TLB entry depending on the
(random) contents of the page that is being used on CPU-2.
T6. This bogus TLB entry being global will be active across future CR3
changes and can result in weird memory corruption etc.
To avoid this issue, for the prev mm that is handing over the cpu to
another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
changed.
Marking it for -stable, though we haven't seen any reported failure that
can be attributed to this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org [v2.6.32+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Markus Kohn ran into a hard hang regression on an acer aspire
1310, when acpi is enabled. git bisect showed the following
commit as the bad one that introduced the boot regression.
commit d0af9eed5a
Author: Suresh Siddha <suresh.b.siddha@intel.com>
Date: Wed Aug 19 18:05:36 2009 -0700
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
Because of the UP configuration of that platform,
native_smp_prepare_cpus() bailed out (in smp_sanity_check())
before doing the set_mtrr_aps_delayed_init()
Further down the boot path, native_smp_cpus_done() will call the
delayed MTRR initialization for the AP's (mtrr_aps_init()) with
mtrr_aps_delayed_init not set. This resulted in the boot
processor reprogramming its MTRR's to the values seen during the
start of the OS boot. While this is not needed ideally, this
shouldn't have caused any side-effects. This is because the
reprogramming of MTRR's (set_mtrr_state() that gets called via
set_mtrr()) will check if the live register contents are
different from what is being asked to write and will do the actual
write only if they are different.
BP's mtrr state is read during the start of the OS boot and
typically nothing would have changed when we ask to reprogram it
on BP again because of the above scenario on an UP platform. So
on a normal UP platform no reprogramming of BP MTRR MSR's
happens and all is well.
However, on this platform, bios seems to be modifying the fixed
mtrr range registers between the start of OS boot and when we
double check the live registers for reprogramming BP MTRR
registers. And as the live registers are modified, we end up
reprogramming the MTRR's to the state seen during the start of
the OS boot.
During ACPI initialization, something in the bios (probably smi
handler?) don't like this fact and results in a hard lockup.
We didn't see this boot hang issue on this platform before the
commit d0af9eed5a, because only
the AP's (if any) will program its MTRR's to the value that BP
had at the start of the OS boot.
Fix this issue by checking mtrr_aps_delayed_init before
continuing further in the mtrr_aps_init(). Now, only AP's (if
any) will program its MTRR's to the BP values during boot.
Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
[ By the way, this behavior of the bios modifying MTRR's after the start
of the OS boot is not common and the kernel is not prepared to
handle this situation well. Irrespective of this issue, during
suspend/resume, linux kernel will try to reprogram the BP's MTRR values
to the values seen during the start of the OS boot. So suspend/resume might
be already broken on this platform for all linux kernel versions. ]
Reported-and-bisected-by: Markus Kohn <jabber@gmx.org>
Tested-by: Markus Kohn <jabber@gmx.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: Rafael Wysocki <rjw@novell.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: stable@kernel.org # [v2.6.32+]
LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xen want page table pages read only.
But the initial page table (from head_*.S) live in .data or .bss.
That was broken by 64edc8ed5f. There is
absolutely no reason to force these pages RW after they have already
been marked RO.
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Fix ASM optimized code for LE
microblaze: Fix unaligned issue on MMU system with BS=0 DIV=1
microblaze: Fix DTB passing from bootloader
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
arch/arm/mach-omap2/dma.c: Convert IS_ERR result to PTR_ERR
arm: omap2: mux: fix compile warning
omap1: Simplify use of omap_irq_flags
omap2+: Fix unused variable warning for omap_irq_base
Allow non-ARM SMP processors to use the SMP_ON_UP feature. CPUs
supporting SMP must have the new CPU ID format, so check for this first.
Then check for ARM11MPCore, which fails the MPIDR check. Lastly check
the MPIDR reports multiprocessing extensions and that the CPU is part of
a multiprocessing system.
Cc: <stable@kernel.org>
Reported-and-Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ensure that the ISA/PCI IO space accessors are properly ordered on
ARMv6+ architectures. These should always be ordered with respect to
all other accesses.
This also fixes __iormb() and __iowmb() not being visible to ioread/
iowrite if a platform defines its own MMIO accessors.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Disable the initrd if the passed address already overlaps the reserved
region. This avoids oopses on Netwinders when NeTTrom tells the kernel
that an initrd is located at mem+4MB, but this overlaps the BSS,
resulting in the kernels in-use BSS being freed.
This should be applied to v2.6.37-stable.
Cc: <stable@kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
0ea1293 (arm: return both physical and virtual addresses from addruart)
changed the way the 'addruart' worked, making it return both the virt
and phys addresses. Unfortunately, for footbridge, these were reversed.
Fix that. Tested on Netwinder.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
6f9a3c33 "[S390] cleanup s390 Kconfig" accidentally changed
the default for CONFIG_CHSC_SCH. Reset it to m.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The implementation of the cache flushing interfaces on the s390
is identical with the default implementation in asm-generic.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix this build error with !CONFIG_SWAP caused by tranparent huge pages support:
In file included from mm/pgtable-generic.c:9:0:
/linux-2.6/arch/s390/include/asm/tlb.h: In function 'tlb_remove_page':
/linux-2.6/arch/s390/include/asm/tlb.h:92:2: error: implicit declaration of function 'page_cache_release'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The uaccess functions copy_in_user_std and clear_user_std fail to
switch back from secondary space mode to primary space mode with sacf
in case of an unresolvable page fault. We need to make sure that the
switch back to primary mode is done in all cases, otherwise the code
following the uaccess inline assembly will crash.
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After page_table_free_rcu removed a page from the pgtable_list
page_table_free better not add it again. Otherwise a page_table_alloc
can reuse a page table fragment that is still in the rcu process.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Microblaze little-endian doesn't support ASM optimized library
functions(memcpy/memmove). Kconfig doens't contain
any information about endian that's why it is necessary to
check it in the source code.
The code is used with barrel shifter is used.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Unaligned code use shift for finding register operand.
There is used BSRLI(r8,r8,2) macro which is expand for BS=0, DIV=1
by
ori rD, r0, (1 << imm); \
idivu rD, rD, rA
but if rD is equal rA then ori instruction rewrite value which
should be devide.
The patch remove this macro which use idivu instruction because
idivu takes 32/34 cycles. The highest shifting is 20 which takes
20 cycles.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Little endian system needs to check OF_DT_HEADER
but it is swapped because it is in big-endian.
Microblaze LE provides lwr instruction which loads
magic number in BIG endian format which can be compared.
There is used the fact that if you write 0x1 as word
and load it as byte then you get for big-endian zero
and 1 for little-endian.
Signed-off-by: Michal Simek <monstr@monstr.eu>
* 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Route halt operations to safe_halt pvop.
xen/e820: Guard against E820_RAM not having page-aligned size or start.
xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.
This code elsewhere returns a negative constant to an indicate an error,
while IS_ERR returns the result of a >= operation.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x;
@@
if (...) { ...
- return IS_ERR(x);
+ return PTR_ERR(x);
}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Jarkko Nikula <jhnikula@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 8419fdbaf2
(omap2+: Add omap_mux_get_by_name) introduced the following
compile warning:
arch/arm/mach-omap2/mux.c: In function '_omap_mux_get_by_name':
arch/arm/mach-omap2/mux.c:163:17: warning: 'found_mode' may be used
uninitialized in this function
Signed-off-by: Felipe Balbi <balbi@ti.com>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 03a9e51261
(omap1: Use asm_irq_flags for entry-macro.S) added support
for multi-omap builds with addition of the omap_irq_flags.
Commit 9f9605c2ed
(omap2+: Fix unused variable warning for omap_irq_base)
simplified omap2+ entry-macro.S by moving omap_irq_flags
out of entry-macro.S.
Simplify omap1 entry-macro.S in a similar way to keep the
code consistent. Based on a similar earlier patch for omap2+
by Russell King <rmk+kernel@arm.linux.org.uk>.
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 5d190c4010
(omap2+: Initialize omap_irq_base for entry-macro.S from
platform code) simplified the handling of omap_irq_base
for multi-omap builds. However, this patch also introduced
a build warning for !MULTI_OMAP2 builds:
arch/arm/mach-omap2/io.c: In function 'omap_irq_base_init':
arch/arm/mach-omap2/io.c:322: warning: unused variable 'omap_irq_base'
Fix this by removing the ifdef. Also simplify things further
by moving omap_irq_base out of entry-macro.S.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
percpu, x86: Fix percpu_xchg_op()
x86: Remove left over system_64.h
x86-64: Don't use pointer to out-of-scope variable in dump_trace()
This patch fixes some issues with raw event validation on
Pentium 4 (Netburst) based processors.
As I was testing libpfm4 Netburst support, I ran into two
problems in the p4_validate_raw_event() function:
- the shared field must be checked ONLY when HT is on
- the binding to ESCR register was missing
The second item was causing raw events to not be encoded
correctly compared to generic PMU events.
With this patch, I can now pass Netburst events to libpfm4
examples and get meaningful results:
$ task -e global_power_events🏃u noploop 1
noploop for 1 seconds
3,206,304,898 global_power_events:running
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
Cc: acme@redhat.com
Cc: gorcunov@gmail.com
Cc: ming.m.lin@intel.com
LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With this patch, the cpuidle driver does not load and
does not issue the mwait operations. Instead the hypervisor
is doing them (b/c we call the safe_halt pvops call).
This fixes quite a lot of bootup issues wherein the user had
to force interrupts for the continuation of the bootup.
Details are discussed in:
http://lists.xensource.com/archives/html/xen-devel/2011-01/msg00535.html
[v2: Wrote the commit description]
Reported-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>