Commit Graph

10441 Commits

Author SHA1 Message Date
Robert Richter
da759fe5be oprofile/x86: move IBS code
Moving code to make future changes easier. This groups all IBS code
together.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:28 +02:00
Robert Richter
8617f98c00 oprofile/x86: return -EBUSY if counters are already reserved
In case a counter is already reserved by the watchdog or perf_event
subsystem, oprofile ignored this counters silently. This case is
handled now and oprofile_setup() now reports an error.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:28 +02:00
Robert Richter
83300ce0df oprofile/x86: moving shutdown functions
Moving some code in preparation of the next patch.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:27 +02:00
Robert Richter
d0e4120fda oprofile/x86: reserve counter msrs pairwise
For AMD's and Intel's P6 generic performance counters have pairwise
counter and control msrs. This patch changes the counter reservation
in a way that both msrs must be registered. It joins some counter
loops and also removes the unnecessary NUM_CONTROLS macro in the AMD
implementation.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:26 +02:00
Robert Richter
8f5a2dd83a oprofile/x86: rework error handler in nmi_setup()
This patch improves the error handler in nmi_setup(). Most parts of
the code are moved to allocate_msrs(). In case of an error
allocate_msrs() also frees already allocated memory. nmi_setup()
becomes easier and better extendable.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:07 +02:00
H. Peter Anvin
097c1bd567 x86, cpu: Make APERF/MPERF a normal table-driven flag
APERF/MPERF can be handled via the table like all the other scattered
CPU flags.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
2010-05-03 15:49:31 -07:00
Borislav Petkov
d88d95eb1c x86, k8: Fix build error when K8_NB is disabled
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 14:34:24 -07:00
Brian Gerst
250825008f x86-32: Don't set ignore_fpu_irq in simd exception
Any processor that supports simd will have an internal fpu, and the
irq13 handler will not be enabled.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:32 -07:00
Brian Gerst
e2e75c915d x86: Merge kernel_math_error() into math_error()
Clean up the kernel exception handling and make it more similar to
the other traps.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:31 -07:00
Brian Gerst
9b6dba9e07 x86: Merge simd_math_error() into math_error()
The only difference between FPU and SIMD exceptions is where the
status bits are read from (cwd/swd vs. mxcsr).  This also fixes
the discrepency introduced by commit adf77bac, which fixed FPU
but not SIMD.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:29 -07:00
Brian Gerst
40d2e76315 x86-32: Rework cache flush denied handler
The cache flush denied error is an erratum on some AMD 486 clones.  If an invd
instruction is executed in userspace, the processor calls exception 19 (13 hex)
instead of #GP (13 decimal).  On cpus where XMM is not supported, redirect
exception 19 to do_general_protection().  Also, remove die_if_kernel(), since
this was the last user.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:26 -07:00
Mark Langsdorf
b810e94c9d powernow-k8: Fix frequency reporting
With F10, model 10, all valid frequencies are in the ACPI _PST table.

Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-03 15:04:18 +02:00
Ingo Molnar
56f0e74c9c x86: Fix parse_reservetop() build failure on certain configs
Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
fixup_early_ioremap() call to parse_reservetop() and declared it
in io.h.

But asm/io.h was only included indirectly - and on some configs
not at all, causing a build failure on those configs.

Cc: Liang Li <liang.li@windriver.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-03 09:22:19 +02:00
Ingo Molnar
53ba4f2fa7 Merge commit 'v2.6.34-rc6' into core/locking 2010-05-03 09:17:01 +02:00
Frederic Weisbecker
feef47d0cb hw-breakpoints: Get the number of available registers on boot dynamically
The breakpoint generic layer assumes that archs always know in advance
the static number of address registers available to host breakpoints
through the HBP_NUM macro.

However this is not true for every archs. For example Arm needs to get
this information dynamically to handle the compatiblity between
different versions.

To solve this, this patch proposes to drop the static HBP_NUM macro
and let the arch provide the number of available slots through a
new hw_breakpoint_slots() function. For archs that have
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once
as the number of registers fits for instruction and data breakpoints
together.
For the others it will be called first to get the number of
instruction breakpoint registers and another time to get the
data breakpoint registers, the targeted type is given as a
parameter of hw_breakpoint_slots().

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:14 +02:00
Frederic Weisbecker
0102752e4c hw-breakpoints: Separate constraint space for data and instruction breakpoints
There are two outstanding fashions for archs to implement hardware
breakpoints.

The first is to separate breakpoint address pattern definition
space between data and instruction breakpoints. We then have
typically distinct instruction address breakpoint registers
and data address breakpoint registers, delivered with
separate control registers for data and instruction breakpoints
as well. This is the case of PowerPc and ARM for example.

The second consists in having merged breakpoint address space
definition between data and instruction breakpoint. Address
registers can host either instruction or data address and
the access mode for the breakpoint is defined in a control
register. This is the case of x86 and Super H.

This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
that archs can select if they belong to the second case. Those
will have their slot allocation merged for instructions and
data breakpoints.

The others will have a separate slot tracking between data and
instruction breakpoints.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:11 +02:00
Frederic Weisbecker
b2812d031d hw-breakpoints: Change/Enforce some breakpoints policies
The current policies of breakpoints in x86 and SH are the following:

- task bound breakpoints can only break on userspace addresses
- cpu wide breakpoints can only break on kernel addresses

The former rule prevents ptrace breakpoints to be set to trigger on
kernel addresses, which is good. But as a side effect, we can't
breakpoint on kernel addresses for task bound breakpoints.

The latter rule simply makes no sense, there is no reason why we
can't set breakpoints on userspace while performing cpu bound
profiles.

We want the following new policies:

- task bound breakpoint can set userspace address breakpoints, with
no particular privilege required.
- task bound breakpoints can set kernelspace address breakpoints but
must be privileged to do that.
- cpu bound breakpoints can do what they want as they are privileged
already.

To implement these new policies, this patch checks if we are dealing
with a kernel address breakpoint, if so and if the exclude_kernel
parameter is set, we tell the user that the breakpoint is invalid,
which makes a good generic ptrace protection.
If we don't have exclude_kernel, ensure the user has the right
privileges as kernel breakpoints are quite sensitive (risk of
trap recursion attacks and global performance impacts).

[ Paul Mundt: keep addr space check for sh signal delivery and fix
  double function declaration]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-05-01 04:32:10 +02:00
Frederic Weisbecker
73266fc1df hw-breakpoints: Tag ptrace breakpoint as exclude_kernel
Tag ptrace breakpoints with the exclude_kernel attribute set. This
will make it easier to set generic policies on breakpoints, when it
comes to ensure nobody unpriviliged try to breakpoint on the kernel.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:07 +02:00
Prarit Bhargava
bbd391a15d x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move().  It is possible that xen guests have
irq_desc->chip_data = NULL.

Test for NULL chip_data pointer before attempting to complete an irq move.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20100427152434.16193.49104.sendpatchset@prarit.bos.redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> [2.6.33]
2010-04-30 14:31:38 -07:00
Liang Li
e67a807f3d x86: Fix 'reservetop=' functionality
When specifying the 'reservetop=0xbadc0de' kernel parameter,
the kernel will stop booting due to a early_ioremap bug that
relates to commit 8827247ff.

The root cause of boot failure problem is the value of
'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
But later in setup_arch, the function 'parse_early_param' will
modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.

The simplest fix might be use __fix_to_virt(idx0) to get updated
value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
old value from slot_virt[slot] directly.

Changelog since v0:

-v1: When reservetop being handled then FIXADDR_TOP get
     adjusted, Hence check prev_map then re-initialize slot_virt and
     PMD based on new FIXADDR_TOP.

-v2: place fixup_early_ioremap hence call early_ioremap_init in
     reserve_top_address  to re-initialize slot_virt and
     corresponding PMD when parse_reservertop

-v3: move fixup_early_ioremap out of reserve_top_address to make
     sure other clients of reserve_top_address like xen/lguest won't
     broken

Signed-off-by: Liang Li <liang.li@windriver.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
[ fixed three small cleanliness details in fixup_early_ioremap() ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30 12:19:53 +02:00
Ingo Molnar
3ca50496c2 Merge commit 'v2.6.34-rc6' into perf/core
Merge reason: update to the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30 09:56:44 +02:00
H. Peter Anvin
d9c5841e22 Merge branch 'x86/asm' into x86/atomic
Merge reason:
	Conflict between LOCK_PREFIX_HERE and relative alternatives
	pointers

Resolved Conflicts:
	arch/x86/include/asm/alternative.h
	arch/x86/kernel/alternative.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-29 16:53:17 -07:00
H. Peter Anvin
b701a47ba4 x86: Fix LOCK_PREFIX_HERE for uniprocessor build
Checkin b3ac891b67:
x86: Add support for lock prefix in alternatives

... did not define LOCK_PREFIX_HERE in the case of a uniprocessor
build.  As a result, it would cause any of the usages of this macro to
fail on a uniprocessor build.  Fix this by defining LOCK_PREFIX_HERE
as a null string.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
2010-04-29 16:08:54 -07:00
Linus Torvalds
dfad53d48e Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Disable large pages on CPUs with Atom erratum AAE44
  x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
  x86, mrst: Conditionally register cpu hotplug notifier for apbt
2010-04-28 20:41:55 -07:00
Thomas Gleixner
30a564be9d x86, hpet: Restrict read back to affected ATI chipsets
After programming the HPET, we do a readback as a workaround for
ATI/SBx00 chipsets as a synchronization.  Unfortunately this triggers
an erratum in newer ICH chipsets (ICH9+) where reading the comparator
immediately after the write returns the old value.  Furthermore, as
always, I/O reads are bad for performance.

Therefore, restrict the readback to the chipsets that need it, or, for
debugging purposes, when we are running with hpet=verbose.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 18:14:29 -07:00
Jan Beulich
6fc108a08d x86: Clean up arch/x86/Kconfig*
No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF2690020000780003B340@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:25:53 -07:00
Jan Beulich
47f9fe2629 x86-64: Don't export init_level4_pgt
It's not used by any module, and i386 (as well as some other arches)
also doesn't export its equivalent (swapper_pg_dir).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF33BD020000780003B3E4@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:25:47 -07:00
Jan Beulich
5967ed87ad x86-64: Reduce SMP locks table size
Reduce the SMP locks table size by using relative pointers instead of
absolute ones, thus cutting the table size by half.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF30FE020000780003B3B6@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:15:47 -07:00
Jan Beulich
2e61878698 x86-64: Combine SRAT regions when possible
... i.e. when the hole between two regions isn't occupied by memory on
another node. This reduces the memory->node table size, thus reducing
cache footprint of lookups, which got increased significantly some
time ago, and things go back to how they were before that change on
the systems I looked at.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF3230020000780003B3CA@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:14:11 -07:00
Jan Beulich
402af0d7c6 x86, asm: Introduce and use percpu_inc()
... generating slightly smaller code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF261F020000780003B33C@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 16:58:49 -07:00
Bjorn Helgaas
48728e0774 x86/PCI: compute Address Space length rather than using _LEN
ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN.  Linux has
been computing Address Spaces as [_MIN to _MIN + _LEN - 1].  Based on the
tests in the bug reports below, Windows apparently uses [_MIN to _MAX].

Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
doesn't matter which way we compute the end.  But of course, there are
BIOSes that don't follow this rule, and we're better off if Linux handles
those exceptions the same way as Windows.

This patch makes Linux use [_MIN to _MAX], as Windows seems to do.  This
effectively reverts d558b483d5 and 03db42adfe and replaces them with
simpler code.

    https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
    https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-28 09:17:45 -07:00
Bjorn Helgaas
55051feb57 x86/PCI: never allocate PCI MMIO resources below BIOS_END
When we move a PCI device or assign resources to a device not configured
by the BIOS, we want to avoid the BIOS region below 1MB.  Note that if the
BIOS places devices below 1MB, we leave them there.

See https://bugzilla.kernel.org/show_bug.cgi?id=15744
and https://bugzilla.kernel.org/show_bug.cgi?id=15841

Tested-by: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Bailey <bailey@akamai.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-26 12:30:03 -07:00
Linus Torvalds
383bee6b54 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: Ensure we re-enable devices on resume
  x86/PCI: parse additional host bridge window resource types
  PCI: revert broken device warning
  PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
  x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
2010-04-24 11:32:12 -07:00
Dmitry Torokhov
453dc65931 VMware Balloon driver
This is a standalone version of VMware Balloon driver.  Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation).  In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor.  Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.

We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.

There were also some concerns whether current ballooning technique is the
right thing.  If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.

We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.

The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline.  The driver will only activate
if host is VMware so everyone else should not be affected at all.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:26 -07:00
H. Peter Anvin
7a0fc404ae x86: Disable large pages on CPUs with Atom erratum AAE44
Atom erratum AAE44/AAF40/AAG38/AAH41:

"If software clears the PS (page size) bit in a present PDE (page
directory entry), that will cause linear addresses mapped through this
PDE to use 4-KByte pages instead of using a large page after old TLB
entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then
it may fetch from a different physical address than specified by
either the old large page translation or the new 4-KByte page
translation. This erratum may also cause speculative code fetches from
incorrect addresses."

[http://download.intel.com/design/processor/specupdt/319536.pdf]

Where as commit 211b3d03c7 seems to
workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
opportunity for the bug to occur and does not totally remove it.  This
patch disables mixed 4K/4MB page tables totally avoiding the page
splitting and not tripping this processor issue.

This is based on an original patch by Colin King.

Originally-by: Colin Ian King <colin.king@canonical.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
H. Peter Anvin
7ce5a2b9bb x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
When we do a thread switch, we clear the outgoing FS/GS base if the
corresponding selector is nonzero.  This is taken by __switch_to() as
an entry invariant; it does not verify that it is true on entry.
However, copy_thread() doesn't enforce this constraint, which can
result in inconsistent results after fork().

Make copy_thread() match the behavior of __switch_to().

Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4BD1E061.8030605@zytor.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
Robin Holt
1f9cc3cb6a x86, pat: Update the page flags for memtype atomically instead of using memtype_lock
While testing an application using the xpmem (out of kernel) driver, we
noticed a significant page fault rate reduction of x86_64 with respect
to ia64.  For one test running with 32 cpus, one thread per cpu, it
took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages.
For the same test running on 256 cpus, one thread per cpu, it took 14:48
to vm_insert_pfn 2 GB worth of pages.

The slowdown was tracked to lookup_memtype which acquires the
spinlock memtype_lock.  This heavily contended lock was slowing down
vm_insert_pfn().

With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu
cases take approx 00:01.3 seconds to complete.

Signed-off-by: Robin Holt <holt@sgi.com>
LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: Rafael Wysocki <rjw@novell.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-23 15:57:23 -07:00
Ingo Molnar
70bce3ba77 Merge branch 'linus' into perf/core
Merge reason: merge the latest fixes, update to latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-23 11:10:30 +02:00
Borislav Petkov
59d3b38874 x86, cacheinfo: Disable index in all four subcaches
When disabling an L3 cache index, make sure we disable that index in
all four subcaches of the L3. Clarify nomenclature while at it, wrt to
disable slots versus disable index and rename accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:27 -07:00
Borislav Petkov
ba06edb63f x86, cacheinfo: Make L3 cache info per node
Currently, we're allocating L3 cache info and calculating indices for
each online cpu which is clearly superfluous. Instead, we need to do
this per-node as is each L3 cache.

No functional change, only per-cpu memory savings.

-v2: Allocate L3 cache descriptors array dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Borislav Petkov
9350f982e4 x86, cacheinfo: Reorganize AMD L3 cache structure
Add a struct representing L3 cache attributes (subcache sizes and
indices count) and move the respective members out of _cpuid4_info.
Also, stash the struct pci_dev ptr into the struct simplifying the code
even more.

There should be no functionality change resulting from this patch except
slightly slimming the _cpuid4_info per-cpu vars.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Frank Arnold
f2b20e4140 x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:21 -07:00
Borislav Petkov
b1ab1b4d9a x86, cacheinfo: Unify AMD L3 cache index disable checking
All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3
cache index disable. Concentrate the family, model, stepping checking at
one place and enable the feature implicitly on upcoming Fam10h models.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:20 -07:00
Jiri Kosina
6c9468e9eb Merge branch 'master' into for-next 2010-04-23 02:08:44 +02:00
Bjorn Helgaas
66528fdd45 x86/PCI: parse additional host bridge window resource types
This adds support for Memory24, Memory32, and Memory32Fixed descriptors in
PCI host bridge _CRS.

I experimentally determined that Windows (2008 R2) accepts these descriptors
and treats them as windows that are forwarded to the PCI bus, e.g., if
it finds any PCI devices with BARs outside the windows, it moves them into
the windows.

I don't know whether any machines actually use these descriptors in PCI
host bridge _CRS methods, but if any exist and they're new enough that we
automatically turn on "pci=use_crs", they will work with Windows but not
with Linux.

Here are the details: https://bugzilla.kernel.org/show_bug.cgi?id=15817

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-22 16:13:22 -07:00
Linus Torvalds
a486b0af79 Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix TSS size check for 16-bit tasks
  KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
  KVM: Increase NR_IOBUS_DEVS limit to 200
  KVM: fix the handling of dirty bitmaps to avoid overflows
  KVM: MMU: fix kvm_mmu_zap_page() and its calling path
  KVM: VMX: Save/restore rflags.vm correctly in real mode
  KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
  KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
  KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
  KVM: take srcu lock before call to complete_pio()
2010-04-21 12:29:46 -07:00
Jan Kiszka
e8861cfe2c KVM: x86: Fix TSS size check for 16-bit tasks
A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-21 13:51:42 +03:00
Jacob Pan
ae7c9b70dc x86, mrst: Conditionally register cpu hotplug notifier for apbt
APB timer is used on Moorestown platforms but not on a standard PC.
If APB timer code is compiled in but not initialized at run-time due
to lack of FW reported SFI table, kernel would panic when the non-boot
CPUs are offlined and notifier is called.

https://bugzilla.kernel.org/show_bug.cgi?id=15786

This patch ensures CPU hotplug notifier for APB timer is only registered
when the APBT timer block is initialized.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1271701423-1162-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-20 14:38:28 -07:00
Linus Torvalds
34388d1c4f Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix unsafe frame rewinding with hot regs fetching
2010-04-20 09:20:23 -07:00
Christoph Hellwig
4cecd935f6 x86: correctly wire up the newuname system call
Before commit e28cbf2293 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.

Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.

Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-20 09:17:21 -07:00
Lin Ming
aa2110cb1a ACPI: add boot option acpi=copy_dsdt to fix corrupt DSDT
Some BIOS on Toshiba machines corrupt the DSDT, so add a new
boot option acpi=copy_dsdt to workaround it.
Add warning message to ask users to use this option if corrupt DSDT detected.

Also build a DMI blacklist to check it and automatically copy DSDT.

https://bugzilla.kernel.org/show_bug.cgi?id=14679

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-04-20 10:43:16 -04:00
Justin P. Mattock
40f0a5d0a1 Fix comment typo in percpu.h
Fix a typo in arch/x86/include/asm/percpu.h

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-04-20 16:38:03 +02:00
Takuya Yoshikawa
87bf6e7de1 KVM: fix the handling of dirty bitmaps to avoid overflows
Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 13:06:55 +03:00
Xiao Guangrong
77662e0028 KVM: MMU: fix kvm_mmu_zap_page() and its calling path
This patch fix:

- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking

KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:32 +03:00
Avi Kivity
78ac8b47c5 KVM: VMX: Save/restore rflags.vm correctly in real mode
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode.  The means that the following sequence

  KVM_SET_REGS  (rflags.vm=1)
  KVM_SET_SREGS (cr0.pe=1)

Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().

Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:31 +03:00
Andre Przywara
114be429c8 KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:59:31 +03:00
Avi Kivity
d6a23895aa KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
These are guest-triggerable.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:05 +03:00
Takuya Yoshikawa
b7af404338 KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00
Gleb Natapov
7567cae105 KVM: take srcu lock before call to complete_pio()
complete_pio() may use slot table which is protected by srcu.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00
Zhang, Yanmin
dcf46b9443 perf & kvm: Clean up some of the guest profiling callback API details
Fix some build bug and programming style issues:

 - use valid C
 - fix up various style details

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: oerg Roedel <joro@8bytes.org>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Zachary Amsden <zamsden@redhat.com>
Cc: zhiteng.huang@intel.com
Cc: tim.c.chen@intel.com
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
LKML-Reference: <1271729638.2078.624.camel@ymzhang.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-20 08:08:28 +02:00
Zhang, Yanmin
ff9d07a0e7 KVM: Implement perf callbacks for guest sampling
Below patch implements the perf_guest_info_callbacks on kvm.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-19 12:36:50 +03:00
Zhang, Yanmin
39447b386c perf: Enhance perf to allow for guest statistic collection from host
Below patch introduces perf_guest_info_callbacks and related
register/unregister functions. Add more PERF_RECORD_MISC_XXX bits
meaning guest kernel and guest user space.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-19 12:35:33 +03:00
Randy Dunlap
a289cc7c70 x86, UV: uv_irq.c: Fix all sparse warnings
Fix all sparse warnings in building uv_irq.c.

 arch/x86/kernel/uv_irq.c:46:17: warning: symbol 'uv_irq_chip' was not declared. Should it be static?
 arch/x86/kernel/uv_irq.c:143:50: error: no identifier for function argument
 arch/x86/kernel/uv_irq.c:162:13: error: typename in expression
 arch/x86/kernel/uv_irq.c:162:13: error: undefined identifier 'restrict'
 arch/x86/kernel/uv_irq.c:250:44: error: no identifier for function argument
 arch/x86/kernel/uv_irq.c:260:17: error: typename in expression
 arch/x86/kernel/uv_irq.c:260:17: error: undefined identifier 'restrict'
 arch/x86/kernel/uv_irq.c:233:50: warning: incorrect type in argument 3 (different signedness)
 arch/x86/kernel/uv_irq.c:233:50:    expected int *pnode
 arch/x86/kernel/uv_irq.c:233:50:    got unsigned int *<noident>
 arch/x86/include/asm/uv/uv_hub.h:318:44: warning: incorrect type in argument 2 (different address spaces)
 arch/x86/include/asm/uv/uv_hub.h:318:44:    expected void volatile [noderef] <asn:2>*addr
 arch/x86/include/asm/uv/uv_hub.h:318:44:    got unsigned long *

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100416175142.f4b59683.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-17 10:37:20 +02:00
Linus Torvalds
dc57da3875 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/gart: Disable GART explicitly before initialization
  dma-debug: Cleanup for copy-loop in filter_write()
  x86/amd-iommu: Remove obsolete parameter documentation
  x86/amd-iommu: use for_each_pci_dev
  Revert "x86: disable IOMMUs on kernel crash"
  x86/amd-iommu: warn when issuing command to uninitialized cmd buffer
  x86/amd-iommu: enable iommu before attaching devices
  x86/amd-iommu: Use helper function to destroy domain
  x86/amd-iommu: Report errors in acpi parsing functions upstream
  x86/amd-iommu: Pt mode fix for domain_destroy
  x86/amd-iommu: Protect IOMMU-API map/unmap path
  x86/amd-iommu: Remove double NULL check in check_device
2010-04-15 12:20:56 -07:00
Cliff Wickman
b8f7fb13d2 x86, UV: Improve BAU performance and error recovery
- increase performance of the interrupt handler

- release timed-out software acknowledge resources

- recover from continuous-busy status due to a hardware issue

- add a 'throttle' to keep a uvhub from sending more than a
  specified number of broadcasts concurrently (work around the hardware issue)

- provide a 'nobau' boot command line option

- rename 'pnode' and 'node' to 'uvhub' (the 'node' terminology
  is ambiguous)

- add some new statistics about the scope of broadcasts, retries, the
  hardware issue and the 'throttle'

- split off new function uv_bau_retry_msg() from
  uv_bau_process_message() per community coding style feedback.

- simplify the argument list to uv_bau_process_message(), per
  community coding style feedback.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: linux-mm@kvack.org
Cc: Jack Steiner <steiner@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <E1O25Z4-0004Ur-PB@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-14 18:49:53 +02:00
Rusty Russell
091ebf07a2 lguest: stop using KVM hypercall mechanism
This is a partial revert of 4cd8b5e2a1 "lguest: use KVM hypercalls";
we revert to using (just as questionable but more reliable) int $15 for
hypercalls.  I didn't revert the register mapping, so we still use the
same calling convention as kvm.

KVM in more recent incarnations stopped injecting a fault when a guest
tried to use the VMCALL instruction from ring 1, so lguest under kvm
fails to make hypercalls.  It was nice to share code with our KVM
cousins, but this was overreach.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Matias Zabaljauregui <zabaljauregui@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
2010-04-14 21:43:56 +09:30
Arnd Bergmann
3f10940e4f x86/microcode: Use nonseekable_open()
No need to seek on this file, so prevent it outright so we can
avoid using default_llseek - removes one more BKL usage.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[drop useless llseek = no_llseek and smp_lock.h inclusion]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnd Bergmann <arnd@relay.de.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
LKML-Reference: <1270910781-8786-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-14 12:27:34 +02:00
Ingo Molnar
2b2f862ee6 Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-04-13 13:24:54 +02:00
Mark Langsdorf
679370641e powernow-k8: Fix frequency reporting
With F10, model 10, all valid frequencies are in the ACPI _PST table.

Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:07:47 -07:00
Mark Langsdorf
a2fed573f0 x86, cpufreq: Add APERF/MPERF support for AMD processors
Starting with model 10 of Family 0x10, AMD processors may have
support for APERF/MPERF.  Add support for identifying it and using
it within cpufreq.  Move the APERF/MPERF functions out of the
acpi-cpufreq code and into their own file so they can easily be
shared.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <20100401141956.GA1930@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:07:40 -07:00
Borislav Petkov
d65ad45cd8 x86: Unify APERF/MPERF support
Initialize this CPUID flag feature in common code. It could be made a
standalone function later, maybe, if more functionality is duplicated.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:50 -07:00
Borislav Petkov
73860c6b2f powernow-k8: Add core performance boost support
Starting with F10h, revE, AMD processors add support for a dynamic
core boosting feature called Core Performance Boost. When a specific
condition is present, a subset of the cores on a system are boosted
beyond their P0 operating frequency to speed up the performance of
single-threaded applications.

In the normal case, the system comes out of reset with core boosting
enabled. This patch adds a sysfs knob with which core boosting can be
switched on or off for benchmarking purposes.

While at it, make the CPB code hotplug-aware so that taking cores
offline wouldn't interfere with boosting the remaining online cores.
Furthermore, add cpu_online_mask hotplug protection as suggested by
Andrew.

Finally, cleanup the driver init codepath and update copyrights.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-3-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:43 -07:00
Borislav Petkov
5958f1d5d7 x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo
By semi-popular demand, this adds the Core Performance Boost feature
flag to /proc/cpuinfo. Possible use case for this is userspace tools
like cpufreq-aperf, for example, so that they don't have to jump through
hoops of accessing "/dev/cpu/%d/cpuid" in order to check for CPB hw
support, or call cpuid from userspace.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-2-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:23 -07:00
Frederic Weisbecker
ab285f2b52 perf: Fix unsafe frame rewinding with hot regs fetching
When we fetch the hot regs and rewind to the nth caller, it
might happen that we dereference a frame pointer outside the
kernel stack boundaries, like in this example:

	perf_trace_sched_switch+0xd5/0x120
        schedule+0x6b5/0x860
        retint_careful+0xd/0x21

Since we directly dereference a userspace frame pointer here while
rewinding behind retint_careful, this may end up in a crash.

Fix this by simply using probe_kernel_address() when we rewind the
frame pointer.

This issue will have a much more proper fix in the next version of the
perf_arch_fetch_caller_regs() API that will only need to rewind to the
first caller.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Miller <davem@davemloft.net>
Cc: Archs <linux-arch@vger.kernel.org>
2010-04-08 19:03:28 +02:00
Bjorn Helgaas
73a0e61458 x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
ACPI Address Space Descriptors (used in _CRS) have a Consumer/Producer
bit that is supposed to distinguish regions that are consumed directly
by a device from those that are forwarded ("produced") by a bridge.
But BIOSes have apparently not used this consistently, and Windows
seems to ignore it, so I think Linux should ignore it as well.

I can't point to any of these supposed broken BIOSes, but since we
now rely on _CRS by default, I think it's safer to ignore this bit
from the start.

Here are details of my experiments with how Windows handles it:
    https://bugzilla.kernel.org/show_bug.cgi?id=15701

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-08 09:23:42 -07:00
Ingo Molnar
ca7e0c6120 Merge branch 'linus' into perf/core
Semantic conflict: arch/x86/kernel/cpu/perf_event_intel_ds.c

Merge reason: pick up latest fixes, fix the conflict

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-08 13:37:18 +02:00
Linus Torvalds
48de8cb784 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Enable Nehalem-EX support
  perf kmem: Fix breakage introduced by 5a0e3ad slab.h script
2010-04-07 14:01:51 -07:00
Linus Torvalds
fb1ae63577 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
  x86: Increase CONFIG_NODES_SHIFT max to 10
  ibft, x86: Change reserve_ibft_region() to find_ibft_region()
  x86, hpet: Fix bug in RTC emulation
  x86, hpet: Erratum workaround for read after write of HPET comparator
  bootmem, x86: Fix 32bit numa system without RAM on node 0
  nobootmem, x86: Fix 32bit numa system without RAM on node 0
  x86: Handle overlapping mptables
  x86: Make e820_remove_range to handle all covered case
  x86-32, resume: do a global tlb flush in S4 resume
2010-04-07 11:02:23 -07:00
Joerg Roedel
4b83873d3d x86/gart: Disable GART explicitly before initialization
If we boot into a crash-kernel the gart might still be
enabled and its caches might be dirty. This can result in
undefined behavior later. Fix it by explicitly disabling the
gart hardware before initialization and flushing the caches
after enablement.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 14:36:30 +02:00
Joerg Roedel
12ff4bf58b Merge branch 'amd-iommu/fixes' into iommu/fixes 2010-04-07 14:36:20 +02:00
Chris Wright
d18c69d389 x86/amd-iommu: use for_each_pci_dev
Replace open coded version with for_each_pci_dev

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:34 +02:00
Chris Wright
8f9f55e83e Revert "x86: disable IOMMUs on kernel crash"
This effectively reverts commit 61d047be99.

Disabling the IOMMU can potetially allow DMA transactions to
complete without being translated.  Leave it enabled, and allow
crash kernel to do the IOMMU reinitialization properly.

Cc: stable@kernel.org
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:17 +02:00
Chris Wright
549c90dc9a x86/amd-iommu: warn when issuing command to uninitialized cmd buffer
To catch future potential issues we can add a warning whenever we issue
a command before the command buffer is fully initialized.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:15 +02:00
Chris Wright
75f66533bc x86/amd-iommu: enable iommu before attaching devices
Hit another kdump problem as reported by Neil Horman.  When initializaing
the IOMMU, we attach devices to their domains before the IOMMU is
fully (re)initialized.  Attaching a device will issue some important
invalidations.  In the context of the newly kexec'd kdump kernel, the
IOMMU may have stale cached data from the original kernel.  Because we
do the attach too early, the invalidation commands are placed in the new
command buffer before the IOMMU is updated w/ that buffer.  This leaves
the stale entries in the kdump context and can renders device unusable.
Simply enable the IOMMU before we do the attach.

Cc: stable@kernel.org
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:50:50 +02:00
Borislav Petkov
d61931d89b x86: Add optimized popcnt variants
Add support for the hardware version of the Hamming weight function,
popcnt, present in CPUs which advertize it under CPUID, Function
0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the
default lib/hweight.c sw versions.

A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost
a 3x speedup on a F10h machine.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100318112015.GC11152@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-06 15:52:11 -07:00
Vince Weaver
134fbadf02 perf, x86: Enable Nehalem-EX support
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).

Signed-off-by:  Vince Weaver <vweaver1@eecs.utk.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-06 17:52:59 +02:00
Tejun Heo
336f5899d2 Merge branch 'master' into export-slabh 2010-04-05 11:37:28 +09:00
Frederic Weisbecker
6f4dee06fb perf: Drop the frame reliablity check
It is useless now that we have a pure stack frame
walker, as given addr are always reliable.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
2010-04-04 15:23:05 +02:00
Suresh Siddha
472a474c66 x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
Jan Grossmann reported kernel boot panic while booting SMP
kernel on his system with a single core cpu. SMP kernels call
enable_IR_x2apic() from native_smp_prepare_cpus() and on
platforms where the kernel doesn't find SMP configuration we
ended up again calling enable_IR_x2apic() from the
APIC_init_uniprocessor() call in the smp_sanity_check(). Thus
leading to kernel panic.

Don't call enable_IR_x2apic() and default_setup_apic_routing()
from APIC_init_uniprocessor() in CONFIG_SMP case.

NOTE: this kind of non-idempotent and assymetric initialization
sequence is rather fragile and unclean, we'll clean that up
in v2.6.35. This is the minimal fix for v2.6.34.

Reported-by: Jan.Grossmann@kielnet.net
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <jbarnes@virtuousgeek.org>
Cc: <david.woodhouse@intel.com>
Cc: <weidong.han@intel.com>
Cc: <youquan.song@intel.com>
Cc: <Jan.Grossmann@kielnet.net>
Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x]
LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:48:47 +02:00
Peter Zijlstra
40b91cd10f perf, x86: Add Nehalem programming quirk to Westmere
According to the Xeon-5600 errata the Westmere suffers the same PMU
programming bug as the original Nehalem did.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:52:06 +02:00
Peter Zijlstra
caaa8be3b6 perf, x86: Fix __initconst vs const
All variables that have __initconst should also be const.

Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:52:05 +02:00
Peter Zijlstra
b4cdc5c264 perf, x86: Fix up the ANY flag stuff
Stephane noticed that the ANY flag was in generic arch code, and Cyrill
reported that it broke the P4 code.

Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and
provide intel_pmu and amd_pmu specific versions of this callback.

The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra
event bits AMD64 has.

Reported-by: Stephane Eranian <eranian@google.com>
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Robert Richter <robert.richter@amd.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269968113.5258.442.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:52:04 +02:00
Robert Richter
a098f4484b perf, x86: implement ARCH_PERFMON_EVENTSEL bit masks
ARCH_PERFMON_EVENTSEL bit masks are often used in the kernel. This
patch adds macros for the bit masks and removes local defines. The
function intel_pmu_raw_event() becomes x86_pmu_raw_event() which is
generic for x86 models and same also for p6. Duplicate code is
removed.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330092821.GH11907@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:52:03 +02:00
Robert Richter
948b1bb89a perf, x86: Undo some some *_counter* -> *_event* renames
The big rename:

 cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events

accidentally renamed some members of stucts that were named after
registers in the spec. To avoid confusion this patch reverts some
changes. The related specs are MSR descriptions in AMD's BKDGs and the
ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32
Architectures Software Developer's Manuals.

This patch does:

 $ sed -i -e 's:num_events:num_counters:g' \
   arch/x86/include/asm/perf_event.h \
   arch/x86/kernel/cpu/perf_event_amd.c \
   arch/x86/kernel/cpu/perf_event.c \
   arch/x86/kernel/cpu/perf_event_intel.c \
   arch/x86/kernel/cpu/perf_event_p6.c \
   arch/x86/kernel/cpu/perf_event_p4.c \
   arch/x86/oprofile/op_model_ppro.c

 $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \
   arch/x86/kernel/cpu/perf_event_amd.c \
   arch/x86/kernel/cpu/perf_event.c \
   arch/x86/kernel/cpu/perf_event_intel.c \
   arch/x86/kernel/cpu/perf_event_p6.c \
   arch/x86/kernel/cpu/perf_event_p4.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:52:02 +02:00
Ingo Molnar
ec5e61aabe Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/cpu/perf_event.c

Merge reason: Resolve the conflict, pick up fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:38:10 +02:00
Torok Edwin
257ef9d21f perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels
When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
stopped after the first function, because it has seen a garbage memory
address (tried to interpret the frame pointer, and return address as a
64-bit pointer).

Fix this by using a struct stack_frame with 32-bit pointers when the
TIF_IA32 flag is set.

Note that TIF_IA32 flag must be used, and not is_compat_task(), because
the latter is only set when the 32-bit process is executing a syscall,
which may not always be the case (when tracing page fault events for
example).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:30:03 +02:00
Peter Zijlstra
b38b24ead3 perf, x86: Fix AMD hotplug & constraint initialization
Commit 3f6da39 ("perf: Rework and fix the arch CPU-hotplug hooks") moved
the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP
however amd_nb_id() doesn't work yet on prepare so it would simply bail
basically reverting to a state where we do not properly track node wide
constraints - causing weird perf results.

Fix up the AMD NorthBridge initialization code by allocating from
CPU_UP_PREPARE and installing it from CPU_STARTING once we have the
proper nb_id. It also properly deals with the allocation failing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ robustify using amd_has_nb() ]
Signed-off-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269353485.5109.48.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:30:02 +02:00
Peter Zijlstra
8525702409 x86: Move notify_cpu_starting() callback to a later stage
Because we need to have cpu identification things done by the time we run
CPU_STARTING notifiers.

( This init ordering will be relied on by the next fix. )

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269353485.5109.48.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:30:01 +02:00
Ingo Molnar
50d11d190a Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent 2010-04-02 19:29:17 +02:00
David Rientjes
51591e31dc x86: Increase CONFIG_NODES_SHIFT max to 10
Some larger systems require more than 512 nodes, so increase the
maximum CONFIG_NODES_SHIFT to 10 for a new max of 1024 nodes.

This was tested with numa=fake=64M on systems with more than
64GB of RAM. A total of 1022 nodes were initialized.

Successfully builds with no additional warnings on x86_64
allyesconfig.

( No effect on any existing config. Newly enabled CONFIG_MAXSMP=y
  will see the new default. )

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1003251538060.8589@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:09:31 +02:00
Yinghai Lu
042be38e61 ibft, x86: Change reserve_ibft_region() to find_ibft_region()
This allows arch code could decide the way to reserve the ibft.

And we should reserve ibft as early as possible, instead of BOOTMEM
stage, in case the table is in RAM range and is not reserved by BIOS
(this will often be the case.)

Move to just after find_smp_config().

Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.

-v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4BB510FB.80601@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Jones <pjones@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
CC: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-01 16:12:48 -07:00
Alok Kataria
b4a5e8a1de x86, hpet: Fix bug in RTC emulation
We think there exists a bug in the HPET code that emulates the RTC.

In the normal case, when the RTC frequency is set, the rtc driver tells
the hpet code about it here:

int hpet_set_periodic_freq(unsigned long freq)
{
        uint64_t clc;

        if (!is_hpet_enabled())
                return 0;

        if (freq <= DEFAULT_RTC_INT_FREQ)
                hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq;
        else {
                clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC;
                do_div(clc, freq);
                clc >>= hpet_clockevent.shift;
                hpet_pie_delta = (unsigned long) clc;
        }
        return 1;
}

If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then
hpet_pie_limit (a static) is set to non-zero.  Then, on every one-shot
HPET interrupt, hpet_rtc_timer_reinit is called to compute the next
timeout.  Well, that function has this logic:

        if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit)
                delta = hpet_default_delta;
        else
                delta = hpet_pie_delta;

Since hpet_pie_limit is not 0, hpet_default_delta is used.  That
corresponds to 64Hz.

Now, if you set a different rtc frequency, you'll take the else path
through hpet_set_periodic_freq, but unfortunately no one resets
hpet_pie_limit back to 0.

Boom....now you are stuck with 64Hz RTC interrupts forever.

The patch below just resets the hpet_pie_limit value when requested freq
is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <201003112200.o2BM0Hre012875@imap1.linux-foundation.org>
Signed-off-by: Daniel Hecht <dhecht@vmware.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-01 15:21:48 -07:00
Pallipadi, Venkatesh
8da854cb02 x86, hpet: Erratum workaround for read after write of HPET comparator
On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote:
> Hello,
>
> Again, on the Intel DP55KG board:
>
> # uname -a
> Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux
>
> [    1.237600] ------------[ cut here ]------------
> [    1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80()
> [    1.238221] Hardware name:
> [    1.238504] hpet: compare register read back failed.
> [    1.238793] Modules linked in:
> [    1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1
> [    1.239605] Call Trace:
> [    1.239886]  <IRQ>  [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0
> [    1.240409]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [    1.240699]  [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50
> [    1.240992]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [    1.241281]  [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80
> [    1.241573]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
> [    1.241859]  [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100
> [    1.246533]  [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30
> [    1.246826]  [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0
> [    1.247118]  [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160
> [    1.247407]  [<ffffffff81029f55>] ? handle_irq+0x15/0x20
> [    1.247689]  [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0
> [    1.247976]  [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa
> [    1.248262]  <EOI>  [<ffffffff8102f277>] ? mwait_idle+0x57/0x80
> [    1.248796]  [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0
> [    1.249080] ---[ end trace db7f668fb6fef4e1 ]---
>
> Is this something Intel has to fix or is it a bug in the kernel?

This is a chipset erratum.

Thomas: You mentioned we can retain this check only for known-buggy and
hpet debug kind of options. But here is the simple workaround patch for
this particular erratum.

Some chipsets have a erratum due to which read immediately following a
write of HPET comparator returns old comparator value instead of most
recently written value.

Erratum 15 in
"Intel I/O Controller Hub 9 (ICH9) Family Specification Update"
(http://www.intel.com/assets/pdf/specupdate/316973.pdf)

Workaround for the errata is to read the comparator twice if the first
one fails.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: <stable@kernel.org>
2010-04-01 15:21:47 -07:00
Andi Kleen
909fc87b32 x86: Handle overlapping mptables
We found a system where the MP table MPC and MPF structures overlap.

That doesn't really matter because the mptable is not used anyways with ACPI,
but it leads to a panic in the early allocator due to the overlapping
reservations in 2.6.33.

Earlier kernels handled this without problems.

Simply change these reservations to reserve_early_overlap_ok to avoid
the panic.

Reported-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100329074111.GA22821@basil.fritz.box>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2010-04-01 13:31:07 -07:00
Jason Wessel
ab310b5edb x86,kgdb: Always initialize the hw breakpoint attribute
It is required to call hw_breakpoint_init() on an attr before using it
in any other calls.  This fixes the problem where kgdb will sometimes
fail to initialize on x86_64.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: 2.6.33 <stable@kernel.org>
LKML-Reference: <1269975907-27602-1-git-send-email-jason.wessel@windriver.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-04-01 08:26:32 +02:00
Frederic Weisbecker
e49a5bd381 perf: Use hot regs with software sched switch/migrate events
Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().

Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.

Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.

This makes the task migration event working and fix the context
switch callchains and origin ip.

Example: perf record -a -e cs

Before:

    10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                |
                --- (nil)
                    perf_callchain
                    perf_prepare_sample
                    __perf_event_overflow
                    perf_swevent_overflow
                    perf_swevent_add
                    perf_swevent_ctx_event
                    do_perf_sw_event
                    __perf_sw_event
                    perf_event_task_sched_out
                    schedule
                    run_ksoftirqd
                    kthread
                    kernel_thread_helper

After:

    23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
            |
            --- schedule
               |
               |--60.00%-- schedule_timeout
               |          wait_for_common
               |          wait_for_completion
               |          blk_execute_rq
               |          scsi_execute
               |          scsi_execute_req
               |          sr_test_unit_ready
               |          |
               |          |--66.67%-- sr_media_change
               |          |          media_changed
               |          |          cdrom_media_changed
               |          |          sr_block_media_changed
               |          |          check_disk_change
               |          |          cdrom_open

v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
2010-04-01 08:26:31 +02:00
Yinghai Lu
9f3a5f52aa x86: Make e820_remove_range to handle all covered case
Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and
looks e820_remove_range does not work right.

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  LGUEST: 0000000000000000 - 0000000004000000 (usable)
[    0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS!
[    0.000000] DMI not present or invalid.
[    0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000
[    0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000

root cause is: the e820_remove_range doesn't handle the all covered
case.  e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...)
produces a bogus range as a result.

Make it match e820_update_range() by handling that case too.

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <4BB18E55.6090903@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-31 17:40:57 -07:00
Shaohua Li
8ae06d223f x86-32, resume: do a global tlb flush in S4 resume
Colin King reported a strange oops in S4 resume code path (see below). The test
system has i5/i7 CPU. The kernel doesn't open PAE, so 4M page table is used.
The oops always happen a virtual address 0xc03ff000, which is mapped to the
last 4k of first 4M memory. Doing a global tlb flush fixes the issue.

EIP: 0060:[<c0493a01>] EFLAGS: 00010086 CPU: 0
EIP is at copy_loop+0xe/0x15
EAX: 36aeb000 EBX: 00000000 ECX: 00000400 EDX: f55ad46c
ESI: 0f800000 EDI: c03ff000 EBP: f67fbec4 ESP: f67fbea8
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
...
...
CR2: 00000000c03ff000

Tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100305005932.GA22675@sli10-desk.sh.intel.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org>
2010-03-30 11:46:02 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Tejun Heo
57f4c226d1 x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h
Including slab.h from x86 pgtable_32.h creates a troublesome
dependency chain w/ ftrace enabled.  The following chain leads to
inclusion of pgtable_32.h from define_trace.h.

 trace/define_trace.h
 trace/ftrace.h
 linux/ftrace_event.h
 linux/ring_buffer.h
 linux/mm.h
 asm/pgtable.h
 asm/pgtable_32.h

slab.h itself defines trace hooks via

 linux/sl[aou]b_def.h
 linux/kmemtrace.h
 trace/events/kmem.h

If slab.h is not included before define_trace.h is included, this
leads to duplicate definitions of kmemtrace hooks or other include
dependency problems.

pgtable_32.h doesn't need slab.h to begin with.  Don't include it from
there.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
2010-03-30 22:02:21 +09:00
Yinghai Lu
c967da6a0b x86: Make sure free_init_pages() frees pages on page boundary
When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
in a more compact fashion.

Example:

 Allocated new RAMDISK: 00ec2000 - 0248ce57
 Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56

The new RAMDISK's end is not page aligned.
Last page could be shared with other users.

When free_init_pages are called for initrd or .init, the page
could be freed and we could corrupt other data.

code segment in free_init_pages():

 |        for (; addr < end; addr += PAGE_SIZE) {
 |                ClearPageReserved(virt_to_page(addr));
 |                init_page_count(virt_to_page(addr));
 |                memset((void *)(addr & ~(PAGE_SIZE-1)),
 |                        POISON_FREE_INITMEM, PAGE_SIZE);
 |                free_page(addr);
 |                totalram_pages++;
 |        }

last half page could be used as one whole free page.

So page align the boundaries.

-v2: make the original initramdisk to be aligned, according to
     Johannes, otherwise we have the chance to lose one page.
     we still need to keep initrd_end not aligned, otherwise it could
     confuse decompressor.
-v3: change to WARN_ON instead, suggested by Johannes.
-v4: use PAGE_ALIGN, suggested by Johannes.
     We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
     Add comments about assuming ramdisk start is aligned
     in relocate_initrd(), change to re get ramdisk_image instead of save it
     to make diff smaller. Add warning for wrong range, suggested by Johannes.
-v6: remove one WARN()
     We need to align beginning in free_init_pages()
     do not copy more than ramdisk_size, noticed by Johannes

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-29 18:55:33 +02:00
Yinghai Lu
596b711ed6 x86: Make smp_locks end with page alignment
Fix:

 ------------[ cut here ]------------
 WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa()
 free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned
 Modules linked in:
 Pid: 0, comm: swapper Not tainted
 2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace:
  [<40232e9f>] warn_slowpath_common+0x65/0x7c
  [<4021c9f0>] ? free_init_pages+0x4c/0xfa
  [<40881434>] ? _etext+0x0/0x24
  [<40232eea>] warn_slowpath_fmt+0x24/0x27
  [<4021c9f0>] free_init_pages+0x4c/0xfa
  [<40881434>] ? _etext+0x0/0x24
  [<40d3f4bd>] alternative_instructions+0xf6/0x100
  [<40d3fe4f>] check_bugs+0xbd/0xbf
  [<40d398a7>] start_kernel+0x2d5/0x2e4
  [<40d390ce>] i386_start_kernel+0xce/0xd5
 ---[ end trace 4eaa2a86a8e2da22 ]---

Comments in vmlinux.lds.S already said:

 |        /*
 |         * smp_locks might be freed after init
 |         * start/end must be page aligned
 |         */

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-29 18:42:30 +02:00
Linus Torvalds
b72c40949b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1
  x86/PCI: for host bridge address space collisions, show conflicting resource
  frv/PCI: remove redundant warnings
  x86/PCI: remove redundant warnings
  PCI: don't say we claimed a resource if we failed
  PCI quirk: Disable MSI on VIA K8T890 systems
  PCI quirk: RS780/RS880: work around missing MSI initialization
  PCI quirk: only apply CX700 PCI bus parking quirk if external VT6212L is present
  PCI: complain about devices that seem to be broken
  PCI: print resources consistently with %pR
  PCI: make disabled window printk style match the enabled ones
  PCI: break out primary/secondary/subordinate for readability
  PCI: for address space collisions, show conflicting resource
  resources: add interfaces that return conflict information
  PCI: cleanup error return for pcix get and set mmrbc functions
  PCI: fix access of PCI_X_CMD by pcix get and set mmrbc functions
  PCI: kill off pci_register_set_vga_state() symbol export.
  PCI: fix return value from pcix_get_max_mmrbc()
2010-03-26 16:34:29 -07:00
Linus Torvalds
f3845f3f60 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd: Restrict usage of c1e_idle()
  x86: Fix placement of FIX_OHCI1394_BASE
  x86: Handle legacy PIC interrupts on all the cpu's
2010-03-26 15:10:56 -07:00
Peter Zijlstra
11164cd4f6 perf, x86: Add Nehelem PMU programming errata workaround
Implement the workaround for Intel Errata AAK100 and AAP53.

Also, remove the Core-i7 name for Nehalem events since there are
also Westmere based i7 chips.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1269608924.12097.147.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 15:47:24 +01:00
Peter Zijlstra
ea8e61b7bb x86, ptrace: Fix block-step
Implement ptrace-block-step using TIF_BLOCKSTEP which will set
DEBUGCTLMSR_BTF when set for a task while preserving any other
DEBUGCTLMSR bits.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100325135414.017536066@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 11:33:57 +01:00
Peter Zijlstra
faa4602e47 x86, perf, bts, mm: Delete the never used BTS-ptrace code
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.

It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.

Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.

So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 11:33:55 +01:00
Peter Zijlstra
7c5ecaf766 perf, x86: Clean up debugctlmsr bit definitions
Move all debugctlmsr thingies into msr-index.h

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100325135413.861425293@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 09:41:03 +01:00
Cyrill Gorcunov
d814f30105 x86, perf: Add raw events support for the P4 PMU
The adding of raw event support lead to complete code
refactoring. I hope is became more readable then it was.

The list of changes:

1)  The 64bit config field is enough to hold all information we need
    to track event details. To achieve it we used *own* enum for
    events selection in ESCR register and map this key into proper
    value at moment of event enabling.

    For the same reason we use 12LSB bits in CCCR register -- to track
    which exactly cache trace event was requested. And we cear this bits
    at real 'write' moment.

2)  There is no per-cpu area reserved for P4 PMU anymore. We
    don't need it. All is held by config.

3)  Now we may use any available counter, ie we try to grab any
    possible counter.

v2:
  - Lin Ming reported the lack of ESCR selector in CCCR for cache events

v3:
  - Don't loose cache event codes at config unpacking procedure, we may
    need it one day so no obscure hack behind our back, better to clear
    reserved bits explicitly when needed (thanks Ming for pointing out)

  - Lin Ming fixed misplaced opcodes in cache events

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com>
[ v4: did a few whitespace fixlets ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 08:45:49 +01:00
Bjorn Helgaas
d558b483d5 x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1
Yanko's GA-MA78GM-S2H (BIOS F11) reports the following resource in a PCI
host bridge _CRS:

    [07] 32-Bit DWORD Address Space Resource
         Min Relocatability : MinFixed
         Max Relocatability : MaxFixed
            Address Minimum : CFF00000  (_MIN)
            Address Maximum : FEBFFFFF  (_MAX)
             Address Length : 3EE10000  (_LEN)

This is invalid per spec (ACPI 4.0, 6.4.3.5) because it's a fixed size,
fixed location descriptor, but _LEN != _MAX - _MIN + 1.

Based on https://bugzilla.kernel.org/show_bug.cgi?id=15480#c15, I think
Windows handles this by truncating the window so it fits between _MIN and
_MAX.  I also verified this by modifying the SeaBIOS DSDT and booting
Windows 2008 R2 with qemu.

This patch makes Linux truncate the window, too, which fixes:
    http://bugzilla.kernel.org/show_bug.cgi?id=15480

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Yanko Kaneti <yaneti@declera.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-25 10:14:13 -07:00
Bjorn Helgaas
eb9fc8ef7c x86/PCI: for host bridge address space collisions, show conflicting resource
With insert_resource_conflict(), we can learn what the actual conflict is,
so print that info for debugging purposes.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-25 10:14:07 -07:00
Bjorn Helgaas
c9c9b56471 x86/PCI: remove redundant warnings
pci_claim_resource() already prints more detailed error messages, so these
are really redundant.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-03-25 08:51:38 -07:00
Ingo Molnar
d2f1e15b66 Merge commit 'v2.6.34-rc2' into perf/core
Merge reason: Pick up latest perf fixes from upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-22 18:47:01 +01:00
Rafael J. Wysocki
a90110c610 x86 / perf: Fix suspend to RAM on HP nx6325
Commit 3f6da39053
(perf: Rework and fix the arch CPU-hotplug hooks) broke suspend to
RAM on my HP nx6325 (and most likely on other AMD-based boxes too)
by allowing amd_pmu_cpu_offline() to be executed for CPUs that are
going offline as part of the suspend process.  The problem is that
cpuhw->amd_nb may be NULL already, so the function should make sure
it's not NULL before accessing the object pointed to by it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-22 09:57:19 -07:00
Andreas Herrmann
035a02c1e1 x86, amd: Restrict usage of c1e_idle()
Currently c1e_idle returns true for all CPUs greater than or equal to
family 0xf model 0x40. This covers too many CPUs.

Meanwhile a respective erratum for the underlying problem was filed
(#400). This patch adds the logic to check whether erratum #400
applies to a given CPU.
Especially for CPUs where SMI/HW triggered C1e is not supported,
c1e_idle() doesn't need to be used. We can check this by looking at
the respective OSVW bit for erratum #400.

Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100319110922.GA19614@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-19 14:43:36 -07:00
Shane Wang
4bd96a7a81 x86, tboot: Add support for S3 memory integrity protection
This patch adds support for S3 memory integrity protection within an Intel(R)
TXT launched kernel, for all kernel and userspace memory.  All RAM used by the
kernel and userspace, as indicated by memory ranges of type E820_RAM and
E820_RESERVED_KERN in the e820 table, will be integrity protected.

The MAINTAINERS file is also updated to reflect the maintainers of the
TXT-related code.

All MACing is done in tboot, based on a complexity analysis and tradeoff.

v3: Compared with v2, this patch adds a check of array size in
tboot.c, and a note to specify which c/s of tboot supports this kind
of MACing in intel_txt.txt.

Signed-off-by: Shane Wang <shane.wang@intel.com>
LKML-Reference: <4B973DDA.6050902@intel.com>
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-19 13:39:58 -07:00
Lin Ming
40b7e05e17 perf, x86: Fix key indexing in Pentium-4 PMU
Index 0-6 in p4_templates are reserved for common hardware
events. So p4_templates is arranged as below:

    0  -    6:  common hardware events
    7  -    N:  cache events
  N+1  -  ...:  other raw events

Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268983738.13901.142.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-19 09:23:17 +01:00
Linus Torvalds
f82c37e7bb Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
  perf: Fix unexported generic perf_arch_fetch_caller_regs
  perf record: Don't try to find buildids in a zero sized file
  perf: export perf_trace_regs and perf_arch_fetch_caller_regs
  perf, x86: Fix hw_perf_enable() event assignment
  perf, ppc: Fix compile error due to new cpu notifiers
  perf: Make the install relative to DESTDIR if specified
  kprobes: Calculate the index correctly when freeing the out-of-line execution slot
  perf tools: Fix sparse CPU numbering related bugs
  perf_event: Fix oops triggered by cpu offline/online
  perf: Drop the obsolete profile naming for trace events
  perf: Take a hot regs snapshot for trace events
  perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
  perf/x86-64: Use frame pointer to walk on irq and process stacks
  lockdep: Move lock events under lockdep recursion protection
  perf report: Print the map table just after samples for which no map was found
  perf report: Add multiple event support
  perf session: Change perf_session post processing functions to take histogram tree
  perf session: Add storage for seperating event types in report
  perf session: Change add_hist_entry to take the tree root instead of session
  perf record: Add ID and to recorded event data when recording multiple events
  ...
2010-03-18 16:52:46 -07:00
Cyrill Gorcunov
9c8c6bad31 x86, perf: Fix few cosmetic dabs for P4 pmu (comments and constantify)
- A few ESCR have escaped fixing at previous attempt.
- p4_escr_map is read only, make it const.

Nothing serious.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100318211256.GH5062@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-18 22:17:46 +01:00
Stephane Eranian
4b24a88b35 perf_events: Fix resource leak in x86 __hw_perf_event_init()
If reserve_pmc_hardware() succeeds but reserve_ds_buffers()
fails, then we need to release_pmc_hardware. It won't be done
by the destroy() callback because we return before setting it
in case of error.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: robert.richter@amd.com
Cc: perfmon2-devel@lists.sf.net
LKML-Reference: <4ba1568b.15185e0a.182a.7802@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
 arch/x86/kernel/cpu/perf_event.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
2010-03-18 18:39:40 +01:00
Lin Ming
cb7d6b5053 perf, x86: Add cache events for the Pentium-4 PMU
Move the HT bit setting code from p4_pmu_event_map to
p4_hw_config. So the cache events can get HT bit set correctly.

Tested on my P4 desktop, below 6 cache events work:

 L1-dcache-load-misses
 LLC-load-misses
 dTLB-load-misses
 dTLB-store-misses
 iTLB-loads
 iTLB-load-misses

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-18 17:04:02 +01:00
Lin Ming
f34edbc1cd perf, x86: Add a key to simplify template lookup in Pentium-4 PMU
Currently, we use opcode(Event and Event-Selector) + emask to
look up template in p4_templates.

But cache events (L1-dcache-load-misses, LLC-load-misses, etc)
use the same event(P4_REPLAY_EVENT) to do the counting, ie, they
have the same opcode and emask. So we can not use current lookup
mechanism to find the template for cache events.

This patch introduces a "key", which is the index into
p4_templates. The low 12 bits of CCCR are reserved, so we can
hide the "key" in the low 12 bits of hwc->config.

We extract the key from hwc->config and then quickly find the
template.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-18 17:03:51 +01:00
Cyrill Gorcunov
7335f75e9c x86, perf: Use apic_write unconditionally
Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC
case we're safe to remove this conditional compilation and clean up
the code a bit.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: fweisbec@gmail.com
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: peterz@infradead.org
LKML-Reference: <20100317104356.232371479@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-18 17:03:22 +01:00
Cyrill Gorcunov
d674cd1963 x86, apic: Allow to use certain functions without APIC built-in support
In case even if the kernel is configured so that
no APIC support is built-in we still may allow
to use certain apic functions as dummy calls.

In particular we start using it in perf-events code.

Note that this is not that same as NOOP apic driver (which
is used if APIC support is present but no physical APIC is
available), this is for the case when we don't have apic code
compiled in at all.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20100317104356.011052632@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-18 17:03:21 +01:00
Jack Steiner
2acebe9ecb x86, UV: Delete unneeded boot messages
SGI:UV: Delete extra boot messages that describe the system
topology. These messages are no longer useful.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100317154038.GA29346@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 17:00:55 +01:00
Robert Richter
d6dc0b4ead perf/core, x86: Remove duplicate perf_event_mask variable
The same information is stored also in x86_pmu.intel_ctrl. This
patch removes perf_event_mask and instead uses
x86_pmu.intel_ctrl directly.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-5-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 13:06:59 +01:00
Robert Richter
10f1014d86 perf/core, x86: Remove cpu_hw_events.interrupts
This member in the struct is not used anymore and can be
removed.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 13:06:59 +01:00
Robert Richter
b27ea29c62 perf/core, x86: Reduce number of CONFIG_X86_LOCAL_APIC macros
The function reserve_pmc_hardware() and release_pmc_hardware()
were hard to read. This patch improves readability of the code by
removing most of the CONFIG_X86_LOCAL_APIC macros.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268826553-19518-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 13:06:58 +01:00
Frederic Weisbecker
dcd5c1662d perf: Fix unexported generic perf_arch_fetch_caller_regs
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.

As a general rule, weak functions should not have their symbol
exported in the same file they are defined.

So let's export it on trace_event_perf.c as it is used by trace
events only.

This fixes:

	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!

-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 12:26:49 +01:00
Robert Richter
984763cb90 perf, x86: Report error code that returned from x86_pmu.hw_config()
If x86_pmu.hw_config() fails a fixed error code (-EOPNOTSUPP) is
returned even if a different error was reported. This patch fixes
this.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Lin Ming <ming.m.lin@intel.com>
Cc: acme@redhat.com
Cc: eranian@google.com
Cc: gorcunov@openvz.org
Cc: peterz@infradead.org
Cc: fweisbec@gmail.com
LKML-Reference: <20100316160733.GR1585@erda.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-17 10:43:50 +01:00
Jan Beulich
ff30a0543e x86: Fix placement of FIX_OHCI1394_BASE
Ever for 32-bit with sufficiently high NR_CPUS, and starting
with commit 789d03f584 also for
64-bit, the statically allocated early fixmap page tables were
not covering FIX_OHCI1394_BASE, leading to a boot time crash
when "ohci1394_dma=early" was used. Despite this entry not being
a permanently used one, it needs to be moved into the permanent
range since it has to be close to FIX_DBGP_BASE and
FIX_EARLYCON_MEM_BASE.

Reported-bisected-and-tested-by: Justin P. Mattock <justinmattock@gmail.com>
Fixes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=14487
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org> # [as far back as long as it still applies]
LKML-Reference: <4B9E15D30200007800034D23@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-16 11:16:27 +01:00
Lin Ming
8ea7f54410 x86, perf: Fix comments in Pentium-4 PMU definitions
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1268705556.3379.8.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-16 09:37:46 +01:00
Frederic Weisbecker
1d199b1ad6 perf: Fix unexported generic perf_arch_fetch_caller_regs
perf_arch_fetch_caller_regs() is exported for the overriden x86
version, but not for the generic weak version.

As a general rule, weak functions should not have their symbol
exported in the same file they are defined.

So let's export it on trace_event_perf.c as it is used by trace
events only.

This fixes:

	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!

-v2: And also only build it if trace events are enabled.
-v3: Fix changelog mistake

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-16 09:27:27 +01:00
Suresh Siddha
36e9e1eab7 x86: Handle legacy PIC interrupts on all the cpu's
Ingo Molnar reported that with the recent changes of not
statically blocking IRQ0_VECTOR..IRQ15_VECTOR's on all the
cpu's, broke an AMD platform (with Nvidia chipset) boot when
"noapic" boot option is used.

On this platform, legacy PIC interrupts are getting delivered to
all the cpu's instead of just the boot cpu. Thus not
initializing the vector to irq mapping for the legacy irq's
resulted in not handling certain interrupts causing boot hang.

Fix this by initializing the vector to irq mapping on all the
logical cpu's, if the legacy IRQ is handled by the legacy PIC.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
[ -v2: io-apic-enabled improvement ]
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1268692386.3296.43.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-16 06:36:35 +01:00
Cyrill Gorcunov
e449526282 perf, x86: Enable not tagged retired instruction counting on P4s
This should turn on instruction counting on P4s, which was missing in
the first version of the new PMU driver.

It's inaccurate for now, we still need dependant event to tag mops
before we can count them precisely. The result is that the number of
instruction may be lifted up.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-15 08:14:34 +01:00
Len Brown
ec28dcc6b4 Merge branches 'battery-2.6.34', 'bugzilla-10805', 'bugzilla-14668', 'bugzilla-531916-power-state', 'ht-warn-2.6.34', 'pnp', 'processor-rename', 'sony-2.6.34', 'suse-bugzilla-531547', 'tz-check', 'video' and 'misc-2.6.34' into release 2010-03-14 21:30:17 -04:00
Alex Chiang
d8191fa4a3 ACPI: processor: driver doesn't need to evaluate _PDC
Now that the early _PDC evaluation path knows how to correctly
evaluate _PDC on only physically present processors, there's no
need for the processor driver to evaluate it later when it loads.

To cover the hotplug case, push _PDC evaluation down into the
hotplug paths.

Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-03-14 21:17:22 -04:00
Len Brown
4c81ba4900 ACPI: plan to delete "acpi=ht" boot option
Signed-off-by: Len Brown <len.brown@intel.com>
2010-03-14 20:58:24 -04:00
Len Brown
8144c88039 ACPI: remove "acpi=ht" DMI blacklist
SuSE added these entries when deploying ACPI in Linux-2.4.
I pulled them into Linux-2.6 on 2003-08-09.
Over the last 6+ years, several entries have proven to be
unnecessary and deleted, while no new entries have been added.
Matthew suggests that they now have negative value, and I agree.

Based-on-patch-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-03-14 20:57:02 -04:00
Ingo Molnar
2aa2b50dd6 x86/mce: Fix build bug with CONFIG_PROVE_LOCKING=y && CONFIG_X86_MCE_INTEL=y
Commit f56e8a076 "x86/mce: Fix RCU lockdep splats" introduced the
following build bug:

  arch/x86/kernel/cpu/mcheck/mce.c: In function 'mce_log':
  arch/x86/kernel/cpu/mcheck/mce.c:166: error: 'mce_read_mutex' undeclared (first use in this function)
  arch/x86/kernel/cpu/mcheck/mce.c:166: error: (Each undeclared identifier is reported only once
  arch/x86/kernel/cpu/mcheck/mce.c:166: error: for each function it appears in.)

Move the in-the-middle-of-file lock variable up to the variable
definition section, the top of the .c file.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1267830207-9474-3-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-14 08:57:03 +01:00