Commit Graph

5462 Commits

Author SHA1 Message Date
Jeremy Fitzhardinge
e25371d60c x86/ioapic.c: unify ioapic_retrigger_irq()
The 32 and 64-bit versions of ioapic_retrigger_irq() are identical
except the 64-bit one takes vector_lock.  vector_lock is defined and
used on 32-bit too, so just use a common ioapic_retrigger_irq().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:51 -07:00
Jeremy Fitzhardinge
638f2f8c52 x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop
Use a normal for() loop in __target_IO_APIC_irq().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge
4eea6fff61 x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments
There's no need for a control variable in replace_pin_at_irq_node();
it can just return if it finds the old apic/pin to replace.

If the loop terminates, then it didn't find the old apic/pin, so it can
add the new ones.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge
535b64291a x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop
Use a conventional for() loop in replace_pin_at_irq_node().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge
875e68ec32 x86/ioapic.c: simplify add_pin_to_irq_node()
Rather than duplicating the same alloc/init code twice, restructure
the function to look for duplicates and then add an entry
if none is found.

This function is not performance critical; all but one of its callers
are __init functions, and the non-__init caller is for PCI device setup.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge
d8c52063ed x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop
Convert the unconventional loop in io_apic_level_ack_pending() to
a conventional for() loop.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge
8e13d697fe x86/ioapic.c: move lost comment to what seems like appropriate place
The comment got separated from its subject, so move it to what
appears to be the right place, and update to describe the current
structure.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge
83c21bedf6 x86/ioapic.c: remove redundant declaration of irq_pin_list
The structure is defined immediately below, so there's no need
to forward declare it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge
916a0fe739 x86/ioapic.c: remove #ifdef for 82093AA workaround
While no 64-bit hardware will have a version 0x11 I/O APIC which needs
the level/edge bug workaround, that's not a particular reason to use
CONFIG_X86_32 to #ifdef the code out.  Most 32-bit machines will no
longer need the workaround either, so the test to see whether it is
necessary should be more fine-grained than "32-bit=yes, 64-bit=no".

(Also fix formatting of block comment.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge
890aeacf64 x86/ioapic.c: unify __mask_IO_APIC_irq()
The main difference between 32 and 64-bit __mask_IO_APIC_irq() does a
readback from the I/O APIC to synchronize it.

If there's a hardware requirement to do a readback sync after updating
an APIC register, then it will be a hardware requrement regardless of
whether the kernel is compiled 32 or 64-bit.

Unify __mask_IO_APIC_irq() using the 64-bit version which always syncs
with io_apic_sync().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge
2f210deba9 x86/ioapic.c: ioapic_modify_irq is too large to inline
If ioapic_modify_irq() is marked inline, it gets inlined several times.
Un-inlining it saves around 200 bytes in .text for me.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:48 -07:00
Jeremy Fitzhardinge
6b2b171a77 x86/acpi: acpi_parse_madt_ioapic_entries: remove redundant braces
We don't put braces around a single statement.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14 13:32:48 -07:00
Dave Jones
2ad76643ff x86: Fix warning in pvclock.c
when building 32-bit, I see this ..
arch/x86/kernel/pvclock.c:63:7: warning: "__x86_64__" is not defined

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20090713201437.GA12165@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-14 16:25:05 +02:00
Rakib Mullick
7473727be8 x86, apic: Fix false positive section mismatch in numaq_32.c
The variable apic_numaq placed in noninit section references the
function wakeup_secondary_cpu_via_nmi(), which is in __cpuinit
section. Thus causes a section mismatch warning. To avoid such
mismatch we mark apic_numaq as __refdata.

We were warned by the following warning:

  WARNING: arch/x86/kernel/built-in.o(.data+0x932c): Section mismatch in
  reference from the variable apic_numaq to the function
  .cpuinit.text:wakeup_secondary_cpu_via_nmi()

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <b9df5fa10907120407p6b4f67dtf4d563155488188a@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 11:03:27 +02:00
Rakib Mullick
151586d0f7 x86: Fix false positive section mismatch in es7000_32.c
The variable apic_es7000_cluster references the function __cpuinit
wakeup_secondary_cpu_via_mip() from a noninit section. So we've been
warned by the following warning. To avoid possible collision between
init/noninit, its best to mark the variable as __refdata.

We were warned by the following warning:

  LD      arch/x86/kernel/apic/built-in.o
  WARNING: arch/x86/kernel/apic/built-in.o(.data+0x198c): Section
  mismatch in reference from the variable apic_es7000_cluster to the
  function .cpuinit.text:wakeup_secondary_cpu_via_mip()

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
LKML-Reference: <b9df5fa10907120404k6279a10ch5e9682432272706f@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 11:03:26 +02:00
Daniel Qarras
f1c6a58121 perf_counter, x86: Extend perf_counter Pentium M support
I've attached a patch to remove the Pentium M special casing of
EMON and as noticed at least with my Pentium M the hardware PMU
now works:

 Performance counter stats for '/bin/ls /var/tmp':

       1.809988  task-clock-msecs         #      0.125 CPUs
              1  context-switches         #      0.001 M/sec
              0  CPU-migrations           #	 0.000 M/sec
            224  page-faults              #	 0.124 M/sec
        1425648  cycles                   #    787.656 M/sec
         912755  instructions             #	 0.640 IPC

Vince suggested that this code was trying to address erratum
Y17 in Pentium-M's:

  http://download.intel.com/support/processors/mobile/pm/sb/25266532.pdf

But that erratum (related to IA32_MISC_ENABLES.7) does not
affect perfcounters as we dont use this toggle to disable RDPMC
and WRMSR/RDMSR access to performance counters. We keep cr4's
bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not
allowed anyway.

Cc: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 08:46:51 +02:00
Alan Cox
8bdbd962ec x86/cpu: Clean up various files a bit
No code changes except printk levels (although some of the K6
mtrr code might be clearer if there were a few as would
splitting out some of the intel cache code).

Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11 11:24:09 +02:00
Huang Weiyi
e90476d3ba x86: Remove duplicated #include
Remove duplicated #include in:

  arch/x86/kernel/dumpstack.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11 10:17:08 +02:00
Linus Torvalds
85be928c41 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
  perf report: Add "Fractal" mode output - support callchains with relative overhead rate
  perf_counter tools: callchains: Manage the cumul hits on the fly
  perf report: Change default callchain parameters
  perf report: Use a modifiable string for default callchain options
  perf report: Warn on callchain output request from non-callchain file
  x86: atomic64: Inline atomic64_read() again
  x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
  x86: atomic64: Improve atomic64_xchg()
  x86: atomic64: Export APIs to modules
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
  x86: atomic64: Fix unclean type use in atomic64_xchg()
  x86: atomic64: Make atomic_read() type-safe
  x86: atomic64: Reduce size of functions
  x86: atomic64: Improve atomic64_add_return()
  x86: atomic64: Improve cmpxchg8b()
  x86: atomic64: Improve atomic64_read()
  x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
  x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
  perf report: Annotate variable initialization
  ...
2009-07-10 14:25:03 -07:00
Yinghai Lu
857fdc53a0 x86/pci: insert ioapic resource before assigning unassigned resources
Stephen reported that his DL585 G2 needed noapic after 2.6.22 (?)

Dann bisected it down to:
  commit 30a18d6c3f
  Date:   Tue Feb 19 03:21:20 2008 -0800

      x86: multi pci root bus with different io resource range, on
      64-bit

It turns out that:
  1. that AMD-based systems have two HT chains.
  2. BIOS doesn't allocate resources for BAR 6 of devices under 8132 etc
  3. that multi-peer-root patch will try to split root resources to peer
     root resources according to PCI conf of NB
  4. PCI core assigns unassigned resources, but they overlap with BARs
     that are used by ioapic addr of io4 and 8132.

The reason: at that point ioapic address are not inserted yet.  Solution
is to insert ioapic resources into the tree a bit earlier.

Reported-by: Stephen Frost <sfrost@snowman.net>
Reported-and-Tested-by: dann frazier <dannf@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@jbarnes-g45.(none)>
2009-07-10 13:03:14 -07:00
Cyrill Gorcunov
a1b4f1a5b7 x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper
We already use a lot of cpu_has_ helpers.
Lets do here the same for consistency.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090705160154.GB4791@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 15:58:34 +02:00
Cyrill Gorcunov
9ff8094299 x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090708180353.GH5301@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 13:57:13 +02:00
Peter Zijlstra
984b838ce6 perf_counter: Clean up global vs counter enable
Ingo noticed that both AMD and P6 call
x86_pmu_disable_counter() on *_pmu_enable_counter(). This is
because we rely on the side effect of that call to program
the event config but not touch the EN bit.

We change that for AMD by having enable_all() simply write
the full config in, and for P6 by explicitly coding it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:28:29 +02:00
Peter Zijlstra
9c74fb5086 perf_counter: Fix up P6 PMU details
The P6 doesn't seem to support cache ref/hit/miss counts, so
we extend the generic hardware event codes to have 0 and -1
mean the same thing as for the generic cache events.

Furthermore, it turns out the 0 event does not count
(that is, its reported that on PPro it actually does count
something), therefore use a event configuration that's
specified not to count to disable the counters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:28:27 +02:00
Vince Weaver
11d1578f94 perf_counter: Add P6 PMU support
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
enable/disable both its counters. We use this for the
global enable/disable, and clear all config bits (except EN)
to disable individual counters.

Actual ia32 hardware doesn't support lfence, so use a locked
op without side-effect to implement a full barrier.

perf stat and perf record seem to function correctly.

[a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]

Signed-off-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10 10:28:26 +02:00
Andi Kleen
a2d32bcbc0 x86: mce: macros to compute banks MSRs
Instead of open coded calculations for bank MSRs hide the indexing of higher
banks MCE register MSRs in new macros.

No semantic changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
cebe182033 x86: mce: Move per bank data in a single datastructure
This addresses one of the leftover review comments.

Move the per bank data into a single structure. This avoids
several separate variables and also separate allocation of sysfs objects.

I didn't move the CMCI ownership information so far because
that would have needed some non trivial changes in the algorithms.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
9eda8cb3ac x86: mce: Move code in mce.c
Now that the X86_OLD_MCE ifdefs are gone move some code that
used to be outside the big ifdef to a more natural place
near its user.

No code change.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
c1ebf83561 x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
Drop the CONFIG_X86_NEW_MCE symbol and change all
references to it to check for CONFIG_X86_MCE directly.

No code changes

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:47 -07:00
Andi Kleen
5bb38adcb5 x86: mce: Remove old i386 machine check code
As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE

This patch only removes code.

The ancient machine check code for very old systems that are not supported
by CONFIG_X86_NEW_MCE is still kept.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-09 18:39:46 -07:00
Tejun Heo
023bf6f1b8 linker script: unify usage of discard definition
Discarded sections in different archs share some commonality but have
considerable differences.  This led to linker script for each arch
implementing its own /DISCARD/ definition, which makes maintaining
tedious and adding new entries error-prone.

This patch makes all linker scripts to move discard definitions to the
end of the linker script and use the common DISCARDS macro.  As ld
uses the first matching section definition, archs can include default
discarded sections by including them earlier in the linker script.

ia64 is notable because it first throws away some ia64 specific
subsections and then include the rest of the sections into the final
image, so those sections must be discarded before the inclusion.

defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
alpha, sparc, sparc64 and s390.  Michal Simek tested microblaze.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Michal Simek <monstr@monstr.eu>
Cc: linux-arch@vger.kernel.org
Cc: Michal Simek <monstr@monstr.eu>
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
2009-07-09 11:27:40 +09:00
Joe Perches
ad361c9884 Remove multiple KERN_ prefixes from printk formats
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics.  printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.

<level> is now included in the output on each additional use.

Remove all uses of multiple KERN_<level>s in formats.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 10:30:03 -07:00
Mark Langsdorf
a2e1b4c312 [CPUFREQ] Powernow-k8: support family 0xf with 2 low p-states
Provide support for family 0xf processors with 2 P-states
below the elevator voltage.  Remove the checks that prevent
this configuration from being supported and increase the
transition voltage to prevent errors during the transition.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-06 21:38:29 -04:00
Linus Torvalds
faf80d62e4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix usage of bios intcall()
  x86: Remove unused function lapic_watchdog_ok()
  x86: Remove unused variable disable_x2apic
  x86, kvm: Fix section mismatches in kvm.c
  x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user
  x86: Fix fixmap page order for FIX_TEXT_POKE0,1
  amd-iommu: set evt_buf_size correctly
  amd-iommu: handle alias entries correctly in init code
  x86: Fix printk call in print_local_apic()
  x86: Declare check_efer() before it gets used
  x86: Mark device_nb as static and fix NULL noise
  x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
  xen: Use kcalloc() in xen_init_IRQ()
  x86: Fix fixmap ordering
  x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c
2009-07-06 17:45:44 -07:00
Peter Oberparleiter
f386c61fe1 gcov: exclude code operating in userspace from profiling
Fix for this issue on x86_64:

rostedt@goodmis.org wrote:
> On bootup of the latest kernel my init segfaults. Debugging it,
> I found  that vread_tsc (a vsyscall) increments some strange
> kernel memory:
>
> 0000000000000000 <vread_tsc>:
>    0:   55                      push   %rbp
>    1:   48 ff 05 00 00 00 00    incq   0(%rip)
>                         # 8 <vread_tsc+0x8>
>                         4: R_X86_64_PC32        .bss+0x3c
>    8:   48 89 e5                mov    %rsp,%rbp
>    b:   66 66 90                xchg   %ax,%ax
>    e:   48 ff 05 00 00 00 00    incq   0(%rip)
>                         # 15 <vread_tsc+0x15>
>                         11: R_X86_64_PC32       .bss+0x44
>   15:   66 66 90                xchg   %ax,%ax
>   18:   48 ff 05 00 00 00 00    incq   0(%rip)
>                         # 1f <vread_tsc+0x1f>
>                         1b: R_X86_64_PC32       .bss+0x4c
>   1f:   0f 31                   rdtsc
>
>
> Those "incq" is very bad to happen in vsyscall memory, since
> userspace can not modify it. You need to make something prevent
> profiling of vsyscall  memory (like I do with ftrace).

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-06 13:57:03 -07:00
Ingo Molnar
e3d0e69268 x86: Further clean up of mtrr/generic.c
Yinghai noticed that i defined BIOS_BUG_MSG but added no
usage for it. The usage is to clean up this turd in generic.c:

			printk(KERN_WARNING "WARNING: BIOS bug: VAR MTRR %d "
				"contains strange UC entry under 1M, check "
				"with your system vendor!\n", i);

Breaking printk lines in the middle looks ugly, is hard to read
and breaks 'git grep'. Use the BIOS_BUG_MSG instead.

Also complete the moving of structure definitions and variables
to the top of the file.

Reported-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05 09:46:10 +02:00
Jaswinder Singh Rajput
dbd51be026 x86: Clean up mtrr/main.c
Fix following trivial style problems:

  ERROR: trailing whitespace X 25
  WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
  WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h>
  ERROR: do not initialise externals to 0 or NULL X 2
  ERROR: "foo * bar" should be "foo *bar" X 5
  ERROR: do not use assignment in if condition X 2
  WARNING: line over 80 characters X 8
  ERROR: return is not a function, parentheses are not required
  WARNING: braces {} are not necessary for any arm of this statement
  ERROR: space required before the open parenthesis '(' X 2
  ERROR: open brace '{' following function declarations go on the next line
  ERROR: space required after that ',' (ctx:VxV) X 8
  ERROR: space required before the open parenthesis '(' X 3
  ERROR: else should follow close brace '}'
  WARNING: space prohibited between function name and open parenthesis '('
  WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable X 2

Also use pr_debug and pr_warning where possible.

total: 50 errors, 14 warnings

arch/x86/kernel/cpu/mtrr/main.o:

   text	   data	    bss	    dec	    hex	filename
   3668	    116	   4156	   7940	   1f04	main.o.before
   3668	    116	   4156	   7940	   1f04	main.o.after

md5:
   e01af2fd28deef77c8d01e71acfbd365  main.o.before.asm
   e01af2fd28deef77c8d01e71acfbd365  main.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
Cc: Avi Kivity <avi@redhat.com> # Avi, please have a look at the kvm_para.h bit
[ More cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:19:55 +02:00
Jaswinder Singh Rajput
09b22c85d5 x86: Clean up mtrr/state.c
Fix:

  WARNING: Use #include <linux/io.h> instead of <asm/io.h>
  WARNING: line over 80 characters X 4

arch/x86/kernel/cpu/mtrr/state.o:

   text	   data	    bss	    dec	    hex	filename
    864	      0	      0	    864	    360	state.o.before
    864	      0	      0	    864	    360	state.o.after

md5:
   c5c4364b9aeac74d70111e1e49667a2c  state.o.before.asm
   c5c4364b9aeac74d70111e1e49667a2c  state.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ More cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:19:53 +02:00
Jaswinder Singh Rajput
3ec8dbcb09 x86: Clean up mtrr/mtrr.h
Fix:

  ERROR: do not use C99 // comments
  ERROR: "foo * bar" should be "foo *bar" X 2

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ More tidyups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:19:52 +02:00
Jaswinder Singh Rajput
26dc67eda1 x86: Clean up mtrr/if.c
Fix:

  WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
  ERROR: trailing whitespace X 7
  ERROR: trailing statements should be on next line X 3
  WARNING: line over 80 characters X 5
  ERROR: space required before the open parenthesis '('

arch/x86/kernel/cpu/mtrr/if.o:

   text	   data	    bss	    dec	    hex	filename
   2239	      4	      0	   2243	    8c3	if.o.before
   2239	      4	      0	   2243	    8c3	if.o.after

md5:
   78d1f2aa4843ec6509c18e2dee54bc7f  if.o.before.asm
   78d1f2aa4843ec6509c18e2dee54bc7f  if.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ More cleanups to make the code more consistent. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:19:48 +02:00
Jaswinder Singh Rajput
a1a499a399 x86: Clean up mtrr/generic.c
Fix following trivial style problems:

  ERROR: trailing whitespace X 4
  WARNING: Use #include <linux/io.h> instead of <asm/io.h>
  WARNING: braces {} are not necessary for single statement blocks X 3
  ERROR: "foo * bar" should be "foo *bar"
  WARNING: line over 80 characters X 6
  ERROR: "foo * bar" should be "foo *bar"
  ERROR: spaces required around that '=' (ctx:VxO)
  ERROR: space required before that '-' (ctx:OxV)
  WARNING: suspect code indent for conditional statements (8, 12)
  ERROR: spaces required around that '=' (ctx:VxV)
  ERROR: do not initialise statics to 0 or NULL
  ERROR: space prohibited after that open parenthesis '(' X 2
  ERROR: space prohibited before that close parenthesis ')' X 2
  ERROR: trailing statements should be on next line
  ERROR: return is not a function, parentheses are not required

Also use pr_debug and pr_warning where possible.

arch/x86/kernel/cpu/mtrr/generic.o:

   text	   data	    bss	    dec	    hex	filename
   5652	     77	   4224	   9953	   26e1	generic.o.before
   5652	     77	   4220	   9949	   26dd	generic.o.after

The md5 changed:
   b34d6c045f06daa4ed092b90cc760e8f  generic.o.before.asm
   a490c6251cfd8442fbffecc0e09a573d  generic.o.after.asm

Because mtrr_state moved from data to bss, changing its
offsets - and also because __LINE__ numbers changed.

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Further cleanups to make the code more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:10:47 +02:00
Jaswinder Singh Rajput
2311037708 x86: Clean up mtrr/cyrix.c
Fix trivial style problems:

  WARNING: Use #include <linux/io.h> instead of <asm/io.h>
  WARNING: line over 80 characters
  ERROR: do not initialise statics to 0 or NULL
  ERROR: space prohibited after that open parenthesis '(' X 2
  ERROR: space prohibited before that close parenthesis ')' X 2
  ERROR: trailing whitespace X 2
  ERROR: trailing statements should be on next line
  ERROR: do not use C99 // comments X 2

arch/x86/kernel/cpu/mtrr/cyrix.o:

   text	   data	    bss	    dec	    hex	filename
   1637	     32	      8	   1677	    68d	cyrix.o.before
   1637	     32	      8	   1677	    68d	cyrix.o.after

md5:
   6f52abd06905be3f4cabb5239f9b0ff0  cyrix.o.before.asm
   6f52abd06905be3f4cabb5239f9b0ff0  cyrix.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Made the code more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:10:47 +02:00
Jaswinder Singh Rajput
63f9600fad x86: Clean up mtrr/cleanup.c
Fix trivial style problems:

  WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
  WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h>

Also, nr_mtrr_spare_reg should be unsigned long.

arch/x86/kernel/cpu/mtrr/cleanup.o:

   text	   data	    bss	    dec	    hex	filename
   6241	   8992	   2056	  17289	   4389	cleanup.o.before
   6241	   8992	   2056	  17289	   4389	cleanup.o.after

The md5 has changed:
   1a7a27513aef1825236daf29110fe657  cleanup.o.before.asm
   bcea358efa2532b6020e338e158447af  cleanup.o.after.asm

Because a WARN_ON()'s __LINE__ value changed by 3 lines.

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Did lots of other cleanups to make the code look more consistent. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:10:46 +02:00
Jaswinder Singh Rajput
6c4caa1ab7 x86: Clean up mtrr/centaur.c
Remove dead code and fix trivial style problems:

  ERROR: trailing whitespace X 2
  WARNING: line over 80 characters X 3
  ROR: trailing whitespace
  ERROR: do not use C99 // comments X 2

arch/x86/kernel/cpu/mtrr/centaur.o:

   text	   data	    bss	    dec	    hex	filename
    605	     32	     68	    705	    2c1	centaur.o.before
    605	     32	     68	    705	    2c1	centaur.o.after

md5:
   a4865ea98ce3c163bb1d376a3949b3e3  centaur.o.before.asm
   a4865ea98ce3c163bb1d376a3949b3e3  centaur.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Standardized comments, DocBook, curly braces, newlines. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:10:45 +02:00
Jaswinder Singh Rajput
42204455f1 x86: Clean up mtrr/amd.c:
Fix trivial style problems :

  ERROR: trailing whitespace
  WARNING: line over 80 characters
  ERROR: do not use C99 // comments

arch/x86/kernel/cpu/mtrr/amd.o:

   text	   data	    bss	    dec	    hex	filename
    501	     32	      0	    533	    215	amd.o.before
    501	     32	      0	    533	    215	amd.o.after

md5:
   62f795eb840ee2d17b03df89e789e76c  amd.o.before.asm
   62f795eb840ee2d17b03df89e789e76c  amd.o.after.asm

Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Also restructured comments to be standard, removed stray return,
  converted function description to DocBook style, etc. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:10:45 +02:00
Ingo Molnar
d7e57676e3 Merge branch 'linus' into x86/cleanups
Merge reason: We were on an older pre-rc1 base, move to almost-rc2.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-04 11:00:42 +02:00
Tejun Heo
a530b79586 percpu: teach large page allocator about NUMA
Large page first chunk allocator is primarily used for NUMA machines;
however, its NUMA handling is extremely simplistic.  Regardless of
their proximity, each cpu is put into separate large page just to
return most of the allocated space back wasting large amount of
vmalloc space and increasing cache footprint.

This patch teachs NUMA details to large page allocator.  Given
processor proximity information, pcpu_lpage_build_unit_map() will find
fitting cpu -> unit mapping in which cpus in LOCAL_DISTANCE share the
same large page and not too much virtual address space is wasted.

This greatly reduces the unit and thus chunk size and wastes much less
address space for the first chunk.  For example, on 4/4 NUMA machine,
the original code occupied 16MB of virtual space for the first chunk
while the new code only uses 4MB - one 2MB page for each node.

[ Impact: much better space efficiency on NUMA machines ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
2009-07-04 08:11:00 +09:00
Tejun Heo
8c4bfc6e88 x86,percpu: generalize lpage first chunk allocator
Generalize and move x86 setup_pcpu_lpage() into
pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
around the generalized version.  Other than taking size parameters and
using arch supplied callbacks to allocate/free/map memory,
pcpu_lpage_first_chunk() is identical to the original implementation.

This simplifies arch code and will help converting more archs to
dynamic percpu allocator.

While at it, factor out pcpu_calc_fc_sizes() which is common to
pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().

[ Impact: code reorganization and generalization ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04 08:10:59 +09:00
Tejun Heo
d4b95f8039 x86,percpu: generalize 4k first chunk allocator
Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk().
setup_pcpu_4k() now is a simple wrapper around the generalized
version.  Other than taking size parameters and using arch supplied
callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical
to the original implementation.

This simplifies arch code and will help converting more archs to
dynamic percpu allocator.

While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for
consistency.

[ Impact: code reorganization and generalization ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04 08:10:59 +09:00
Tejun Heo
788e5abc54 percpu: drop @unit_size from embed first chunk allocator
The only extra feature @unit_size provides is making dead space at the
end of the first chunk which doesn't have any valid usecase.  Drop the
parameter.  This will increase consistency with generalized 4k
allocator.

James Bottomley spotted missing conversion for the default
setup_per_cpu_areas() which caused build breakage on all arcsh which
use it.

[ Impact: drop unused code path ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
2009-07-04 08:10:58 +09:00
Tejun Heo
c43768cbb7 Merge branch 'master' into for-next
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes.  As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.

Conflicts:
	arch/alpha/include/asm/percpu.h
	arch/mn10300/kernel/vmlinux.lds.S
	include/linux/percpu-defs.h
2009-07-04 07:13:18 +09:00
Ingo Molnar
22a26e6663 Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-07-03 14:35:02 +02:00
Jaswinder Singh Rajput
c7210e1ff8 x86: Remove unused function lapic_watchdog_ok()
lapic_watchdog_ok() is a global function but no one is using it.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1246554335.2242.29.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:34:31 +02:00
Jaswinder Singh Rajput
23d0cd8e71 x86: Remove unused variable disable_x2apic
setup_nox2apic() is writing 1 to disable_x2apic but no one is reading it.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <1246554239.2242.27.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:34:27 +02:00
Rakib Mullick
d3ac88157c x86, kvm: Fix section mismatches in kvm.c
The function paravirt_ops_setup() has been refering the
variable no_timer_check, which is a __initdata. Thus generates
the following warning. paravirt_ops_setup() function is called
from kvm_guest_init() which is a __init function. So to fix
this we mark paravirt_ops_setup as __init.

The sections-check output that warned us about this was:

   LD      arch/x86/built-in.o
  WARNING: arch/x86/built-in.o(.text+0x166ce): Section mismatch in
  reference from the function paravirt_ops_setup() to the variable
  .init.data:no_timer_check
  The function paravirt_ops_setup() references
  the variable __initdata no_timer_check.
  This is often because paravirt_ops_setup lacks a __initdata
  annotation or the annotation of no_timer_check is wrong.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <b9df5fa10907012240y356427b8ta4bd07f0efc6a049@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 14:34:22 +02:00
Linus Torvalds
405d7ca515 Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6: (38 commits)
  intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
  intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
  intel-iommu: Use cmpxchg64_local() for setting PTEs
  intel-iommu: Warn about unmatched unmap requests
  intel-iommu: Kill superfluous mapping_lock
  intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
  intel-iommu: Make iommu=pt work on i386 too
  intel-iommu: Performance improvement for dma_pte_free_pagetable()
  intel-iommu: Don't free too much in dma_pte_free_pagetable()
  intel-iommu: dump mappings but don't die on pte already set
  intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
  intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
  intel-iommu: Simplify __intel_alloc_iova()
  intel-iommu: Performance improvement for domain_pfn_mapping()
  intel-iommu: Performance improvement for dma_pte_clear_range()
  intel-iommu: Clean up iommu_domain_identity_map()
  intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
  intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
  intel-iommu: Change aligned_size() to aligned_nrpages()
  intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
  ...
2009-07-02 16:51:09 -07:00
Yinghai Lu
7c5371c403 x86: add boundary check for 32bit res before expand e820 resource to alignment
fix hang with HIGHMEM_64G and 32bit resource.  According to hpa and
Linus, use (resource_size_t)-1 to fend off big ranges.

Analyzed by hpa

Reported-and-tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-02 12:11:12 -07:00
Joerg Roedel
1bc6f83813 amd-iommu: set evt_buf_size correctly
The setting of this variable got lost during the suspend/resume
implementation.  But keeping this variable zero causes a divide-by-zero
error in the interrupt handler. This patch fixes this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-07-02 18:32:05 +02:00
Joerg Roedel
7a6a3a086f amd-iommu: handle alias entries correctly in init code
An alias entry in the ACPI table means that the device can send requests to the
IOMMU with both device ids, its own and the alias. This is not handled properly
in the ACPI init code. This patch fixes the issue.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-07-02 12:23:23 +02:00
Ingo Molnar
251e1e44b9 x86: Fix printk call in print_local_apic()
Instead of this:

[   75.690022] <7>printing local APIC contents on CPU#0/0:
[   75.704406] ... APIC ID:      00000000 (0)
[   75.707905] ... APIC VERSION: 00060015
[   75.722551] ... APIC TASKPRI: 00000000 (00)
[   75.725473] ... APIC PROCPRI: 00000000
[   75.728592] ... APIC LDR: 00000001
[   75.742137] ... APIC SPIV: 000001ff
[   75.744101] ... APIC ISR field:
[   75.746648] 0123456789abcdef0123456789abcdef
[   75.746649] <7>00000000000000000000000000000000

Improve the code to be saner and simpler and just print out
the bitfield in a single line using hexa values - not as a
(rather pointless) binary bitfield.

Partially reused Linus's initial fix for this.

Reported-and-Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A4C43BC.90506@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 08:54:08 +02:00
Frederic Weisbecker
0406ca6d8e perf_counter: Ignore the nmi call frames in the x86-64 backtraces
About every callchains recorded with perf record are filled up
including the internal perfcounter nmi frame:

 perf_callchain
 perf_counter_overflow
 intel_pmu_handle_irq
 perf_counter_nmi_handler
 notifier_call_chain
 atomic_notifier_call_chain
 notify_die
 do_nmi
 nmi

We want ignore this frame as it's not interesting for
instrumentation. To solve this, we simply ignore every frames
from nmi context.

New example of "perf report -s sym -c" after this patch:

9.59%  [k] search_by_key
             4.88%
                search_by_key
                reiserfs_read_locked_inode
                reiserfs_iget
                reiserfs_lookup
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
                0x406fb1

             3.19%
                search_by_key
                search_by_entry_key
                reiserfs_find_entry
                reiserfs_lookup
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
                0x406fb1
[...]

For now this patch only solves the problem in x86-64.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 22:37:23 +02:00
David Woodhouse
3238c0c4d6 intel-iommu: Make iommu=pt work on i386 too
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-07-01 18:56:16 +01:00
Jaswinder Singh Rajput
b25ae679f6 x86: Mark device_nb as static and fix NULL noise
This sparse warning:

  arch/x86/kernel/amd_iommu.c:1195:23: warning: symbol 'device_nb' was not declared. Should it be static?

triggers because device_nb is global but is only used in a
single .c file. change device_nb to static to fix that - this
also addresses the sparse warning.

This sparse warning:

  arch/x86/kernel/amd_iommu.c:1766:10: warning: Using plain integer as NULL pointer

triggers because plain integer 0 is used in place of a NULL
pointer. change 0 to NULL to fix that - this also address the
sparse warning.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <1246458194.6940.20.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 16:52:53 +02:00
Linus Torvalds
55bcab4695 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
  perf report: Add --symbols parameter
  perf report: Add --comms parameter
  perf report: Add --dsos parameter
  perf_counter tools: Adjust only prelinked symbol's addresses
  perf_counter: Provide a way to enable counters on exec
  perf_counter tools: Reduce perf stat measurement overhead/skew
  perf stat: Use percentages for scaling output
  perf_counter, x86: Update x86_pmu after WARN()
  perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
  perf stat: Improve output
  perf stat: Fix multi-run stats
  perf stat: Add -n/--null option to run without counters
  perf_counter tools: Remove dead code
  perf_counter: Complete counter swap
  perf report: Print sorted callchains per histogram entries
  perf_counter tools: Prepare a small callchain framework
  perf record: Fix unhandled io return value
  perf_counter tools: Add alias for 'l1d' and 'l1i'
  perf-report: Add bare minimum PERF_EVENT_READ parsing
  perf-report: Add modes for inherited stats and no-samples
  ...
2009-06-30 19:02:59 -07:00
Linus Torvalds
e717f33e98 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "x86: cap iomem_resource to addressable physical memory"
2009-06-29 09:42:01 -07:00
Yinghai Lu
4078c444cf perf_counter, x86: Update x86_pmu after WARN()
The print out should read the value before changing the value.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A487017.4090007@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 10:19:25 +02:00
Linus Torvalds
8326e284f8 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, delay: tsc based udelay should have rdtsc_barrier
  x86, setup: correct include file in <asm/boot.h>
  x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h>
  x86, mce: percpu mcheck_timer should be pinned
  x86: Add sysctl to allow panic on IOCK NMI error
  x86: Fix uv bau sending buffer initialization
  x86, mce: Fix mce resume on 32bit
  x86: Move init_gbpages() to setup_arch()
  x86: ensure percpu lpage doesn't consume too much vmalloc space
  x86: implement percpu_alloc kernel parameter
  x86: fix pageattr handling for lpage percpu allocator and re-enable it
  x86: reorganize cpa_process_alias()
  x86: prepare setup_pcpu_lpage() for pageattr fix
  x86: rename remap percpu first chunk allocator to lpage
  x86: fix duplicate free in setup_pcpu_remap() failure path
  percpu: fix too lazy vunmap cache flushing
  x86: Set cpu_llc_id on AMD CPUs
2009-06-28 11:05:28 -07:00
H. Peter Anvin
ff8a4bae45 Revert "x86: cap iomem_resource to addressable physical memory"
This reverts commit 95ee14e437.
Mikael Petterson <mikepe@it.uu.se> reported that at least one of his
systems will not boot as a result.  We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.

Revert the commit until we can root cause and correct this problem.

-stable team: this means the underlying commit should be rejected.

Reported-and-isolated-by: Mikael Petterson <mikpe@it.uu.se>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <200906261559.n5QFxJH8027336@pilspetsen.it.uu.se>
Cc: stable@kernel.org
Cc: Grant Grundler <grundler@parisc-linux.org>
2009-06-28 09:38:47 +02:00
Hidetoshi Seto
5be6066a7f x86, mce: percpu mcheck_timer should be pinned
If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might
be migrated on other cpu.  Use add_timer_on() instead.

Avoids the following failure:

Maciej Rutecki wrote:
> > After normal boot I try:
> >
> > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> >
> > I found this in dmesg:
> >
> > [  141.704025] ------------[ cut here ]------------
> > [  141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
> > mcheck_timer+0xf5/0x100()

Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-25 13:33:02 -07:00
Kurt Garloff
5211a242d0 x86: Add sysctl to allow panic on IOCK NMI error
This patch introduces a new sysctl:

    /proc/sys/kernel/panic_on_io_nmi

which defaults to 0 (off).

When enabled, the kernel panics when the kernel receives an NMI
caused by an IO error.

The IO error triggered NMI indicates a serious system
condition, which could result in IO data corruption. Rather
than contiuing, panicing and dumping might be a better choice,
so one can figure out what's causing the IO error.

This could be especially important to companies running IO
intensive applications where corruption must be avoided, e.g. a
bank's databases.

[ SuSE has been shipping it for a while, it was done at the
  request of a large database vendor, for their users. ]

Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: Roberto Angelino <robertangelino@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <20090624213211.GA11291@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 22:06:11 +02:00
Peter Zijlstra
194002b274 perf_counter, x86: Add mmap counter read support
Update the mmap control page with the needed information to
use the userspace RDPMC instruction for self monitoring.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 21:39:06 +02:00
Linus Torvalds
0c26d7cc31 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (72 commits)
  asus-laptop: remove EXPERIMENTAL dependency
  asus-laptop: use pr_fmt and pr_<level>
  eeepc-laptop: cpufv updates
  eeepc-laptop: sync eeepc-laptop with asus_acpi
  asus_acpi: Deprecate in favor of asus-laptop
  acpi4asus: update MAINTAINER and KConfig links
  asus-laptop: platform dev as parent for led and backlight
  eeepc-laptop: enable camera by default
  ACPI: Rename ACPI processor device bus ID
  acerhdf: Acer Aspire One fan control
  ACPI: video: DMI workaround broken Acer 7720 BIOS enabling display brightness
  ACPI: run ACPI device hot removal in kacpi_hotplug_wq
  ACPI: Add the reference count to avoid unloading ACPI video bus twice
  ACPI: DMI to disable Vista compatibility on some Sony laptops
  ACPI: fix a deadlock in hotplug case
  Show the physical device node of backlight class device.
  ACPI: pdc init related memory leak with physical CPU hotplug
  ACPI: pci_root: remove unused dev/fn information
  ACPI: pci_root: simplify list traversals
  ACPI: pci_root: use driver data rather than list lookup
  ...
2009-06-24 10:17:07 -07:00
Cliff Wickman
9c26f52b90 x86: Fix uv bau sending buffer initialization
The initialization of the UV Broadcast Assist Unit's sending
buffers was making an invalid assumption about the
initialization of an MMR that defines its address.

The BIOS will not be providing that MMR.  So
uv_activation_descriptor_init() should unconditionally set it.

Tested on UV simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # for v2.6.30.x
LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 17:33:58 +02:00
Yong Wang
c14dab5c07 perf_counter, x86: Set global control MSR correctly
Previous code made an assumption that the power on value of global
control MSR has enabled all fixed and general purpose counters properly.

However, this is not the case for certain Intel processors, such as
Atom - and it might also be firmware dependent.

Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
enable bits for all privilege levels in the respective IA32_PERFEVTSELx
or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
respective counters. Counting is enabled if the AND'ed results is true;
counting is disabled when the result is false.

The end result is that all fixed counters are always disabled on Atom
processors because the assumption is just invalid.

Fix this by not initializing the ctrl-mask out of the global MSR,
but setting it to perf_counter_mask.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 10:51:24 +02:00
Tejun Heo
245b2e70ea percpu: clean up percpu variable definitions
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique.  Update percpu
variable definitions accordingly.

* as,cfq: rename ioc_count uniquely

* cpufreq: rename cpu_dbs_info uniquely

* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
  rename it

* ipv4,6: rename cookie_scratch uniquely

* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
  pmc_irq_entry and nmi_entry to pmc_nmi_entry

* perf_counter: rename disable_count to perf_disable_count

* ftrace: rename test_event_disable to ftrace_test_event_disable

* kmemleak: rename test_pointer to kmemleak_test_pointer

* mce: rename next_interval to mce_next_interval

[ Impact: percpu usage cleanups, no duplicate static percpu var names ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
2009-06-24 15:13:48 +09:00
Tejun Heo
204fba4aa3 percpu: cleanup percpu array definitions
Currently, the following three different ways to define percpu arrays
are in use.

1. DEFINE_PER_CPU(elem_type[array_len], array_name);
2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
3. DEFINE_PER_CPU(elem_type, array_name)[array_len];

Unify to #1 which correctly separates the roles of the two parameters
and thus allows more flexibility in the way percpu variables are
defined.

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
2009-06-24 15:13:45 +09:00
Len Brown
fbe8cddd2d Merge branches 'acerhdf', 'acpi-pci-bind', 'bjorn-pci-root', 'bugzilla-12904', 'bugzilla-13121', 'bugzilla-13396', 'bugzilla-13533', 'bugzilla-13612', 'c3_lock', 'hid-cleanups', 'misc-2.6.31', 'pdc-leak-fix', 'pnpacpi', 'power_nocheck', 'thinkpad_acpi', 'video' and 'wmi' into release 2009-06-24 01:19:50 -04:00
Weidong Han
f007e99c8e Intel-IOMMU, intr-remap: source-id checking
To support domain-isolation usages, the platform hardware must be
capable of uniquely identifying the requestor (source-id) for each
interrupt message. Without source-id checking for interrupt remapping
, a rouge guest/VM with assigned devices can launch interrupt attacks
to bring down anothe guest/VM or the VMM itself.

This patch adds source-id checking for interrupt remapping, and then
really isolates interrupts for guests/VMs with assigned devices.

Because PCI subsystem is not initialized yet when set up IOAPIC
entries, use read_pci_config_byte to access PCI config space directly.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-06-23 22:09:17 +01:00
Pekka J Enberg
854c879f5a x86: Move init_gbpages() to setup_arch()
The init_gbpages() function is conditionally called from
init_memory_mapping() function. There are two call-sites where
this 'after_bootmem' condition can be true: setup_arch() and
mem_init() via pci_iommu_alloc().

Therefore, it's safe to move the call to init_gbpages() to
setup_arch() as it's always called before mem_init().

This removes an after_bootmem use - paving the way to remove
all uses of that state variable.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-23 10:33:32 +02:00
Linus Torvalds
687d680985 Merge git://git.infradead.org/~dwmw2/iommu-2.6.31
* git://git.infradead.org/~dwmw2/iommu-2.6.31:
  intel-iommu: Fix one last ia64 build problem in Pass Through Support
  VT-d: support the device IOTLB
  VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
  VT-d: add device IOTLB invalidation support
  VT-d: parse ATSR in DMA Remapping Reporting Structure
  PCI: handle Virtual Function ATS enabling
  PCI: support the ATS capability
  intel-iommu: dmar_set_interrupt return error value
  intel-iommu: Tidy up iommu->gcmd handling
  intel-iommu: Fix tiny theoretical race in write-buffer flush.
  intel-iommu: Clean up handling of "caching mode" vs. IOTLB flushing.
  intel-iommu: Clean up handling of "caching mode" vs. context flushing.
  VT-d: fix invalid domain id for KVM context flush
  Fix !CONFIG_DMAR build failure introduced by Intel IOMMU Pass Through Support
  Intel IOMMU Pass Through Support

Fix up trivial conflicts in drivers/pci/{intel-iommu.c,intr_remapping.c}
2009-06-22 21:38:22 -07:00
Ingo Molnar
b7f797cb60 Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into x86/urgent 2009-06-22 10:24:43 +02:00
Tejun Heo
0017c869dd x86: ensure percpu lpage doesn't consume too much vmalloc space
On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
percpu first chunk allocator can consume too much of vmalloc space.
Make it fall back to 4k allocator if the consumption goes over 20%.

[ Impact: add sanity check for lpage percpu first chunk allocator ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo
fa8a7094ba x86: implement percpu_alloc kernel parameter
According to Andi, it isn't clear whether lpage allocator is worth the
trouble as there are many processors where PMD TLB is far scarcer than
PTE TLB.  The advantage or disadvantage probably depends on the actual
size of percpu area and specific processor.  As performance
degradation due to TLB pressure tends to be highly workload specific
and subtle, it is difficult to decide which way to go without more
data.

This patch implements percpu_alloc kernel parameter to allow selecting
which first chunk allocator to use to ease debugging and testing.

While at it, make sure all the failure paths report why something
failed to help determining why certain allocator isn't working.  Also,
kill the "Great future plan" comment which had already been realized
quite some time ago.

[ Impact: allow explicit percpu first chunk allocator selection ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo
e59a1bb2fd x86: fix pageattr handling for lpage percpu allocator and re-enable it
lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator.  When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.

This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.

pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address.  pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().

Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.

With this, lpage percpu allocator should work correctly.  Re-enable
it.

[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo
0ff2587fd5 x86: prepare setup_pcpu_lpage() for pageattr fix
Make the following changes in preparation of coming pageattr updates.

* Define and use array of struct pcpul_ent instead of array of
  pointers.  The only difference is ->cpu field which is set but
  unused yet.

* Rename variables according to the above change.

* Rename local variable vm to pcpul_vm and move it out of the
  function.

[ Impact: no functional difference ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo
97c9bf0618 x86: rename remap percpu first chunk allocator to lpage
The "remap" allocator remaps large pages to build the first chunk;
however, the name isn't very good because 4k allocator remaps too and
the whole point of the remap allocator is using large page mapping.
The allocator will be generalized and exported outside of x86, rename
it to lpage before that happens.

percpu_alloc kernel parameter is updated to accept both "remap" and
"lpage" for lpage allocator.

[ Impact: code cleanup, kernel parameter argument updated ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo
c5806df923 x86: fix duplicate free in setup_pcpu_remap() failure path
In the failure path, setup_pcpu_remap() tries to free the area which
has already been freed to make holes in the large page.  Fix it.

[ Impact: fix duplicate free in failure path ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Jaswinder Singh Rajput
d9f2a5ecb2 perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD
Fix AMD's Data Cache Refills from System event.

After this patch :

 ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null

 Performance counter stats for 'ls /dev/':

        2499484  L1-data-Cache-Load-Referencees             (scaled from 3.97%)
          70347  L1-data-Cache-Load-Misses                  (scaled from 7.30%)
           9360  L1-data-Cache-Store-Referencees            (scaled from 8.64%)
          32804  L1-data-Cache-Prefetch-Referencees         (scaled from 17.72%)
           7693  L1-data-Cache-Prefetch-Misses              (scaled from 22.97%)
        2180945  L1-instruction-Cache-Load-Referencees      (scaled from 28.48%)
          14518  L1-instruction-Cache-Load-Misses           (scaled from 35.00%)
           2405  L1-instruction-Cache-Prefetch-Referencees  (scaled from 34.89%)
          71387  L2-Cache-Load-Referencees                  (scaled from 34.94%)
          18732  L2-Cache-Load-Misses                       (scaled from 34.92%)
          79918  L2-Cache-Store-Referencees                 (scaled from 36.02%)
        1295294  Data-TLB-Cache-Load-Referencees            (scaled from 35.99%)
          30896  Data-TLB-Cache-Load-Misses                 (scaled from 33.36%)
        1222030  Instruction-TLB-Cache-Load-Referencees     (scaled from 29.46%)
            357  Instruction-TLB-Cache-Load-Misses          (scaled from 20.46%)
         530888  Branch-Cache-Load-Referencees              (scaled from 11.48%)
           8638  Branch-Cache-Load-Misses                   (scaled from 5.09%)

    0.011295149  seconds time elapsed.

Earlier it always shows value 0.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-21 13:25:55 +02:00
Andreas Herrmann
99bd0c0fc4 x86: Set cpu_llc_id on AMD CPUs
This counts when building sched domains in case NUMA information
is not available.

( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
  created based on cpu_llc_id. )

Currently Linux builds domains as follows:
(example from a dual socket quad-core system)

 CPU0 attaching sched-domain:
  domain 0: span 0-7 level CPU
   groups: 0 1 2 3 4 5 6 7

  ...

 CPU7 attaching sched-domain:
  domain 0: span 0-7 level CPU
   groups: 7 0 1 2 3 4 5 6

Ever since that is borked for multi-core AMD CPU systems.
This patch fixes that and now we get a proper:

 CPU0 attaching sched-domain:
  domain 0: span 0-3 level MC
   groups: 0 1 2 3
   domain 1: span 0-7 level CPU
    groups: 0-3 4-7

  ...

 CPU7 attaching sched-domain:
  domain 0: span 4-7 level MC
   groups: 7 4 5 6
   domain 1: span 0-7 level CPU
    groups: 4-7 0-3

This allows scheduler to assign tasks to cores on different sockets
(i.e. that don't share last level cache) for performance reasons.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-21 10:13:32 +02:00
Borislav Petkov
a95436e44a x86, mce: use atomic_inc_return() instead of add by 1
Use atomic_inc_return() instead of atomic_add_return() by 1.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-20 23:28:22 -07:00
Linus Torvalds
12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds
1eb51c33b2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix out of scope variable access in sched_slice()
  sched: Hide runqueues from direct refer at source code level
  sched: Remove unneeded __ref tag
  sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20 10:57:40 -07:00
Linus Torvalds
b0b7065b64 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
  tracing/urgent: warn in case of ftrace_start_up inbalance
  tracing/urgent: fix unbalanced ftrace_start_up
  function-graph: add stack frame test
  function-graph: disable when both x86_32 and optimize for size are configured
  ring-buffer: have benchmark test print to trace buffer
  ring-buffer: do not grab locks in nmi
  ring-buffer: add locks around rb_per_cpu_empty
  ring-buffer: check for less than two in size allocation
  ring-buffer: remove useless compile check for buffer_page size
  ring-buffer: remove useless warn on check
  ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
  tracing: update sample event documentation
  tracing/filters: fix race between filter setting and module unload
  tracing/filters: free filter_string in destroy_preds()
  ring-buffer: use commit counters for commit pointer accounting
  ring-buffer: remove unused variable
  ring-buffer: have benchmark test handle discarded events
  ring-buffer: prevent adding write in discarded area
  tracing/filters: strloc should be unsigned short
  tracing/filters: operand can be negative
  ...

Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20 10:56:46 -07:00
Linus Torvalds
c4c5ab3089 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
  x86, mce: fix error path in mce_create_device()
  x86: use zalloc_cpumask_var for mce_dev_initialized
  x86: fix duplicated sysfs attribute
  x86: de-assembler-ize asm/desc.h
  i386: fix/simplify espfix stack switching, move it into assembly
  i386: fix return to 16-bit stack from NMI handler
  x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
  x86: Remove duplicated #include's
  x86: msr.h linux/types.h is only required for __KERNEL__
  x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
  x86, mce: mce_intel.c needs <asm/apic.h>
  x86: apic/io_apic.c: dmar_msi_type should be static
  x86, io_apic.c: Work around compiler warning
  x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
  x86: mce: Handle banks == 0 case in K7 quirk
  x86, boot: use .code16gcc instead of .code16
  x86: correct the conversion of EFI memory types
  x86: cap iomem_resource to addressable physical memory
  x86, mce: rename _64.c files which are no longer 64-bit-specific
  x86, mce: mce.h cleanup
  ...

Manually fix up trivial conflict in arch/x86/mm/fault.c
2009-06-20 10:49:48 -07:00
Jaswinder Singh Rajput
feaa0457ec x86: ds.c fix invalid assignment
Fixes the type mixups that cause the following sparse warnings:

  CHECK   arch/x86/kernel/ds.c
arch/x86/kernel/ds.c:549:19: warning: incorrect type in argument 2 (invalid types)
arch/x86/kernel/ds.c:549:19:    expected bad type enum bts_field field
arch/x86/kernel/ds.c:549:19:    got int
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:514:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:514:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:514:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:514:7: error: invalid assignment
arch/x86/kernel/ds.c:520:35: error: incompatible types for operation (*)
arch/x86/kernel/ds.c:520:35:    left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field
arch/x86/kernel/ds.c:520:35:    right side has type bad type enum bts_field field
arch/x86/kernel/ds.c:520:7: error: invalid assignment
arch/x86/kernel/ds.c:520:35: error: incompatible types for operation (*)

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
LKML-Reference: <1245494740.8613.12.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-20 17:53:13 +02:00
Ingo Molnar
1d99100120 Merge branch 'x86/mce3' into x86/urgent 2009-06-20 10:54:22 +02:00
Pallipadi, Venkatesh
7b768f07dc ACPI: pdc init related memory leak with physical CPU hotplug
arch_acpi_processor_cleanup_pdc() in x86 and ia64 results in memory allocated
for _PDC objects that is never freed and will cause memory leak in case of
physical CPU remove and add. Patch fixes the memory leak by freeing the
objects soon after _PDC is evaluated.

Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-06-20 00:50:52 -04:00
Peter Zijlstra
f9188e023c perf_counter: Make callchain samples extensible
Before exposing upstream tools to a callchain-samples ABI, tidy it
up to make it more extensible in the future:

Use markers in the IP chain to denote context, use (u64)-1..-4095 range
for these context markers because we use them for ERR_PTR(), so these
addresses are unlikely to be mapped.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 13:42:34 +02:00
Steven Rostedt
71e308a239 function-graph: add stack frame test
In case gcc does something funny with the stack frames, or the return
from function code, we would like to detect that.

An arch may implement passing of a variable that is unique to the
function and can be saved on entering a function and can be tested
when exiting the function. Usually the frame pointer can be used for
this purpose.

This patch also implements this for x86. Where it passes in the stack
frame of the parent function, and will test that frame on exit.

There was a case in x86_32 with optimize for size (-Os) where, for a
few functions, gcc would align the stack frame and place a copy of the
return address into it. The function graph tracer modified the copy and
not the actual return address. On return from the funtion, it did not go
to the tracer hook, but returned to the parent. This broke the function
graph tracer, because the return of the parent (where gcc did not do
this funky manipulation) returned to the location that the child function
was suppose to. This caused strange kernel crashes.

This test detected the problem and pointed out where the issue was.

This modifies the parameters of one of the functions that the arch
specific code calls, so it includes changes to arch code to accommodate
the new prototype.

Note, I notice that the parsic arch implements its own push_return_trace.
This is now a generic function and the ftrace_push_return_trace should be
used instead. This patch does not touch that code.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18 18:40:18 -04:00
Peter Oberparleiter
7bf99fb673 gcov: enable GCOV_PROFILE_ALL for x86_64
Enable gcov profiling of the entire kernel on x86_64. Required changes
include disabling profiling for:

* arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
  not linked to main kernel
* arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
  profiling causes segfaults during boot (incompatible context)

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:58 -07:00
Hidetoshi Seto
b1f49f9582 x86, mce: fix error path in mce_create_device()
Don't skip removing mce_attrs in route from error2.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-18 07:02:32 -07:00
Yinghai Lu
e92fae064a x86: use zalloc_cpumask_var for mce_dev_initialized
We need a cleared cpu_mask to record if mce is initialized, especially
when MAXSMP is used.

used zalloc_... instead

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:47:18 -07:00
Yinghai Lu
74b602c714 x86: fix duplicated sysfs attribute
The sysfs attribute cmci_disabled was accidentall turned into a
duplicate of ignore_ce, breaking all other attributes.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:43:16 -07:00
Alexander van Heukelum
bc3f5d3dbd x86: de-assembler-ize asm/desc.h
asm/desc.h is included in three assembly files, but the only macro
it defines, GET_DESC_BASE, is never used. This patch removes the
includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
from asm/desc.h.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:35:10 -07:00
Alexander van Heukelum
dc4c2a0aed i386: fix/simplify espfix stack switching, move it into assembly
The espfix code triggers if we have a protected mode userspace
application with a 16-bit stack. On returning to userspace, with iret,
the CPU doesn't restore the high word of the stack pointer. This is an
"official" bug, and the work-around used in the kernel is to temporarily
switch to a 32-bit stack segment/pointer pair where the high word of the
pointer is equal to the high word of the userspace stackpointer.

The current implementation uses THREAD_SIZE to determine the cut-off,
but there is no good reason not to use the more natural 64kb... However,
implementing this by simply substituting THREAD_SIZE with 65536 in
patch_espfix_desc crashed the test application. patch_espfix_desc tries
to do what is described above, but gets it subtly wrong if the userspace
stack pointer is just below a multiple of THREAD_SIZE: an overflow
occurs to bit 13... With a bit of luck, when the kernelspace
stackpointer is just below a 64kb-boundary, the overflow then ripples
trough to bit 16 and userspace will see its stack pointer changed by
65536.

This patch moves all espfix code into entry_32.S. Selecting a 16-bit
cut-off simplifies the code. The game with changing the limit dynamically
is removed too. It complicates matters and I see no value in it. Changing
only the top 16-bit word of ESP is one instruction and it also implies
that only two bytes of the ESPFIX GDT entry need to be changed and this
can be implemented in just a handful simple to understand instructions.
As a side effect, the operation to compute the original ESP from the
ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
three instructions have been expanded inline in entry_32.S.

impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
stack segment

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:35:09 -07:00
Alexander van Heukelum
2e04bc7656 i386: fix return to 16-bit stack from NMI handler
Returning to a task with a 16-bit stack requires special care: the iret
instruction does not restore the high word of esp in that case. The
espfix code fixes this, but currently is not invoked on NMIs. This means
that a running task gets the upper word of esp clobbered due intervening
NMIs. To reproduce, compile and run the following program with the nmi
watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
see that the high bits of esp contain garbage, while the low bits are
still correct.

This patch puts the espfix code back into the NMI code path.

The patch is slightly complicated due to the irqtrace infrastructure not
being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
Otherwise, the tail of the normal iret-code is correct for the nmi code
path too. To be able to share this code-path, the TRACE_IRQS_IRET was
move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
this code explicitly disables interrupts. This short interrupts-off
section is now not traced anymore. The return-to-kernel path now always
includes the preliminary test to decide if the espfix code should be
called. This is never the case, but doing it this way keeps the patch as
simple as possible and the few extra instructions should not affect
timing in any significant way.

 #define _GNU_SOURCE
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <asm/ldt.h>

int modify_ldt(int func, void *ptr, unsigned long bytecount)
{
        return syscall(SYS_modify_ldt, func, ptr, bytecount);
}

/* this is assumed to be usable */
 #define SEGBASEADDR 0x10000
 #define SEGLIMIT 0x20000

/* 16-bit segment */
struct user_desc desc = {
        .entry_number = 0,
        .base_addr = SEGBASEADDR,
        .limit = SEGLIMIT,
        .seg_32bit = 0,
        .contents = 0, /* ??? */
        .read_exec_only = 0,
        .limit_in_pages = 0,
        .seg_not_present = 0,
        .useable = 1
};

int main(void)
{
        setvbuf(stdout, NULL, _IONBF, 0);

        /* map a 64 kb segment */
        char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                        PROT_EXEC|PROT_READ|PROT_WRITE,
                        MAP_SHARED|MAP_ANONYMOUS, -1, 0);
        if (pointer == NULL) {
                printf("could not map space\n");
                return 0;
        }

        /* write ldt, new mode */
        int err = modify_ldt(0x11, &desc, sizeof(desc));
        if (err) {
                printf("error modifying ldt: %i\n", err);
                return 0;
        }

        for (int i=0; i<1000; i++) {
        asm volatile (
                "pusha\n\t"
                "mov %ss, %eax\n\t" /* preserve ss:esp */
                "mov %esp, %ebp\n\t"
                "push $7\n\t" /* index 0, ldt, user mode */
                "push $65536-4096\n\t" /* esp */
                "lss (%esp), %esp\n\t" /* switch to new stack */
                "push %eax\n\t" /* save old ss:esp on new stack */
                "push %ebp\n\t"
                "add $17*65536, %esp\n\t" /* set high bits */
                "mov %esp, %edx\n\t"

                "mov $10000000, %ecx\n\t" /* wait... */
                "1: loop 1b\n\t" /* ... a bit */

                "cmp %esp, %edx\n\t"
                "je 1f\n\t"
                "ud2\n\t" /* esp changed inexplicably! */
                "1:\n\t"
                "sub $17*65536, %esp\n\t" /* restore high bits */
                "lss (%esp), %esp\n\t" /* restore old ss:esp */
                "popa\n\t");

                printf("\rx%ix", i);
        }

        return 0;
}

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17 21:35:09 -07:00
Jeremy Fitzhardinge
17950c5b24 x86-64: move clts into batch cpu state updates when preloading fpu
When a task is likely to be using the fpu, we preload its state during
the context switch, rather than waiting for it to run an fpu instruction.
Make sure the clts() happens while we're doing batched fpu state updates
to optimise paravirtualized context switches.

[ Impact: optimise paravirtual FPU context switch ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-06-17 13:27:58 -07:00
Jeremy Fitzhardinge
16d9dbf0c2 x86-64: move unlazy_fpu() into lazy cpu state part of context switch
Make sure that unlazy_fpu()'s stts gets batched along with the other
cpu state changes during context switch.  (32-bit already does this.)

This makes sure it gets batched when running paravirtualized.

[ Impact: optimise paravirtual FPU context switch ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-06-17 13:21:26 -07:00
Jeremy Fitzhardinge
2fcddce10f x86-32: make sure clts is batched during context switch
If we're preloading the fpu state during context switch, make sure the clts
happens while we're batching the cpu context update, then do the actual
__math_state_restore once the updates are flushed.

This allows more efficient context switches when running paravirtualized,
as all the hypercalls can be folded together into one.

[ Impact: optimise paravirtual FPU context switch ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-06-17 13:21:25 -07:00
Jeremy Fitzhardinge
e6e9cac8c3 x86: split out core __math_state_restore
Split the core fpu state restoration out into __math_state_restore, which
assumes that cr0.TS is clear and that the fpu context has been initialized.

This will be used during context switch.  There are two reasons this is
desireable:

- There's a small clarification.  When __switch_to() calls math_state_restore,
  it relies on the fact that tsk_used_math() returns true, and so will
  never do a blocking init_fpu().  __math_state_restore() does not have
  (or need) that logic, so the question never arises.

- It allows the clts() to be moved earler in __switch_to() so it can be performed
  while cpu context updates are batched (will be done in a later patch).

[ Impact: refactor code to make reuse cleaner; no functional change ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-06-17 13:21:25 -07:00
Cyrill Gorcunov
3f4c3955ea x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
Vegard Nossum reported:

[  503.576724] ACPI: Preparing to enter system sleep state S5
[  503.710857] Disabling non-boot CPUs ...
[  503.716853] Power down.
[  503.717770] ------------[ cut here ]------------
[  503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du)
[  503.717770] Hardware name: OptiPlex GX100
[  503.717770] Modules linked in:
[  503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443
[  503.717770] Call Trace:
[  503.717770]  [<c154d327>] ? printk+0x18/0x1a
[  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c10360fc>] warn_slowpath_common+0x6c/0xc0
[  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c1036165>] warn_slowpath_null+0x15/0x20
[  503.717770]  [<c1017358>] native_apic_write_dummy+0x38/0x50
[  503.717770]  [<c1017173>] disconnect_bsp_APIC+0x63/0x100
[  503.717770]  [<c1019e48>] disable_IO_APIC+0xb8/0xc0
[  503.717770]  [<c1214231>] ? acpi_power_off+0x0/0x29
[  503.717770]  [<c1015e55>] native_machine_shutdown+0x65/0x80
[  503.717770]  [<c1015c36>] native_machine_power_off+0x26/0x30
[  503.717770]  [<c1015c49>] machine_power_off+0x9/0x10
[  503.717770]  [<c1046596>] kernel_power_off+0x36/0x40
[  503.717770]  [<c104680d>] sys_reboot+0xfd/0x1f0
[  503.717770]  [<c109daa0>] ? perf_swcounter_event+0xb0/0x130
[  503.717770]  [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120
[  503.717770]  [<c102dfc6>] ? finish_task_switch+0x56/0xd0
[  503.717770]  [<c154da1e>] ? schedule+0x49e/0xb40
[  503.717770]  [<c10444b0>] ? sys_kill+0x70/0x160
[  503.717770]  [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50
[  503.717770]  [<c10dd443>] ? sys_ioctl+0x63/0x70
[  503.717770]  [<c1003024>] sysenter_do_call+0x12/0x22
[  503.717770] ---[ end trace 8157b5d0ed378f15 ]---

|
| That's including this commit:
|
| commit 103428e57b
|Author: Cyrill Gorcunov <gorcunov@openvz.org>
|Date:   Sun Jun 7 16:48:40 2009 +0400
|
|    x86, apic: Fix dummy apic read operation together with broken MP handling
|

If we have apic disabled we don't even switch to APIC mode and do not
calling for connect_bsp_APIC. Though on SMP compiled kernel the
native_machine_shutdown does try to write the apic register anyway.

Fix it with explicit check if we really should touch apic registers.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090617181322.GG10822@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 20:24:39 +02:00
Peter Zijlstra
60f916dee6 perf_counter: x86: Set the period in the intel overflow handler
Commit 9e350de37a ("perf_counter: Accurate period data")
missed a spot, which caused all Intel-PMU samples to have a
period of 0.

This broke auto-freq sampling.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 19:23:52 +02:00
Huang Weiyi
8653f88ff9 x86: Remove duplicated #include's
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 19:02:35 +02:00
Linus Torvalds
c30938d59e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
  [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
  [CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
  [CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
  [CPUFREQ] powernow-k8: get drv data for correct CPU
  [CPUFREQ] powernow-k8: read P-state from HW
  [CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
  [CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
  [CPUFREQ] minor correction to cpu-freq documentation
  [CPUFREQ] powernow-k8.c: mess cleanup
  [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
  [CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
  [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
2009-06-17 09:51:50 -07:00
Ingo Molnar
813400060f Merge branch 'x86/urgent' into x86/mce3
Conflicts:
	arch/x86/kernel/cpu/mcheck/mce_intel.c

Merge reason: merge with an urgent-branch MCE fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:21:41 +02:00
Prarit Bhargava
fe955e5c79 x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
(cpuid family-6, model-f, stepping-4).  Resolves a situation where the NMI
would not enable on these processors.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: prarit@redhat.com
Cc: suresh.b.siddha@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:20:39 +02:00
H. Peter Anvin
1bf7b31efa x86, mce: mce_intel.c needs <asm/apic.h>
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs
<asm/apic.h>.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
2009-06-17 08:31:15 -07:00
Jaswinder Singh Rajput
8f7007aabe x86: apic/io_apic.c: dmar_msi_type should be static
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:16:08 +02:00
Figo.zhang
50a8d4d297 x86, io_apic.c: Work around compiler warning
This compiler warning:

  arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
  arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
  arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here

Is bogus as 'eu' is always initialized. But annotate it away by
initializing the variable, to make it easier for people to notice
real warnings. A compiler that sees through this logic will
optimize away the initialization.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
LKML-Reference: <1245248720.3312.27.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:13:25 +02:00
Cyrill Gorcunov
5ce4243dce x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
If APIC was disabled (for some reason) and as result
it's not even mapped we should not try to enable thermal
interrupts at all.

Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090615182633.GA7606@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 17:10:22 +02:00
Peter Zijlstra
84599f8a59 sched, x86: Fix cpufreq + sched_clock() TSC scaling
For freqency dependent TSCs we only scale the cycles, we do not account
for the discrepancy in absolute value.

Our current formula is: time = cycles * mult

(where mult is a function of the cpu-speed on variable tsc machines)

Suppose our current cycle count is 10, and we have a multiplier of 5,
then our time value would end up being 50.

Now cpufreq comes along and changes the multiplier to say 3 or 7,
which would result in our time being resp. 30 or 70.

That means that we can observe random jumps in the time value due to
frequency changes in both fwd and bwd direction.

So what this patch does is change the formula to:

  time = cycles * frequency + offset

And we calculate offset so that time_before == time_after, thereby
ridding us of these jumps in time.

[ Impact: fix/reduce sched_clock() jumps across frequency changing events ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Chucked-on-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-06-17 16:03:54 +02:00
Ingo Molnar
a3d06cc6aa Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/include/asm/kmap_types.h
	include/linux/mm.h

	include/asm-generic/kmap_types.h

Merge reason: We crossed changes with kmap_types.h cleanups in mainline.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 13:06:17 +02:00
Andi Kleen
203abd67b7 x86: mce: Handle banks == 0 case in K7 quirk
Vegard Nossum reported:

> I get an MCE-related crash like this in latest linus tree:
>
> [    0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [    0.116396] CPU: L2 Cache: 512K (64 bytes/line)
> [    0.120570] mce: CPU supports 0 MCE banks
> [    0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
> [    0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [    0.128001] PGD 0
> [    0.128001] Thread overran stack, or stack corrupted
> [    0.128001] Oops: 0002 [#1] PREEMPT SMP
> [    0.128001] last sysfs file:
> [    0.128001] CPU 0
> [    0.128001] Modules linked in:
> [    0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
> [    0.128001] RIP: 0010:[<ffffffff813b98ad>]  [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [    0.128001] RSP: 0018:ffffffff81595e38  EFLAGS: 00000246
> [    0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
> [    0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
> [    0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
> [    0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
> [    0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
> [    0.128001] FS:  0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
> 00000000000
> [    0.128001] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [    0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
> [    0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [    0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
> ff8152a4a0)
> [    0.128001] Stack:
> [    0.128001]  0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
> 914
> [    0.128001]  ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
> 69c
> [    0.128001]  5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
> e6e
> [    0.128001] Call Trace:
> [    0.128001]  [<ffffffff813b869c>] identify_cpu+0x331/0x392
> [    0.128001]  [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
> [    0.128001]  [<ffffffff815a14ac>] check_bugs+0x1c/0x60
> [    0.128001]  [<ffffffff8159c075>] start_kernel+0x403/0x46e
> [    0.128001]  [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
> [    0.128001]  [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
> [    0.128001]  [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71

This happens on QEMU which reports MCA capability, but no banks.
Without this patch there is a buffer overrun and boot ops because
the code would try to initialize the 0 element of a zero length
kmalloc() buffer.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 08:59:45 +02:00
Ingo Molnar
cc4949e1fd Merge branch 'linus' into x86/urgent
Merge reason: pull in latest to fix a bug in it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 08:59:10 +02:00
Linus Torvalds
517d08699b Merge branch 'akpm'
* akpm: (182 commits)
  fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
  fbdev: *bfin*: fix __dev{init,exit} markings
  fbdev: *bfin*: drop unnecessary calls to memset
  fbdev: bfin-t350mcqb-fb: drop unused local variables
  fbdev: blackfin has __raw I/O accessors, so use them in fb.h
  fbdev: s1d13xxxfb: add accelerated bitblt functions
  tcx: use standard fields for framebuffer physical address and length
  fbdev: add support for handoff from firmware to hw framebuffers
  intelfb: fix a bug when changing video timing
  fbdev: use framebuffer_release() for freeing fb_info structures
  radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
  s3c-fb: CPUFREQ frequency scaling support
  s3c-fb: fix resource releasing on error during probing
  carminefb: fix possible access beyond end of carmine_modedb[]
  acornfb: remove fb_mmap function
  mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
  mb862xxfb: restrict compliation of platform driver to PPC
  Samsung SoC Framebuffer driver: add Alpha Channel support
  atmel-lcdc: fix pixclock upper bound detection
  offb: use framebuffer_alloc() to allocate fb_info struct
  ...

Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16 19:50:13 -07:00
Minchan Kim
a9c5695393 use printk_once() in several places
There are some places to be able to use printk_once instead of hard coding.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:50 -07:00
Alexey Dobriyan
bb1f17b037 mm: consolidate init_mm definition
* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:

  init_mm is already unexported on x86.

  One strange place is some OMAP driver (drivers/video/omap/) which
  won't build modular, but it's already wants get_vm_area() export.
  Somebody should look there.

[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:28 -07:00
Arnd Bergmann
08604bd993 time: move PIT_TICK_RATE to linux/timex.h
PIT_TICK_RATE is currently defined in four architectures, but in three
different places.  While linux/timex.h is not the perfect place for it, it
is still a reasonable replacement for those drivers that traditionally use
asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.

Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
 This is unlikely to make a difference, and probably can only improve
accuracy.  There was a discussion on the correct value of CLOCK_TICK_RATE
a few years ago, after which every existing instance was getting changed
to 1193182.  According to the specification, it should be
1193181.818181...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:27 -07:00
Cliff Wickman
e2a7147640 x86: correct the conversion of EFI memory types
This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
in the e820 table as type E820_RESERVED.

(This patch replaces one called 'x86: vendor reserved memory type'.
 This version has been discussed a bit with Peter and Yinghai but not given
 a final opinion.)

Without this patch EFI_RESERVED_TYPE memory reservations may be
marked usable in the e820 table. There may be a collision between
kernel use and some reserver's use of this memory.

(An example use of this functionality is the UV system, which
 will access extremely large areas of memory with a memory engine
 that allows a user to address beyond the processor's range.  Such
 areas are reserved in the EFI table by the BIOS.
 Some loaders have a restricted number of entries possible in the e820 table,
 hence the need to record the reservations in the unrestricted EFI table.)

The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
on the kernel command line.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 17:47:32 -07:00
H. Peter Anvin
95ee14e437 x86: cap iomem_resource to addressable physical memory
iomem_resource is by default initialized to -1, which means 64 bits of
physical address space if 64-bit resources are enabled.  However, x86
CPUs cannot address 64 bits of physical address space.  Thus, we want
to cap the physical address space to what the union of all CPU can
actually address.

Without this patch, we may end up assigning inaccessible values to
uninitialized 64-bit PCI memory resources.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: stable@kernel.org
2009-06-16 17:47:31 -07:00
Hidetoshi Seto
1af0815f96 x86, mce: rename _64.c files which are no longer 64-bit-specific
Rename files that are no longer 64bit specific:
	mce_amd_64.c	=> mce_amd.c
	mce_intel_64.c	=> mce_intel.c

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:11 -07:00
Hidetoshi Seto
1149e72645 x86, mce: remove therm_throt.h
Now all symbols in the header are static.  Remove the header.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:09 -07:00
Hidetoshi Seto
8363fc82d3 x86, mce: remove intel_set_thermal_handler()
and make intel_thermal_interrupt() static.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:08 -07:00
Hidetoshi Seto
895287c0a6 x86, mce: squash mce_intel.c into therm_throt.c
move intel_init_thermal() into therm_throt.c

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:08 -07:00
Hidetoshi Seto
a65c88dd2c x86, mce: unify smp_thermal_interrupt
Put common functions into therm_throt.c, modify Makefile.

	unexpected_thermal_interrupt
	intel_thermal_interrupt
	smp_thermal_interrupt
	intel_set_thermal_handler

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:08 -07:00
Hidetoshi Seto
e8ce2c5ee8 x86, mce: unify smp_thermal_interrupt, prepare
Let them in same shape.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:08 -07:00
Hidetoshi Seto
5335612a57 x86, mce: unify smp_thermal_interrupt, prepare mce_intel_64
Break smp_thermal_interrupt() into two functions.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:08 -07:00
Hidetoshi Seto
3adacb70d3 x86, mce: unify smp_thermal_interrupt, prepare p4
Remove unused argument regs from handlers, and use inc_irq_stat.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:07 -07:00
Hidetoshi Seto
c697836985 x86, mce: make mce_disabled boolean
The mce_disabled on 32bit is a tristate variable [1,0,-1],
while 64bit version is boolean [0,1].
This patch makes mce_disabled always boolean, and use mce_p5_enabled
to indicate the third state instead.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:07 -07:00
Hidetoshi Seto
9e55e44e39 x86, mce: unify mce.h
There are 2 headers:
	arch/x86/include/asm/mce.h
	arch/x86/kernel/cpu/mcheck/mce.h
and in the latter small header:
	#include <asm/mce.h>

This patch move all contents in the latter header into the former,
and fix all files using the latter to include the former instead.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:07 -07:00
Hidetoshi Seto
9af43b54ab x86, mce: sysfs entries for new mce options
Add sysfs interface for admins who want to tweak these options without
rebooting the system.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:06 -07:00
Hidetoshi Seto
1020bcbcc7 x86, mce: rename static variables around trigger
"trigger" is not straight forward name for valiable that holds name
of user mode helper program which triggered by machine check events.

This patch renames this valiable and kins to more recognizable names.

	trigger		=> mce_helper
	trigger_argv	=> mce_helper_argv
	notify_user	=> mce_need_notify

No functional changes.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:06 -07:00
Hidetoshi Seto
4e5b3e690d x86, mce: add __read_mostly
Add __read_mostly to data written during setup.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:05 -07:00
Hidetoshi Seto
7fb06fc967 x86, mce: cleanup mce_start()
Simplify interface of mce_start():

-       no_way_out = mce_start(no_way_out, &order);
+       order = mce_start(&no_way_out);

Now Monarch and Subjects share same exit(return) in usual path.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:05 -07:00
Hidetoshi Seto
33edbf02a9 x86, mce: don't init timer if !mce_available
In mce_cpu_restart, mce_init_timer is called unconditionally.
If !mce_available (e.g. mce is disabled), there are no useful work
for timer.  Stop running it.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16 16:56:04 -07:00
Huang Ying
184e1fdfea x86, mce: fix a race condition about mce_callin and no_way_out
If one CPU has no_way_out == 1, all other CPUs should have no_way_out
== 1. But despite global_nwo is read after mce_callin, global_nwo is
updated after mce_callin too. So it is possible that some CPU read
global_nwo before some other CPU update global_nwo, so that no_way_out
== 1 for some CPU, while no_way_out == 0 for some other CPU.

This patch fixes this race condition via moving mce_callin updating
after global_nwo updating, with a smp_wmb in between. A smp_rmb is
added between their reading too.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
2009-06-16 16:56:04 -07:00
Linus Torvalds
b3fec0fe35 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
2009-06-16 13:09:51 -07:00
Ingo Molnar
8a4a6182fd Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-06-16 11:51:24 +02:00
Chris Wright
6a047d8b9e amd-iommu: resume cleanup
Now that enable_iommus() will call iommu_disable() for each iommu,
the call to disable_iommus() during resume is redundant.  Also, the order
for an invalidation is to invalidate device table entries first, then
domain translations.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-16 10:19:16 +02:00
Kay Sievers
07e9bb8eeb Driver Core: x86: add nodename for cpuid and msr drivers.
This adds support to the x86 cpuid and msr drivers to report the proper
device name to userspace for their devices.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:25 -07:00
Kay Sievers
d405640539 Driver Core: misc: add nodename support for misc devices.
This adds support for misc devices to report their requested nodename to
userspace.  It also updates a number of misc drivers to provide the
needed subdirectory and device name to be used for them.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:25 -07:00
Linus Torvalds
19035e5b5d Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: Logic to move non pinned timers
  timers: /proc/sys sysctl hook to enable timer migration
  timers: Identifying the existing pinned timers
  timers: Framework for identifying pinned timers
  timers: allow deferrable timers for intervals tv2-tv5 to be deferred

Fix up conflicts in kernel/sched.c and kernel/timer.c manually
2009-06-15 10:06:19 -07:00
Rusty Russell
8e7c25971b [CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
Remove all old-style cpumask operators, and cpumask_t.

Also: get rid of the unused define_siblings function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:43 -04:00
Rusty Russell
1ff6e97f1d [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
cpumask: avoid playing with cpus_allowed in powernow-k8.c

It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).

I did not replace powernowk8_target; it needs fixing, but it grabs a
mutex (so no smp_call_function_single here) but Mark points out it can
be called multiple times per second, so work_on_cpu is too heavy.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:43 -04:00
Rusty Russell
e3f996c26f [CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
Impact: don't play with current's cpumask

It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).

Use rdmsr_on_cpu and wrmsr_on_cpu instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:43 -04:00
Rusty Russell
394122ab14 [CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
Impact: don't play with current's cpumask

It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).

We use smp_call_function_single: this had the advantage of being more
efficient, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Dominik Brodowski <linux@brodo.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:43 -04:00
Naga Chumbalkar
e15bc4559b [CPUFREQ] powernow-k8: get drv data for correct CPU
Make powernowk8_get() similar to powernowk8_target() and powernowk8_verify()
in the way it obtains "powernow_data" for a given CPU.

Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Thomas Renninger <trenn@suse.de>

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:42 -04:00
Naga Chumbalkar
532cfee6ba [CPUFREQ] powernow-k8: read P-state from HW
By definition, "cpuinfo_cur_freq" should report the value from HW. So, don't
depend on the cached value. Instead read P-state directly from HW, while
taking into account the erratum 311 workaround for Fam 11h processors.

Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Thomas Renninger <trenn@suse.de>

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:42 -04:00
Andrew Morton
b394f1dfc0 [CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
This symbol doesn't need file-global scope.

Cc: "Zhang, Rui" <rui.zhang@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Leo Milano <lmilano@gmx.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:42 -04:00
Dave Jones
931db6a32d [CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
Christoph Hellwig noticed the following potential uninitialised use:

 > arch/x86/kernel/tsc.c: In function 'time_cpufreq_notifier':
 > arch/x86/kernel/tsc.c:634: warning: 'dummy' may be used uninitialized in this function
 >
 > where we do have CONFIG_SMP set, freq->flags & CPUFREQ_CONST_LOOPS is
 > true and ref_freq is false.

It seems plausable, though the circumstances for hitting it are really low.
Nearly all SMP capable cpufreq drivers set CPUFREQ_CONST_LOOPS.
powernow-k8 is really the only exception. The older CPUs were typically
only ever UP. (powernow-k7 never supported SMP for eg)

It's worth fixing regardless, as it cleans up the code.

Fix possible uninitialized use of dummy, by just removing it,
and making the setting of lpj more obvious.

Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:42 -04:00
Luis Henriques
21335d0214 [CPUFREQ] powernow-k8.c: mess cleanup
Mess cleanup in powernow_k8_acpi_pst_values() function.

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:41 -04:00
Thomas Renninger
86e13684aa [CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
This doesn't fix anything, but it's expected that a transition latency of 0
could cause trouble in the future.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15 11:49:41 -04:00
Joerg Roedel
09067207f6 amd-iommu: set event buffer head and tail to 0 manually
These registers may contain values from previous kernels. So reset them
to known values before enable the event buffer again.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 16:06:48 +02:00
Peter Zijlstra
74193ef0ec perf_counter: x86: Fix call-chain support to use NMI-safe methods
__copy_from_user_inatomic() isn't NMI safe in that it can trigger
the page fault handler which is another trap and its return path
invokes IRET which will also close the NMI context.

Therefore use a GUP based approach to copy the stack frames over.

We tried an alternative solution as well: we used a forward ported
version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch
that modifies the exception return path to use an open-coded IRET with
explicit stack unrolling and TF checking.

This didnt work as it interacted with faulting user-space instructions,
causing them not to restart properly, which corrupts user-space
registers.

Solving that would probably involve disassembling those instructions
and backtracing the RIP. But even without that, the code was deemed
rather complex to the already non-trivial x86 entry assembly code,
so instead we went for this GUP based method that does a
software-walk of the pagetables.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15 15:57:53 +02:00
Chris Wright
a8c485bb68 amd-iommu: disable cmd buffer and evt logging before reprogramming iommu
The IOMMU spec states that IOMMU behavior may be undefined when the
IOMMU registers are rewritten while command or event buffer is enabled.
Disable them in IOMMU disable path.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:53:45 +02:00
Vegard Nossum
722f2a6c87 Merge commit 'linus/master' into HEAD
Conflicts:
	MAINTAINERS

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:50:49 +02:00
Chris Wright
42a49f965a amd-iommu: flush domain tlb when attaching a new device
When kexec'ing to a new kernel (for example, when crashing and launching
a kdump session), the AMD IOMMU may have cached translations.  The kexec'd
kernel, during initialization, will invalidate the IOMMU device table
entries, but not the domain translations.  These stale entries can cause
a device's DMA to fail, makes it rough to write a dump to disk when the
disk controller can't DMA ;-)

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:42:00 +02:00
Joerg Roedel
61d047be99 x86: disable IOMMUs on kernel crash
If the IOMMUs are still enabled when the kexec kernel boots access to
the disk is not possible. This is bad for tools like kdump or anything
else which wants to use PCI devices.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:20:40 +02:00
Joerg Roedel
0975904276 amd-iommu: disable IOMMU hardware on shutdown
When the IOMMU stays enabled the BIOS may not be able to finish the
machine shutdown properly. So disable the hardware on shutdown.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-15 15:20:40 +02:00
Vegard Nossum
2dff440525 kmemcheck: add mm functions
With kmemcheck enabled, the slab allocator needs to do this:

1. Tell kmemcheck to allocate the shadow memory which stores the status of
   each byte in the allocation proper, e.g. whether it is initialized or
   uninitialized.
2. Tell kmemcheck which parts of memory that should be marked uninitialized.
   There are actually a few more states, such as "not yet allocated" and
   "recently freed".

If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
memory that can take page faults because of kmemcheck.

If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
request memory with the __GFP_NOTRACK flag. This does not prevent the page
faults from occuring, however, but marks the object in question as being
initialized so that no warnings will ever be produced for this object.

In addition to (and in contrast to) __GFP_NOTRACK, the
__GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should
not be tracked _because_ it would produce a false positive. Their values
are identical, but need not be so in the future (for example, we could now
enable/disable false positives with a config option).

Parts of this patch were contributed by Pekka Enberg but merged for
atomicity.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:40:03 +02:00
Vegard Nossum
f85612967c x86: add hooks for kmemcheck
The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
  the instruction that caused the page fault)

Also redefine memset() to use the optimized version if kmemcheck is
enabled.

(Thanks to Pekka Enberg for minimizing the impact on the page fault
handler.)

As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
the optimized xor code, and rely instead on the generic C implementation
in order to avoid false-positive warnings.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>

[whitespace fixlet]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
2009-06-15 12:40:02 +02:00
Ingo Molnar
038e836e97 perf_counter, x86: Fix kernel-space call-chains
Kernel-space call-chains were trimmed at the first entry because
we never processed anything beyond the first stack context.

Allow the backtrace to jump from NMI to IRQ stack then to task stack
and finally user-space stack.

Also calculate the stack and bp variables correctly so that the
stack walker does not exit early.

We can get deep traces as a result, visible in perf report -D output:

0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1
... chain: u:2, k:22, nr:24
.....  0: 0xffffffff815225fd
.....  1: 0xffffffff810ac51c
.....  2: 0xffffffff81018e29
.....  3: 0xffffffff81523939
.....  4: 0xffffffff81524b8f
.....  5: 0xffffffff81524bd9
.....  6: 0xffffffff8105e498
.....  7: 0xffffffff8152315a
.....  8: 0xffffffff81522c3a
.....  9: 0xffffffff810d9b74
..... 10: 0xffffffff810dbeec
..... 11: 0xffffffff810dc3fb

This is a 22-entries kernel-space chain.

(We still only record reliable stack entries.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15 09:08:08 +02:00
Ingo Molnar
5a6cec3abb perf_counter, x86: Fix call-chain walking
Fix the ptregs variant when we hit user-mode tasks.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-14 22:37:15 +02:00
Thomas Gleixner
507fa3a3d8 x86: hpet: Mark per cpu interrupts IRQF_TIMER to prevent resume failure
timer interrupts are excluded from being disabled during suspend. The
clock events code manages the disabling of clock events on its own
because the timer interrupt needs to be functional before the resume
code reenables the device interrupts.

The hpet per cpu timers request their interrupt without setting the
IRQF_TIMER flag so suspend_device_irqs() disables them as well which
results in a fatal resume failure on the boot CPU.

Adding IRQF_TIMER to the interupt flags when requesting the hpet per
cpu timer interrupts solves the problem.

Reported-by: Benjamin S. <sbenni@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Benjamin S. <sbenni@gmx.de>
Cc: stable@kernel.org
2009-06-14 18:24:29 +02:00
Jaswinder Singh Rajput
c64b04fe6e x86, cpu: cpu/proc.c display cache alignment and address sizes for 32 bit
32 bits can also access x86_cache_alignment, x86_phys_bits and
x86_virt_bits, make them available to user space just as on 64 bits.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1244921390.11733.30.camel@ht.satnam>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-13 14:00:49 -07:00
Linus Torvalds
a2ee2981ae Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
  x86, mce: Add boot options for corrected errors
  x86, mce: Fix mce printing
  x86, mce: fix for mce counters
  x86, mce: support action-optional machine checks
  x86, mce: define MCE_VECTOR
  x86, mce: rename mce_notify_user to mce_notify_irq
  x86: fix panic with interrupts off (needed for MCE)
  x86, mce: export MCE severities coverage via debugfs
  x86, mce: implement new status bits
  x86, mce: print header/footer only once for multiple MCEs
  x86, mce: default to panic timeout for machine checks
  x86, mce: improve mce_get_rip
  x86, mce: make non Monarch panic message "Fatal machine check" too
  x86, mce: switch x86 machine check handler to Monarch election.
  x86, mce: implement panic synchronization
  x86, mce: implement bootstrapping for machine check wakeups
  x86, mce: check early in exception handler if panic is needed
  x86, mce: add table driven machine check grading
  x86, mce: remove TSC print heuristic
  x86, mce: log corrected errors when panicing
  ...
2009-06-13 13:14:51 -07:00
Jaswinder Singh Rajput
f4db43a38f perf_counter, x86: Update AMD hw caching related event table
All AMD models share the same hw caching related event table.

Also complete the table with more events.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1244835381.2802.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 12:58:25 +02:00
Jaswinder Singh Rajput
4d2be1267f perf_counter, x86: Check old-AMD performance monitoring support
AMD supports performance monitoring start from K7 (i.e. family 6),
so disable it for earlier AMD CPUs.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1244714289.6923.0.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 12:58:25 +02:00
Len Brown
c4bf2f372d ACPI, PCI, x86: move MCFG parsing routine from ACPI to PCI file
Move
arch/x86/kernel/acpi/boot.c: acpi_parse_mcfg()
to
arch/x86/pci/mmconfig-shared.c: pci_parse_mcfg()
where it is used, and make it static.

Move associated globals and helper routine with it.

No functional change.

This code move is in preparation for SFI support,
which will allow the PCI code to find the MCFG table
on systems which do not support ACPI.

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-12 20:50:38 -04:00
Olivier Berger
d023e49118 ACPI: Remove Asus P4B266 from blacklist
See http://marc.info/?l=linux-acpi&m=124068823904429&w=2 for discussion

Signed-off-by: Olivier Berger <oberger@ouvaton.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-06-12 20:50:37 -04:00
Len Brown
c636f753b5 ACPI: delete dead acpi_disabled setting code
Testing CONFIG_ACPI inside boot.c is a waste of text,
since boot.c is built only when CONFIG_ACPI=y

Signed-off-by: Len Brown <len.brown@intel.com>
2009-06-12 20:49:50 -04:00
Vegard Nossum
acc6be5405 x86: add save_stack_trace_bp() for tracing from a specific stack frame
This will help kmemcheck (and possibly other debugging tools) since we
can now simply pass regs->bp to the stack tracer instead of specifying
the number of stack frames to skip, which is unreliable if gcc decides
to inline functions, etc.

Note that this makes the API incomplete for other architectures, but I
expect that those can be updated lazily, e.g. when they need it.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-12 23:01:05 +02:00
Linus Torvalds
947ec0b0c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add empty suspend/resume device irq functions
  PM/Hibernate: Move NVS routines into a seperate file (v2).
  PM/Hibernate: Rename disk.c to hibernate.c
  PM: Separate suspend to RAM functionality from core
  Driver Core: Rework platform suspend/resume, print warning
  PM: Remove device_type suspend()/resume()
  PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2)
  PM/Suspend: Do not shrink memory before suspend
  PM: Remove bus_type suspend_late()/resume_early() V2
  PM core: rename suspend and resume functions
  PM: Rename device_power_down/up()
  PM: Remove unused asm/suspend.h
  x86: unify power/cpu_(32|64).c
  x86: unify power/cpu_(32|64) copyright notes
  x86: unify power/cpu_(32|64) regarding restoring processor state
  x86: unify power/cpu_(32|64) regarding saving processor state
  x86: unify power/cpu_(32|64) global variables
  x86: unify power/cpu_(32|64) headers
  PM: Warn if interrupts are enabled during suspend-resume of sysdevs
  PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c
2009-06-12 13:17:27 -07:00
Linus Torvalds
4ddbac9898 Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
  perf_counter: Add forward/backward attribute ABI compatibility
  perf record: Explicity program a default counter
  perf_counter: Remove PERF_TYPE_RAW special casing
  perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
  powerpc, perf_counter: Fix performance counter event types
  perf_counter/x86: Add a quirk for Atom processors
  perf_counter tools: Remove one L1-data alias
2009-06-12 13:16:52 -07:00
Alan Stern
d161630297 PM core: rename suspend and resume functions
This patch (as1241) renames a bunch of functions in the PM core.
Rather than go through a boring list of name changes, suffice it to
say that in the end we have a bunch of pairs of functions:

	device_resume_noirq	dpm_resume_noirq
	device_resume		dpm_resume
	device_complete		dpm_complete
	device_suspend_noirq	dpm_suspend_noirq
	device_suspend		dpm_suspend
	device_prepare		dpm_prepare

in which device_X does the X operation on a single device and dpm_X
invokes device_X for all devices in the dpm_list.

In addition, the old dpm_power_up and device_resume_noirq have been
combined into a single function (dpm_resume_noirq).

Lastly, dpm_suspend_start and dpm_resume_end are the renamed versions
of the former top-level device_suspend and device_resume routines.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Magnus Damm
e39a71ef80 PM: Rename device_power_down/up()
Rename the functions performing "_noirq" dev_pm_ops
operations from device_power_down() and device_power_up()
to device_suspend_noirq() and device_resume_noirq().

The new function names are chosen to show that the functions
are responsible for calling the _noirq() versions to finalize
the suspend/resume operation. The current function names do
not perform power down/up anymore so the names may be misleading.

Global function renames:
- device_power_down() -> device_suspend_noirq()
- device_power_up() -> device_resume_noirq()

Static function renames:
- suspend_device_noirq() -> __device_suspend_noirq()
- resume_device_noirq() -> __device_resume_noirq()

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:31 +02:00
Jaswinder Singh Rajput
ce4b3c5547 PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c
One of the numbers in arch/x86/kernel/acpi/sleep.c is long, but it is
not annotated appropriately, so sparese warns about it.  Fix that.

[rjw: added the changelog.]

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-06-12 21:32:29 +02:00
Linus Torvalds
6d21491838 Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab: setup cpu caches later on when interrupts are enabled
  slab,slub: don't enable interrupts during early boot
  slab: fix gfp flag in setup_cpu_cache()
  x86: make zap_low_mapping could be used early
  irq: slab alloc for default irq_affinity
  memcg: fix page_cgroup fatal error in FLATMEM
2009-06-12 09:52:30 -07:00
Linus Torvalds
7f3591cfac Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
  lguest: add support for indirect ring entries
  lguest: suppress notifications in example Launcher
  lguest: try to batch interrupts on network receive
  lguest: avoid sending interrupts to Guest when no activity occurs.
  lguest: implement deferred interrupts in example Launcher
  lguest: remove obsolete LHREQ_BREAK call
  lguest: have example Launcher service all devices in separate threads
  lguest: use eventfds for device notification
  eventfd: export eventfd_signal and eventfd_fget for lguest
  lguest: allow any process to send interrupts
  lguest: PAE fixes
  lguest: PAE support
  lguest: Add support for kvm_hypercall4()
  lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
  lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
  lguest: map switcher with executable page table entries
  lguest: fix writev returning short on console output
  lguest: clean up length-used value in example launcher
  lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
  lguest: beyond ARRAY_SIZE of cpu->arch.gdt
  ...
2009-06-12 09:32:26 -07:00
Linus Torvalds
65d52cc9d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param:
  module: cleanup FIXME comments about trimming exception table entries.
  module: trim exception table on init free.
  module: merge module_alloc() finally
  uml module: fix uml build process due to this merge
  x86 module: merge the rest functions with macros
  x86 module: merge the same functions in module_32.c and module_64.c
  uvesafb: improve parameter handling.
  module_param: allow 'bool' module_params to be bool, not just int.
  module_param: add __same_type convenience wrapper for __builtin_types_compatible_p
  module_param: split perm field into flags and perm
  module_param: invbool should take a 'bool', not an 'int'
  cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
2009-06-12 09:30:36 -07:00
Linus Torvalds
db8e7f10ed Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Provide _sdata in the vmlinux.lds.S file
  x86: handle initrd that extends into unusable memory
2009-06-12 09:26:32 -07:00
Rusty Russell
61f4bc83fe lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.

But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers.  In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.

The results are in the noise, but since it's about the same amount of
code, it's worth applying.

1GB Guest->Host: input(suppressed),output(suppressed)
Before:
	Seconds: 0:16.53
	Packets: 377268,753673
	Interrupts: 22461,24297
	Notifications: 1(5245),21303(732370)
	Net IRQs triggered: 377023(245),42578(711095)

After:
	Seconds: 0:16.48
	Packets: 377289,753673
	Interrupts: 22281,24465
	Notifications: 1(5245),21296(732377)
	Net IRQs triggered: 377060(229),42564(711109)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 +09:30
Rusty Russell
5933048c69 module: cleanup FIXME comments about trimming exception table entries.
Everyone cut and paste this comment from my original one.  We now do
it generically, so cut the comments.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Amerigo Wang <amwang@redhat.com>
2009-06-12 21:47:05 +09:30
Amerigo Wang
c398df30d5 module: merge module_alloc() finally
As Christoph Hellwig suggested, module_alloc() actually can be
unified for i386 and x86_64 (of course, also UML).

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: 'Ingo Molnar' <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:03 +09:30
Amerigo Wang
0fdc83b950 x86 module: merge the rest functions with macros
Merge the rest functions together, with proper preprocessing directives.
Finally remove module_{32|64}.c.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:01 +09:30
Amerigo Wang
2d5bf28fb9 x86 module: merge the same functions in module_32.c and module_64.c
Merge the same functions both in module_32.c and module_64.c into
module.c.

This is the first step to merge both of them finally.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 21:47:00 +09:30
Yong Wang
dff5da6d09 perf_counter/x86: Add a quirk for Atom processors
The fixed-function performance counters do not work on current Atom
processors. Use the general-purpose ones instead.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090612080855.GA2286@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 13:48:32 +02:00
Yinghai Lu
55cd63676e x86: make zap_low_mapping could be used early
Only one cpu is there, just call __flush_tlb for it. Fixes the following boot
warning on x86:

  [    0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem)
  [    0.000000] virtual kernel memory layout:
  [    0.000000]     fixmap  : 0xffe17000 - 0xfffff000   (1952 kB)
  [    0.000000]     vmalloc : 0xf8615000 - 0xffe15000   ( 120 MB)
  [    0.000000]     lowmem  : 0xc0000000 - 0xf7e15000   ( 894 MB)
  [    0.000000]       .init : 0xc19a5000 - 0xc1a10000   ( 428 kB)
  [    0.000000]       .data : 0xc15da4bb - 0xc199af6c   (3842 kB)
  [    0.000000]       .text : 0xc1000000 - 0xc15da4bb   (5993 kB)
  [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
  [    0.000000] ------------[ cut here ]------------
  [    0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0()
  [    0.000000] Hardware name: System Product Name
  [    0.000000] Modules linked in:
  [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504
  [    0.000000] Call Trace:
  [    0.000000]  [<c104aa16>] warn_slowpath_common+0x65/0x95
  [    0.000000]  [<c104aa58>] warn_slowpath_null+0x12/0x15
  [    0.000000]  [<c1073bbe>] smp_call_function_many+0x50/0x1b0
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c1073d4f>] smp_call_function+0x31/0x58
  [    0.000000]  [<c1037615>] ? do_flush_tlb_all+0x0/0x41
  [    0.000000]  [<c104f635>] on_each_cpu+0x26/0x65
  [    0.000000]  [<c10374b5>] flush_tlb_all+0x19/0x1b
  [    0.000000]  [<c1032ab3>] zap_low_mappings+0x4d/0x56
  [    0.000000]  [<c15d64b5>] ? printk+0x14/0x17
  [    0.000000]  [<c19b42a8>] mem_init+0x23d/0x245
  [    0.000000]  [<c19a56a1>] start_kernel+0x17a/0x2d5
  [    0.000000]  [<c19a5347>] ? unknown_bootoption+0x0/0x19a
  [    0.000000]  [<c19a5039>] __init_begin+0x39/0x41
  [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12 13:50:24 +03:00
Catalin Marinas
1260866a27 x86: Provide _sdata in the vmlinux.lds.S file
_sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.

[ Impact: add new global symbol ]

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
LKML-Reference: <20090511122105.26556.96593.stgit@pc1117.cambridge.arm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 09:21:33 +02:00
Yinghai Lu
12274e96b4 x86: use zalloc_cpumask_var in arch_early_irq_init
So we make sure MAXSMP gets a cleared cpumask

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-11 20:04:36 -07:00
Yinghai Lu
8c5dd8f433 x86: handle initrd that extends into unusable memory
On a system where system memory (according e820) is not covered by
mtrr, mtrr_trim_memory converts a portion of memory to reserved, but
bootloader has already put the initrd in that range.

Thus, we need to have 64bit to use relocate_initrd too.

[ Impact: fix using initrd when mtrr_trim_memory happen ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
2009-06-11 15:19:13 -07:00
Ingo Molnar
0d5959723e Merge branch 'linus' into x86/mce3
Conflicts:
	arch/x86/kernel/cpu/mcheck/mce_64.c
	arch/x86/kernel/irq.c

Merge reason: Resolve the conflicts above.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 23:31:52 +02:00
Linus Torvalds
8a1ca8cedd Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
  perf_counter: Turn off by default
  perf_counter: Add counter->id to the throttle event
  perf_counter: Better align code
  perf_counter: Rename L2 to LL cache
  perf_counter: Standardize event names
  perf_counter: Rename enums
  perf_counter tools: Clean up u64 usage
  perf_counter: Rename perf_counter_limit sysctl
  perf_counter: More paranoia settings
  perf_counter: powerpc: Implement generalized cache events for POWER processors
  perf_counters: powerpc: Add support for POWER7 processors
  perf_counter: Accurate period data
  perf_counter: Introduce struct for sample data
  perf_counter tools: Normalize data using per sample period data
  perf_counter: Annotate exit ctx recursion
  perf_counter tools: Propagate signals properly
  perf_counter tools: Small frequency related fixes
  perf_counter: More aggressive frequency adjustment
  perf_counter/x86: Fix the model number of Intel Core2 processors
  perf_counter, x86: Correct some event and umask values for Intel processors
  ...
2009-06-11 14:01:07 -07:00
Linus Torvalds
b640f042fa Merge branch 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  vgacon: use slab allocator instead of the bootmem allocator
  irq: use kcalloc() instead of the bootmem allocator
  sched: use slab in cpupri_init()
  sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var()
  memcg: don't use bootmem allocator in setup code
  irq/cpumask: make memoryless node zero happy
  x86: remove some alloc_bootmem_cpumask_var calling
  vt: use kzalloc() instead of the bootmem allocator
  sched: use kzalloc() instead of the bootmem allocator
  init: introduce mm_init()
  vmalloc: use kzalloc() instead of alloc_bootmem()
  slab: setup allocators earlier in the boot sequence
  bootmem: fix slab fallback on numa
  bootmem: use slab if bootmem is no longer available
2009-06-11 12:25:06 -07:00
Linus Torvalds
6cd8e300b4 Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
  KVM: Prevent overflow in largepages calculation
  KVM: Disable large pages on misaligned memory slots
  KVM: Add VT-x machine check support
  KVM: VMX: Rename rmode.active to rmode.vm86_active
  KVM: Move "exit due to NMI" handling into vmx_complete_interrupts()
  KVM: Disable CR8 intercept if tpr patching is active
  KVM: Do not migrate pending software interrupts.
  KVM: inject NMI after IRET from a previous NMI, not before.
  KVM: Always request IRQ/NMI window if an interrupt is pending
  KVM: Do not re-execute INTn instruction.
  KVM: skip_emulated_instruction() decode instruction if size is not known
  KVM: Remove irq_pending bitmap
  KVM: Do not allow interrupt injection from userspace if there is a pending event.
  KVM: Unprotect a page if #PF happens during NMI injection.
  KVM: s390: Verify memory in kvm run
  KVM: s390: Sanity check on validity intercept
  KVM: s390: Unlink vcpu on destroy - v2
  KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock
  KVM: s390: use hrtimer for clock wakeup from idle - v2
  KVM: s390: Fix memory slot versus run - v3
  ...
2009-06-11 10:03:30 -07:00
Yinghai Lu
dad213aeb5 irq/cpumask: make memoryless node zero happy
Don't hardcode to node zero for early boot IRQ setup memory allocations.

[ penberg@cs.helsinki.fi: minor cleanups ]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:27:08 +03:00
Yinghai Lu
38c7fed2f5 x86: remove some alloc_bootmem_cpumask_var calling
Now that we set up the slab allocator earlier, we can get rid of some
alloc_bootmem_cpumask_var() calls in boot code.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-11 19:27:07 +03:00
Ingo Molnar
940010c5a3 Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/kernel/irqinit.c
	arch/x86/kernel/irqinit_64.c
	arch/x86/kernel/traps.c
	arch/x86/mm/fault.c
	include/linux/sched.h
	kernel/exit.c
2009-06-11 17:55:42 +02:00
Peter Zijlstra
8be6e8f3c3 perf_counter: Rename L2 to LL cache
The top (fastest) and last level (biggest) caches are the most
interesting ones, performance wise.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ Fixed the Nehalem LL table to LLC Reference/Miss events ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:17 +02:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Hidetoshi Seto
62fdac5913 x86, mce: Add boot options for corrected errors
This patch introduces three boot options (no_cmci, dont_log_ce
and ignore_ce) to control handling for corrected errors.

The "mce=no_cmci" boot option disables the CMCI feature.

Since CMCI is a new feature so having boot controls to disable
it will be a help if the hardware is misbehaving.

The "mce=dont_log_ce" boot option disables logging for corrected
errors. All reported corrected errors will be cleared silently.
This option will be useful if you never care about corrected
errors.

The "mce=ignore_ce" boot option disables features for corrected
errors, i.e. polling timer and cmci.  All corrected events are
not cleared and kept in bank MSRs.

Usually this disablement is not recommended, however it will be
a help if there are some conflict with the BIOS or hardware
monitoring applications etc., that clears corrected events in
banks instead of OS.

[ And trivial cleanup (space -> tab) for doc is included. ]

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 11:42:18 +02:00
Hidetoshi Seto
77e26cca20 x86, mce: Fix mce printing
This patch:

 - Adds print_mce_head() instead of first flag
 - Makes the header to be printed always
 - Stops double printing of corrected errors

[ This portion originates from Huang Ying's patch ]

Originally-From: Huang Ying <ying.huang@intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <4A30AC83.5010708@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 11:42:17 +02:00
Linus Torvalds
8623661180 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
  Revert "x86, bts: reenable ptrace branch trace support"
  tracing: do not translate event helper macros in print format
  ftrace/documentation: fix typo in function grapher name
  tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
  tracing: add protection around module events unload
  tracing: add trace_seq_vprint interface
  tracing: fix the block trace points print size
  tracing/events: convert block trace points to TRACE_EVENT()
  ring-buffer: fix ret in rb_add_time_stamp
  ring-buffer: pass in lockdep class key for reader_lock
  tracing: add annotation to what type of stack trace is recorded
  tracing: fix multiple use of __print_flags and __print_symbolic
  tracing/events: fix output format of user stack
  tracing/events: fix output format of kernel stack
  tracing/trace_stack: fix the number of entries in the header
  ring-buffer: discard timestamps that are at the start of the buffer
  ring-buffer: try to discard unneeded timestamps
  ring-buffer: fix bug in ring_buffer_discard_commit
  ftrace: do not profile functions when disabled
  tracing: make trace pipe recognize latency format flag
  ...
2009-06-10 19:53:40 -07:00
Linus Torvalds
8f40642ad3 Merge branch 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hookup sys_rt_tgsigqueueinfo
  signals: implement sys_rt_tgsigqueueinfo
  signals: split do_tkill
2009-06-10 19:50:52 -07:00
Peter Zijlstra
9e350de37a perf_counter: Accurate period data
We currently log hw.sample_period for PERF_SAMPLE_PERIOD, however this is
incorrect. When we adjust the period, it will only take effect the next
cycle but report it for the current cycle. So when we adjust the period
for every cycle, we're always wrong.

Solve this by keeping track of the last_period.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 02:39:02 +02:00
Peter Zijlstra
df1a132bf3 perf_counter: Introduce struct for sample data
For easy extension of the sample data, put it in a structure.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 02:39:02 +02:00
Linus Torvalds
3f6280ddf2 Merge branch 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  amd-iommu: remove unnecessary "AMD IOMMU: " prefix
  amd-iommu: detach device explicitly before attaching it to a new domain
  amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling
  dma-debug: simplify logic in driver_filter()
  dma-debug: disable/enable irqs only once in device_dma_allocations
  dma-debug: use pr_* instead of printk(KERN_* ...)
  dma-debug: code style fixes
  dma-debug: comment style fixes
  dma-debug: change hash_bucket_find from first-fit to best-fit
  x86: enable GART-IOMMU only after setting up protection methods
  amd_iommu: fix lock imbalance
  dma-debug: add documentation for the driver filter
  dma-debug: add dma_debug_driver kernel command line
  dma-debug: add debugfs file for driver filter
  dma-debug: add variables and checks for driver filter
  dma-debug: fix debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device
  dma-debug: use sg_dma_len accessor
  dma-debug: use sg_dma_address accessor instead of using dma_address directly
  amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS
  amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS
  ...
2009-06-10 16:19:14 -07:00
Linus Torvalds
be15f9d63b Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits)
  xen: cache cr0 value to avoid trap'n'emulate for read_cr0
  xen/x86-64: clean up warnings about IST-using traps
  xen/x86-64: fix breakpoints and hardware watchpoints
  xen: reserve Xen start_info rather than e820 reserving
  xen: add FIX_TEXT_POKE to fixmap
  lguest: update lazy mmu changes to match lguest's use of kvm hypercalls
  xen: honour VCPU availability on boot
  xen: add "capabilities" file
  xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
  xen/sys/hypervisor: change writable_pt to features
  xen: add /sys/hypervisor support
  xen/xenbus: export xenbus_dev_changed
  xen: use device model for suspending xenbus devices
  xen: remove suspend_cancel hook
  xen/dev-evtchn: clean up locking in evtchn
  xen: export ioctl headers to userspace
  xen: add /dev/xen/evtchn driver
  xen: add irq_from_evtchn
  xen: clean up gate trap/interrupt constants
  xen: set _PAGE_NX in __supported_pte_mask before pagetable construction
  ...
2009-06-10 16:16:27 -07:00
Linus Torvalds
595dc54a1d Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: move rdtsc_barrier() into the TSC vread method
2009-06-10 16:15:59 -07:00
Linus Torvalds
9b29e8228a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Clear TS in irq_ts_save() when in an atomic section
  x86: Detect use of extended APIC ID for AMD CPUs
  x86: memtest: remove 64-bit division
  x86, UV: Fix macros for multiple coherency domains
  x86: Fix non-lazy GS handling in sys_vm86()
  x86: Add quirk for reboot stalls on a Dell Optiplex 360
  x86: Fix UV BAU activation descriptor init
2009-06-10 16:15:14 -07:00
Linus Torvalds
bec706838e Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, setup: fix comment in the "glove box" code
  x86, setup: "glove box" BIOS interrupts in the video code
  x86, setup: "glove box" BIOS interrupts in the MCA code
  x86, setup: "glove box" BIOS interrupts in the EDD code
  x86, setup: "glove box" BIOS interrupts in the APM code
  x86, setup: "glove box" BIOS interrupts in the core boot code
  x86, setup: "glove box" BIOS calls -- infrastructure
2009-06-10 16:14:41 -07:00
Linus Torvalds
bb7762961d Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  x86: fix system without memory on node0
  x86, mm: Fix node_possible_map logic
  mm, x86: remove MEMORY_HOTPLUG_RESERVE related code
  x86: make sparse mem work in non-NUMA mode
  x86: process.c, remove useless headers
  x86: merge process.c a bit
  x86: use sparse_memory_present_with_active_regions() on UMA
  x86: unify 64-bit UMA and NUMA paging_init()
  x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB
  x86: Sanity check the e820 against the SRAT table using e820 map only
  x86: clean up and and print out initial max_pfn_mapped
  x86/pci: remove rounding quirk from e820_setup_gap()
  x86, e820, pci: reserve extra free space near end of RAM
  x86: fix typo in address space documentation
  x86: 46 bit physical address support on 64 bits
  x86, mm: fault.c, use printk_once() in is_errata93()
  x86: move per-cpu mmu_gathers to mm/init.c
  x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c
  x86: unify noexec handling
  x86: remove (null) in /sys kernel_page_tables
  ...
2009-06-10 16:13:20 -07:00
Linus Torvalds
48c72d1ab4 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, microcode: Simplify vfree() use
  x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic
2009-06-10 16:13:06 -07:00
Linus Torvalds
5301e0de34 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86_64: fix incorrect comments
  x86: unify restore_fpu_checking
  x86_32: introduce restore_fpu_checking()
2009-06-10 15:55:04 -07:00
Linus Torvalds
c44e3ed539 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: cpu_debug: Remove model information to reduce encoding-decoding
  x86: fixup numa_node information for AMD CPU northbridge functions
  x86: k8 convert node_to_k8_nb_misc() from a macro to an inline function
  x86: cacheinfo: complete L2/L3 Cache and TLB associativity field definitions
  x86/docs: add description for cache_disable sysfs interface
  x86: cacheinfo: disable L3 ECC scrubbing when L3 cache index is disabled
  x86: cacheinfo: replace sysfs interface for cache_disable feature
  x86: cacheinfo: use cached K8 NB_MISC devices instead of scanning for it
  x86: cacheinfo: correct return value when cache_disable feature is not active
  x86: cacheinfo: use L3 cache index disable feature only for CPUs that support it
2009-06-10 15:51:15 -07:00
Linus Torvalds
7dc3ca39cb Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, nmi: Use predefined numbers instead of hardcoded one
  x86: asm/processor.h: remove double declaration
  x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType
  x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000
  x86, mtrr: remove mtrr MSRs double declaration
  x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000
  x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000
  x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap
  x86: mce: remove duplicated #include
  x86: msr-index.h remove duplicate MSR C001_0015 declaration
  x86: clean up arch/x86/kernel/tsc_sync.c a bit
  x86: use symbolic name for VM86_SIGNAL when used as vm86 default return
  x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h
  x86: avoid multiple declaration of kstack_depth_to_print
  x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used
  x86: clean up declarations and variables
  x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static
  x86 early quirks: eliminate unused function
2009-06-10 15:49:36 -07:00
Linus Torvalds
aa98936e4f Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, 64-bit: ifdef out struct thread_struct::ip
  x86, 32-bit: ifdef out struct thread_struct::fs
  x86: clean up alternative.h
2009-06-10 15:49:10 -07:00
Linus Torvalds
82782ca77d Merge branch 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
  x86, boot: add new generated files to the appropriate .gitignore files
  x86, boot: correct the calculation of ZO_INIT_SIZE
  x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN
  x86, boot: correct sanity checks in boot/compressed/misc.c
  x86: add extension fields for bootloader type and version
  x86, defconfig: update kernel position parameters
  x86, defconfig: update to current, no material changes
  x86: make CONFIG_RELOCATABLE the default
  x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB
  x86: document new bzImage fields
  x86, boot: make kernel_alignment adjustable; new bzImage fields
  x86, boot: remove dead code from boot/compressed/head_*.S
  x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits
  x86, boot: make symbols from the main vmlinux available
  x86, boot: determine compressed code offset at compile time
  x86, boot: use appropriate rep string for move and clear
  x86, boot: zero EFLAGS on 32 bits
  x86, boot: set up the decompression stack as early as possible
  x86, boot: straighten out ranges to copy/zero in compressed/head*.S
  x86, boot: stylistic cleanups for boot/compressed/head_64.S
  ...

Fixed trivial conflict in arch/x86/configs/x86_64_defconfig manually
2009-06-10 15:30:41 -07:00
Linus Torvalds
f0d5e12bd4 Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits)
  x86, apic: Fix dummy apic read operation together with broken MP handling
  x86, apic: Restore irqs on fail paths
  x86: Print real IOAPIC version for x86-64
  x86: enable_update_mptable should be a macro
  sparseirq: Allow early irq_desc allocation
  x86, io-apic: Don't mark pin_programmed early
  x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled
  x86, irq: update_mptable needs pci_routeirq
  x86: don't call read_apic_id if !cpu_has_apic
  x86, apic: introduce io_apic_irq_attr
  x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix
  x86: read apic ID in the !acpi_lapic case
  x86: apic: Fixmap apic address even if apic disabled
  x86: display extended apic registers with print_local_APIC and cpu_debug code
  x86: read apic ID in the !acpi_lapic case
  x86: clean up and fix setup_clear/force_cpu_cap handling
  x86: apic: Check rev 3 fadt correctly for physical_apic bit
  x86/pci: update pirq_enable_irq() to setup io apic routing
  x86/acpi: move setup io apic routing out of CONFIG_ACPI scope
  x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector()
  ...
2009-06-10 15:25:41 -07:00
Harald Welte
0fea615e52 CPUFREQ: Mark e_powersaver driver as EXPERIMENTAL and DANGEROUS
The e_powersaver driver for VIA's C7 CPU's needs to be marked as
DANGEROUS as it configures the CPU to power states that are out
of specification.

According to Centaur, all systems with C7 and Nano CPU's support
the ACPI p-state method.  Thus, the acpi-cpufreq driver should
be used instead.

Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-10 15:22:44 -07:00
Harald Welte
0de51088e6 CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs
The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states
using a MSR interface.  The Linux driver just never made use of it, since in
addition to the check for the EST flag it also checked if the vendor is Intel.

Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
[ Removed the vendor checks entirely  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-10 15:22:44 -07:00
Peter Zijlstra
bd2b5b1284 perf_counter: More aggressive frequency adjustment
Also employ the overflow handler to adjust the frequency, this results
in a stable frequency in about 40~50 samples, instead of that many ticks.

This also means we can start sampling at a sample period of 1 without
running head-first into the throttle.

It relies on sched_clock() to accurately measure the time difference
between the overflow NMIs.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-10 16:55:26 +02:00
Yong Wang
dc81081b2d perf_counter/x86: Fix the model number of Intel Core2 processors
Fix the model number of Intel Core2 processors according to the
documentation: Intel Processor Identification with the CPUID
Instruction: http://www.intel.com/support/processors/sb/cs-009861.htm

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Also-Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090610090612.GA26580@ywang-moblin2.bj.intel.com>
[ Added two more model numbers suggested by Arnd Bergmann ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-10 13:04:43 +02:00
Andi Kleen
a0861c02a9 KVM: Add VT-x machine check support
VT-x needs an explicit MC vector intercept to handle machine checks in the
hyper visor.

It also has a special option to catch machine checks that happen
during VT entry.

Do these interceptions and forward them to the Linux machine check
handler. Make it always look like user space is interrupted because
the machine check handler treats kernel/user space differently.

Thanks to Jiang Yunhong for help and testing.

Cc: stable@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 12:27:08 +03:00
Marcelo Tosatti
32f8840064 KVM: use smp_send_reschedule in kvm_vcpu_kick
KVM uses a function call IPI to cause the exit of a guest running on a
physical cpu. For virtual interrupt notification there is no need to
wait on IPI receival, or to execute any function.

This is exactly what the reschedule IPI does, without the overhead
of function IPI. So use it instead of smp_call_function_single in
kvm_vcpu_kick.

Also change the "guest_mode" variable to a bit in vcpu->requests, and
use that to collapse multiple IPI's that would be issued between the
first one and zeroing of guest mode.

This allows kvm_vcpu_kick to called with interrupts disabled.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:53 +03:00
Marcelo Tosatti
a90ede7b17 KVM: x86: paravirt skip pit-through-ioapic boot check
Skip the test which checks if the PIT is properly routed when
using the IOAPIC, aimed at buggy hardware.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:24 +03:00
Yong Wang
fecc8ac849 perf_counter, x86: Correct some event and umask values for Intel processors
Correct some event and UMASK values according to Intel SDM,
in the Nehalem and Atom tables.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090609131553.GA12489@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-09 16:50:07 +02:00
Andreas Herrmann
42937e81a8 x86: Detect use of extended APIC ID for AMD CPUs
Booting a 32-bit kernel on Magny-Cours results in the following panic:

  ...
  Using APIC driver default
  ...
  Overriding APIC driver with bigsmp
  ...
  Getting VERSION: 80050010
  Getting VERSION: 80050010
  Getting ID: 10000000
  Getting ID: ef000000
  Getting LVT0: 700
  Getting LVT1: 10000
  Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0)
  Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2
  Call Trace:
   [<c05194da>] ? panic+0x38/0xd3
   [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f
   [<c073b19d>] ? kernel_init+0x3e/0x141
   [<c073b15f>] ? kernel_init+0x0/0x141
   [<c020325f>] ? kernel_thread_helper+0x7/0x10

The reason is that default_get_apic_id handled extension of local APIC
ID field just in case of XAPIC.

Thus for this AMD CPU, default_get_apic_id() returns 0 and
bigsmp_get_apic_id() returns 16 which leads to the respective kernel
panic.

This patch introduces a Linux specific feature flag to indicate
support for extended APIC id (8 bits instead of 4 bits width) and sets
the flag on AMD CPUs if applicable.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090608135509.GA12431@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-09 15:28:46 +02:00
Yinghai Lu
eaa958402e cpumask: alloc zeroed cpumask for static cpumask_var_ts
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already.  Avoid surprises when MAXSMP is enabled.

Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-09 22:30:27 +09:30
Joerg Roedel
e9a22a13c7 amd-iommu: remove unnecessary "AMD IOMMU: " prefix
That prefix is already included in the DUMP_printk macro. So there is no
need to repeat it in the format string.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-09 12:01:58 +02:00
Joerg Roedel
71ff3bca2f amd-iommu: detach device explicitly before attaching it to a new domain
This fixes a bug with a device that could not be assigned to a KVM guest
because it is still assigned to a dma_ops protection domain.

[chrisw: simply remove WARN_ON(), will always fire since dev->driver
will be pci-sub]

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-09 11:14:14 +02:00
Joerg Roedel
29150078d7 amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling
Handling this event causes device assignment in KVM to fail because the
device gets re-attached as soon as the pci-stub registers as the driver
for the device.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-09 10:54:18 +02:00
Joerg Roedel
d2dd01de99 Merge commit 'tip/core/iommu' into amd-iommu/fixes 2009-06-09 10:50:57 +02:00
Thomas Gleixner
820a644211 perf_counter, x86: Clean up hw_cache_event ids copies
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 23:10:42 +02:00
Thomas Gleixner
f86748e91a perf_counter, x86: Implement generalized cache event types, add AMD support
Fill in amd_hw_cache_event_id[] with the AMD CPU specific events,
for family 0x0f, 0x10 and 0x11.

There's apparently no distinction between load and store events, so
we only fill in the load events.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 23:10:37 +02:00
Jack Steiner
c4ed3f04ba x86, UV: Fix macros for multiple coherency domains
Fix bug in the SGI UV macros that support systems with multiple
coherency domains.  The macros used for referencing global MMR
(chipset registers) are failing to correctly "or" the NASID
(node identifier) bits that reside above M+N. These high bits
are supplied automatically by the chipset for memory accesses
coming from the processor socket.

However, the bits must be present for references to the special
global MMR space used to map chipset registers. (See uv_hub.h
for more details ...)

The bug results in references to invalid/incorrect nodes.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090608154405.GA16395@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 18:57:47 +02:00
Ingo Molnar
1123e3ad73 perf_counter: Clean up x86 boot messages
Standardize and tidy up all the messages we print during
perfcounter initialization.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 12:29:30 +02:00
Thomas Gleixner
ad68922061 perf_counter, x86: Implement generalized cache event types, add Atom support
Fill in core2_hw_cache_event_id[] with the Atom model specific events.

The events can be used in all the tools via the -e (--event) parameter,
for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses".

( Note: these are straight from the Intel manuals - not tested yet.)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 11:18:27 +02:00
Thomas Gleixner
0312af8416 perf_counter, x86: Implement generalized cache event types, add Core2 support
Fill in core2_hw_cache_event_id[] with the Core2 model specific events.

The events can be used in all the tools via the -e (--event) parameter,
for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses".

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 11:18:26 +02:00
Figo.zhang
aeef50bc04 x86, microcode: Simplify vfree() use
vfree() does its own 'NULL' check, so no need for check before
calling it.

In v2, remove the stray newline.

[ Impact: cleanup ]

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
LKML-Reference: <1244385036.3402.11.camel@myhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 16:35:11 +02:00
Lubomir Rintel
3aa6b186f8 x86: Fix non-lazy GS handling in sys_vm86()
This fixes a stack corruption panic or null dereference oops
due to a bad GS in resume_userspace() when returning from
sys_vm86() and calling lockdep_sys_exit().

Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR
enabled.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1244384628.2323.4.camel@bimbo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 16:31:23 +02:00
Cyrill Gorcunov
103428e57b x86, apic: Fix dummy apic read operation together with broken MP handling
Ingo Molnar reported that read_apic is buggy novadays:

[    0.000000] Using APIC driver default
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic"
[    0.000000] APIC: disable apic facility
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b()
[    0.000000] Hardware name: HP OmniBook PC

Indeed we still rely on apic->read operation for SMP compiled
kernel. And instead of disfigure the SMP code with #ifdef we
allow to call apic->read. To capture any unexpected results
we check for apic->read being called for sane reason via
WARN_ON_ONCE but(!) instead of OR we should use AND logical
operation (thanks Yinghai for spotting the root of the problem).

Along with that we could be have bad MP table and we are
to fix it that way no SMP started and no complains about
BIOS bug if apic was just disabled via command line.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090607124840.GD4547@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 16:08:05 +02:00
Jean Delvare
4a4aca641b x86: Add quirk for reboot stalls on a Dell Optiplex 360
The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so
the same quirk is needed.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve Conklin <steve.conklin@canonical.com>
Cc: Leann Ogasawara <leann.ogasawara@canonical.com>
Cc: <stable@kernel.org>
LKML-Reference: <200906051202.38311.jdelvare@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 15:51:20 +02:00
Jaswinder Singh Rajput
5095f59bda x86: cpu_debug: Remove model information to reduce encoding-decoding
Remove model information, encoding/decoding and reduce bookkeeping.

This, besides removing a lot of code and cleaning up the code, also
enables these features on many more CPUs that were enumerated before.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
LKML-Reference: <1244224637.8212.6.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 12:22:56 +02:00
Ingo Molnar
5f4457a4f6 Merge branch 'linus' into x86/cpu 2009-06-07 12:22:15 +02:00
Ingo Molnar
56fdd18c7b Merge branch 'linus' into core/iommu
Merge reason: This branch was on an -rc5 base so pull almost-2.6.30
              to resync with the latest upstream fixes and make sure
              the combination works fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 11:35:05 +02:00
Ingo Molnar
75b5032212 Merge branch 'linus' into perfcounters/core
Merge reason: Pick up the latest fixes before the -v8 perfcounters
	      release.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 20:21:28 +02:00
Ingo Molnar
8326f44da0 perf_counter: Implement generalized cache event types
Extend generic event enumeration with the PERF_TYPE_HW_CACHE
method.

This is a 3-dimensional space:

       { L1-D, L1-I, L2, ITLB, DTLB, BPU } x
       { load, store, prefetch } x
       { accesses, misses }

User-space passes in the 3 coordinates and the kernel provides
a counter. (if the hardware supports that type and if the
combination makes sense.)

Combinations that make no sense produce a -EINVAL.
Combinations that are not supported by the hardware produce -ENOTSUP.

Extend the tools to deal with this, and rewrite the event symbol
parsing code with various popular aliases for the units and
access methods above. So 'l1-cache-miss' and 'l1d-read-ops' are
both valid aliases.

( x86 is supported for now, with the Nehalem event table filled in,
  and with Core2 and Atom having placeholder tables. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 13:14:47 +02:00
Ingo Molnar
a21ca2cac5 perf_counter: Separate out attr->type from attr->config
Counter type is a frequently used value and we do a lot of
bit juggling by encoding and decoding it from attr->config.

Clean this up by creating a separate attr->type field.

Also clean up the various similarly complex user-space bits
all around counter attribute management.

The net improvement is significant, and it will be easier
to add a new major type (which is what triggered this cleanup).

(This changes the ABI, all tools are adapted.)
(PowerPC build-tested.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 11:37:22 +02:00
Mark Langsdorf
fe2245c905 x86: enable GART-IOMMU only after setting up protection methods
The current code to set up the GART as an IOMMU enables GART
translations before it removes the aperture from the kernel memory
map, sets the GART PTEs to UC, sets up the guard and scratch
pages, or does a wbinvd().  This leaves the possibility of cache
aliasing open and can cause system crashes.

Re-order the code so as to enable the GART translations only
after all safeguards are in place and the tlb has been flushed.

AMD has tested this patch on both Istanbul systems and 1st
generation Opteron systems with APG enabled and seen no adverse
effects.  Istanbul systems with HT Assist enabled sometimes
see MCE errors due to cache artifacts with the unmodified
code.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Cc: <stable@kernel.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: akpm@linux-foundation.org
Cc: jbarnes@virtuousgeek.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 09:42:09 +02:00
Dave Jones
2c701b1028 [CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH
The powernow-k8 driver checks to see that the Performance Control/Status
Registers are declared as FFH (functional fixed hardware) by the BIOS.
However, this check got broken in the commit:
 0e64a0c982
 [CPUFREQ] checkpatch cleanups for powernow-k8

Fix based on an original patch from Naga Chumbalkar.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-05 13:25:25 -04:00
Hidetoshi Seto
8051dbd2df x86, mce: fix for mce counters
Make the MCE counters work on 32bit and add poll count in
arch_irq_stat_cpu.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:48:59 -07:00
Andi Kleen
9b1beaf2b5 x86, mce: support action-optional machine checks
Newer Intel CPUs support a new class of machine checks called recoverable
action optional.

Action Optional means that the CPU detected some form of corruption in
the background and tells the OS about using a machine check
exception. The OS can then take appropiate action, like killing the
process with the corrupted data or logging the event properly to disk.

This is done by the new generic high level memory failure handler added
in a earlier patch. The high level handler takes the address with the
failed memory and does the appropiate action, like killing the process.

In this version of the patch the high level handler is stubbed out
with a weak function to not create a direct dependency on the hwpoison
branch.

The high level handler cannot be directly called from the machine check
exception though, because it has to run in a defined process context to
be able to sleep when taking VM locks (it is not expected to sleep for a
long time, just do so in some exceptional cases like lock contention)

Thus the MCE handler has to queue a work item for process context,
trigger process context and then call the high level handler from there.

This patch adds two path to process context: through a per thread kernel
exit notify_user() callback or through a high priority work item.
The first runs when the process exits back to user space, the other when
it goes to sleep and there is no higher priority process.

The machine check handler will schedule both, and whoever runs first
will grab the event. This is done because quick reaction to this
event is critical to avoid a potential more fatal machine check
when the corruption is consumed.

There is a simple lock less ring buffer to queue the corrupted
addresses between the exception handler and the process context handler.
Then in process context it just calls the high level VM code with
the corrupted PFNs.

The code adds the required code to extract the failed address from
the CPU's machine check registers. It doesn't try to handle all
possible cases -- the specification has 6 different ways to specify
memory address -- but only the linear address.

Most of the required checking has been already done earlier in the
mce_severity rule checking engine.  Following the Intel
recommendations Action Optional errors are only enabled for known
situations (encoded in MCACODs). The errors are ignored otherwise,
because they are action optional.

v2: Improve comment, disable preemption while processing ring buffer
    (reported by Ying Huang)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:48:59 -07:00
Andi Kleen
9ff36ee966 x86, mce: rename mce_notify_user to mce_notify_irq
Rename the mce_notify_user function to mce_notify_irq. The next
patch will split the wakeup handling of interrupt context
and of process context and it's better to give it a clearer
name for this.

Contains a fix from Ying Huang

[ Impact: cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:48:04 -07:00
Andi Kleen
4ef702c10b x86: fix panic with interrupts off (needed for MCE)
For some time each panic() called with interrupts disabled
triggered the !irqs_disabled() WARN_ON in smp_call_function(),
producing ugly backtraces and confusing users.

This is a common situation with machine checks for example which
tend to call panic with interrupts disabled, but will also hit
in other situations e.g. panic during early boot.  In fact it
means that panic cannot be called in many circumstances, which
would be bad.

This all started with the new fancy queued smp_call_function,
which is then used by the shutdown path to shut down the other
CPUs.

On closer examination it turned out that the fancy RCU
smp_call_function() does lots of things not suitable in a panic
situation anyways, like allocating memory and relying on complex
system state.

I originally tried to patch this over by checking for panic
there, but it was quite complicated and the original patch
was also not very popular.  This also didn't fix some of the
underlying complexity problems.

The new code in post 2.6.29 tries to patch around this by
checking for oops_in_progress, but that is not enough to make
this fully safe and I don't think that's a real solution
because panic has to be reliable.

So instead use an own vector to reboot.  This makes the reboot
code extremly straight forward, which is definitely a big plus
in a panic situation where it is important to avoid relying on
too much kernel state.  The new simple code is also safe to be
called from interupts off region because it is very very simple.

There can be situations where it is important that panic
is reliable.  For example on a fatal machine check the panic
is needed to get the system up again and running as quickly
as possible.  So it's important that panic is reliable and
all function it calls simple.

This is why I came up with this simple vector scheme.
It's very hard to beat in simplicity.  Vectors are not
particularly precious anymore since all big systems are
using per CPU vectors.

Another possibility would have been to use an NMI similar
to kdump, but there is still the problem that NMIs don't
work reliably on some systems due to BIOS issues.  NMIs
would have been able to stop CPUs running with interrupts
off too.  In the sake of universal reliability I opted for
using a non NMI vector for now.

I put the reboot vector into the highest priority bucket of
the APIC vectors and moved the 64bit UV_BAU message down
instead into the next lower priority.

[ Impact: bug fix, fixes an old regression ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:35 -07:00
Huang Ying
4611a6fa4b x86, mce: export MCE severities coverage via debugfs
The MCE severity judgement code is data-driven, so code coverage tools
such as gcov can not be used for measuring coverage. Instead a dedicated
coverage mechanism is implemented.  The kernel keeps track of rules
executed and reports them in debugfs.

This is useful for increasing coverage of the mce-test testsuite.

Right now it's unconditionally enabled because it's very little code.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:34 -07:00
Andi Kleen
ed7290d0ee x86, mce: implement new status bits
The x86 architecture recently added some new machine check status bits:
S(ignalled) and AR (Action-Required). Signalled allows to check
if a specific event caused an exception or was just logged through CMCI.
AR allows the kernel to decide if an event needs immediate action
or can be delayed or ignored.

Implement support for these new status bits. mce_severity() uses
the new bits to grade the machine check correctly and decide what
to do. The exception handler uses AR to decide to kill or not.
The S bit is used to separate events between the poll/CMCI handler
and the exception handler.

Classical UC always leads to panic. That was true before anyways
because the existing CPUs always passed a PCC with it.

Also corrects the rules whether to kill in user or kernel context
and how to handle missing RIPV.

The machine check handler largely uses the mce-severity grading
engine now instead of making its own decisions. This means the logic
is centralized in one place.  This is useful because it has to be
evaluated multiple times.

v2: Some rule fixes; Add AO events
Fix RIPV, RIPV|EIPV order (Ying Huang)
Fix UCNA with AR=1 message (Ying Huang)
Add comment about panicing in m_c_p.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:34 -07:00
Andi Kleen
86503560e4 x86, mce: print header/footer only once for multiple MCEs
When multiple MCEs are printed print the "HARDWARE ERROR" header
and "This is not a software error" footer only once. This
makes the output much more compact with many CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:34 -07:00
Andi Kleen
29b0f591d6 x86, mce: default to panic timeout for machine checks
Fatal machine checks can be logged to disk after boot, but only if
the system did a warm reboot. That's unfortunately difficult with the
default panic behaviour, which waits forever and the admin has to
press the power button because modern systems usually miss a reset button.
This clears the machine checks in the registers and make
it impossible to log them.

This patch changes the default for machine check panic to always
reboot after 30s. Then the mce can be successfully logged after
reboot.

I believe this will improve machine check experience for any
system running the X server.

This is dependent on successfull boot logging of MCEs. This currently
only works on Intel systems, on AMD there are quite a lot of systems
around which leave junk in the machine check registers after boot,
so it's disabled here. These systems will continue to default
to endless waiting panic.

v2: Only force panic timeout when it's shorter (H.Seto)
v3: Only force timeout when there is no timeout
(based on comment H.Seto)

[ Fix changelog - HS ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:33 -07:00
Huang Ying
1b2797dcc9 x86, mce: improve mce_get_rip
Assume IP on the stack is valid when either EIPV or RIPV are set.
This influences whether the machine check exception handler decides
to return or panic.

This fixes a test case in the mce-test suite and is more compliant
to the specification.

This currently only makes a difference in a artificial testing
scenario with the mce-test test suite.

Also in addition do not force the EIPV to be valid with the exact
register MSRs, and keep in trust the CS value on stack even if MSR
is available.

[AK: combination of patches from Huang Ying and Hidetoshi Seto, with
new description by me]
[add some description, no code changed - HS]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:33 -07:00
Andi Kleen
ac9603754d x86, mce: make non Monarch panic message "Fatal machine check" too
... instead of "Machine check". This is for consistency with the Monarch
panic message.

Based on a report from Ying Huang.

v2: But add a descriptive postfix so that the test suite can distingush.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:12 -07:00
Andi Kleen
3c0797925f x86, mce: switch x86 machine check handler to Monarch election.
On Intel platforms machine check exceptions are always broadcast to
all CPUs.  This patch makes the machine check handler synchronize all
these machine checks, elect a Monarch to handle the event and collect
the worst event from all CPUs and then process it first.

This has some advantages:

- When there is a truly data corrupting error the system panics as
  quickly as possible. This improves containment of corrupted
  data and makes sure the corrupted data never hits stable storage.

- The panics are synchronized and do not reenter the panic code
  on multiple CPUs (which currently does not handle this well).

- All the errors are reported. Currently it often happens that
  another CPU happens to do the panic first, but reports useless
  information (empty machine check) because the real error
  happened on another CPU which came in later.
  This is a big advantage on Nehalem where the 8 threads per CPU
  lead to often the wrong CPU winning the race and dumping
  useless information on a machine check.  The problem also occurs
  in a less severe form on older CPUs.

- The system can detect when no CPUs detected a machine check
  and shut down the system.  This can happen when one CPU is so
  badly hung that that it cannot process a machine check anymore
  or when some external agent wants to stop the system by
  asserting the machine check pin.  This follows Intel hardware
  recommendations.

- This matches the recommended error model by the CPU designers.

- The events can be output in true severity order

- When a panic happens on another CPU it makes sure to be actually
  be able to process the stop IPI by enabling interrupts.

The code is extremly careful to handle timeouts while waiting
for other CPUs. It can't rely on the normal timing mechanisms
(jiffies, ktime_get) because of its asynchronous/lockless nature,
so it uses own timeouts using ndelay() and a "SPINUNIT"

The timeout is configurable. By default it waits for upto one
second for the other CPUs.  This can be also disabled.

From some informal testing AMD systems do not see to broadcast
machine checks, so right now it's always disabled by default on
non Intel CPUs or also on very old Intel systems.

Includes fixes from Ying Huang
Fixed a "ecception" in a comment (H.Seto)
Moved global_nwo reset later based on suggestion from H.Seto
v2: Avoid duplicate messages

[ Impact: feature, fixes long standing problems. ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:12 -07:00
Andi Kleen
f94b61c2c9 x86, mce: implement panic synchronization
In some circumstances multiple CPUs can enter mce_panic() in parallel.
This gives quite confused output because they will all dump the same
machine check buffer.

The other problem is that they would all panic in parallel, but not
process each other's shutdown IPIs because interrupts are disabled.

Detect this situation early on in mce_panic(). On the first CPU
entering will do the panic, the others will just wait to be killed.

For paranoia reasons in case the other CPU dies during the MCE I added
a 5 seconds timeout. If it expires each CPU will panic on its own again.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:45:12 -07:00
Andi Kleen
ccc3c3192a x86, mce: implement bootstrapping for machine check wakeups
Machine checks support waking up the mcelog daemon quickly.

The original wake up code for this was pretty ugly, relying on
a idle notifier and a special process flag. The reason it did
it this way is that the machine check handler is not subject
to normal interrupt locking rules so it's not safe
to call wake_up().  Instead it set a process flag
and then either did the wakeup in the syscall return
or in the idle notifier.

This patch adds a new "bootstraping" method as replacement.

The idea is that the handler checks if it's in a state where
it is unsafe to call wake_up(). If it's safe it calls it directly.
When it's not safe -- that is it interrupted in a critical
section with interrupts disables -- it uses a new "self IPI" to trigger
an IPI to its own CPU. This can be done safely because IPI
triggers are atomic with some care. The IPI is raised
once the interrupts are reenabled and can then safely call
wake_up().

When APICs are disabled the event is just queued and will be picked up
eventually by the next polling timer. I think that's a reasonable
compromise, since it should only happen quite rarely.

Contains fixes from Ying Huang.

[ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:44:05 -07:00
Andi Kleen
bd19a5e6b7 x86, mce: check early in exception handler if panic is needed
The exception handler should behave differently if the exception is
fatal versus one that can be returned from.  In the first case it should
never clear any registers because these need to be preserved
for logging after the next boot. Otherwise it should clear them
on each CPU step by step so that other CPUs sharing the same bank don't
see duplicate events. Otherwise we risk reporting events multiple
times on any CPUs which have shared machine check banks, which
is a common problem on Intel Nehalem which has both SMT (two
CPU threads sharing banks) and shared machine check banks in the uncore.

Determine early in a special pass if any event requires a panic.
This uses the mce_severity() function added earlier.

This is needed for the next patch.

Also fixes a problem together with an earlier patch
that corrected events weren't logged on a fatal MCE.

[ Impact: Feature ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:39 -07:00
Andi Kleen
817f32d02a x86, mce: add table driven machine check grading
The machine check grading (as in deciding what should be done for a given
register value) has to be done multiple times soon and it's also getting
more complicated.
So it makes sense to consolidate it into a single function. To get smaller
and more straight forward and possibly more extensible code I opted towards
a new table driven method. The various rules are put into a table
when is then executed by a very simple interpreter.

The grading engine is in a new file mce-severity.c. I also added a private
include file mce-internal.h, because mce.h is already a bit too cluttered.

This is dead code right now, but will be used in followon patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:39 -07:00
Andi Kleen
a0189c70e5 x86, mce: remove TSC print heuristic
Previously mce_panic used a simple heuristic to avoid printing
old so far unreported machine check events on a mce panic. This worked
by comparing the TSC value at the start of the machine check handler
with the event time stamp and only printing newer ones.

This has a couple of issues, in particular on systems where the TSC
is not fully synchronized between CPUs it could lose events or print
old ones.

It is also problematic with full system synchronization as it is
added by the next patch.

Remove the TSC heuristic and instead replace it with a simple heuristic
to print corrected errors first and after that uncorrected errors
and finally the worst machine check as determined by the machine
check handler.

This simplifies the code because there is no need to pass the
original TSC value around.

Contains fixes from Ying Huang

[ Impact: bug fix, cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Ying Huang <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:39 -07:00
Andi Kleen
de8a84d85a x86, mce: log corrected errors when panicing
Normally the machine check handler ignores corrected errors and leaves
them to machine_check_poll(). But when panicing mcp won't run, so
log all errors.

Note: this can still miss some cases until the "early no way out"
patch later is applied too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:39 -07:00
Andi Kleen
8ee08347c1 x86, mce: extend struct mce user interface with more information.
Experience has shown that struct mce which is used to pass an machine
check to the user space daemon currently a few limitations.  Also some
data which is useful to print at panic level is also missing.

This patch addresses most of them. The same information is also
printed out together with mce panic.

struct mce can be painlessly extended in a compatible way, the mcelog
user space code just ignores additional fields with a warning.

- It doesn't provide a wall time timestamp. There have been a few
  complaints about that. Fix that by adding a 64bit time_t

- It doesn't provide the exact CPU identification. This makes
  it awkward for mcelog to decode the event correctly, especially
  when there are variations in the supported MCE codes on different
  CPU models or when mcelog is running on a different host after a panic.
  Previously the administrator had to specify the correct CPU
  when mcelog ran on a different host, but with the more variation
  in machine checks now it's better to auto detect that.
  It's also useful for more detailed analysis of CPU events.
  Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.

- Socket ID and initial APIC ID are useful to report because they
  allow to identify the failing CPU in some (not all) cases.
  This is also especially useful for the panic situation.
  This addresses one of the complaints from Thomas Gleixner earlier.

- The MCG capabilities MSR needs to be reported for some advanced
  error processing in mcelog

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:38 -07:00
Andi Kleen
d620c67fb9 x86, mce: support more than 256 CPUs in struct mce
The old struct mce had a limitation to 256 CPUs. But x86 Linux supports
more than that now with x2apic. Add a new field extcpu to report the
extended number.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:38 -07:00
Andi Kleen
f6fb0ac086 x86, mce: store record length into memory struct mce anchor
This makes it easier for tools who want to extract the mcelog out of
crash images or memory dumps to adapt to changing struct mce size.
The length field replaces padding, so it's fully compatible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:38 -07:00
Andi Kleen
ca84f69697 x86, mce: add MCE poll count to /proc/interrupts
Keep a count of the machine check polls (or CMCI events) in
/proc/interrupts.

Andi needs this for debugging, but it's also useful in general
to see what's going in by the kernel.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:38 -07:00
Andi Kleen
01ca79f141 x86, mce: add machine check exception count in /proc/interrupts
Useful for debugging, but it's also good general policy
to have a counter for all special interrupts there. This makes it easier
to diagnose where a CPU is spending its time.

[ Impact: feature, debugging tool ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03 14:40:38 -07:00
Ingo Molnar
128f048f0f perf_counter: Fix throttling lock-up
Throttling logic is broken and we can lock up with too small
hw sampling intervals.

Make the throttling code more robust: disable counters even
if we already disabled them.

( Also clean up whitespace damage i noticed while reading
  various pieces of code related to throttling. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-03 23:39:51 +02:00
Cliff Wickman
0e2595cdfd x86: Fix UV BAU activation descriptor init
The UV tlb shootdown code has a serious initialization error.

An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.

This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.

Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.

Tested on the UV simulator.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-03 13:07:31 +02:00
Jiri Slaby
367d04c4ec amd_iommu: fix lock imbalance
In alloc_coherent there is an omitted unlock on the path where mapping
fails. Add the unlock.

[ Impact: fix lock imbalance in alloc_coherent ]

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-06-03 10:34:55 +02:00
Yong Wang
a32881066e perf_counter/x86: Remove the IRQ (non-NMI) handling bits
Remove the IRQ (non-NMI) handling bits as NMI will be used always.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-03 09:53:34 +02:00
Peter Zijlstra
0d48696f87 perf_counter: Rename perf_counter_hw_event => perf_counter_attr
The structure isn't hw only and when I read event, I think about those
things that fall out the other end. Rename the thing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:33 +02:00
Peter Zijlstra
e4abb5d4f7 perf_counter: x86: Emulate longer sample periods
Do as Power already does, emulate sample periods up to 2^63-1 by
composing them of smaller values limited by hardware capabilities.
Only once we wrap the software period do we generate an overflow
event.

Just 10 lines of new code.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:31 +02:00
Peter Zijlstra
8a016db386 perf_counter: Remove the last nmi/irq bits
IRQ (non-NMI) sampling is not used anymore - remove the last few bits.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:31 +02:00
Peter Zijlstra
b23f3325ed perf_counter: Rename various fields
A few renames:

  s/irq_period/sample_period/
  s/irq_freq/sample_freq/
  s/PERF_RECORD_/PERF_SAMPLE_/
  s/record_type/sample_type/

And change both the new sample_type and read_format to u64.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-02 21:45:30 +02:00
Jiri Slaby
3d58829b05 x86, apic: Restore irqs on fail paths
lapic_resume forgets to restore interrupts on fail paths.
Fix that.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
2009-06-02 02:48:59 +02:00
Naga Chumbalkar
58f892e022 x86: Print real IOAPIC version for x86-64
Fix the fact that the IOAPIC version number in the x86_64 code path always
gets assigned to 0, instead of the correct value.

Before the patch: (from "dmesg" output):

 ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
 IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23     <---

 After the patch:
 ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
 IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23    <---

History:

io_apic_get_version() was compiled out of the x86_64 code path in the commit
f2c2cca3ac:

Author: Andi Kleen <ak@suse.de>
Date:   Tue Sep 26 10:52:37 2006 +0200

    [PATCH] Remove APIC version/cpu capability mpparse checking/printing

    ACPI went to great trouble to get the APIC version and CPU capabilities
    of different CPUs before passing them to the mpparser. But all
    that data was used was to print it out.  Actually it even faked some data
    based on the boot cpu, not on the actual CPU being booted.

    Remove all this code because it's not needed.

    Cc: len.brown@intel.com

At the time, the IOAPIC version number was deliberately not printed
in the x86_64 code path. However, after the x86 and x86_64 files were
merged, the net result is that the IOAPIC version is printed incorrectly
in the x86_64 code path.

The patch below provides a fix. I have tested it with acpi, and with
acpi=off, and did not see any problems.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
*************************
2009-06-02 02:03:18 +02:00
H. Peter Anvin
48b1fddbb1 Merge branch 'irq/numa' into x86/mce3
Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa
and modified in x86/mce3; this merge resolves the conflict.

Conflicts:
	arch/x86/kernel/irqinit.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-01 15:25:31 -07:00
Ingo Molnar
ee4c24a5c9 Merge branch 'x86/cpufeature' into irq/numa
Merge reason: irq/numa didnt build because this commit:

  2759c32: x86: don't call read_apic_id if !cpu_has_apic

Had a dependency on x86/cpufeature changes. Pull in that
(small) branch to fix the dependency.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 22:30:01 +02:00
Ingo Molnar
3d58f48ba0 Merge branch 'linus' into irq/numa
Conflicts:
	arch/mips/sibyte/bcm1480/irq.c
	arch/mips/sibyte/sb1250/irq.c

Merge reason: we gathered a few conflicts plus update to latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 21:06:21 +02:00
Ingo Molnar
23db9f430b Merge branch 'linus' into perfcounters/core
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
              based - to pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01 10:01:39 +02:00
Joe Perches
61c8c67e3a acpi-cpufreq: fix printk typo and indentation
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-29 21:26:26 -04:00
Yong Wang
c323d95fa4 perf_counter/x86: Always use NMI for performance-monitoring interrupt
Always use NMI for performance-monitoring interrupt as there could be
racy situations if we switch between irq and nmi mode frequently.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
LKML-Reference: <20090529052835.GA13657@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-29 09:04:58 +02:00
Hidetoshi Seto
98a9c8c3ba x86, mce: trivial clean up for mce-inject.c
Fix for:

WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
+#include <asm/uaccess.h>

WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc
+       if (m.cpu >= NR_CPUS || !cpu_online(m.cpu))

ERROR: trailing whitespace
+/* $

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:16 -07:00
Hidetoshi Seto
61a021a070 x86, mce: trivial clean up for mce_intel_64.c
Fix for:

WARNING: space prohibited between function name and open parenthesis '('
+       for_each_online_cpu (cpu) {

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:16 -07:00
Hidetoshi Seto
34fa1967aa x86, mce: trivial clean up for mce_amd_64.c
Fix for followings:

WARNING: Use #include <linux/percpu.h> instead of <asm/percpu.h>
+#include <asm/percpu.h>

ERROR: Macros with multiple statements should be enclosed in a do - while
loop
+#define THRESHOLD_ATTR(_name, _mode, _show, _store)                    \
+{                                                                      \
+       .attr   = {.name = __stringify(_name), .mode = _mode },         \
+       .show   = _show,                                                \
+       .store  = _store,                                               \
+};

WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(),
num_possible_cpus(), for_each_possible_cpu(), etc
+       if (cpu >= NR_CPUS)

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:16 -07:00
Hidetoshi Seto
14a02530e2 x86, mce: trivial clean up for mce.c
This fixs following checkpatch warnings:

WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
+#include <asm/uaccess.h>

WARNING: Use #include <linux/smp.h> instead of <asm/smp.h>
+#include <asm/smp.h>

WARNING: line over 80 characters
+                               set_bit(MCE_OVERFLOW, (unsigned long *)&mcelog.flags);

WARNING: braces {} are not necessary for any arm of this statement
+       if (mce_notify_user()) {
[...]
+       } else {
[...]

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:16 -07:00
Hidetoshi Seto
cc3aec52ab x86, mce: trivial clean up for therm_throt.c
This patch removes following checkpatch warning:

WARNING: Use #include <linux/cpu.h> instead of <asm/cpu.h>
+#include <asm/cpu.h>

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Hidetoshi Seto
9319cec8c1 x86, mce: use strict_strtoull
Use strict_strtoull instead of simple_strtoull.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Andi Kleen
b170204ddb x86, mce: drop BKL in mce_open
BKL is not needed for anything in mce_open because it has
an own spinlock. Remove it.

[ Impact: cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Andi Kleen
32561696c2 x86, mce: rename and align out2 label
There's only a single out path in do_machine_check now, so rename the
label from out2 to out.  Also align it at the first column.

[ Impact: minor cleanup, no functional changes ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Thomas Gleixner
8be9110569 x86, mce: remove mce_init unused argument
Remove unused mce_init argument.

[ Impact: cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Andi Kleen
fc016a49c2 x86, mce: remove unused mce_events variable
Remove unused mce_events static variable.

[ Impact: cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Andi Kleen
b56f642d2b x86, mce: use extended sysattrs for the check_interval attribute.
Instead of using own callbacks use the generic ones provided by
the sysdev later.

This finally allows to get rid of the ugly ACCESSOR macros. Should
also save some text size.

[ Impact: cleanup ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:15 -07:00
Andi Kleen
88921be302 x86, mce: synchronize core after machine check handling
The example code in the IA32 SDM recommends to synchronize the CPU
after machine check handling. So do that here.

[ Impact: Spec compliance ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:14 -07:00
H. Peter Anvin
5706001aac x86, mce: fix comment style in mce-inject.c
Fix style of winged comment in mce-inject.c.

[ Impact: comment only ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:14 -07:00
H. Peter Anvin
a1ff41bfc1 x86, mce: add comment about mce_chrdev_ops being writable
Add a comment explaining that mce_chrdev_ops is intentionally
writable.

[ Impact: comment only ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:14 -07:00
Andi Kleen
ea149b36c7 x86, mce: add basic error injection infrastructure
Allow user programs to write mce records into /dev/mcelog. When they do
that a fake machine check is triggered to test the machine check code.

This uses the MCE MSR wrappers added earlier.

The implementation is straight forward. There is a struct mce record
per CPU and the MCE MSR accesses get data from there if there is valid
data injected there. This allows to test the machine check code
relatively realistically because only the lowest layer of hardware
access is intercepted.

The test suite and injector are available at
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:14 -07:00
Andi Kleen
5f8c1a54ca x86, mce: add MSR read wrappers for easier error injection
This will be used by future patches to allow machine check error injection.
Right now it's a nop, except for adding some wrappers around the MSR reads.

This is early in the sequence to avoid too many conflicts.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:14 -07:00
Andi Kleen
7856f6cce4 x86, mce: enable MCE_INTEL for 32bit new MCE
Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:13 -07:00
Andi Kleen
4efc0670ba x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.

Use the 64bit code for 32bit too.

This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit.  Back then this ran into some
trouble with K7s and was reverted.

I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.

But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.

I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.

The new code is default y for more coverage.

Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.

This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.

The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5.  I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.

Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:13 -07:00
Andi Kleen
d896a940ef x86, mce: remove oops_begin() use in 64bit machine check
First 32bit doesn't have oops_begin, so it's a barrier of using
this code on 32bit.

On closer examination it turns out oops_begin is not
a good idea in a machine check panic anyways. All oops_begin
does it so check for recursive/parallel oopses and implement the
"wait on oops" heuristic. But there's actually no good reason
to lock machine checks against oopses or prevent them
from recursion. Also "wait on oops" does not really make
sense for a machine check too.

Replace it with a manual bust_spinlocks/console_verbose.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:13 -07:00
Andi Kleen
8e97aef5f4 x86, mce: remove machine check handler idle notify on 64bit
i386 has no idle notifiers, but the 64bit machine check
code uses them to wake up mcelog from a fatal machine check
exception.

For corrected machine checks found by the poller or
threshold interrupts going through an idle notifier is not needed
because the wake_up can is just done directly and doesn't
need the idle notifier. It is only needed for logging
exceptions.

To be honest I never liked the idle notifier even though I signed
off on it. On closer investigation the code actually turned out
to be nearly. Right now machine check exceptions on x86 are always
unrecoverable (lead to panic due to PCC), which means we never execute
the idle notifier path.

The only exception is the somewhat weird tolerant==3 case, which
ignores PCC. I'll fix this in a future patch in a much cleaner way.

So remove the "mcelog wakeup through idle notifier" code
from 64bit.

This allows to compile the 64bit machine check handler on 32bit
which doesn't have idle notifiers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
d7c3c9a609 x86, mce: move mce_disabled option into common 32bit/64bit code
It's the same function, so let's share it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
04b2b1a4df x86, mce: rename 64bit mce_dont_init to mce_disabled
Give it the same name as on 32bit. This makes further merging easier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
5d7279268b x86, mce: use a call vector to call the 64bit mce handler
Allows to call different machine check handlers from the low
level machine check entry vector.

This is needed for later when it will be used for 32bit too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
2e6f694fde x86, mce: port K7 bank 0 quirk to 64bit mce code
Various K7 have broken bank 0s. Don't enable it by default

Port from the 32bit code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
06b7a7a5ec x86, mce: implement the PPro bank 0 quirk in the 64bit machine check code
Quoting the comment:

* SDM documents that on family 6 bank 0 should not be written
* because it aliases to another special BIOS controlled
* register.
* But it's not aliased anymore on model 0x1a+
* Don't ignore bank 0 completely because there could be a valid
* event later, merely don't write CTL0.

This is mostly a port on the 32bit code, except that 32bit
always didn't write it and didn't have the 0x1a heuristic. I checked
with the CPU designers that the quirk is not required starting with
this model.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Andi Kleen
3cde5c8c83 x86, mce: initial steps to make 64bit mce code 32bit clean
Replace unsigned long with u64s if they need to contain 64bit values.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Thomas Gleixner
01c6680a54 x86, mce: Cleanup MCG definitions
Decode more magic constants and turn them into symbols.

[ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:12 -07:00
Thomas Gleixner
ba2d0f2b0c x86, mce: Cleanup symbols in intel thermal codes
Decode magic constants and turn them into symbols.

[ Cleanup to use symbols already exists - HS ]

[ Impact: cleanup ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Ingo Molnar
b659294b77 x86, mce: print number of MCE banks
The number of MCE banks supported by a CPU is a useful number to know,
so print it out during CPU initialization.

[ Impact: add printout ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Ingo Molnar
cb491fca55 x86, mce: Rename sysfs variables
Shorten variable names. This also compacts the code a bit.

	device_mce		=> mce_dev
	mce_device_initialized	=> mce_dev_initialized
	mce_attribute		=> mce_attrs

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Ingo Molnar
dba3725d44 x86, mce: unify
move mce_64.c => mce.c and glue it up in the Makefile.
Remove mce_32.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Ingo Molnar
711c2e481c x86, mce: unify, prepare for 32-bit v2
Prepare the 64-bit mce_64.c code side to be built on 32-bit.

[ includes ifdef relocation by Andi Kleen ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@firstfloor.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Ingo Molnar
a988d334ae x86, mce: unify, prepare codes
Move current 32-bit mce_32.c code into mce_64.c.

[ Remove unused artifact stop/restart_mce pointed by Andi Kleen ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@firstfloor.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:11 -07:00
Thomas Gleixner
a65d086235 x86, mce: unify Intel thermal init
Mechanic unification. No change in code.

[ Impact: cleanup, 32-bit / 64-bit unification ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Thomas Gleixner
6cc6f3ebd1 x86, mce: unify Intel thermal init, prepare
Prepare for unification, make two intel_init_thermal equal.

[ Impact: cleanup ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
1cb2a8e176 x86, mce: clean up mce_amd_64.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
cb6f3c155b x86, mce: clean up therm_throt.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
bdbfbdd5e8 x86, mce: clean up non-fatal.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
91425084f7 x86, mce: clean up winchip.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
efee4ca809 x86, mce: clean up k7.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:10 -07:00
Ingo Molnar
ea2566ff80 x86, mce: clean up p6.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Ingo Molnar
ed8bc7ed9a x86, mce: clean up p5.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Ingo Molnar
c5aaf0e070 x86, mce: clean up p4.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Ingo Molnar
3b58dfd04b x86, mce: clean up mce_32.c
Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Ingo Molnar
e9eee03e99 x86, mce: clean up mce_64.c
This file has been modified many times along the years, by multiple
authors, so the general style and structure has diverged in a number
of areas making this file hard to read.

So fix the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Hidetoshi Seto
13503fa913 x86, mce: Cleanup param parser
- Fix the comment formatting.

- The error path does not return 0, and printk lacks level and "\n".

- Move __setup("nomce") next to mcheck_disable().

- Improve readability etc.

[ Impact: cleanup ]

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <49CB3F38.7090703@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28 09:24:09 -07:00
Joerg Roedel
83cce2b69e Merge branches 'amd-iommu/fixes', 'amd-iommu/debug', 'amd-iommu/suspend-resume' and 'amd-iommu/extended-allocator' into amd-iommu/2.6.31
Conflicts:
	arch/x86/kernel/amd_iommu.c
	arch/x86/kernel/amd_iommu_init.c
2009-05-28 18:23:56 +02:00
Joerg Roedel
47bccd6bb2 amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS
This will test the automatic aperture enlargement code. This is
important because only very few devices will ever trigger this code
path. So force it under CONFIG_IOMMU_STRESS.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:18:33 +02:00
Joerg Roedel
f5e9705c64 amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS
This forces testing of on-demand page table allocation code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:18:08 +02:00
Joerg Roedel
fe16f088a8 amd-iommu: disable round-robin allocator for CONFIG_IOMMU_STRESS
Disabling the round-robin allocator results in reusing the same
dma-addresses again very fast. This is a good test if the iotlb flushing
is working correctly.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:17:13 +02:00
Joerg Roedel
d9cfed9254 amd-iommu: remove amd_iommu_size kernel parameter
This parameter is not longer necessary when aperture increases
dynamically.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:16:49 +02:00
Joerg Roedel
11b83888ae amd-iommu: enlarge the aperture dynamically
By dynamically increasing the aperture the extended allocator is now
ready for use.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:15:57 +02:00
Joerg Roedel
00cd122ae5 amd-iommu: handle exlusion ranges and unity mappings in alloc_new_range
This patch makes sure no reserved addresses are allocated in an dma_ops
domain when the aperture is increased dynamically.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:15:19 +02:00
Joerg Roedel
9cabe89b99 amd-iommu: move aperture_range allocation code to seperate function
This patch prepares the dynamic increasement of dma_ops domain
apertures.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:14:35 +02:00
Joerg Roedel
803b8cb4d9 amd-iommu: change dma_dom->next_bit to dma_dom->next_address
Simplify the code a little bit by using the same unit for all address
space related state in the dma_ops domain structure.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:14:26 +02:00
Joerg Roedel
384de72910 amd-iommu: make address allocator aware of multiple aperture ranges
This patch changes the AMD IOMMU address allocator to allow up to 32
aperture ranges per dma_ops domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:14:15 +02:00
Joerg Roedel
53812c115c amd-iommu: handle page table allocation failures in dma_ops code
The code will be required when the aperture size increases dynamically
in the extended address allocator.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:13:43 +02:00
Joerg Roedel
8bda3092bc amd-iommu: move page table allocation code to seperate function
This patch makes page table allocation usable for dma_ops code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:13:20 +02:00
Joerg Roedel
c3239567a2 amd-iommu: introduce aperture_range structure
This is a preperation for extended address allocator.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:12:52 +02:00
Joerg Roedel
736501ee00 amd-iommu: implement suspend/resume
This patch puts everything together and enables suspend/resume support
in the AMD IOMMU driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:11:39 +02:00
Joerg Roedel
05f92db9f4 amd_iommu: un __init functions required for suspend/resume
This patch makes sure that no function required for suspend/resume of
AMD IOMMU driver is thrown away after boot.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:10:56 +02:00
Joerg Roedel
7d7a110c61 amd-iommu: add function to flush tlb for all devices
This function is required for suspend/resume support with AMD IOMMU
enabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:10:43 +02:00
Joerg Roedel
bfd1be1857 amd-iommu: add function to flush tlb for all domains
This function is required for suspend/resume support with AMD IOMMU
enabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:10:12 +02:00
Joerg Roedel
92ac4320af amd-iommu: add function to disable all iommus
This function is required for suspend/resume support with AMD IOMMU
enabled.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:09:26 +02:00
Joerg Roedel
d91cecdd79 amd-iommu: remove support for msi-x
Current hardware uses msi instead of msi-x so this code it not necessary
and can not be tested. The best thing is to drop this code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:09:18 +02:00
Joerg Roedel
fab6afa309 amd-iommu: drop pointless iommu-loop in msi setup code
It is not necessary to loop again over all IOMMUs in this code. So drop
the loop.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:09:08 +02:00
Joerg Roedel
58492e1288 amd-iommu: consolidate hardware initialization to one function
This patch restructures the AMD IOMMU initialization code to initialize
all hardware registers with one single function call.
This is helpful for suspend/resume support.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:08:58 +02:00
Joerg Roedel
3bd221724a amd-iommu: introduce for_each_iommu* macros
This patch introduces the for_each_iommu and for_each_iommu_safe macros
to simplify the developers life when having to iterate over all AMD
IOMMUs in the system.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:08:50 +02:00
Chris Wright
c1eee67b2d amd iommu: properly detach from protection domain on ->remove
Some drivers may use the dma api during ->remove which will
cause a protection domain to get reattached to a device.  Delay the
detach until after the driver is completely unbound.

[ joro: added a little merge helper ]

[ Impact: fix too early device<->domain removal ]

Signed-off-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:06:54 +02:00
Joerg Roedel
0bc252f430 amd-iommu: make sure only ivmd entries are parsed
The bug never triggered. But it should be fixed to protect against
broken ACPI tables in the future.

[ Impact: protect against broken ivrs acpi table ]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:06:47 +02:00
Neil Turton
7455aab1f9 amd-iommu: fix the handling of device aliases in the AMD IOMMU driver.
The devid parameter to set_dev_entry_from_acpi is the requester ID
rather than the device ID since it is used to index the IOMMU device
table.  The handling of IVHD_DEV_ALIAS used to pass the device ID.
This patch fixes it to pass the requester ID.

[ Impact: fix setting the wrong req-id in acpi-table parsing ]

Signed-off-by: Neil Turton <nturton@solarflare.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:06:38 +02:00
Neil Turton
421f909c80 amd-iommu: fix an off-by-one error in the AMD IOMMU driver.
The variable amd_iommu_last_bdf holds the maximum bdf of any device
controlled by an IOMMU, so the number of device entries needed is
amd_iommu_last_bdf+1.  The function tbl_size used amd_iommu_last_bdf
instead.  This would be a problem if the last device were a large
enough power of 2.

[ Impact: fix amd_iommu_last_bdf off-by-one error ]

Signed-off-by: Neil Turton <nturton@solarflare.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 18:06:27 +02:00
Joerg Roedel
2e8b569614 amd-iommu: disable device isolation with CONFIG_IOMMU_STRESS
With device isolation disabled we can test better for race conditions in
dma_ops related code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:56:57 +02:00
Joerg Roedel
b3b99ef8b4 amd-iommu: move protection domain printk to dump code
This information is only helpful for debugging. Don't print it anymore
unless explicitly requested.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:55:08 +02:00
Joerg Roedel
02acc43a29 amd-iommu: print ivmd information to dmesg when requested
Add information about device memory mapping requirements for the IOMMU
as described in the IVRS ACPI table to the kernel log if amd_iommu_dump
was specified on the kernel command line.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:53:30 +02:00
Joerg Roedel
42a698f40a amd-iommu: print ivhd information to dmesg when requested
Add information about devices belonging to an IOMMU as described in the
IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the
kernel command line.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:52:04 +02:00
Joerg Roedel
9c72041f71 amd-iommu: add dump for iommus described in ivrs table
Add information about IOMMU devices described in the IVRS ACPI table to
the kernel log if amd_iommu_dump was specified on the kernel command
line.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:50:56 +02:00
Joerg Roedel
fefda117dd amd-iommu: add amd_iommu_dump parameter
This kernel parameter will be useful to get some AMD IOMMU related
information in dmesg that is not necessary for the default user but may
be helpful in debug situations.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-05-28 17:49:56 +02:00
Petr Tesarik
7d96fd41ca x86: move rdtsc_barrier() into the TSC vread method
The *fence instructions were moved to vsyscall_64.c by commit
cb9e35dce9.  But this breaks the
vDSO, because vread methods are also called from there.

Besides, the synchronization might be unnecessary for other
time sources than TSC.

[ Impact: fix potential time warp in VDSO ]

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
2009-05-28 14:15:54 +02:00
Pallipadi, Venkatesh
ee1ca48fae ACPI: Disable ARB_DISABLE on platforms where it is not needed
ARB_DISABLE is a NOP on all of the recent Intel platforms.

For such platforms, reduce contention on c3_lock
by skipping the fake ARB_DISABLE.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-27 21:57:30 -04:00
Andreas Herrmann
ca446d0635 [CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates
Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h.

Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but
according to AMD family 11h BKDG this frequency is just a rounded value:

  "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid]
  rounded to the nearest 100 Mhz."

As a consequnce powernow-k8 reports wrong CPU frequency on some systems,
e.g. on Turion X2 Ultra:

  powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82
               processors (2 cpu cores) (version 2.20.00)
  powernow-k8:    0 : pstate 0 (2200 MHz)
  powernow-k8:    1 : pstate 1 (1100 MHz)
  powernow-k8:    2 : pstate 2 (600 MHz)

But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it
correctly:

  #x86info -a |grep Pstate
  ...
  Pstate-0: fid=e, did=0, vid=24 (2200MHz)
  Pstate-1: fid=e, did=1, vid=30 (1100MHz)
  Pstate-2: fid=e, did=2, vid=3c (550MHz) (current)

Solution is to determine the frequency directly from Pstate MSRs instead
of using rounded values from ACPI table.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:51 -04:00
Thomas Renninger
df1829770d [CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data
- Make the message shorter and easier to grep for
- Use printk_once instead of WARN_ONCE (functionality of these was mixed)

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:51 -04:00
Dave Jones
d38e73e8da [CPUFREQ] powernow-k7 build fix when ACPI=n
arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Jarod Wilson
4319503779 [CPUFREQ] add atom family to p4-clockmod
Some atom procs don't do freq scaling (such as the atom 330 on my own
littlefalls2 board). By adding the atom family here, we at least get
the benefit of passive cooling in a thermal emergency. Not sure how
to see that its actually helping any, but the driver does bind and
claim its functioning on my atom 330.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-05-26 12:04:50 -04:00
Ingo Molnar
aaba98018b perf_counter, x86: Make NMI lockups more robust
We have a debug check that detects stuck NMIs and returns with
the PMU disabled in the global ctrl MSR - but i managed to trigger
a situation where this was not enough to deassert the NMI.

So clear/reset the full PMU and keep the disable count balanced when
exiting from here. This way the box produces a debug warning but
stays up and is more debuggable.

[ Impact: in case of PMU related bugs, recover more gracefully ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-26 09:52:03 +02:00
Ingo Molnar
79202ba9ff perf_counter, x86: Fix APIC NMI programming
My Nehalem box locks up in certain situations (with an
always-asserted NMI causing a lockup) if the PMU LVT
entry is programmed between NMI and IRQ mode with a
high frequency.

Standardize exlusively on NMIs instead.

[ Impact: fix lockup ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-26 09:49:28 +02:00
Ingo Molnar
53b441a565 Revert "perf_counter, x86: speed up the scheduling fast-path"
This reverts commit b68f1d2e7a.

It is causing problems (stuck/stuttering profiling) - when mixed
NMI and non-NMI counters are used.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.703093461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25 21:41:28 +02:00
Peter Zijlstra
a78ac32587 perf_counter: Generic per counter interrupt throttle
Introduce a generic per counter interrupt throttle.

This uses the perf_counter_overflow() quick disable to throttle a specific
counter when its going too fast when a pmu->unthrottle() method is provided
which can undo the quick disable.

Power needs to implement both the quick disable and the unthrottle method.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.703093461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25 21:41:12 +02:00
Peter Zijlstra
48e22d56ec perf_counter: x86: Remove interrupt throttle
remove the x86 specific interrupt throttle

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.616671838@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25 21:41:12 +02:00
Peter Zijlstra
ff99be573e perf_counter: x86: Expose INV and EDGE bits
Expose the INV and EDGE bits of the PMU to raw configs.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.494709027@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25 21:41:11 +02:00
Tejun Heo
71c9d8b68b x86: Remove remap percpu allocator for the time being
Remap percpu allocator has subtle bug when combined with page
attribute changing.  Remap percpu allocator aliases PMD pages for the
first chunk and as pageattr doesn't know about the alias it ends up
updating page attributes of the original mapping thus leaving the
alises in inconsistent state which might lead to subtle data
corruption.  Please read the following threads for more information:

  http://thread.gmane.org/gmane.linux.kernel/835783

The following is the proposed fix which teaches pageattr about percpu
aliases.

  http://thread.gmane.org/gmane.linux.kernel/837157

However, the above changes are deemed too pervasive for upstream
inclusion for 2.6.30 release, so this patch essentially disables
the remap allocator for the time being.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4A1A0A27.4050301@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25 05:37:55 +02:00
H. Peter Anvin
ee0736627d Merge branch 'x86/urgent' into x86/setup
Resolved conflicts:
	arch/x86/boot/memory.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-23 16:42:19 -07:00
Suresh Siddha
0c752a9335 x86: introduce noxsave boot parameter
Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor
capabilities. Useful for debugging and working around xsave related issues.

[ Impact: make it possible to debug problems in the field ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-22 13:10:54 -07:00
Paul Mackerras
a63eaf34ae perf_counter: Dynamically allocate tasks' perf_counter_context struct
This replaces the struct perf_counter_context in the task_struct with
a pointer to a dynamically allocated perf_counter_context struct.  The
main reason for doing is this is to allow us to transfer a
perf_counter_context from one task to another when we do lazy PMU
switching in a later patch.

This has a few side-benefits: the task_struct becomes a little smaller,
we save some memory because only tasks that have perf_counters attached
get a perf_counter_context allocated for them, and we can remove the
inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
end up recompiling nearly everything whenever perf_counter.h changes.

The perf_counter_context structures are reference-counted and freed
when the last reference is dropped.  A context can have references
from its task and the counters on its task.  Counters can outlive the
task so it is possible that a context will be freed well after its
task has exited.

Contexts are allocated on fork if the parent had a context, or
otherwise the first time that a per-task counter is created on a task.
In the latter case, we set the context pointer in the task struct
locklessly using an atomic compare-and-exchange operation in case we
raced with some other task in creating a context for the subject task.

This also removes the task pointer from the perf_counter struct.  The
task pointer was not used anywhere and would make it harder to move a
context from one task to another.  Anything that needed to know which
task a counter was attached to was already using counter->ctx->task.

The __perf_counter_init_context function moves up in perf_counter.c
so that it can be called from find_get_context, and now initializes
the refcount, but is otherwise unchanged.

We were potentially calling list_del_counter twice: once from
__perf_counter_exit_task when the task exits and once from
__perf_counter_remove_from_context when the counter's fd gets closed.
This adds a check in list_del_counter so it doesn't do anything if
the counter has already been removed from the lists.

Since perf_counter_task_sched_in doesn't do anything if the task doesn't
have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
__perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
the case where the current task adds the first counter to itself and
thus creates a context for itself.

This also adds similar code to __perf_counter_enable to handle a
similar situation which can arise when the counters have been disabled
using prctl; that also leaves cpuctx->task_ctx = NULL.

[ Impact: refactor counter context management to prepare for new feature ]

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-22 12:18:19 +02:00
Zhang Rui
88dff4936c x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot
x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot,
see:

  http://bugzilla.kernel.org/show_bug.cgi?id=12901

[ Impact: fix hung reboot on certain systems ]

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <1242963350.32574.53.camel@rzhang-dt>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-22 09:11:30 +02:00
Ingo Molnar
34adc80622 perf_counter: Fix context removal deadlock
Disable the PMU globally before removing a counter from a
context. This fixes the following lockup:

[22081.741922] ------------[ cut here ]------------
[22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e()
[22081.755624] Hardware name: X8DTN
[22081.758903] perfcounters: irq loop stuck!
[22081.762985] Modules linked in:
[22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226
[22081.772432] Call Trace:
[22081.774940]  <NMI>  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.781993]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.788368]  [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3
[22081.794649]  [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45
[22081.800696]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.807080]  [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a
[22081.813751]  [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86
[22081.819951]  [<ffffffff8105b250>] ? notify_die+0x2d/0x32
[22081.825392]  [<ffffffff814d1414>] ? do_nmi+0x8e/0x242
[22081.830538]  [<ffffffff814d0f0a>] ? nmi+0x1a/0x20
[22081.835342]  [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a
[22081.842105]  [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41
[22081.848673]  <<EOE>>  [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103
[22081.855512]  [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe
[22081.862926]  [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce
[22081.868909]  [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe
[22081.876382]  [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6
[22081.882955]  [<ffffffff81091b96>] ? perf_release+0x86/0x14c
[22081.888600]  [<ffffffff810c4c84>] ? __fput+0xe7/0x195
[22081.893718]  [<ffffffff810c213e>] ? filp_close+0x5b/0x62
[22081.899107]  [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2
[22081.905031]  [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef
[22081.910360]  [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe
[22081.916292]  [<ffffffff8104898e>] ? do_group_exit+0x67/0x93
[22081.921953]  [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16
[22081.927759]  [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b
[22081.934076] ---[ end trace 3a3936ce3e1b4505 ]---

And could potentially also fix the lockup reported by Marcelo Tosatti.

Also, print more debug info in case of a detected lockup.

[ Impact: fix lockup ]

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20 20:12:54 +02:00
Yinghai Lu
4c6f18fc81 x86, io-apic: Don't mark pin_programmed early
Peter bisected that:

| commit b9c61b7007
| Date:   Wed May 6 10:10:06 2009 -0700
|
|     x86/pci: update pirq_enable_irq() to setup io apic routing
|
|     So we can set io apic routing only when enabling the device irq.

wrecked his opteron box, ata1 interrupts fail to get through.

ata1 is using irq 11:

[    1.451839] sata_svw 0000:01:0e.0: version 2.3
[    1.456333] sata_svw 0000:01:0e.0: PCI INT A -> GSI 11 (level, low) -> IRQ 11
[    1.463639] scsi0 : sata_svw
[    1.466949] scsi1 : sata_svw
[    1.470022] scsi2 : sata_svw
[    1.473090] scsi3 : sata_svw
[    1.476112] ata1: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe000 irq 11
[    1.483490] ata2: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe100 irq 11
[    1.490870] ata3: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe200 irq 11
[    1.498247] ata4: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe300 irq 11

that pin is overlapped with pin with legacy ones.

We should not set bits in pin_programmed here, so that those bit could
be set later via io_apic_set_pci_routing().

[ Impact: fix boot hang on certain systems ]

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <4A119990.9020606@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-19 14:26:51 +02:00
Linus Torvalds
13bba6fda9 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix performance regression caused by paravirt_ops on native kernels
  xen: use header for EXPORT_SYMBOL_GPL
  x86, 32-bit: fix kernel_trap_sp()
  x86: fix percpu_{to,from}_op()
  x86: mtrr: Fix high_width computation when phys-addr is >= 44bit
  x86: Fix false positive section mismatch warnings in the apic code
2009-05-18 09:17:37 -07:00
Linus Torvalds
0130b2d701 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Append prompt in /debug/tracing/README file
  x86/function-graph: fix constraint for recording old return value
2009-05-18 09:15:41 -07:00
Ingo Molnar
1079cac0f4 Merge commit 'v2.6.30-rc6' into tracing/core
Merge reason: we were on an -rc4 base, sync up to -rc6

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 10:15:35 +02:00
Ingo Molnar
b68f1d2e7a perf_counter, x86: speed up the scheduling fast-path
We have to set up the LVT entry only at counter init time, not at
every switch-in time.

There's friction between NMI and non-NMI use here - we'll probably
remove the per counter configurability of it - but until then, dont
slow down things ...

[ Impact: micro-optimization ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 09:37:09 +02:00
Yinghai Lu
f1bdb52388 x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled
Len expressed concern that the update_mptable feature has
side-effects on the ACPI code.

Make it sure explicitly that the code only ever gets called if
the (default disabled) update_mptable boot quirk option is
disabled.

[ Impact: isolate the update_mptable feature from ACPI code more ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A0DC832.5090200@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 09:33:29 +02:00
Yinghai Lu
629e15d245 x86, irq: update_mptable needs pci_routeirq
To get all device irq routing and to save them.

This is basically an implicit pci=routeirq enablement if (and on if)
the update_mptable boot option (which is off by default) has been
specified.

[ Impact: extend the update_mptable boot opion's scope ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <4A0DB7B4.4060702@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 09:33:17 +02:00
Yinghai Lu
35d5a9a614 x86: fix system without memory on node0
Jack found a boot crash on a system which doesn't have memory on node0.

It turns out with recent per_cpu changes, node_number for BSP will always
be 0, and it is not consistent to cpu_to_node() that might set it to a
different (nearer) node already.

aka when numa_set_node() for node0 is called early before per_cpu area is
setup:

two places touched that per_cpu(node_number,):

1. in cpu/common.c::cpu_init() and it is not for BP
| #ifdef CONFIG_NUMA
|        if (cpu != 0 && percpu_read(node_number) == 0 &&
|            cpu_to_node(cpu) != NUMA_NO_NODE)
|                percpu_write(node_number, cpu_to_node(cpu));
| #endif
for BP: traps_init ==> cpu_init
for AP: start_secondary ==> cpu_init

2. cpu/intel.c or amd.c::srat_detect_node via numa_set_node()
for BP: check_bugs ==> identify_boot_cpu ==> identify_cpu()
	 that is rather later before numa_node_id() is used for BP...
for AP: start_secondary => smp_callin => smp_store_cpu_info() =>
	=> identify_secondary_cpu => identify_cpu()

so try to set that for BP earlier in setup_per_cpu_areas(), and
don't bother to set that for APs there (it will be updated later
and will be used later)

(and don't mess the 0 before the copying BP per_cpu data to APs)

[ Impact: fix boot crash on memoryless node-0 ]

Reported-and-tested-by: Jack Steiner <steiner@sgi.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4A0C4A02.7050401@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 09:27:09 +02:00
Ingo Molnar
b286e21868 Merge commit 'v2.6.30-rc6' into x86/mm
Merge reason: sync up to -rc6 which has changes to mm/ which we are
              going to touch in the commits to follow as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 09:12:51 +02:00
Yinghai Lu
2759c3287d x86: don't call read_apic_id if !cpu_has_apic
should not call that if apic is disabled.

[ Impact: fix crash on certain UP configs ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
LKML-Reference: <4A09CCBB.2000306@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 08:43:25 +02:00
Yinghai Lu
e5198075c6 x86, apic: introduce io_apic_irq_attr
according to Ingo, io_apic irq-setup related functions have too many
parameters with a repetitive signature.

So reduce related funcs to get less params by passing a pointer
to a newly defined io_apic_irq_attr structure.

v2: io_apic_irq ==> irq_attr
    triggering ==> trigger

v3: add set_io_apic_irq_attr

[ Impact: cleanup ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A08ACD3.2070401@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 08:38:55 +02:00
Ingo Molnar
dc3f81b129 Merge commit 'v2.6.30-rc6' into perfcounters/core
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
              to get the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 07:37:49 +02:00
Ingo Molnar
d2517a49d5 perf_counter, x86: fix zero irq_period counters
The quirk to irq_period unearthed an unrobustness we had in the
hw_counter initialization sequence: we left irq_period at 0, which
was then quirked up to 2 ... which then generated a _lot_ of
interrupts during 'perf stat' runs, slowed them down and skewed
the counter results in general.

Initialize irq_period to the maximum instead.

[ Impact: fix perf stat results ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-17 12:27:37 +02:00
Jeremy Fitzhardinge
b4ecc12699 x86: Fix performance regression caused by paravirt_ops on native kernels
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

 | commit 8efcbab674
 | Date:   Mon Jul 7 12:07:51 2008 -0700
 |
 |     paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test).  The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons).  This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions.  But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

		nopv		Pv-nospin	Pv-spin
CPU cycles	100.00%		99.89%		102.18%
instructions	100.00%		100.10%		100.15%
CPI		100.00%		99.79%		102.03%
cache ref	100.00%		100.84%		100.28%
cache miss	100.00%		90.47%		88.56%
cache miss rate	100.00%		89.72%		88.31%
branches	100.00%		99.93%		100.04%
branch miss	100.00%		103.66%		107.72%
branch miss rt	100.00%		103.73%		107.67%
wallclock	100.00%		99.90%		102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates.  Not too surprising, but it suggests that
the non-pvops kernel is over-inlined.  On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls.  For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq  *0xffffffff805a5b30
<_spin_lock+22>:	retq

The indirect call will get patched to:
<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq <__ticket_spin_lock>
<_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
<_spin_lock+22>:	retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled).  That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case.  The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks.  Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin).  Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]

Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 20:07:42 +02:00
Jaswinder Singh Rajput
52650257ea x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType
Use standard msr-index.h's MSR declaration and no need to declare again.

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput
ba5673ff1f x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000
Use standard msr-index.h's MSR declaration and no need to declare again.

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput
654ac05801 x86, mtrr: remove mtrr MSRs double declaration
Removed MTRR MSR from mtrr/mtrr.h as these are already declared in
msr-index.h and nobody is using them:
 MTRRfix16K_A0000_MSR
 MTRRfix4K_C8000_MSR
 MTRRfix4K_D0000_MSR
 MTRRfix4K_D8000_MSR
 MTRRfix4K_E0000_MSR
 MTRRfix4K_E8000_MSR
 MTRRfix4K_F0000_MSR
 MTRRfix4K_F8000_MSR

Use standard msr-index.h's MSR declaration and no need to declare again

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput
7d9d55e449 x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000
Use standard msr-index.h's MSR declaration and no need to declare again

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:00 -07:00
Jaswinder Singh Rajput
a036c7a358 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000
Use standard msr-index.h's MSR declaration and no need to declare again.

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:00 -07:00
Jaswinder Singh Rajput
d9bcc01d58 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap
Use standard msr-index.h's MSR declaration and no need to declare again.

[ Impact: cleanup, no object code change ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-15 07:49:00 -07:00
Peter Zijlstra
60db5e09c1 perf_counter: frequency based adaptive irq_period
Instead of specifying the irq_period for a counter, provide a target interrupt
frequency and dynamically adapt the irq_period to match this frequency.

[ Impact: new perf-counter attribute/feature ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090515132018.646195868@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 15:26:56 +02:00
Jason Wessel
33ab1979bc kgdb,i386: use address that SP register points to in the exception frame
The treatment of the SP register is different on x86_64 and i386.
This is a regression fix that lived outside the mainline kernel from
2.6.27 to now.  The regression was a result of the original merge
consolidation of the i386 and x86_64 archs to x86.

The incorrectly reported SP on i386 prevented stack tracebacks from
working correctly in gdb.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-05-15 07:56:25 -05:00
Ingo Molnar
9029a5e380 perf_counter: x86: Protect against infinite loops in intel_pmu_handle_irq()
intel_pmu_handle_irq() can lock up in an infinite loop if the hardware
does not allow the acking of irqs. Alas, this happened in testing so
make this robust and emit a warning if it happens in the future.

Also, clean up the IRQ handlers a bit.

[ Impact: improve perfcounter irq/nmi handling robustness ]

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:06 +02:00
Ingo Molnar
1c80f4b598 perf_counter: x86: Disallow interval of 1
On certain CPUs i have observed a stuck PMU if interval was set to
1 and NMIs were used. The PMU had PMC0 set in MSR_CORE_PERF_GLOBAL_STATUS,
but it was not possible to ack it via MSR_CORE_PERF_GLOBAL_OVF_CTRL,
and the NMI loop got stuck infinitely.

[ Impact: fix rare hangs during high perfcounter load ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:05 +02:00
Peter Zijlstra
a4016a79fc perf_counter: x86: Robustify interrupt handling
Two consecutive NMIs could daze and confuse the machine when the
first would handle the overflow of both counters.

[ Impact: fix false-positive syslog messages under multi-session profiling ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:03 +02:00
Peter Zijlstra
9e35ad388b perf_counter: Rework the perf counter disable/enable
The current disable/enable mechanism is:

	token = hw_perf_save_disable();
	...
	/* do bits */
	...
	hw_perf_restore(token);

This works well, provided that the use nests properly. Except we don't.

x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
provide a reference counter disable/enable interface, where the first disable
disables the hardware, and the last enable enables the hardware again.

[ Impact: refactor, simplify the PMU disable/enable logic ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:02 +02:00
Peter Zijlstra
962bf7a66e perf_counter: x86: Fix up the amd NMI/INT throttle
perf_counter_unthrottle() restores throttle_ctrl, buts its never set.
Also, we fail to disable all counters when throttling.

[ Impact: fix rare stuck perf-counters when they are throttled ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:47:01 +02:00
Peter Zijlstra
a026dfecc0 perf_counter: x86: Allow unpriviliged use of NMIs
Apply sysctl_perf_counter_priv to NMIs. Also, fail the counter
creation instead of silently down-grading to regular interrupts.

[ Impact: allow wider perf-counter usage ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:46:57 +02:00
Ingo Molnar
f5a5a2f6e6 perf_counter: x86: Fix throttling
If counters are disabled globally when a perfcounter IRQ/NMI hits,
and if we throttle in that case, we'll promote the '0' value to
the next lapic IRQ and disable all perfcounters at that point,
permanently ...

Fix it.

[ Impact: fix hung perfcounters under load ]

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:46:56 +02:00
Peter Zijlstra
ec3232bdf8 perf_counter: x86: More accurate counter update
Take the counter width into account instead of assuming 32 bits.

In particular Nehalem has 44 bit wide counters, and all
arithmetics should happen on a 44-bit signed integer basis.

[ Impact: fix rare event imprecision, warning message on Nehalem ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15 09:46:54 +02:00
Steven Rostedt
29a679754b x86/stacktrace: return 0 instead of -1 for stack ops
If we return -1 in the ops->stack for the stacktrace saving, we end up
breaking out of the loop if the stack we are tracing is in the exception
stack. This causes traces like:

          <idle>-0     [002] 34263.745825: raise_softirq_irqoff <-__blk_complete_request
          <idle>-0     [002] 34263.745826:
 <= 0
 <= 0
 <= 0
 <= 0
 <= 0
 <= 0
 <= 0

By returning "0" instead, the irq stack is saved as well, and we see:

          <idle>-0     [003]   883.280992: raise_softirq_irqoff <-__hrtimer_star
t_range_ns
          <idle>-0     [003]   883.280992:
 <= hrtimer_start_range_ns
 <= tick_nohz_restart_sched_tick
 <= cpu_idle
 <= start_secondary
 <=
 <= 0
 <= 0

[ Impact: record stacks from interrupts ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-14 23:19:09 -04:00
Steven Rostedt
aa512a27e9 x86/function-graph: fix constraint for recording old return value
After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke.
Investigating, I found that in the asm that replaces the return value,
gcc was using the same register for the old value as it was for the
new value.

	mov	(addr), old
	mov	new, (addr)

But if old and new are the same register, we clobber new with old!
I first thought this was a bug in gcc 4.4.0 and reported it:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132

Andrew Pinski responded (quickly), saying that it was correct gcc behavior
and the code needed to denote old as an "early clobber".

Instead of "=r"(old), we need "=&r"(old).

[Impact: keep function graph tracer from breaking with gcc 4.4.0 ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-13 13:52:19 -04:00
Arun R Bharadwaj
5c333864a6 timers: Identifying the existing pinned timers
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:

The following pinned hrtimers have been identified and marked:
1)sched_rt_period_timer
2)tick_sched_timer
3)stack_trace_timer_fn

[ tglx: fixup the hrtimer pinned mode ]

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-05-13 16:52:42 +02:00
Peter Zijlstra
5bb9efe33e perf_counter: fix print debug irq disable
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
bash/15802 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (sysrq_key_table_lock){?.....},

Don't unconditionally enable interrupts in the perf_counter_print_debug()
path.

[ Impact: fix potential deadlock pointed out by lockdep ]

LKML-Reference: <new-submission>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-05-13 08:17:37 +02:00
Yinghai Lu
4797f6b021 x86: read apic ID in the !acpi_lapic case
Ed found that on 32-bit, boot_cpu_physical_apicid is not read right,
when the mptable is broken.

Interestingly, actually three paths use/set it:

 1. acpi: at that time that is already read from reg
 2. mptable: only read from mptable
 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit

so we could read the apic id for the 2/3 path. We trust the hardware
register more than we trust a BIOS data structure (the mptable).

We can also avoid the double set_fixmap() when acpi_lapic
is used, and also need to move cpu_has_apic earlier and
call apic_disable().

Also when need to update the apic id, we'd better read and
set the apic version as well - so that quirks are applied precisely.

v2: make path 3 with 64bit, use -1 as apic id, so could read it later.
v3: fix whitespace problem pointed out by Ed Swierk
v5: fix boot crash

[ Impact: get correct apic id for bsp other than acpi path ]

Reported-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <49FC85A9.2070702@kernel.org>
[ v4: sanity-check in the ACPI case too ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12 12:22:06 +02:00
Ingo Molnar
6cda3eb62e Merge branch 'x86/apic' into irq/numa
Merge reason: both topics modify the APIC code but were able to do it in
              parallel so far. An upcoming patch generates a conflict so
              merge them to avoid the conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12 12:17:36 +02:00
Amerigo Wang
bf78ad69cd x86: process.c, remove useless headers
<stdarg.h> is not needed by these files, remove them.

[ Impact: cleanup ]

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: akpm@linux-foundation.org
LKML-Reference: <20090512032956.5040.77055.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12 11:26:32 +02:00
Amerigo Wang
9d62dcdfa6 x86: merge process.c a bit
Merge arch_align_stack() and arch_randomize_brk(), since
they are the same.

Tested on x86_64.

[ Impact: cleanup ]

Signed-off-by: Amerigo Wang <amwang@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12 11:13:45 +02:00
Dmitry Adamushko
871b72dd1e x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic
* Solve issues described in 6f66cbc630
  in a way that doesn't resort to set_cpus_allowed();

* in fact, only collect_cpu_info and apply_microcode callbacks
  must run on a target cpu, others will do just fine on any other.
  smp_call_function_single() (as suggested by Ingo) is used to run
  these callbacks on a target cpu.

* cleanup of synchronization logic of the 'microcode_core' part

  The generic 'microcode_core' part guarantees that only a single cpu
  (be it a full-fledged cpu, one of the cores or HT)
  is being updated at any particular moment of time.

  In general, there is no need for any additional sync. mechanism in
  arch-specific parts (the patch removes existing spinlocks).

  See also the "Synchronization" section in microcode_core.c.

* return -EINVAL instead of -1 (which is translated into -EPERM) in
  microcode_write(), reload_cpu() and mc_sysdev_add(). Other suggestions
  for an error code?

* use 'enum ucode_state' as return value of request_microcode_{fw, user}
  to gain more flexibility by distinguishing between real error cases
  and situations when an appropriate ucode was not found (which is not an
  error per-se).

* some minor cleanups

Thanks a lot to Hugh Dickins for review/suggestions/testing!

   Reference: http://marc.info/?l=linux-kernel&m=124025889012541&w=2

[ Impact: refactor and clean up microcode driver locking code ]

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Peter Oruba <peter.oruba@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <1242078507.5560.9.camel@earth>
[ did some more cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
 arch/x86/include/asm/microcode.h  |   25 ++
 arch/x86/kernel/microcode_amd.c   |   58 ++----
 arch/x86/kernel/microcode_core.c  |  326 +++++++++++++++++++++-----------------
 arch/x86/kernel/microcode_intel.c |   92 +++-------
 4 files changed, 261 insertions(+), 240 deletions(-)

(~20 new comment lines)
2009-05-12 10:36:44 +02:00
H. Peter Anvin
5031296c57 x86: add extension fields for bootloader type and version
A long ago, in days of yore, it all began with a god named Thor.
There were vikings and boats and some plans for a Linux kernel
header.  Unfortunately, a single 8-bit field was used for bootloader
type and version.  This has generally worked without *too* much pain,
but we're getting close to flat running out of ID fields.

Add extension fields for both type and version.  The type will be
extended if it the old field is 0xE; the version is a simple MSB
extension.

Keep /proc/sys/kernel/bootloader_type containing
(type << 4) + (ver & 0xf) for backwards compatiblity, but also add
/proc/sys/kernel/bootloader_version which contains the full version
number.

[ Impact: new feature to support more bootloaders ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-11 17:45:06 -07:00
H. Peter Anvin
37ba7ab5e3 x86, boot: make kernel_alignment adjustable; new bzImage fields
Make the kernel_alignment field adjustable; this allows us to set it
to a large value (intended to be 16 MB to avoid ZONE_DMA contention,
memory holes and other weirdness) while a smart bootloader can still
force a loading at a lesser alignment if absolutely necessary.

Also export pref_address (preferred loading address, corresponding to
the link-time address) and init_size, the total amount of linear
memory the kernel will require during initialization.

[ Impact: allows better kernel placement, gives bootloader more info ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-11 17:44:39 -07:00
Alexey Dobriyan
0c23590f00 x86, 64-bit: ifdef out struct thread_struct::ip
struct thread_struct::ip isn't used on x86_64, struct pt_regs::ip is used
instead.

kgdb should be reading 0 always, but I can't check it.

[ Impact: (potentially) reduce thread_struct size on 64-bit ]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: containers@lists.linux-foundation.org
LKML-Reference: <20090503233015.GJ16631@x200.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 16:23:54 +02:00
Cyrill Gorcunov
cec6be6d10 x86: apic: Fixmap apic address even if apic disabled
In case if apic were disabled by boot option
we still need read_apic operation. So fixmap
a fake apic area if needed.

[ Impact: fix boot crash ]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: yinghai@kernel.org
Cc: eswierk@aristanetworks.com
LKML-Reference: <20090511134140.GH4624@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 15:50:58 +02:00
Ingo Molnar
41fb454ebe Merge commit 'v2.6.30-rc5' into core/iommu
Merge reason: core/iommu was on an .30-rc1 base,
              update it to .30-rc5 to refresh.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 14:44:31 +02:00
Andreas Herrmann
97a5271465 x86: display extended apic registers with print_local_APIC and cpu_debug code
Both print_local_APIC (used when apic=debug kernel param is set) and
cpu_debug code missed support for some extended APIC registers that
I'd like to see.

This adds support to show:

 - extended APIC feature register
 - extended APIC control register
 - extended LVT registers

[ Impact: print more debug info ]

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090508162350.GO29045@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 14:37:36 +02:00
Mike Galbraith
8823392360 perf_counter, x86: clean up throttling printk
s/PERFMON/perfcounters for perfcounter interrupt throttling warning.

'perfmon' is the CPU feature name that is Intel-only, while we do
throttling in a generic way.

[ Impact: cleanup ]

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 12:04:30 +02:00
Yinghai Lu
917a015362 x86: mtrr: Fix high_width computation when phys-addr is >= 44bit
found one system where cpu address line is 44bits, mtrr printout
is not right:

 [    0.000000] MTRR variable ranges enabled:
 [    0.000000]   0 base 0   00000000 mask FF0 00000000 write-back
 [    0.000000]   1 base 10  00000000 mask FFF 80000000 write-back
 [    0.000000]   2 base 0   80000000 mask FFF 80000000 uncachable
 [    0.000000]   3 base 0   7F800000 mask FFF FF800000 uncachable

Li Zefan and Frederic pointed out the high_width could be -4 some how.

It turns out when phys_addr is 44bit, size_or_mask will be
ffffffff,00000000 so ffs(size_or_mask) will be 0.

Try to check low 32 bit, to get correct high_width.

Signed-off-by: Yinghai Lu <yinghai@kerne.org>
Also-analyzed-by: Frederic Weisbecker <fweisbec@gmail.com>
Also-analyzed-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A026540.8060504@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 11:40:43 +02:00
Yinghai Lu
4401da6111 x86: read apic ID in the !acpi_lapic case
Ed found that on 32-bit, boot_cpu_physical_apicid is not read right,
when the mptable is broken.

Interestingly, actually three paths use/set it:

 1. acpi: at that time that is already read from reg
 2. mptable: only read from mptable
 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit

so we could read the apic id for the 2/3 path. We trust the hardware
register more than we trust a BIOS data structure (the mptable).

We can also avoid the double set_fixmap() when acpi_lapic
is used, and also need to move cpu_has_apic earlier and
call apic_disable().

Also when need to update the apic id, we'd better read and
set the apic version as well - so that quirks are applied precisely.

v2: make path 3 with 64bit, use -1 as apic id, so could read it later.
v3: fix whitespace problem pointed out by Ed Swierk

[ Impact: get correct apic id for bsp other than acpi path ]

Reported-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <49FC85A9.2070702@kernel.org>
[ v4: sanity-check in the ACPI case too ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 11:29:23 +02:00
Yinghai Lu
80989ce064 x86: clean up and and print out initial max_pfn_mapped
Do this so we can check the range that is mapped before
init_memory_mapping().

To be able to print out meaningful info, we first have to fix
64-bit to have max_pfn_mapped assigned before that call. This
also unifies the code-path a bit.

[ Impact: print more debug info, cleanup ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BF0978.40605@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 11:11:12 +02:00
Yinghai Lu
3e0c373749 x86: clean up and fix setup_clear/force_cpu_cap handling
setup_force_cpu_cap() only have one user (Xen guest code),
but it should not reuse cleared_cpu_cpus, otherwise it
will have problems on SMP.

Need to have a separate cpu_cpus_set array too, for forced-on
flags, beyond the forced-off flags.

Also need to setup handling before all cpus caps are combined.

[ Impact: fix the forced-set CPU feature flag logic ]

Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:57:24 +02:00
Yinghai Lu
61fe91e131 x86: apic: Check rev 3 fadt correctly for physical_apic bit
Impact: fix fadt version checking

FADT2_REVISION_ID has value 3 aka rev 3 FADT. So need to use >= instead
of >, as other places in the code do.

[ Impact: extend scope of APIC boot quirk ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:52:40 +02:00
Yinghai Lu
b9c61b7007 x86/pci: update pirq_enable_irq() to setup io apic routing
So we can set io apic routing only when enabling the device irq.

This is advantageous for IRQ descriptor allocation affinity: if we set up
the IO-APIC entry later, we have a chance to allocate the IRQ descriptor
later and know which device it is on and can set affinity accordingly.

[ Impact: standardize/enhance irq-enabling sequence for mptable irqs ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A01C46E.8000501@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:35:10 +02:00
Yinghai Lu
5ef2183768 x86/acpi: move setup io apic routing out of CONFIG_ACPI scope
So we could set io apic routing when ACPI is not enabled.

[ Impact: prepare for new functionality ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A01C422.5070400@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:35:09 +02:00
Yinghai Lu
e20c06fd69 x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector()
To prepare those params for pcibios_irq_enable() to call setup_io_apic_routing().

[ Impact: extend function call API to prepare for new functionality ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A01C406.2040303@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:35:09 +02:00
Yinghai Lu
bdfe8ac153 x86/acpi: move pin_programmed bit map to io_apic.c
Prepare to call setup_io_apic_routing() in pcibios_irq_enable()
also remove not needed member apic_id.

[ Impact: clean up, prepare for future change ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <4A01C3DD.3050104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11 10:35:08 +02:00