Conflicts:
arch/x86/kernel/apic/io_apic.c
Merge reason: non-trivial interaction between ongoing work in io_apic.c
and the NUMA migration feature in the irq tree.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move_irq_desc() will try to move irq_desc to the home node if
the allocated one is not correct, in create_irq_nr().
( This can happen on devices that are on different nodes that
are using MSI, when drivers are loaded and unloaded randomly. )
v2: fix non-smp build
v3: add NUMA_IRQ_DESC to eliminate #ifdefs
[ Impact: improve irq descriptor locality on NUMA systems ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F95EAE.2050903@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we don't have any perf-counters active, don't act like we know
what the NMI is for.
[ Impact: fix hard hang with nmi_watchdog=2 ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090501102533.109867793@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is necessary to avoid the conflict of syscall numbers.
Conflicts:
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
Fixes up the borked syscall numbers of perfcounters versus
preadv/pwritev as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch moves the max_pfn_mapped and max_low_pfn_mapped global
variables to kernel/setup.c where they're initialized.
[ Impact: cleanup ]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
LKML-Reference: <1240923649.1982.21.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Standardize on explicitly mentioning '_mask' in fields that
are not plain flags but masks. This avoids typos like:
if (cpuc->used)
(which could easily slip through review unnoticed), while if a
typo looks like this:
if (cpuc->used_mask)
it might get noticed during review.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1241016956-24648-1-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The mask argument is unused and thus can be removed.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-29-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function fixed_mode_idx() is used generically. Now it checks the
num_counters_fixed value instead of the vendor to decide if fixed
counters are present.
[ Impact: generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-28-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In x86 pmus the allowed counter period to programm differs. This
introduces a max_period value and allows the generic implementation
for all models to check the max period.
[ Impact: generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-27-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To check on AMD cpus if a counter overflows, the upper bit of the raw
counter value must be checked. This value is already internally
available in x86_perf_counter_update(). Now, the value is returned so
that it can be used directly to check for overflows.
[ Impact: micro-optimization ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-26-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch implements the interrupt handler for AMD performance
counters. In difference to the Intel pmu, there is no single status
register and also there are no fixed counters. This makes the handler
very different and it is useful to make the handler vendor
specific. To check if a counter is overflowed the upper bit of the
counter is checked. Only counters where the active bit is set are
checked.
With this patch throttling is enabled for AMD performance counters.
This patch also reenables Linux performance counters on AMD cpus.
[ Impact: re-enable perfcounters on AMD CPUs ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-25-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some functions are only called if the pmu was proper initialized. That
initalization checks can be removed. The way to check initialization
changed too. Now, the pointer to the interrupt handler is checked. If
it exists the pmu is initialized. This also removes a static variable
and uses struct x86_pmu as only data source for the check.
[ Impact: simplify code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-24-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As for the enable function, this patch reworks the disable functions
and introduces x86_pmu_disable_counter(). The internal function i/f in
struct x86_pmu changed too.
[ Impact: refactor and generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-23-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is vendor specific code in generic x86 code, and there is vendor
specific code that could be generic. This patch introduces
x86_pmu_enable_counter() for x86 generic code. Fixed counter code for
Intel is moved to Intel only functions. In the end, checks and calls
via function pointers were reduced to the necessary. Also, the
internal function i/f changed.
[ Impact: refactor and generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-22-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The type of counter index is sometimes implemented as unsigned
int. This patch changes this to have a consistent usage of int.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-21-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpuc->active will now be used to indicate an enabled counter which
implies also valid pointers of cpuc->counters[]. In contrast,
cpuc->used only locks the counter, but it can be still uninitialized.
[ Impact: refactor and generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-20-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is to have a consistent naming scheme with cpuc->used.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-19-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the use of the version variable generic. Also, some debug
messages have been generalized.
[ Impact: refactor and generalize code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-17-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of using a pointer to reference to the x86 pmu we now have one
single data structure that is initialized at the beginning. This saves
the pointer access when using this memory.
[ Impact: micro-optimization ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-15-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds an error handler and changes initialization of struct
x86_pmu. No functional changes. Needed for follow-on patches.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-14-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This function is Intel only and not necessary for AMD cpus.
[ Impact: simplify code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-11-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This function is Intel only and not necessary for AMD cpus.
[ Impact: simplify code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-10-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This separates the perfcounter interrupt handler for AMD and Intel
cpus. The AMD interrupt handler implementation is a follow-on patch.
[ Impact: refactor and clean up code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-9-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames struct pmc_x86_ops into struct x86_pmu. It
introduces a structure to describe an x86 model specific pmu
(performance monitoring unit). It may contain ops and data. The new
name of the structure fits better, is shorter, and thus better to
handle. Where it was appropriate, names of function and variable have
been changed too.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-8-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch renames struct hw_perf_counter_ops into struct pmu. It
introduces a structure to describe a cpu specific pmu (performance
monitoring unit). It may contain ops and data. The new name of the
structure fits better, is shorter, and thus better to handle. Where it
was appropriate, names of function and variable have been changed too.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Per-cpu variables needn't to be protected with cpu barriers
(smp_wmb()). Protection is only needed for preemption on the same cpu
(rescheduling or the nmi handler). This can be done using a compiler
barrier only.
[ Impact: micro-optimization ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-6-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
MSR reads and writes are expensive. This patch adds checks to avoid
its usage where possible.
[ Impact: micro-optimization on AMD CPUs ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-5-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This quits hw counter initialization immediately if no cpu is
detected.
[ Impact: cleanup ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1241002046-8832-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
X86_FEATURE_ARCH_PERFMON is an Intel hardware feature that does not
work on AMD CPUs. The flag is now only used in Intel specific code
(especially initialization).
[ Impact: refactor code ]
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1241002046-8832-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
32 bit:
- increase alignment from 4 to 8 for .parainstructions
- increase alignment from 4 to 8 for .altinstructions
64 bit:
- move ALIGN() outside output section for .altinstructions
None of the above should result in any functional change.
[ Impact: refactor and unify linker script ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-10-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
32-bit:
- Move definition of __init_begin outside output_section
because it covers more than one section
- Move ALIGN() for end-of-section inside .smp_locks output section.
Same effect but the intent is better documented that
we need both start and end aligned.
64-bit:
- Move ALIGN() outside output section in .init.setup
- Deleted unused __smp_alt_* symbols
None of the above should result in any functional change.
[ Impact: refactor and unify linker script ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-9-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For 64 bit the following functional changes are introduced:
- .data.page_aligned has moved
- .data.cacheline_aligned has moved
- .data.read_mostly has moved
- ALIGN() moved out of output section for .data.cacheline_aligned
- ALIGN() moved out of output section for .data.page_aligned
Notice that 32 bit and 64 bit has different location of _edata.
.data_nosave is 32 bit only as 64 bit is special due to PERCPU.
[ Impact: 32-bit: cleanup, 64-bit: use 32-bit linker script ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-7-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
32 bit x86 had a dedicated .text.head output section,
whereas 64 bit had it all in a single output section.
In the unified version the dedicated .text.head output section
was kept to have full control over the head code.
32 bit:
- Moved definition of _stext to the linker script.
The definition is located _after_ .text.page_aligned as this
is what 32 bit did before.
The ALIGN(8) was introduced so we hit the exact same address
(on the tested config) before and after the move.
I assume that it is a bug that _stext did not cover the
.text.page_aligned section - if this is true it can be fixed
in a follow-up patch (and the ugly ALIGN() can be dropped).
[ Impact: 64-bit: cleanup, 32-bit: use the 64-bit linker script ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-5-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
PHDRS are not equal for the two - so
use ifdefs to cover up for that.
On the assumption that they may become equal the ifdef
is inside the PHDRS definiton.
[ Impact: cleanup ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-3-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Beautify vmlinux_32.lds.S:
- Use tabs for indent
- Located curly braces like in C code
- Rearranged a few comments
To see actual differences use "git diff -b" which
ignore 'whitespace' changes.
The beautification is done to prepare a unification
of the _32 and _64 variants of the linker scripts.
[ Impact: cleanup ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1240991249-27117-1-git-send-email-sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Try to get irq_desc on the home node in create_irq_nr().
v2: don't check if we can move it when sparse_irq is not used
v3: use move_irq_des, if that node is not what we want
[ Impact: optimization, make MSI IRQ descriptors more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F6559F.7070005@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make actual use of the device parameter passed down to
io_apic_set_pci_routing() - to have the IRQ descriptor
on the home node of the device.
If no device has been passed down, we assume it's a platform
device and use the boot node ID for the IRQ descriptor.
[ Impact: optimization, make IO-APIC code more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F6557E.3080101@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We want to use dev_to_node() later on, to be aware of the 'home node'
of the GSI in question.
[ Impact: cleanup, prepare the IRQ code to be more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
LKML-Reference: <49F65560.20904@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This simplifies the node awareness of the code. All our allocators
only deal with a NUMA node ID locality not with CPU ids anyway - so
there's no need to maintain (and transform) a CPU id all across the
IRq layer.
v2: keep move_irq_desc related
[ Impact: cleanup, prepare IRQ code to be NUMA-aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
LKML-Reference: <49F65536.2020300@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
according to Ingo, change set_affinity() in irq_chip should return int,
because that way we can handle failure cases in a much cleaner way, in
the genirq layer.
v2: fix two typos
[ Impact: extend API ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <49F654E9.4070809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The original feature of migrating irq_desc dynamic was too fragile
and was causing problems: it caused crashes on systems with lots of
cards with MSI-X when user-space irq-balancer was enabled.
We now have new patches that create irq_desc according to device
numa node. This patch removes the leftover bits of the dynamic balancer.
[ Impact: remove dead code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F654AF.8000808@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We will have systems with 2 and more sockets 8cores/2thread,
but we treat them as multi chassis - while they could have
a stable TSC domain.
Use DMI check instead.
[ Impact: do not turn possibly stable TSCs off incorrectly ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
LKML-Reference: <49F5532A.5000802@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
XAPIC_DEST_* is dupliicated to the one in apicdef.h
[ Impact: cleanup ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49F552D0.5050505@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Beautify vmlinux_64.lds.S:
- Use tabs for indent
- Located curly braces like in C code
- Rearranged a few comments
There is no functional changes in this patch
The beautification is done to prepare a unification
of the _32 and the _64 variants of the linker scripts.
[ Impact: cleanup ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Tim Abbott <tabbott@MIT.EDU>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20090426210742.GA3464@uranus.ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hpet: Stop soliciting hpet=force users on ICH4M
x86: check boundary in setup_node_bootmem()
uv_time: add parameter to uv_read_rtc()
x86: hpet: fix periodic mode programming on AMD 81xx
x86: more than 8 32-bit CPUs requires X86_BIGSMP
x86: avoid theoretical spurious NMI backtraces with CONFIG_CPUMASK_OFFSTACK=y
x86: fix boot crash in NMI watchdog with CONFIG_CPUMASK_OFFSTACK=y and flat APIC
x86-64: fix FPU corruption with signals and preemption
x86/uv: fix for no memory at paddr 0
docs, x86: add nox2apic back to kernel-parameters.txt
x86: mm/numa_32.c calculate_numa_remap_pages should use __init
x86, kbuild: make "make install" not depend on vmlinux
x86/uv: fix init of cpu-less nodes
x86/uv: fix init of memory-less nodes
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/irq: mark NUMA_MIGRATE_IRQ_DESC broken
x86, irq: Remove IRQ_DISABLED check in process context IRQ move
The current mm interface is asymetric. One function allocates a locked
buffer, another function only refunds the memory.
Change this to have two functions for accounting and refunding locked
memory, respectively; and do the actual buffer allocation in ptrace.
[ Impact: refactor BTS buffer allocation code ]
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090424095143.A30265@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The debug store selftest code uses a stack-allocated buffer, which is
not necessarily correctly aligned.
For tests using a buffer to hold a single entry, the buffer that is
passed to ds_request must already be suitably aligned.
Pass a suitably aligned portion of the bigger buffer.
[ Impact: fix hw-branch-tracer self-test failure ]
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: markus.t.metzger@gmail.com
LKML-Reference: <20090424094309.A30145@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/ptrace.c
Merge reason: fix the conflict above, and also pick up the CONFIG_BROKEN
dependency change from upstream so that we can remove it
here.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The HPET in the ICH4M is not documented in the data sheet
because it was not officially validated.
While it is fine for hackers to continue to use "hpet=force"
to enable the hardware that they have, it is not prudent to
solicit additional "hpet=force" users on this hardware.
[ Impact: remove hpet=force syslog message on old-ICH systems ]
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <alpine.LFD.2.00.0904231918510.15843@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The earlier patch to change the poller to a separate function subtly
broke the boot logging logic. This could lead to machine checks
getting logged at boot even when disabled or defaulting to off
on some systems. Fix that.
[ Impact: bug fix - avoid spurious MCE in log ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The polling timer while running per CPU still uses a global next_interval
variable, which lead to some CPUs either polling too fast or too slow.
This was not a serious problem because all errors get picked up eventually,
but it's still better to avoid it. Turn next_interval into a per cpu variable.
v2: Fix check_interval == 0 case (Hidetoshi Seto)
[ Impact: minor bug fix ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
uv_read_rtc() is referenced by read member of struct clocksource clocksource_uv.
In include/linux/clocksource.h, read of struct clocksource is declared as:
cycle_t (*read)(struct clocksource *cs)
This got introduced recently in:
8e19608: clocksource: pass clocksource to read() callback
But arch/x86/kernel/uv_time.c was not properly converted by that pach.
This patch adds a dummy parameter (struct clocksource type) to uv_read_rtc() to
fix the incompatible reference in clocksource_uv, and add a NULL parameter in
all places where uv_read_rtc() gets called.
[ Impact: cleanup, address compiler warning ]
Signed-off-by: Coly Li <coly.li@suse.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
LKML-Reference: <49EF3614.1050806@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dimitri Sivanich <sivanich@sgi.com>
(See http://bugzilla.kernel.org/show_bug.cgi?id=12961)
It partially reverts commit c23e253e67
(x86: hpet: stop HPET_COUNTER when programming periodic mode)
HPET on AMD 81xx chipset needs a second write (with HPET_TN_SETVAL
cleared) to T0_CMP register to set the period in periodic mode.
With this patch HPET_COUNTER is still stopped but not reset when HPET
is programmed in periodic mode. This should help to avoid races when
HPET is programmed in periodic mode and fixes a boot time hang that
I've observed on a machine when using 1000HZ.
[ Impact: fix boot time hang on machines with AMD 81xx chipset ]
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <20090421180037.GA2763@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When interrupt-remapping is enabled, we are relying on
setup_IO_APIC_irqs() to configure remapped entries in the
IO-APIC, which comes little bit later after enabling
interrupt-remapping.
Meanwhile, restoration of old io-apic entries after enabling
interrupt-remapping will not make the interrupts through
io-apic functional anyway.
So remove the unnecessary reinit_intr_remapped_IO_APIC() step.
The longer story:
When interrupt-remapping is enabled, IO-APIC entries need to be
setup in the re-mappable format (pointing to
interrupt-remapping table entries setup by the OS). This
remapping configuration is happening in the same place where we
traditionally configure IO-APIC (i.e., in
setup_IO_APIC_irqs()).
So when we enable interrupt-remapping successfully, there is no
need to restore old io-apic RTE entries before we actually do a
complete configuration shortly in setup_IO_APIC_irqs(). Old
IO-APIC RTE's may be in traditional format (non re-mappable) or
in re-mappable format pointing to interrupt-remapping table
entries setup by BIOS. Restoring both of these will not make
IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for
proper configuration by OS.
So I am removing this unnecessary and broken step.
[ Impact: remove unnecessary/broken IO-APIC setup step ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Weidong Han <weidong.han@intel.com>
Cc: dwmw2@infradead.org
LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pass clocksource pointer to the read() callback for clocksources. This
allows us to share the callback between multiple instances.
[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In theory (though not shown in practice) alloc_cpumask_var() doesn't zero
memory, so CPUs might print an "NMI backtrace for cpu %d" once on boot.
(Bug introduced in fcef8576d8).
[ Impact: avoid theoretical syslog noise in rare configs ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fcef8576d8 converted backtrace_mask to a
cpumask_var_t, and assumed check_nmi_watchdog was called before
nmi_watchdog_tick was ever called. Steven's oops shows I was wrong.
This is something of a bandaid: I'm not sure we *should* be calling
nmi_watchdog_tick before check_nmi_watchdog. Note that gcc eliminates
this test for the CONFIG_CPUMASK_OFFSTACK=n case.
[ Impact: fix boot crash in rare configs ]
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of panic() ignore the "nox2apic" boot option when BIOS
has already enabled x2apic prior to OS handover.
[ Impact: printk warning instead of panic() when BIOS has enabled x2apic already ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: dwmw2@infradead.org
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.425091000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Address the following complier warning:
arch/x86/kernel/apic/io_apic.c:2543: warning: `eoi_ioapic_irq' defined but not used
By moving that function (and eoi_ioapic_irq()) into an existing
#ifdef CONFIG_INTR_REMAP section of the code.
[ Impact: cleanup ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: dwmw2@infradead.org
Cc: Weidong Han <weidong.han@intel.com>
LKML-Reference: <20090420200450.271099000@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Weidong Han <weidong.han@intel.com>
In 64bit signal delivery path, clear_used_math() was happening before saving
the current active FPU state on to the user stack for signal handling. Between
clear_used_math() and the state store on to the user stack, potentially we
can get a page fault for the user address and can block. Infact, while testing
we were hitting the might_fault() in __clear_user() which can do a schedule().
At a later point in time, we will schedule back into this process and
resume the save state (using "xsave/fxsave" instruction) which can lead
to DNA fault. And as used_math was cleared before, we will reinit the FP state
in the DNA fault and continue. This reinit will result in loosing the
FPU state of the process.
Move clear_used_math() to a point after the FPU state has been stored
onto the user stack.
This issue is present from a long time (even before the xsave changes
and the x86 merge). But it can easily be exposed in 2.6.28.x and 2.6.29.x
series because of the __clear_user() in this path, which has an explicit
__cond_resched() leading to a context switch with CONFIG_PREEMPT_VOLUNTARY.
[ Impact: fix FPU state corruption ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Fix endcase where the memory at physical address 0 does not really
exist AND one of the sockets on blade 0 has no active cpus.
The memory that _appears_ to be at physical address 0 is actually
memory that located at a different address but has been remapped by
the chipset so that it appears to be at physical address 0.
When determining the UV pnode, the algorithm for determining the pnode
incorrectly used the relocated physical address instead of the actual
(global) address.
[ Impact: boot failure on partitioned systems ]
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090420132530.GA23156@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take already available policy->cpuinfo.max_freq and get rid of acpi-cpufreq
specific max_freq variable.
This implies that P0 is always the highest frequency which should always
be true as ACPI spec says:
As a result, the zeroth entry describes the highest performance state
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
A caller (do_boot_cpu) already has __cpuinit attribute.
Since HOTPLUG_CPU depends on SMP && HOTPLUG it doesn't
lead to panic at moment.
[ Impact: cleanup ]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090418194528.GD25510@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, when x2apic is not enabled, interrupt remapping
will be enabled in init_dmars(), where it is too late to remap
ioapic interrupts, that is, ioapic interrupts are really in
compatibility mode, not remappable mode.
This patch always enables interrupt remapping before ioapic
setup, it guarantees all interrupts will be remapped when
interrupt remapping is enabled. Thus it doesn't need to set
the compatibility interrupt bit.
[ Impact: refactor intr-remap init sequence, enable fuller remap mode ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because
ack_apic_edge() does more than just ack: it also does irq migration
in the non-interrupt-remapping case. But there is no such need for
interrupt-remapping case, as irq migration is done in the process
context.
Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and
instead should do the local cpu's EOI + directed EOI to the io-apic.
ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR
write for x2apic, and uncached write for non-x2apic.
[ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: allen.m.kay@intel.com
Cc: fenghua.yu@intel.com
LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I hit the check_flags error of lockdep:
WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
[...]
hardirqs last enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
softirqs last enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0
The check_flags warning of lockdep tells me that lockdep thought interrupts
were disabled, but they were really enabled.
The numbers in the above parenthesis show the order of events:
12566: softirqs last enabled: lock_sock_nested
12567: hardirqs last enabled: local_bh_enable
12568: softirqs last disabled: tcp_prequeue_process
12566: hardirqs last disabled: int3
int3 is a breakpoint!
Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
break points into the kernel.
The paranoid_exit of the return of int3 does not account for enabling
interrupts on return to kernel. This code is a bit tricky since it
is also used by the nmi handler (when lockdep is off), and we must be
careful about the swapgs. We can not call kernel code after the swapgs
has been performed.
[ Impact: fix lockdep check_flags warning + self-turn-off ]
Acked-by: Peter Zijlsta <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix an endcase in the UV initialization code for the "UV large system mode"
of apicids. If node zero contains no cpus, cpus on another node will be the
boot cpu. The percpu data that contains the extra apicid bits was not
being initialized early enough.
[ Impact: fix potential boot crash on cpu-less UV nodes ]
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090417142447.GA23759@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix microcode driver newly spewing warnings
x86, PAT: Remove page granularity tracking for vm_insert_pfn maps
x86: disable X86_PTRACE_BTS for now
x86, documentation: kernel-parameters replace X86-32,X86-64 with X86
x86: pci-swiotlb.c swiotlb_dma_ops should be static
x86, PAT: Remove duplicate memtype reserve in devmem mmap
x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()
x86, PAT: Changing memtype to WC ensuring no WB alias
x86, PAT: Handle faults cleanly in set_memory_ APIs
x86, PAT: Change order of cpa and free in set_memory_wb
x86, CPA: Change idmap attribute before ioremap attribute setup
Currently the numa_node attribute for these PCI devices is 0 (it
corresponds to the numa_node for PCI bus 0). This is not a big issue
but incorrect.
This inconsistency can be fixed by reading the node number from CPU
NB function 0.
[ Impact: fill in dev->numa_node information, to optimize DMA allocations ]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: jbarnes@virtuousgeek.org
LKML-Reference: <20090417100746.GG16198@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: UV BAU distribution and payload MMRs
x86: UV: BAU partition-relative distribution map
x86, uv: add Kconfig dependency on NUMA for UV systems
x86: prevent /sys/firmware/sgi_uv from being created on non-uv systems
x86, UV: Fix for nodes with memory and no cpus
x86, UV: system table in bios accessed after unmap
x86: UV BAU messaging timeouts
x86: UV BAU and nodes with no memory
Jeff Garzik reported this WARN_ON() noise:
> Kernel: 2.6.30-rc1-00306-g8371f87
> Hardware: ICH10 x86-64
>
> This is a regression from 2.6.29. Microcode spews the following WARNING
> multiple times during boot:
>
> ------------[ cut here ]------------
> WARNING: at fs/sysfs/group.c:138 sysfs_remove_group+0xeb/0xf0()
> Hardware name: sysfs group ffffffffa0209700 not found for
> kobject 'cpu0'
Keep sysfs files around for cpus even when we failed to locate
microcode for them at the moment of module loading. The appropriate
microcode firmware can become available later on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch correctly sets BAU memory mapped registers to point
to the sending activation descriptor table and target payload table.
The "Broadcast Assist Unit" is used for TLB shootdown in UV.
The memory mapped registers that point to sending and receiving
memory structures contain node numbers.
In one case the __pa() function did not provide the node id of
memory on blade zero in configurations where that id is nonzero.
In another case, it was assumed that memory was allocated on
the local node. That assumption is not true in a configuration
in which the node has no memory.
Tested on the UV hardware simulator.
[ Impact: fix possible runtime crash due to incorrect TLB logic ]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1LuR5Z-0007An-B8@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix crash with many devices
I found this crash:
[ 552.616646] general protection fault: 0403 [#1] SMP
[ 552.620013] last sysfs file:
/sys/devices/pci0000:00/0000:00:02.0/usb1/1-1/1-1:1.0/host13/target13:0:0/13:0:0:0/block/sr0/size
[ 552.620013] CPU 0
[ 552.620013] Modules linked in:
[ 552.620013] Pid: 0, comm: swapper Not tainted 2.6.30-rc1-tip-01931-g8fcafd8-dirty #28 Sun Fire X4440
[ 552.620013] RIP: 0010:[<ffffffff8023bada>] [<ffffffff8023bada>] default_idle+0x7d/0xda
[ 552.620013] RSP: 0018:ffffffff81345e68 EFLAGS: 00010246
[ 552.620013] RAX: 0000000000000000 RBX: ffffffff8133d870 RCX: ffffc20000000000
[ 552.620013] RDX: 00000000001d0620 RSI: ffffffff8023bad8 RDI: ffffffff802a3169
[ 552.620013] RBP: ffffffff81345e98 R08: 0000000000000000 R09: ffffffff812244a0
[ 552.620013] R10: ffffffff81345dc8 R11: 7ebe1b6fa0bcac50 R12: 4ec4ec4ec4ec4ec5
[ 552.620013] R13: ffffffff813a54d0 R14: ffffffff813a7a40 R15: 0000000000000000
[ 552.620013] FS: 00000000006d1880(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000
[ 552.620013] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 552.620013] CR2: 00007fec9d936a50 CR3: 000000007d1a9000 CR4: 00000000000006e0
[ 552.620013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 552.620013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 552.620013] Process swapper (pid: 0, threadinfo ffffffff81344000,task ffffffff812244a0)
[ 552.620013] Stack:
[ 552.620013] 0000000000000000 ffffc20000000000 00000000001d0620 7ebe1b6fa0bcac50
[ 552.620013] ffffffff8133d870 4ec4ec4ec4ec4ec5 ffffffff81345ec8 ffffffff8023bd84
[ 552.620013] 4ec4ec4ec4ec4ec5 ffffffff813a54d0 7ebe1b6fa0bcac50 ffffffff8133d870
[ 552.620013] Call Trace:
[ 552.620013] [<ffffffff8023bd84>] c1e_idle+0x109/0x124
[ 552.620013] [<ffffffff8023314b>] cpu_idle+0xb8/0x101
[ 552.620013] [<ffffffff80c16d6a>] rest_init+0x7e/0x94
[ 552.620013] [<ffffffff81357efc>] start_kernel+0x3dc/0x3fd
[ 552.620013] [<ffffffff813572a9>] x86_64_start_reservations+0xb9/0xd4
[ 552.620013] [<ffffffff813573b2>] x86_64_start_kernel+0xee/0x109
[ 552.620013] Code: 48 8b 04 25 f8 b4 00 00 83 a0 3c e0 ff ff fb 0f ae f0 65 48 8b 04 25 f8 b4 00 00 f6 80 38 e0 ff ff 08 75 09 e8 71 76 06 00 fb f4 <eb> 06 e8 68 76 06 00 fb 65 48 8b 04 25 f8 b4 00 00 83 88 3c e0
[ 552.620013] RIP [<ffffffff8023bada>] default_idle+0x7d/0xda
[ 552.620013] RSP <ffffffff81345e68>
[ 552.828646] ---[ end trace 4cbfc5c01382af7f ]---
Joerg Roedel said
"The 0403 error code means that there was an external interrupt with vector
0x80. Yinghai, my theory is that the kernel on this machine has no 32bit
emulation compiled in, right? In this case the selector points to a zero entry
which may cause the #gpf right after the hlt.
But I have no idea where the external int 0x80 comes from"
it turns out that we could use 0x80 for external device on 64-bit
when 32-bit emulation is disabled.
But we forgot to set the gate for it.
try to set gate for it by checking used_vectors.
Also move apic_intr_init() early to avoid setting
that gate two times.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <49E62DFD.6010904@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It turns out that 'smp_call_function_many()' doesn't work at all like
'smp_call_function_single()', and my change to Andrew's patch to use it
rather than a loop over all CPU's acpi-cpufreq doesn't work.
My bad.
'smp_call_function_many()' has two "features" (aka "documented bugs"):
(a) it needs to be called with preemption disabled, because it uses
smp_processor_id() without guarding the CPU lookup with 'get_cpu()'
and 'put_cpu()' like the 'single' variant does.
(b) even if the current CPU is part of the CPU mask, it won't do the
call on that CPU.
Still, we're better off trying to use 'smp_call_function_many()' than
looping over CPU's, since it at least in theory allows us to use a
broadcast IPI and do it all in parallel. So let's just work around the
silly semantic bugs in that function.
Reported-and-tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IOMMU_LEAK, GART's own feature, dumps the used IOMMU entries when
IOMMU entries is full, which might be useful to find a bad driver that
eats IOMMU entries.
DMA_API_DEBUG provides the similar feature, debug_dma_dump_mappings,
and it's better than GART's IOMMU_LEAK feature. GART's IOMMU_LEAK
feature doesn't say who uses IOMMU entries so it's hard to find a bad
driver.
This patch reimplements the GART's IOMMU_LEAK feature by using
DMA_API_DEBUG.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1239669799-23579-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Revert part of af5c820a31 ("x86: cpumask:
use work_on_cpu in arch/x86/kernel/microcode_core.c")
That change is causing only one Intel CPU's microcode to be updated e.g.
microcode: CPU3 updated from revision 0x9 to 0x17, date = 2005-04-22
where before it announced that also for CPU0 and CPU1 and CPU2.
We cannot use work_on_cpu() in the CONFIG_MICROCODE_OLD_INTERFACE code,
because Intel's request_microcode_user() involves a copy_from_user() from
/sbin/microcode_ctl, which therefore needs to be on that CPU at the time.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enables each partition's BAU distribution bit map
to be partition-relative.
The distribution bitmap had been constructed assuming 0 as the base
node number. That construct would not have allowed a total system of
greater than 256 nodes.
It also corrects an error that occurred when the first blade's nasid
was not zero. That nasid was stored as the base node.
The base node number gets added by hardware to the node numbers implied
in the distribution bitmap, resulting in invalid target nasids.
Tested on the UV hardware simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1Ltl0C-0004Ob-37@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As discussed in the thread here:
http://marc.info/?l=linux-kernel&m=123964468521142&w=2
Eric W. Biederman observed:
> It looks like some additional bugs have slipped in since last I looked.
>
> set_irq_affinity does this:
> ifdef CONFIG_GENERIC_PENDING_IRQ
> if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
> cpumask_copy(desc->affinity, cpumask);
> desc->chip->set_affinity(irq, cpumask);
> } else {
> desc->status |= IRQ_MOVE_PENDING;
> cpumask_copy(desc->pending_mask, cpumask);
> }
> #else
>
> That IRQ_DISABLED case is a software state and as such it has nothing to
> do with how safe it is to move an irq in process context.
[...]
>
> The only reason we migrate MSIs in interrupt context today is that there
> wasn't infrastructure for support migration both in interrupt context
> and outside of it.
Yes. The idea here was to force the MSI migration to happen in process
context. One of the patches in the series did
disable_irq(dev->irq);
irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
enable_irq(dev->irq);
with the above patch adding irq/manage code check for interrupt disabled
and moving the interrupt in process context.
IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
when we eventually submitted the patch upstream. But, looks like I did a
blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.
Below patch fixes this. i.e., revert commit 932775a4ab
and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
in generic code as set_affinity routines are doing it internally.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Li Shaohua" <shaohua.li@intel.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: "lcm@us.ibm.com" <lcm@us.ibm.com>
Cc: suresh.b.siddha@intel.com
LKML-Reference: <20090413222058.GB8211@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_IOMMU_DEBUG has depends on CONFIG_GART_IOMMU:
config IOMMU_DEBUG
bool "Enable IOMMU debugging"
depends on GART_IOMMU && DEBUG_KERNEL
depends on X86_64
So it's not useful to have CONFIG_IOMMU_DEBUG in Calgary IOMMU code,
which does the extra checking of the bitmap space management.
And Calgary uses the iommu helper for the bitmap space management now
so it would be better to have the extra checking feature in the iommu
helper rather than Calgary code (if necessary).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: alexisb@us.ibm.com
LKML-Reference: <20090414120827G.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We ended up incorrectly using '&cur' instead of '&readin' in the
work_on_cpu() -> smp_call_function_single() transformation in commit
01599fca67 ("cpufreq: use
smp_call_function_[single|many]() in acpi-cpufreq.c").
Andrew explains:
"OK, the acpi tree went and had conflicting changes merged into it after
I'd written the patch and it appears that I incorrectly reverted part
of 18b2646fe3 while fixing the resulting
rejects.
Switching it to `readin' looks correct."
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: reduce kernel size a bit, address sparse warning
Addresses the problem pointed out by this sparse warning:
arch/x86/kernel/pci-swiotlb.c:53:20: warning: symbol 'swiotlb_dma_ops' was not declared. Should it be static?
For x86: swiotlb_dma_ops can be static, because it's not used outside
of pci-swiotlb.c
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1239558861.3938.2.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch "introduce imcr_ helpers" introduced good comments, but
also a few new compile warnings. This fixes the function definitions
to have a 'void' return type.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090413153924.GA20287@mailshack.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: add linux kernel support for YMM state
x86: fix wrong section of pat_disable & make it static
x86: Fix section mismatches in mpparse
x86: fix set_fixmap to use phys_addr_t
x86: Document get_user_pages_fast()
x86, intr-remap: fix eoi for interrupt remapping without x2apic
Atttempting to rid us of the problematic work_on_cpu(). Just use
smp_call_fuction_single() here.
This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
due to
commit 6b44003e5c
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Thu Apr 9 09:50:37 2009 -0600
work_on_cpu(): rewrite it to create a kernel thread on demand
It seems that the kernel calls these acpi-cpufreq functions at a quite
high frequency.
Valdis Kletnieks also reports that this causes 70-90 forks per second on
his hardware.
Cc: Valdis.Kletnieks@vt.edu
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
[ Made it use smp_call_function_many() instead of looping over cpu's
with smp_call_function_single() - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup
It's a bit hard to parse by eyes without
them being aligned.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090412165058.924175574@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: refactor, speed up and robustize code
In case if apic was disabled by kernel option
or by hardware limits we can use dummy operations
in apic->write to simplify the ack_APIC_irq() code.
At the lame time the patch fixes the missed EOI in
do_IRQ function (which has place if kernel is compiled
as X86-32 and interrupt without handler happens where
apic was not asked to be disabled via kernel option).
Note that native_apic_write_dummy() consists of
WARN_ON_ONCE to catch any buggy writes on enabled
APICs. Could be removed after some time of testing.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090412165058.724788431@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, robustization
1) guard ack_bad_irq with printk_ratelimit since there is no
guarantee we will not be flooded one day
2) use pr_emerg() helper
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090412165058.277579847@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, no code changed
- syscalls.h update declarations due to unifications
- irq.c declare smp_generic_interrupt() before it gets used
- process.c declare sys_fork() and sys_vfork() before they get used
- tsc.c rename tsc_khz shadowed variable
- apic/probe_32.c declare apic_default before it gets used
- apic/nmi.c prev_nmi_count should be unsigned
- apic/io_apic.c declare smp_irq_move_cleanup_interrupt() before it gets used
- mm/init.c declare direct_gbpages and free_initrd_mem before they get used
Signed-off-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: save/restore Intel-AVX state properly between tasks
Intel Advanced Vector Extensions (AVX) introduce 256-bit vector processing
capability. More about AVX at http://software.intel.com/sites/avx
Add OS support for YMM state management using xsave/xrstor infrastructure
to support AVX.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1239402084.27006.8057.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce kernel size a bit, address sparse warning
Addresses the problem pointed out by this sparse warning:
arch/x86/kernel/apic/x2apic_cluster.c:13:1: warning: symbol 'per_cpu__x86_cpu_to_logical_apicid' was not declared. Should it be static?
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1239434726.4418.24.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix section mismatch
In arch/x86/kernel/mpparse.c, smp_reserve_bootmem() has been called
and also refers to a function which is in .init section. Thus causes
the first warning. And check_irq_src() also requires an __init,
because it refers to an .init section.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <b9df5fa10904102004g51265d9axc8d07278bfdb6ba0@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To make the topic merge life easier for tip:perfcounters/core,
include two (inactive in this topic) IRQ vector initializations
here.
Also fix build bug - missing kprobes.h inclusion.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
We can remove some #ifdefs if we define IA32_SYSCALL_VECTOR on 32-bit.
Reviewed-by Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
In preparation for unifying irqinit_{32,64}.c, make
x86_quirk_pre_intr_init() local to irqinit_32.c.
Reviewed-by Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
(Use correct mask to zero out bits 24-28 by Andreas)
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090409132406.GK31527@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: replace sysfs attribute
Current interface violates against "one-value-per-sysfs-attribute
rule". This patch replaces current attribute with two attributes --
one for each L3 Cache Index Disable register.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090409131849.GJ31527@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: bug fix
If user writes to "cache_disable" attribute on a CPU that does not support
this feature, the process hangs due to an invalid return value in
store_cache_disable().
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <20090409130729.GH31527@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
AMD family 0x11 CPU doesn't support the feature.
Some AMD family 0x10 CPUs do not support it or have an erratum, see
erratum #382 in "Revision Guide for AMD Family 10h Processors, 41322
Rev. 3.40 February 2009".
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
CC: Mark Langsdorf <mark.langsdorf@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090409130510.GG31527@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To simplify level irq migration in the presence of interrupt-remapping,
Suresh used a virtual vector (io-apic pin number) to eliminate io-apic
RTE modification. Level triggered interrupt will appear as an edge to
the local apic cpu but still as level to the IO-APIC. So in addition to
do the local apic EOI, it still needs to do IO-APIC directed EOI to clear
the remote IRR bit in the IO-APIC RTE. Pls refer to Suresh's patch for
more details (commit 0280f7c416).
Now interrupt remapping is decoupled from x2apic, it also needs to do the
directed EOI for apic. Otherwise, apic interrupts won't work correctly.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: Weidong Han <weidong.han@intel.com>
Cc: suresh.b.siddha@intel.com
Cc: dwmw2@infradead.org
Cc: allen.m.kay@intel.com
LKML-Reference: <1239355037-22856-1-git-send-email-weidong.han@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: speed up
The return to handler portion of the function graph tracer should only
need to save the return values. The caller already saved off the
registers that the callee can modify. The returning function already
saved the registers it modified. When we call our own trace function
it too will save the registers that the callee must restore.
There's no reason to save off anything more that the registers used
to return the values.
Note, I did a complete kernel build with this modification and the
function graph tracer running on x86_64.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new interfaces (not yet used)
For all the platforms out there, there is an infinite number of buggy
BIOSes. This adds infrastructure to treat BIOS interrupts more like
toxic waste and "glove box" them -- we switch out the register set,
perform the BIOS interrupt, and then restore the previous state.
LKML-Reference: <49DE7F79.4030106@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: cpu_debug remove execute permission
x86: smarten /proc/interrupts output for new counters
x86: DMI match for the Dell DXP061 as it needs BIOS reboot
x86: make 64 bit to use default_inquire_remote_apic
x86, setup: un-resequence mode setting for VGA 80x34 and 80x60 modes
x86, intel-iommu: fix X2APIC && !ACPI build failure
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: consolidate documents
blktrace: pass the right pointer to kfree()
tracing/syscalls: use a dedicated file header
tracing: append a comma to INIT_FTRACE_GRAPH
It seems by mistake these files got execute permissions so removing it.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1239211186.9037.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix build warnings and possibe compat misbehavior on IA64
Building a kernel on ia64 might trigger these ugly build warnings:
CC arch/ia64/ia32/sys_ia32.o
In file included from arch/ia64/ia32/sys_ia32.c:55:
arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
In file included from include/linux/elf.h:7,
from include/linux/module.h:14,
from include/linux/ftrace.h:8,
from include/linux/syscalls.h:68,
from arch/ia64/ia32/sys_ia32.c:18:
arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
[...]
sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
to import the syscalls tracing prototypes.
But including ftrace.h can pull too much things for a low level file,
especially on ia64 where the ia32 private headers conflict with higher
level headers.
Now we isolate the syscall tracing headers in their own lightweight file.
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Davidson <md@google.com>
LKML-Reference: <20090408184058.GB6017@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Cleanup
Reorganizes the code in arch/x86/kernel/io_apic.c by
combining two '#ifdef CONFIG_SMP' regions. In addition
to making the code easier to understand the first
'#ifdef CONFIG_SMP' region is moved to a location later
in the file which will reduce the need for function
forward declarations when the code subsequently revised.
The only changes other than relocating code to a different
position in the file were the removal of the assign_irq_vector()
forward declaration which was no longer needed and some line
length reduction formatting changes.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Cc: lcm@us.ibm.com
LKML-Reference: <20090408210725.GC11159@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now /proc/interrupts of tip tree has new counters:
PLT: Platform interrupts
Format change of output, as like that by commit:
commit 7a81d9a7da
x86: smarten /proc/interrupts output
should be applied to these new counters too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49C98DEA.8060208@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now /proc/interrupts of tip tree has new counters:
CNT: Performance counter interrupts
Format change of output, as like that by commit:
commit 7a81d9a7da
x86: smarten /proc/interrupts output
should be applied to these new counters too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49C98DEA.8060208@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: restore old behavior
for flat and phys_flat
Signed-off-by: Yinhai Lu <yinghai@kernel.org.
LKML-Reference: <49DCBBF1.8080903@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a hwdev argument that is needed on some architectures
in order to access a per-device offset that is taken into
account when producing a physical address (also needed to
get from bus address to virtual address because the physical
address is an intermediate step).
Also make swiotlb_bus_to_virt weak so architectures can
override it.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: jeremy@goop.org
Cc: ian.campbell@citrix.com
LKML-Reference: <1239199761-22886-8-git-send-email-galak@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
/sys/firmware/sgi_uv should only be created on uv systems.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20090403222423.GA28546@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
didn't set it for UV_NON_UNIQUE_APIC, so don't restore it
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49D6A6B9.6060501@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
this warning:
arch/x86/kernel/early-quirks.c:99: warning: ‘ati_ixp4x0_rev’ defined but not used
triggers because ati_ixp4x0_rev() is only used in the
ACPI && X86_IO_APIC case.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, prepare FPU code unificaton
Like on x86_64, return an error from restore_fpu and kill the task
if it fails.
Also rename restore_fpu to restore_fpu_checking which allows ifdefs
to be removed in math_state_restore().
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
LKML-Reference: <1239190320-23952-1-git-send-email-jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/powerpc/include/asm/systbl.h
arch/powerpc/include/asm/unistd.h
include/linux/init_task.h
Merge reason: the conflicts are non-trivial: PowerPC placement
of sys_perf_counter_open has to be mixed with the
new preadv/pwrite syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for Always Running APIC timer, CPUID_0x6_EAX_Bit2.
This bit means the APIC timer continues to run even when CPU is
in deep C-states.
The advantage is that we can use LAPIC timer on these CPUs
always, and there is no need for "slow to read and program"
external timers (HPET/PIT) and the timer broadcast logic
and related code in C-state entry and exit.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Do not write zeroes to APERF and MPERF by ondemand governor. With this
change, other users can share these MSRs for reads.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Change structure name to make the code cleaner and simpler. No
functionality change in this patch.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y
branch tracer: Fix for enabling branch profiling makes sparse unusable
ftrace: Correct a text align for event format output
Update /debug/tracing/README
tracing/ftrace: alloc the started cpumask for the trace file
tracing, x86: remove duplicated #include
ftrace: Add check of sched_stopped for probe_sched_wakeup
function-graph: add proper initialization for init task
tracing/ftrace: fix missing include string.h
tracing: fix incorrect return type of ns2usecs()
tracing: remove CALLER_ADDR2 from wakeup tracer
blktrace: fix pdu_len when tracing packet command requests
blktrace: small cleanup in blk_msg_write()
blktrace: NUL-terminate user space messages
tracing: move scripts/trace/power.pl to scripts/tracing/power.pl
* commit 'origin/master': (4825 commits)
Fix build errors due to CONFIG_BRANCH_TRACER=y
parport: Use the PCI IRQ if offered
tty: jsm cleanups
Adjust path to gpio headers
KGDB_SERIAL_CONSOLE check for module
Change KCONFIG name
tty: Blackin CTS/RTS
Change hardware flow control from poll to interrupt driven
Add support for the MAX3100 SPI UART.
lanana: assign a device name and numbering for MAX3100
serqt: initial clean up pass for tty side
tty: Use the generic RS485 ioctl on CRIS
tty: Correct inline types for tty_driver_kref_get()
splice: fix deadlock in splicing to file
nilfs2: support nanosecond timestamp
nilfs2: introduce secondary super block
nilfs2: simplify handling of active state of segments
nilfs2: mark minor flag for checkpoint created by internal operation
nilfs2: clean up sketch file
nilfs2: super block operations fix endian bug
...
Conflicts:
arch/x86/include/asm/thread_info.h
arch/x86/lguest/boot.c
drivers/xen/manage.c
Replace all DMA_24BIT_MASK macro with DMA_BIT_MASK(24)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_40BIT_MASK macro with DMA_BIT_MASK(40)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add debug store support for Core i7.
Core i7 adds a reset value for each performance counter and a new
PEBS record format.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144607.088997000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Check the buffer size more precisely to allow buffers for exactly
one element provided the base address is already properly aligned.
Add a debug store selftest.
Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144606.139137000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ds_reset_pebs() passed the wrong qualifier to a shared function resulting
in a reset of bts, rather than pebs.
Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144605.206510000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Debug store already uses TIF_DS_AREA_MSR to trigger debug store context
switch handling. No need to use TIF_DEBUGCTLMSR, as well.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144604.256645000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the ptrace bts context field to task_struct unconditionally.
Initialize the field directly in copy_process().
Remove all the unneeded functionality used to initialize that field.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144603.292754000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a warning in case a debug store context is not removed before
the task it is attached to is freed.
Remove the old warning at thread exit. It is too early.
Declare the debug store context field in thread_struct unconditionally.
Remove ds_copy_thread() and ds_exit_thread() and do the work directly
in process*.c.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144601.254472000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Perform debug store selftests on each cpu.
Cover both the normal and the _noirq variant of the debug store interface.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144559.394583000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The hw-branch-tracer uses debug store functions from an on_each_cpu()
context, which is simply wrong since the functions may sleep.
Add _noirq variants for most functions, which may be called with
interrupts disabled.
Separate per-cpu and per-task tracing and allow per-cpu tracing to be
controlled from any cpu.
Make the hw-branch-tracer use the new debug store interface, synchronize
with hotplug cpu event using get/put_online_cpus(), and remove the
unnecessary spinlock.
Make the ptrace bts and the ds selftest code use the new interface.
Defer the ds selftest.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144555.658136000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Per-task branch tracing installs a debug store context with the traced
task. This immediately results in the branch trace control bits to be
cleared for the next context switch of that task, if not set before.
Either per-cpu or per-task tracing are allowed at the same time.
An active per-cpu tracing would be disabled even if the per-task tracing
request is rejected and the task debug store context removed.
Check the tracing type (per-cpu or per-task) before installing a task
debug store context.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144552.856000000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to stop branch tracing for a running task, we need to first
clear the branch tracing control bits before we may free the tracing
buffer.
If the traced task is running, the cpu might still trace that task
after the branch trace control bits have cleared.
Wait until the traced task has been scheduled out before proceeding.
A similar problem affects the task debug store context. We first remove
the context, then we need to wait until the task has been scheduled
out before we can free the context memory.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144551.919636000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a ptraced task is unlinked, we need to stop branch tracing for
that task.
Since the unlink is called with interrupts disabled, and we need
interrupts enabled to stop branch tracing, we defer the work.
Collect all branch tracing related stuff in a branch tracing context.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144550.712401000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the bts tracer is removed while the traced task is running,
the write to clear the bts tracer pointer races with context switch code.
Read the tracer once during a context switch.
When a new tracer is installed, the bts tracer is set in the ds context
before the tracer is initialized in order to claim the context for that
tracer.
This may result in write accesses using an uninitialized trace configuration
when scheduling timestamps have been requested.
Store active tracing flags separately and only set active flags after
the tracing configuration has been initialized.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144548.881338000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Prepare for more generic overflow handling. The new perf_counter_overflow()
method will handle the generic bits of the counter overflow, and can return
a !0 return value, in which case the counter should be (soft) disabled, so
that it won't count until it's properly disabled.
XXX: do powerpc and swcounter
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090406094517.812109629@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement set_perf_counter_pending() with a self-IPI so that it will
run ASAP in a usable context.
For now use a second IRQ vector, because the primary vector pokes
the apic in funny ways that seem to confuse things.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090406094517.724626696@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove dupilicated #include in arch/x86/kernel/cpu/cpufreq/longhaul.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Put in counts to tell which ips belong to what context.
-----
| | hv
| --
nr | | kernel
| --
| | user
-----
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Orig-LKML-Reference: <20090402091319.493101305@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Follow the example set by powerpc and try to play nice with oprofile
and the nmi watchdog.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171024.459968444@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide the x86 perf_callchain() implementation.
Code based on the ftrace/sysprof code from Soeren Sandmann Pedersen.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Soeren Sandmann Pedersen <sandmann@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Orig-LKML-Reference: <20090330171024.341993293@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that Paul cleaned up the error propagation paths, pass down the
x86 error as well.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.792822360@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While going over the wakeup code I noticed delayed wakeups only work
for hardware counters but basically all software counters rely on
them.
This patch unifies and generalizes the delayed wakeup to fix this
issue.
Since we're dealing with NMI context bits here, use a cmpxchg() based
single link list implementation to track counters that have pending
wakeups.
[ This should really be generic code for delayed wakeups, but since we
cannot use cmpxchg()/xchg() in generic code, I've let it live in the
perf_counter code. -- Eric Dumazet could use it to aggregate the
network wakeups. ]
Furthermore, the x86 method of using TIF flags was flawed in that its
quite possible to end up setting the bit on the idle task, loosing the
wakeup.
The powerpc method uses per-cpu storage and does appear to be
sufficient.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since the bitfields turned into a bit of a mess, remove them and rely on
good old masks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090323172417.059499915@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Having 3 slightly different copies of the same code around does nobody
any good. First step in revamping the output format.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.929962222@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: modify ABI
The hardware/software classification in hw_event->type became a little
strained due to the addition of tracepoint tracing.
Instead split up the field and provide a type field to explicitly specify
the counter type, while using the event_id field to specify which event to
use.
Raw counters still work as before, only the raw config now goes into
raw_event.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.836807573@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot crash on Intel Perfmon Version 1 systems
Intel Perfmon v1 does not support the global MSRs, nor does
it offer the generalized MSR ranges. So support v2 and later
CPUs only.
Also mark pmc_ops as read-mostly - to avoid false cacheline
sharing.
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a build warning on 32bit machines by explicitly marking the
constants as 64-bit.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We need to ensure the enabled=0 write happens before we
start disabling the actual counters, so that a pcm_amd_enable()
will not enable one underneath us.
I think the race is impossible anyway, we always balance the
ops within any one context and perform enable() with IRQs disabled.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: we have gathered quite a few conflicts, need to merge upstream
Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
tracing, net: fix net tree and tracing tree merge interaction
tracing, powerpc: fix powerpc tree and tracing tree interaction
ring-buffer: do not remove reader page from list on ring buffer free
function-graph: allow unregistering twice
trace: make argument 'mem' of trace_seq_putmem() const
tracing: add missing 'extern' keywords to trace_output.h
tracing: provide trace_seq_reserve()
blktrace: print out BLK_TN_MESSAGE properly
blktrace: extract duplidate code
blktrace: fix memory leak when freeing struct blk_io_trace
blktrace: fix blk_probes_ref chaos
blktrace: make classic output more classic
blktrace: fix off-by-one bug
blktrace: fix the original blktrace
blktrace: fix a race when creating blk_tree_root in debugfs
blktrace: fix timestamp in binary output
tracing, Text Edit Lock: cleanup
tracing: filter fix for TRACE_EVENT_FORMAT events
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
x86: kretprobe-booster interrupt emulation code fix
...
Fix up trivial conflicts in
arch/parisc/include/asm/ftrace.h
include/linux/memory.h
kernel/extable.c
kernel/module.c
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)
cpumask: remove cpumask allocation from idle_balance, fix
numa, cpumask: move numa_node_id default implementation to topology.h, fix
cpumask: remove cpumask allocation from idle_balance
x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
x86: microcode: cleanup
x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
numa, cpumask: move numa_node_id default implementation to topology.h
cpumask: convert node_to_cpumask_map[] to cpumask_var_t
cpumask: remove x86 cpumask_t uses.
cpumask: use cpumask_var_t in uv_flush_tlb_others.
cpumask: remove cpumask_t assignment from vector_allocation_domain()
cpumask: make Xen use the new operators.
cpumask: clean up summit's send_IPI functions
cpumask: use new cpumask functions throughout x86
x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
cpumask: convert node_to_cpumask_map[] to cpumask_var_t
x86: unify 32 and 64-bit node_to_cpumask_map
...
interrupt remapping must be enabled before enabling x2apic, but
interrupt remapping doesn't depend on x2apic, it can be used
separately. Enable interrupt remapping in init_dmars even x2apic
is not supported.
[dwmw2: Update Kconfig accordingly, fix build with INTR_REMAP && !X2APIC]
Signed-off-by: Weidong Han <weidong.han@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
All logical processors with APIC ID values of 255 and greater will have their
APIC reported through Processor X2APIC structure (type-9 entry type) and all
logical processors with APIC ID less than 255 will have their APIC reported
through legacy Processor Local APIC (type-0 entry type) only. This is the
same case even for NMI structure reporting.
The Processor X2APIC Affinity structure provides the association between the
X2APIC ID of a logical processor and the proximity domain to which the logical
processor belongs.
For OSPM, Procssor IDs outside the 0-254 range are to be declared as Device()
objects in the ACPI namespace.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch enables suspend/resume for interrupt remapping. During suspend,
interrupt remapping is disabled. When resume, interrupt remapping is enabled
again.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Fix initialization of UV blade information for systems that have
nodes with memory but no cpus.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20090330140111.GA18461@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix address wrap on 32-bit kernel.
intel-iommu: Enable DMAR on 32-bit kernel.
intel-iommu: fix PCI device detach from virtual machine
intel-iommu: VT-d page table to support snooping control bit
iommu: Add domain_has_cap iommu_ops
intel-iommu: Snooping control support
Fixed trivial conflicts in arch/x86/Kconfig and drivers/pci/intel-iommu.c
Use the copy of UV system table in kernel memory, not the one in
bios after unmapping.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20090330225240.GA22776@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces a 'nop' uv_enable_timeouts() in the
UV TLB shootdown code. (somehow, long ago that function got
eviscerated)
If any cpu in the destination node does not get interrupted by the
message and post completion in a reasonable time the hardware
should respond to the sender with an error. This function
enables such timeouts.
Tested on the UV hardware simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1LpjXU-00007e-Qh@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes BAU initialization for systems containing
nodes with no memory and for systems with non-consecutive
node numbers.
Fixes and clarifies situations where pnode should be used instead
of node id.
Tested on the UV hardware simulator.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1LpjX3-00007N-12@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.
Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html
The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:
ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);
This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.
The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add macros for using the UV hub to send interrupts. Change the IPI code
to use these macros. These macros will also be used in additional patches
that will follow.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Container-init must behave like global-init to processes within the
container and hence it must be immune to unhandled fatal signals from
within the container (i.e SIG_DFL signals that terminate the process).
But the same container-init must behave like a normal process to processes
in ancestor namespaces and so if it receives the same fatal signal from a
process in ancestor namespace, the signal must be processed.
Implementing these semantics requires that send_signal() determine pid
namespace of the sender but since signals can originate from workqueues/
interrupt-handlers, determining pid namespace of sender may not always be
possible or safe.
This patchset implements the design/simplified semantics suggested by
Oleg Nesterov. The simplified semantics for container-init are:
- container-init must never be terminated by a signal from a
descendant process.
- container-init must never be immune to SIGKILL from an ancestor
namespace (so a process in parent namespace must always be able
to terminate a descendant container).
- container-init may be immune to unhandled fatal signals (like
SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP
are the only reliable signals to a container-init from ancestor
namespace.
This patch:
Based on an earlier patch submitted by Oleg Nesterov and comments from
Roland McGrath (http://lkml.org/lkml/2008/11/19/258).
The handler parameter is currently unused in the tracehook functions.
Besides, the tracehook functions are called with siglock held, so the
functions can check the handler if they later need to.
Removing the parameter simiplifies changes to sig_ignored() in a follow-on
patch.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: clean up
those code pcpu_need_numa(), should be removed.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
LKML-Reference: <49D31770.9090502@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
setup_percpu_remap() is for NUMA machines yet it bailed out with
-EINVAL if pcpu_need_numa(). Fix the inverted condition.
This problem was reported by David Miller and verified by Yinhai Lu.
Reported-by: David Miller <davem@davemloft.net>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <49D30469.8020006@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
Make the following header file changes:
- remove arch ifdefs and asm/suspend.h from linux/suspend.h
- add asm/suspend.h to disk.c (for arch_prepare_suspend())
- add linux/io.h to swsusp.c (for ioremap())
- x86 32/64 bit compile fixes
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: fix redundant and incorrect check
Oleg Nesterov noticed wrt commit:
14fc9fb: x86: signal: check signal stack overflow properly
>> No need to check SA_ONSTACK if we're already using alternate signal stack.
>
> Yes, but this also mean that we don't need sas_ss_flags() under
> "if (!onsigstack)",
Checking on_sig_stack() in sas_ss_flags() at get_sigframe() is redundant
and not correct on 64 bit. To check sas_ss_size is enough.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: roland@redhat.com
LKML-Reference: <49CBB54C.5080201@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
Revert "proc: revert /proc/uptime to ->read_proc hook"
proc 2/2: remove struct proc_dir_entry::owner
proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
proc: fix sparse warnings in pagemap_read()
proc: move fs/proc/inode-alloc.txt comment into a source file
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PCI PM: Make pci_prepare_to_sleep() disable wake-up if needed
radeonfb: Use __pci_complete_power_transition()
PCI PM: Introduce __pci_[start|complete]_power_transition() (rev. 2)
PCI PM: Restore config spaces of all devices during early resume
PCI PM: Make pci_set_power_state() handle devices with no PM support
PCI PM: Put devices into low power states during late suspend (rev. 2)
PCI PM: Move pci_restore_standard_config to pci-driver.c
PCI PM: Use pci_set_power_state during early resume
PCI PM: Consistently use variable name "error" for pm call return values
kexec: Change kexec jump code ordering
PM: Change hibernation code ordering
PM: Change suspend code ordering
PM: Rework handling of interrupts during suspend-resume
PM: Introduce functions for suspending and resuming device interrupts
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
dma-debug: make memory range checks more consistent
dma-debug: warn of unmapping an invalid dma address
dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG
dma-debug/x86: register pci bus for dma-debug leak detection
dma-debug: add a check dma memory leaks
dma-debug: add checks for kernel text and rodata
dma-debug: print stacktrace of mapping path on unmap error
dma-debug: Documentation update
dma-debug: x86 architecture bindings
dma-debug: add function to dump dma mappings
dma-debug: add checks for sync_single_sg_*
dma-debug: add checks for sync_single_range_*
dma-debug: add checks for sync_single_*
dma-debug: add checking for [alloc|free]_coherent
dma-debug: add add checking for map/unmap_sg
dma-debug: add checking for map/unmap_page/single
dma-debug: add core checking functions
dma-debug: add debugfs interface
dma-debug: add kernel command line parameters
dma-debug: add initialization code
...
Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
Use the functions introduced in by the previous patch,
suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
to rework the handling of interrupts during suspend (hibernation) and
resume. Namely, interrupts will only be disabled on the CPU right
before suspending sysdevs, while device drivers will be prevented
from receiving interrupts, with the help of the new helper function,
before their "late" suspend callbacks run (and analogously during
resume).
In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
* 'x86-stage-3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (190 commits)
Revert "cpuacct: reduce one NULL check in fast-path"
Revert "x86: don't compile vsmp_64 for 32bit"
x86: Correct behaviour of irq affinity
x86: early_ioremap_init(), use __fix_to_virt(), because we are sure it's safe
x86: use default_cpu_mask_to_apicid for 64bit
x86: fix set_extra_move_desc calling
x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot
x86/dmi: fix dmi_alloc() section mismatches
x86: e820 fix various signedness issues in setup.c and e820.c
x86: apic/io_apic.c define msi_ir_chip and ir_ioapic_chip all the time
x86: irq.c keep CONFIG_X86_LOCAL_APIC interrupts together
x86: irq.c use same path for show_interrupts
x86: cpu/cpu.h cleanup
x86: Fix a couple of sparse warnings in arch/x86/kernel/apic/io_apic.c
Revert "x86: create a non-zero sized bm_pte only when needed"
x86: pci-nommu.c cleanup
x86: io_delay.c cleanup
x86: rtc.c cleanup
x86: i8253 cleanup
x86: kdebugfs.c cleanup
...
Impact: cleanup
It's unused, since about 1995. So remove all initialization of it in
preparation for actually removing the field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Impact: minor optimisation
percpu_read/write is a slightly more direct way of getting
to percpu data.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Impact: remove obsolete checks, simplification
Lift restrictions on preemption with lazy mmu mode, as it is now allowed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: fix lazy context switch API
Pass the previous and next tasks into the context switch start
end calls, so that the called functions can properly access the
task state (esp in end_context_switch, in which the next task
is not yet completely current).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: allow preemption during lazy mmu updates
If we're in lazy mmu mode when context switching, leave
lazy mmu mode, but remember the task's state in
TIF_LAZY_MMU_UPDATES. When we resume the task, check this
flag and re-enter lazy mmu mode if its set.
This sets things up for allowing lazy mmu mode while preemptible,
though that won't actually be active until the next change.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: simplification, prepare for later changes
Make lazy cpu mode more specific to context switching, so that
it makes sense to do more context-switch specific things in
the callbacks.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: simplification, robustness
Make paravirt_lazy_mode() always return PARAVIRT_LAZY_NONE
when in an interrupt. This prevents interrupt code from
accidentally inheriting an outer lazy state, and instead
does everything synchronously. Outer batched operations
are left deferred.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Some BIOSes report very high frequency transition latency which are plainly
wrong on CPus that can change frequency using native MSR interface.
One such system is IBM T42 (2327-8ZU) as reported by Owen Taylor and
Rik van Riel.
cpufreq_ondemand driver uses this transition latency to come up with a
reasonable sampling interval to sample CPU usage and with such high
latency value, ondemand sampling interval ends up being very high
(0.5 sec, in this particular case), resulting in performance impact due to
slow response to increasing frequency.
Fix it by capping-off the transition latency to 20uS for native MSR based
frequency transitions.
mjg: We've confirmed that this also helps on the X31
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
> arch/x86/kernel/cpu/cpufreq/longhaul.c: In function 'longhaul_setstate':
> arch/x86/kernel/cpu/cpufreq/longhaul.c:308: error: implicit declaration of function 'acpi_set_register'
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Compile-tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
sched: Add comments to find_busiest_group() function
sched: Refactor the power savings balance code
sched: Optimize the !power_savings_balance during fbg()
sched: Create a helper function to calculate imbalance
sched: Create helper to calculate small_imbalance in fbg()
sched: Create a helper function to calculate sched_domain stats for fbg()
sched: Define structure to store the sched_domain statistics for fbg()
sched: Create a helper function to calculate sched_group stats for fbg()
sched: Define structure to store the sched_group statistics for fbg()
sched: Fix indentations in find_busiest_group() using gotos
sched: Simple helper functions for find_busiest_group()
sched: remove unused fields from struct rq
sched: jiffies not printed per CPU
sched: small optimisation of can_migrate_task()
sched: fix typos in documentation
sched: add avg_overlap decay
x86, sched_clock(): mark variables read-mostly
sched: optimize ttwu vs group scheduling
sched: TIF_NEED_RESCHED -> need_reshed() cleanup
sched: don't rebalance if attached on NULL domain
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
[CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
[CPUFREQ] Make cpufreq-nforce2 less obnoxious
[CPUFREQ] p4-clockmod reports wrong frequency.
[CPUFREQ] powernow-k8: Use a common exit path.
[CPUFREQ] Change link order of x86 cpufreq modules
[CPUFREQ] conservative: remove 10x from def_sampling_rate
[CPUFREQ] conservative: fixup governor to function more like ondemand logic
[CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
[CPUFREQ] conservative: amend author's email address
[CPUFREQ] Use swap() in longhaul.c
[CPUFREQ] checkpatch cleanups for acpi-cpufreq
[CPUFREQ] powernow-k8: Only print error message once, not per core.
[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
[CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
[CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
[CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
[CPUFREQ] checkpatch cleanups for powernow-k8
[CPUFREQ] checkpatch cleanups for ondemand governor.
[CPUFREQ] checkpatch cleanups for powernow-k7
[CPUFREQ] checkpatch cleanups for speedstep related drivers.
...
Partial revert of commit 129d8bc828
titled 'x86: don't compile vsmp_64 for 32bit'
Commit reverted to compile vsmp_64.c if CONFIG_X86_64 is defined,
since is_vsmp_box() needs to indicate that TSCs are not synchronized, and
hence, not a valid time source, even when CONFIG_X86_VSMP is not defined.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: shai@scalex86.org
LKML-Reference: <20090324061429.GH7278@localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix interrupt emulation code in kretprobe-booster according to
pt_regs update (es/ds change and gs adding).
This issue has been reported on systemtap-bugzilla:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9965
| On a -tip kernel on x86_32, kretprobe_example (from samples) triggers the
| following backtrace when its retprobing a class of functions that cause a
| copy_from/to_user().
|
| BUG: sleeping function called from invalid context at mm/memory.c:3196
| in_atomic(): 0, irqs_disabled(): 1, pid: 2286, name: cat
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: systemtap-ml <systemtap@sources.redhat.com>
LKML-Reference: <49C7995C.2010601@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: get correct smp_affinity as user requested
The effect of setting desc->affinity (ie. from userspace via sysfs) has
varied over time. In 2.6.27, the 32-bit code anded the value with
cpu_online_map, and both 32 and 64-bit did that anding whenever a cpu
was unplugged.
2.6.29 consolidated this into one routine (and fixed hotplug) but
introduced another variation: anding the affinity with cfg->domain.
We should just set it to what the user said - if possible.
(cpu_mask_to_apicid_and already takes cpu_online_mask into account)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49C94DDF.2010703@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bug with irq-descriptor moving when logical flat
Rusty observed:
> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time. In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain. Is this right, or
> should we just set it to what the user said? Or as now, indicate that we're
> restricting it.
Eric pointed out that desc->affinity should be what the user requested,
if it is at all possible to honor the user space request.
This bug got introduced by commit 22f65d31b "x86: Update io_apic.c to use
new cpumask API".
Fix it by moving the masking to before the descriptor moving ...
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <49C94134.4000408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch move the timestamp from happening in the arch specific
code into the general code. This allows for better control by the tracer
to time manipulation.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
This iommu_op can tell if domain have a specific capability, like snooping
control for Intel IOMMU, which can be used by other components of kernel to
adjust the behaviour.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Impact: cleanup
This fixed various signedness issues in setup.c and e820.c:
arch/x86/kernel/setup.c:455:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:455:53: expected int *pnr_map
arch/x86/kernel/setup.c:455:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:639:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:639:53: expected int *pnr_map
arch/x86/kernel/setup.c:639:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:820:54: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:820:54: expected int *pnr_map
arch/x86/kernel/setup.c:820:54: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/e820.c:670:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/e820.c:670:53: expected int *pnr_map
arch/x86/kernel/e820.c:670:53: got unsigned int [toplevel] *<noident>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
move out msi_ir_chip and ir_ioapic_chip from CONFIG_INTR_REMAP shadow
Fix:
arch/x86/kernel/apic/io_apic.c:1431: warning: ‘msi_ir_chip’ defined but not used
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Impact: cleanup
This patch fixes the following sparse warnings:
arch/x86/kernel/apic/io_apic.c:3602:17: warning: symbol 'hpet_msi_type'
was not declared. Should it be static?
arch/x86/kernel/apic/io_apic.c:3467:30: warning: Using plain integer as
NULL pointer
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <1237741871-5827-2-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To reduce the size of the oversized function __get_smp_config()
There should be no impact to functionality.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Impact: fix incorrect error message
- IO APIC resource allocation error message contains one too many "be".
- Print the error message iff there are IO APICs in the system.
I've seen this error message for some time on my x86-32 laptop...
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Bartlett <ajb.stxsl@googlemail.com>
LKML-Reference: <200903202100.30789.bzolnier@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Check alternate signal stack overflow with proper stack pointer.
The stack pointer of the next signal frame is different if that
task has i387 state.
On x86_64, redzone would be included.
No need to check SA_ONSTACK if we're already using alternate signal stack.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <49C2874D.3080002@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch changes a VIA PCI quirk to use dev_info() rather than printk().
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgek.org>
Ds_request_bts() needs to allocate memory. It uses GFP_KERNEL.
Hw-branch-tracer calls ds_request_bts() within on_each_cpu().
Use atomic memory allocation to allow it to be used in that context.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
LKML-Reference: <20090318192700.A6038@sedona.ch.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When I review the sensitive code ftrace_nmi_enter(), I found
the atomic variable nmi_running does protect NMI VS do_ftrace_mod_code(),
but it can not protects NMI(entered nmi) VS NMI(ftrace_nmi_enter()).
cpu#1 | cpu#2 | cpu#3
ftrace_nmi_enter() | do_ftrace_mod_code() |
not modify | |
------------------------|-----------------------|--
executing | set mod_code_write = 1|
executing --|-----------------------|--------------------
executing | | ftrace_nmi_enter()
executing | | do modify
------------------------|-----------------------|-----------------
ftrace_nmi_exit() | |
cpu#3 may be being modified the code which is still being executed on cpu#1,
it will have undefined results and possibly take a GPF, this patch
prevents it occurred.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49C0B411.30003@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: cleanup
Refactor the MP-table parsing code via the introduction of the
following helper functions:
skip_entry()
smp_reserve_bootmem()
check_irq_src()
check_slot()
To simplify the code flow and to reduce the size of the
following oversized functions: smp_read_mpc(), smp_scan_config().
There should be no impact to functionality.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, avoid cpumask games
The APM code wants to run on CPU 0: we create an "on_cpu0" wrapper
which uses work_on_cpu() if we're not already on cpu 0.
This introduces a new failure mode: -ENOMEM, so we add an explicit
err arg and handle Linux-style errnos in apm_err().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <200903111631.29787.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: don't play with current's cpumask
Straightforward indirection through work_on_cpu(). One change is
that the error code from microcode_update_cpu() is now actually
plumbed back to microcode_init_cpu(), so now we printk if it fails
on cpu hotplug.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Peter Oruba <peter.oruba@amd.com>
LKML-Reference: <200903111632.37279.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/mtrr/cleanup.c:197: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1237378015.13488.1.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot crash on UV systems
Commit 76ba0ecda0 "cpumask: use
cpumask_var_t in uv_flush_tlb_others" used cur_cpu as an iterator;
it was supposed to be zero for the code below it.
Reported-by: Cliff Wickman <cpw@sgi.com>
Original-From: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Mike Travis <travis@sgi.com>
Cc: steiner@sgi.com
Cc: <stable@kernel.org>
LKML-Reference: <200903180822.31196.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Fix cpu offline when CONFIG_MAXSMP=y
Changeset bc9b83dd1f "cpumask: convert
c1e_mask in arch/x86/kernel/process.c to cpumask_var_t" contained a
bug: c1e_mask is manipulated even if C1E isn't detected (and hence
not allocated).
This is simply fixed by checking for NULL (which gcc optimizes out
anyway of CONFIG_CPUMASK_OFFSTACK=n, since it knows ce1_mask can never
be NULL).
In addition, fix a leak where select_idle_routine re-allocates
(and re-clears) c1e_mask on every cpu init.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <200903171450.34549.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: optimize APIC IPI related barriers
Uncached MMIO accesses for xapic are inherently serializing and hence
we don't need explicit barriers for xapic IPI paths.
x2apic MSR writes/reads don't have serializing semantics and hence need
a serializing instruction or mfence, to make all the previous memory
stores globally visisble before the x2apic msr write for IPI.
Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "steiner@sgi.com" <steiner@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Attempting to rid us of the problematic work_on_cpu(). Just use
smp_call_function_single() here.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090318042217.EF3F1DDF39@ozlabs.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix spurious IRQs
During irq migration, we send a low priority interrupt to the previous
irq destination. This happens in non interrupt-remapping case after interrupt
starts arriving at new destination and in interrupt-remapping case after
modifying and flushing the interrupt-remapping table entry caches.
This low priority irq cleanup handler can cleanup multiple vectors, as
multiple irq's can be migrated at almost the same time. While
there will be multiple invocations of irq cleanup handler (one cleanup
IPI for each irq migration), first invocation of the cleanup handler
can potentially cleanup more than one vector (as the first invocation can
see the requests for more than vector cleanup). When we cleanup multiple
vectors during the first invocation of the smp_irq_move_cleanup_interrupt(),
other vectors that are to be cleanedup can still be pending in the local
cpu's IRR (as smp_irq_move_cleanup_interrupt() runs with interrupts disabled).
When we are ready to unhook a vector corresponding to an irq, check if that
vector is registered in the local cpu's IRR. If so skip that cleanup and
do a self IPI with the cleanup vector, so that we give a chance to
service the pending vector interrupt and then cleanup that vector
allocation once we execute the lowest priority handler.
This fixes spurious interrupts seen when migrating multiple vectors
at the same time.
[ This is apparently possible even on conventional xapic, although to
the best of our knowledge it has never been seen. The stable
maintainers may wish to consider this one for -stable. ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: stable@kernel.org
Impact: fix possible race
save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
called with interrupts disabled. Fix this by splitting this into two different
function. Allocation part save_IO_APIC_setup() now happens before
disabling interrupts.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: cleanup
Clean up #ifdefs and replace them with helper functions.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: simplification
In the current code, for level triggered migration, we need to modify the
io-apic RTE with the update vector information, along with modifying interrupt
remapping table entry(IRTE) with vector and destination. This is to ensure that
remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
With this patch, for level triggered, we eliminate the io-apic RTE modification
(with the updated vector information), by using a virtual vector (io-apic pin
number). Real vector that is used for interrupting cpu will be coming from
the interrupt-remapping table entry. Trigger mode in the IRTE will always be
edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
So a level triggered interrupt will appear as an edge to the local apic
cpu but still as level to the IO-APIC.
With this change, level irq migration can be done by simply modifying
the interrupt-remapping table entry with out changing the io-apic RTE.
And as the interrupt appears as edge at the cpu, in addition to do the
local apic EOI, we need to do IO-APIC directed EOI to clear the remote
IRR bit in the IO-APIC RTE.
This simplies the irq migration in the presence of interrupt-remapping.
Idea-by: Rajesh Sankaran <rajesh.sankaran@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: cleanup, paranoia
We were not clearing the local APIC in clear_local_APIC() in the
presence of x2apic. Fix it.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: make kexec work with x2apic
disable_IO_APIC() gets called during crashdump aswell, which configures the
IO-APIC/LAPIC so that legacy interrupts can be delivered for the kexec'd kernel.
In the presence of interrupt-remapping, we need to change the
interrupt-remapping configuration aswell as modifying IO-APIC for virtual wire
B mode.
To keep things simple during the crash, use virtual wire A mode
(for which we don't need to touch io-apic and interrupt-remapping tables).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: interface augmentation (not yet used)
Enable fault handling flow for intr-remapping aswell. Fault handling
code now shared by both dma-remapping and intr-remapping.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: build fix with CONFIG_RELOCATABLE
Move _end into a dummy section, so that relocs.c will know it is a
relocatable symbol.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Impact: disambiguate real .bss variables from .brk storage
Add a .brk section after the .bss section. This has no effect
on the final vmlinux, but it more clearly distinguishes the space
taken by actual .bss symbols, and the variable space reserved
by .brk users.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Impact: Tighten bound to avoid masking errors
The definition of MAPPING_BEYOND_END was excessive; this has a nasty
tendency to mask bugs. We have learned over time that this kind of
bug hiding can cause some very strange errors. Therefore, tighten the
bound to only need to map the actual kernel area.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Impact: cleanup
ALLOCATOR_SLOP is a vestigial remain from when we used the
bootmem allocator to allocate the kernel's linear memory mapping.
Now we directly reserve pages from the e820 mapping, and no
longer require secondary structures to keep track of allocated
pages.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: crash fix
head_32.S needs to map the kernel itself, and enough space so
that mm/init.c can allocate space from the e820 allocator
for the linear map of low memory.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Don't boost at the addresses which are listed on exception tables,
because major page fault will occur on those addresses. In that case,
kprobes can not ensure that when instruction buffer can be freed since
some processes will sleep on the buffer.
kprobes-ia64 already has same check.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order for ntpd to correctly synchronize the clocks, the frequency of
the system clock must not be off by more than 500 ppm (or, put another
way, 1:2000), or ntpd will end up giving up on trying to synchronize
properly, and ends up reseting the clock in jumps instead.
The fast TSC PIT calibration sometimes failed this test - it was
assuming that the PIT reads always took about one microsecond each (2us
for the two reads to get a 16-bit timer), and that calibrating TSC to
the PIT over 15ms should thus be sufficient to get much closer than
500ppm (max 2us error on both sides giving 4us over 15ms: a 270 ppm
error value).
However, that assumption does not always hold: apparently some hardware
is either very much slower at reading the PIT registers, or there was
other noise causing at least one machine to get 700+ ppm errors.
So instead of using a fixed 15ms timing loop, this changes the fast PIT
calibration to read the TSC delta over the individual PIT timer reads,
and use the result to calculate the error bars on the PIT read timing
properly. We then successfully calibrate the TSC only if the maximum
error bars fall below 500ppm.
In the process, we also relax the timing to allow up to 25ms for the
calibration, although it can happen much faster depending on hardware.
Reported-and-tested-by: Jesper Krogh <jesper@krogh.cc>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During bootup, when we reprogram the PIT (programmable interval timer)
to start counting down from 0xffff in order to use it for the fast TSC
calibration, we should also make sure to delay a bit afterwards to allow
the PIT hardware to actually start counting with the new value.
That will happens at the next CLK pulse (1.193182 MHz), so the easiest
way to do that is to just wait at least one microsecond after
programming the new PIT counter value. We do that by just reading the
counter value back once - which will take about 2us on PC hardware.
Reported-and-tested-by: john stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: don't trim e820 according to wrong mtrr
Ozan reports that his server emits strange warning.
it turns out the BIOS sets the MTRRs incorrectly.
Ignore those strange ranges, and don't trim e820,
just emit one warning about BIOS
Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BEE1E7.7020706@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Bug fix on UP
Referring commit cc3ca22063,
Peter removed __cpuinit annotations for mce_cpu_features()
and its successor functions, which caused troubles on UP
configurations.
However the intel_init_cmci() was introduced after that and
it also has __cpuinit annotation even though it is called from
mce_cpu_features(). Remove the annotation from that function
too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: help debug e820 bugs
Try to print out more info, to catch wrong call parameters.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49BCB557.3030000@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix boot crash
Need to exit early if the addr is far above 64k.
The crash got exposed by:
78a8b35: x86: make e820_update_range() handle small range update
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>
LKML-Reference: <49BC2279.2030101@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: makes vmlinux section information more useful
Don't use ram after _end blindly for pagetables. aka init pages is before _end
put those pg table into .bss
[Adapted to use brk segment - Jeremy]
v2: keep initial page table up to 512M only.
v4: put initial page tables just before _end
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: new interface; remove hard-coded limit
Add RESERVE_BRK(name, size) macro to reserve space in the brk
area. This should be a conservative (ie, larger) estimate of
how much space might possibly be required from the brk area.
Any unused space will be freed, so there's no real downside
on making the reservation too large (within limits).
The name should be unique within a given file, and somewhat
descriptive.
The C definition of RESERVE_BRK() ends up being more complex than
one would expect to work around a cluster of gcc infelicities:
The first attempt was to simply try putting __section(.brk_reservation)
on a variable. This doesn't work because it ends up making it a
@progbits section, which gets actual space allocated in the vmlinux
executable.
The second attempt was to emit the space into a section using asm,
but gcc doesn't allow arguments to be passed to file-level asm()
statements, making it hard to pass in the size.
The final attempt is to wrap the asm() in a function to allow
it to have arguments, and put the function itself into the
.discard section, which vmlinux*.lds drops entirely from the
emitted vmlinux.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: simplification
We only need to map the kernel in head_32.S, not the whole of
lowmem. We use 512MB as a reasonable (but arbitrary) limit on
the maximum size of the kernel image.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: use new interface instead of previous ad hoc implementation
Use extend_brk() to allocate memory for DMI rather than having an
ad-hoc allocator.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: use new interface instead of previous ad hoc implementation
Rather than having special purpose init_pg_table_start/end variables
to delimit the kernel pagetable built by head_32.S, just use the brk
mechanism to extend the bss for the new pagetable.
This patch removes init_pg_table_start/end and pg0, defines __brk_base
(which is page-aligned and immediately follows _end), initializes
the brk region to start there, and uses it for the 32-bit pagetable.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: build fix
The brk initialization functions were incorrectly located inside
an #ifdef CONFIG_VLK_DEV_INITRD block, causing the obvious build failure in
minimal configurations.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Impact: new interface
Add a brk()-like allocator which effectively extends the bss in order
to allow very early code to do dynamic allocations. This is better than
using statically allocated arrays for data in subsystems which may never
get used.
The space for brk allocations is in the bss ELF segment, so that the
space is mapped properly by the code which maps the kernel, and so
that bootloaders keep the space free rather than putting a ramdisk or
something into it.
The bss itself, delimited by __bss_stop, ends before the brk area
(__brk_base to __brk_limit). The kernel text, data and bss is reserved
up to __bss_stop.
Any brk-allocated data is reserved separately just before the kernel
pagetable is built, as that code allocates from unreserved spaces
in the e820 map, potentially allocating from any unused brk memory.
Ultimately any unused memory in the brk area is used in the general
kernel memory pool.
Initially the brk space is set to 1MB, which is probably much larger
than any user needs (the largest current user is i386 head_32.S's code
to build the pagetables to map the kernel, which can get fairly large
with a big kernel image and no PSE support). So long as the system
has sufficient memory for the bootloader to reserve the kernel+1MB brk,
there are no bad effects resulting from an over-large brk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
Move the symbols delimiting a section part of the section
(section relative) rather than absolute. This avoids any
unexpected gaps between the section-start symbol and the first
data in the section, which could be caused by implicit
alignment of the section data. It also makes the general
form of vmlinux_64.lds.S consistent with vmlinux_32.lds.S.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
there should be no difference, except:
* the 64bit variant now also initializes the padlock unit.
* ->c_early_init() is executed again from ->c_init()
* the 64bit fixups made into 32bit path.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: herbert@gondor.apana.org.au
LKML-Reference: <1237029843-28076-2-git-send-email-sebastian@breakpoint.cc>
Signed-off-by: Ingo Molnar <mingo@elte.hu>