linux

Author	SHA1	Message	Date
Jeremy Fitzhardinge	e25371d60c	x86/ioapic.c: unify ioapic_retrigger_irq() The 32 and 64-bit versions of ioapic_retrigger_irq() are identical except the 64-bit one takes vector_lock. vector_lock is defined and used on 32-bit too, so just use a common ioapic_retrigger_irq(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:51 -07:00
Jeremy Fitzhardinge	638f2f8c52	x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop Use a normal for() loop in __target_IO_APIC_irq(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge	4eea6fff61	x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments There's no need for a control variable in replace_pin_at_irq_node(); it can just return if it finds the old apic/pin to replace. If the loop terminates, then it didn't find the old apic/pin, so it can add the new ones. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge	535b64291a	x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop Use a conventional for() loop in replace_pin_at_irq_node(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge	875e68ec32	x86/ioapic.c: simplify add_pin_to_irq_node() Rather than duplicating the same alloc/init code twice, restructure the function to look for duplicates and then add an entry if none is found. This function is not performance critical; all but one of its callers are __init functions, and the non-__init caller is for PCI device setup. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge	d8c52063ed	x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop Convert the unconventional loop in io_apic_level_ack_pending() to a conventional for() loop. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:50 -07:00
Jeremy Fitzhardinge	8e13d697fe	x86/ioapic.c: move lost comment to what seems like appropriate place The comment got separated from its subject, so move it to what appears to be the right place, and update to describe the current structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge	83c21bedf6	x86/ioapic.c: remove redundant declaration of irq_pin_list The structure is defined immediately below, so there's no need to forward declare it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge	916a0fe739	x86/ioapic.c: remove #ifdef for 82093AA workaround While no 64-bit hardware will have a version 0x11 I/O APIC which needs the level/edge bug workaround, that's not a particular reason to use CONFIG_X86_32 to #ifdef the code out. Most 32-bit machines will no longer need the workaround either, so the test to see whether it is necessary should be more fine-grained than "32-bit=yes, 64-bit=no". (Also fix formatting of block comment.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge	890aeacf64	x86/ioapic.c: unify __mask_IO_APIC_irq() The main difference between 32 and 64-bit __mask_IO_APIC_irq() does a readback from the I/O APIC to synchronize it. If there's a hardware requirement to do a readback sync after updating an APIC register, then it will be a hardware requrement regardless of whether the kernel is compiled 32 or 64-bit. Unify __mask_IO_APIC_irq() using the 64-bit version which always syncs with io_apic_sync(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:49 -07:00
Jeremy Fitzhardinge	2f210deba9	x86/ioapic.c: ioapic_modify_irq is too large to inline If ioapic_modify_irq() is marked inline, it gets inlined several times. Un-inlining it saves around 200 bytes in .text for me. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:48 -07:00
Jeremy Fitzhardinge	6b2b171a77	x86/acpi: acpi_parse_madt_ioapic_entries: remove redundant braces We don't put braces around a single statement. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-07-14 13:32:48 -07:00
Dave Jones	2ad76643ff	x86: Fix warning in pvclock.c when building 32-bit, I see this .. arch/x86/kernel/pvclock.c:63:7: warning: "__x86_64__" is not defined Signed-off-by: Dave Jones <davej@redhat.com> LKML-Reference: <20090713201437.GA12165@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-07-14 16:25:05 +02:00
Rakib Mullick	7473727be8	x86, apic: Fix false positive section mismatch in numaq_32.c The variable apic_numaq placed in noninit section references the function wakeup_secondary_cpu_via_nmi(), which is in __cpuinit section. Thus causes a section mismatch warning. To avoid such mismatch we mark apic_numaq as __refdata. We were warned by the following warning: WARNING: arch/x86/kernel/built-in.o(.data+0x932c): Section mismatch in reference from the variable apic_numaq to the function .cpuinit.text:wakeup_secondary_cpu_via_nmi() Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <b9df5fa10907120407p6b4f67dtf4d563155488188a@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 11:03:27 +02:00
Rakib Mullick	151586d0f7	x86: Fix false positive section mismatch in es7000_32.c The variable apic_es7000_cluster references the function __cpuinit wakeup_secondary_cpu_via_mip() from a noninit section. So we've been warned by the following warning. To avoid possible collision between init/noninit, its best to mark the variable as __refdata. We were warned by the following warning: LD arch/x86/kernel/apic/built-in.o WARNING: arch/x86/kernel/apic/built-in.o(.data+0x198c): Section mismatch in reference from the variable apic_es7000_cluster to the function .cpuinit.text:wakeup_secondary_cpu_via_mip() Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <b9df5fa10907120404k6279a10ch5e9682432272706f@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 11:03:26 +02:00
Daniel Qarras	f1c6a58121	perf_counter, x86: Extend perf_counter Pentium M support I've attached a patch to remove the Pentium M special casing of EMON and as noticed at least with my Pentium M the hardware PMU now works: Performance counter stats for '/bin/ls /var/tmp': 1.809988 task-clock-msecs # 0.125 CPUs 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 224 page-faults # 0.124 M/sec 1425648 cycles # 787.656 M/sec 912755 instructions # 0.640 IPC Vince suggested that this code was trying to address erratum Y17 in Pentium-M's: http://download.intel.com/support/processors/mobile/pm/sb/25266532.pdf But that erratum (related to IA32_MISC_ENABLES.7) does not affect perfcounters as we dont use this toggle to disable RDPMC and WRMSR/RDMSR access to performance counters. We keep cr4's bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not allowed anyway. Cc: Vince Weaver <vince@deater.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@googlemail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-13 08:46:51 +02:00
Alan Cox	8bdbd962ec	x86/cpu: Clean up various files a bit No code changes except printk levels (although some of the K6 mtrr code might be clearer if there were a few as would splitting out some of the intel cache code). Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-11 11:24:09 +02:00
Huang Weiyi	e90476d3ba	x86: Remove duplicated #include Remove duplicated #include in: arch/x86/kernel/dumpstack.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-11 10:17:08 +02:00
Linus Torvalds	85be928c41	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) perf report: Add "Fractal" mode output - support callchains with relative overhead rate perf_counter tools: callchains: Manage the cumul hits on the fly perf report: Change default callchain parameters perf report: Use a modifiable string for default callchain options perf report: Warn on callchain output request from non-callchain file x86: atomic64: Inline atomic64_read() again x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative() x86: atomic64: Improve atomic64_xchg() x86: atomic64: Export APIs to modules x86: atomic64: Improve atomic64_read() x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP x86: atomic64: Fix unclean type use in atomic64_xchg() x86: atomic64: Make atomic_read() type-safe x86: atomic64: Reduce size of functions x86: atomic64: Improve atomic64_add_return() x86: atomic64: Improve cmpxchg8b() x86: atomic64: Improve atomic64_read() x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too perf report: Annotate variable initialization ...	2009-07-10 14:25:03 -07:00
Yinghai Lu	857fdc53a0	x86/pci: insert ioapic resource before assigning unassigned resources Stephen reported that his DL585 G2 needed noapic after 2.6.22 (?) Dann bisected it down to: commit `30a18d6c3f` Date: Tue Feb 19 03:21:20 2008 -0800 x86: multi pci root bus with different io resource range, on 64-bit It turns out that: 1. that AMD-based systems have two HT chains. 2. BIOS doesn't allocate resources for BAR 6 of devices under 8132 etc 3. that multi-peer-root patch will try to split root resources to peer root resources according to PCI conf of NB 4. PCI core assigns unassigned resources, but they overlap with BARs that are used by ioapic addr of io4 and 8132. The reason: at that point ioapic address are not inserted yet. Solution is to insert ioapic resources into the tree a bit earlier. Reported-by: Stephen Frost <sfrost@snowman.net> Reported-and-Tested-by: dann frazier <dannf@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org Signed-off-by: Jesse Barnes <jbarnes@jbarnes-g45.(none)>	2009-07-10 13:03:14 -07:00
Cyrill Gorcunov	a1b4f1a5b7	x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper We already use a lot of cpu_has_ helpers. Lets do here the same for consistency. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090705160154.GB4791@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 15:58:34 +02:00
Cyrill Gorcunov	9ff8094299	x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090708180353.GH5301@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 13:57:13 +02:00
Peter Zijlstra	984b838ce6	perf_counter: Clean up global vs counter enable Ingo noticed that both AMD and P6 call x86_pmu_disable_counter() on *_pmu_enable_counter(). This is because we rely on the side effect of that call to program the event config but not touch the EN bit. We change that for AMD by having enable_all() simply write the full config in, and for P6 by explicitly coding it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:28:29 +02:00
Peter Zijlstra	9c74fb5086	perf_counter: Fix up P6 PMU details The P6 doesn't seem to support cache ref/hit/miss counts, so we extend the generic hardware event codes to have 0 and -1 mean the same thing as for the generic cache events. Furthermore, it turns out the 0 event does not count (that is, its reported that on PPro it actually does count something), therefore use a event configuration that's specified not to count to disable the counters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:28:27 +02:00
Vince Weaver	11d1578f94	perf_counter: Add P6 PMU support Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to enable/disable both its counters. We use this for the global enable/disable, and clear all config bits (except EN) to disable individual counters. Actual ia32 hardware doesn't support lfence, so use a locked op without side-effect to implement a full barrier. perf stat and perf record seem to function correctly. [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code] Signed-off-by: Vince Weaver <vince@deater.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-10 10:28:26 +02:00
Andi Kleen	a2d32bcbc0	x86: mce: macros to compute banks MSRs Instead of open coded calculations for bank MSRs hide the indexing of higher banks MCE register MSRs in new macros. No semantic changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:47 -07:00
Andi Kleen	cebe182033	x86: mce: Move per bank data in a single datastructure This addresses one of the leftover review comments. Move the per bank data into a single structure. This avoids several separate variables and also separate allocation of sysfs objects. I didn't move the CMCI ownership information so far because that would have needed some non trivial changes in the algorithms. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:47 -07:00
Andi Kleen	9eda8cb3ac	x86: mce: Move code in mce.c Now that the X86_OLD_MCE ifdefs are gone move some code that used to be outside the big ifdef to a more natural place near its user. No code change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:47 -07:00
Andi Kleen	c1ebf83561	x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE Drop the CONFIG_X86_NEW_MCE symbol and change all references to it to check for CONFIG_X86_MCE directly. No code changes Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:47 -07:00
Andi Kleen	5bb38adcb5	x86: mce: Remove old i386 machine check code As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE This patch only removes code. The ancient machine check code for very old systems that are not supported by CONFIG_X86_NEW_MCE is still kept. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-09 18:39:46 -07:00
Tejun Heo	023bf6f1b8	linker script: unify usage of discard definition Discarded sections in different archs share some commonality but have considerable differences. This led to linker script for each arch implementing its own /DISCARD/ definition, which makes maintaining tedious and adding new entries error-prone. This patch makes all linker scripts to move discard definitions to the end of the linker script and use the common DISCARDS macro. As ld uses the first matching section definition, archs can include default discarded sections by including them earlier in the linker script. ia64 is notable because it first throws away some ia64 specific subsections and then include the rest of the sections into the final image, so those sections must be discarded before the inclusion. defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64, alpha, sparc, sparc64 and s390. Michal Simek tested microblaze. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Tested-by: Michal Simek <monstr@monstr.eu> Cc: linux-arch@vger.kernel.org Cc: Michal Simek <monstr@monstr.eu> Cc: microblaze-uclinux@itee.uq.edu.au Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Tony Luck <tony.luck@intel.com>	2009-07-09 11:27:40 +09:00
Joe Perches	ad361c9884	Remove multiple KERN_ prefixes from printk formats Commit `5fd29d6ccb` ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:30:03 -07:00
Mark Langsdorf	a2e1b4c312	[CPUFREQ] Powernow-k8: support family 0xf with 2 low p-states Provide support for family 0xf processors with 2 P-states below the elevator voltage. Remove the checks that prevent this configuration from being supported and increase the transition voltage to prevent errors during the transition. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-07-06 21:38:29 -04:00
Linus Torvalds	faf80d62e4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix usage of bios intcall() x86: Remove unused function lapic_watchdog_ok() x86: Remove unused variable disable_x2apic x86, kvm: Fix section mismatches in kvm.c x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user x86: Fix fixmap page order for FIX_TEXT_POKE0,1 amd-iommu: set evt_buf_size correctly amd-iommu: handle alias entries correctly in init code x86: Fix printk call in print_local_apic() x86: Declare check_efer() before it gets used x86: Mark device_nb as static and fix NULL noise x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1 xen: Use kcalloc() in xen_init_IRQ() x86: Fix fixmap ordering x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c	2009-07-06 17:45:44 -07:00
Peter Oberparleiter	f386c61fe1	gcov: exclude code operating in userspace from profiling Fix for this issue on x86_64: rostedt@goodmis.org wrote: > On bootup of the latest kernel my init segfaults. Debugging it, > I found that vread_tsc (a vsyscall) increments some strange > kernel memory: > > 0000000000000000 <vread_tsc>: > 0: 55 push %rbp > 1: 48 ff 05 00 00 00 00 incq 0(%rip) > # 8 <vread_tsc+0x8> > 4: R_X86_64_PC32 .bss+0x3c > 8: 48 89 e5 mov %rsp,%rbp > b: 66 66 90 xchg %ax,%ax > e: 48 ff 05 00 00 00 00 incq 0(%rip) > # 15 <vread_tsc+0x15> > 11: R_X86_64_PC32 .bss+0x44 > 15: 66 66 90 xchg %ax,%ax > 18: 48 ff 05 00 00 00 00 incq 0(%rip) > # 1f <vread_tsc+0x1f> > 1b: R_X86_64_PC32 .bss+0x4c > 1f: 0f 31 rdtsc > > > Those "incq" is very bad to happen in vsyscall memory, since > userspace can not modify it. You need to make something prevent > profiling of vsyscall memory (like I do with ftrace). Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-06 13:57:03 -07:00
Ingo Molnar	e3d0e69268	x86: Further clean up of mtrr/generic.c Yinghai noticed that i defined BIOS_BUG_MSG but added no usage for it. The usage is to clean up this turd in generic.c: printk(KERN_WARNING "WARNING: BIOS bug: VAR MTRR %d " "contains strange UC entry under 1M, check " "with your system vendor!\n", i); Breaking printk lines in the middle looks ugly, is hard to read and breaks 'git grep'. Use the BIOS_BUG_MSG instead. Also complete the moving of structure definitions and variables to the top of the file. Reported-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-05 09:46:10 +02:00
Jaswinder Singh Rajput	dbd51be026	x86: Clean up mtrr/main.c Fix following trivial style problems: ERROR: trailing whitespace X 25 WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h> ERROR: do not initialise externals to 0 or NULL X 2 ERROR: "foo * bar" should be "foo *bar" X 5 ERROR: do not use assignment in if condition X 2 WARNING: line over 80 characters X 8 ERROR: return is not a function, parentheses are not required WARNING: braces {} are not necessary for any arm of this statement ERROR: space required before the open parenthesis '(' X 2 ERROR: open brace '{' following function declarations go on the next line ERROR: space required after that ',' (ctx:VxV) X 8 ERROR: space required before the open parenthesis '(' X 3 ERROR: else should follow close brace '}' WARNING: space prohibited between function name and open parenthesis '(' WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable X 2 Also use pr_debug and pr_warning where possible. total: 50 errors, 14 warnings arch/x86/kernel/cpu/mtrr/main.o: text data bss dec hex filename 3668 116 4156 7940 1f04 main.o.before 3668 116 4156 7940 1f04 main.o.after md5: e01af2fd28deef77c8d01e71acfbd365 main.o.before.asm e01af2fd28deef77c8d01e71acfbd365 main.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> Cc: Avi Kivity <avi@redhat.com> # Avi, please have a look at the kvm_para.h bit [ More cleanups ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:19:55 +02:00
Jaswinder Singh Rajput	09b22c85d5	x86: Clean up mtrr/state.c Fix: WARNING: Use #include <linux/io.h> instead of <asm/io.h> WARNING: line over 80 characters X 4 arch/x86/kernel/cpu/mtrr/state.o: text data bss dec hex filename 864 0 0 864 360 state.o.before 864 0 0 864 360 state.o.after md5: c5c4364b9aeac74d70111e1e49667a2c state.o.before.asm c5c4364b9aeac74d70111e1e49667a2c state.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ More cleanups ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:19:53 +02:00
Jaswinder Singh Rajput	3ec8dbcb09	x86: Clean up mtrr/mtrr.h Fix: ERROR: do not use C99 // comments ERROR: "foo * bar" should be "foo *bar" X 2 Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ More tidyups ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:19:52 +02:00
Jaswinder Singh Rajput	26dc67eda1	x86: Clean up mtrr/if.c Fix: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> ERROR: trailing whitespace X 7 ERROR: trailing statements should be on next line X 3 WARNING: line over 80 characters X 5 ERROR: space required before the open parenthesis '(' arch/x86/kernel/cpu/mtrr/if.o: text data bss dec hex filename 2239 4 0 2243 8c3 if.o.before 2239 4 0 2243 8c3 if.o.after md5: 78d1f2aa4843ec6509c18e2dee54bc7f if.o.before.asm 78d1f2aa4843ec6509c18e2dee54bc7f if.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ More cleanups to make the code more consistent. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:19:48 +02:00
Jaswinder Singh Rajput	a1a499a399	x86: Clean up mtrr/generic.c Fix following trivial style problems: ERROR: trailing whitespace X 4 WARNING: Use #include <linux/io.h> instead of <asm/io.h> WARNING: braces {} are not necessary for single statement blocks X 3 ERROR: "foo * bar" should be "foo bar" WARNING: line over 80 characters X 6 ERROR: "foo bar" should be "foo *bar" ERROR: spaces required around that '=' (ctx:VxO) ERROR: space required before that '-' (ctx:OxV) WARNING: suspect code indent for conditional statements (8, 12) ERROR: spaces required around that '=' (ctx:VxV) ERROR: do not initialise statics to 0 or NULL ERROR: space prohibited after that open parenthesis '(' X 2 ERROR: space prohibited before that close parenthesis ')' X 2 ERROR: trailing statements should be on next line ERROR: return is not a function, parentheses are not required Also use pr_debug and pr_warning where possible. arch/x86/kernel/cpu/mtrr/generic.o: text data bss dec hex filename 5652 77 4224 9953 26e1 generic.o.before 5652 77 4220 9949 26dd generic.o.after The md5 changed: b34d6c045f06daa4ed092b90cc760e8f generic.o.before.asm a490c6251cfd8442fbffecc0e09a573d generic.o.after.asm Because mtrr_state moved from data to bss, changing its offsets - and also because __LINE__ numbers changed. Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ Further cleanups to make the code more consistent ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:10:47 +02:00
Jaswinder Singh Rajput	2311037708	x86: Clean up mtrr/cyrix.c Fix trivial style problems: WARNING: Use #include <linux/io.h> instead of <asm/io.h> WARNING: line over 80 characters ERROR: do not initialise statics to 0 or NULL ERROR: space prohibited after that open parenthesis '(' X 2 ERROR: space prohibited before that close parenthesis ')' X 2 ERROR: trailing whitespace X 2 ERROR: trailing statements should be on next line ERROR: do not use C99 // comments X 2 arch/x86/kernel/cpu/mtrr/cyrix.o: text data bss dec hex filename 1637 32 8 1677 68d cyrix.o.before 1637 32 8 1677 68d cyrix.o.after md5: 6f52abd06905be3f4cabb5239f9b0ff0 cyrix.o.before.asm 6f52abd06905be3f4cabb5239f9b0ff0 cyrix.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ Made the code more consistent ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:10:47 +02:00
Jaswinder Singh Rajput	63f9600fad	x86: Clean up mtrr/cleanup.c Fix trivial style problems: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h> Also, nr_mtrr_spare_reg should be unsigned long. arch/x86/kernel/cpu/mtrr/cleanup.o: text data bss dec hex filename 6241 8992 2056 17289 4389 cleanup.o.before 6241 8992 2056 17289 4389 cleanup.o.after The md5 has changed: 1a7a27513aef1825236daf29110fe657 cleanup.o.before.asm bcea358efa2532b6020e338e158447af cleanup.o.after.asm Because a WARN_ON()'s __LINE__ value changed by 3 lines. Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ Did lots of other cleanups to make the code look more consistent. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:10:46 +02:00
Jaswinder Singh Rajput	6c4caa1ab7	x86: Clean up mtrr/centaur.c Remove dead code and fix trivial style problems: ERROR: trailing whitespace X 2 WARNING: line over 80 characters X 3 ROR: trailing whitespace ERROR: do not use C99 // comments X 2 arch/x86/kernel/cpu/mtrr/centaur.o: text data bss dec hex filename 605 32 68 705 2c1 centaur.o.before 605 32 68 705 2c1 centaur.o.after md5: a4865ea98ce3c163bb1d376a3949b3e3 centaur.o.before.asm a4865ea98ce3c163bb1d376a3949b3e3 centaur.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ Standardized comments, DocBook, curly braces, newlines. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:10:45 +02:00
Jaswinder Singh Rajput	42204455f1	x86: Clean up mtrr/amd.c: Fix trivial style problems : ERROR: trailing whitespace WARNING: line over 80 characters ERROR: do not use C99 // comments arch/x86/kernel/cpu/mtrr/amd.o: text data bss dec hex filename 501 32 0 533 215 amd.o.before 501 32 0 533 215 amd.o.after md5: 62f795eb840ee2d17b03df89e789e76c amd.o.before.asm 62f795eb840ee2d17b03df89e789e76c amd.o.after.asm Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090703164225.GA21447@elte.hu> [ Also restructured comments to be standard, removed stray return, converted function description to DocBook style, etc. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:10:45 +02:00
Ingo Molnar	d7e57676e3	Merge branch 'linus' into x86/cleanups Merge reason: We were on an older pre-rc1 base, move to almost-rc2. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-04 11:00:42 +02:00
Tejun Heo	a530b79586	percpu: teach large page allocator about NUMA Large page first chunk allocator is primarily used for NUMA machines; however, its NUMA handling is extremely simplistic. Regardless of their proximity, each cpu is put into separate large page just to return most of the allocated space back wasting large amount of vmalloc space and increasing cache footprint. This patch teachs NUMA details to large page allocator. Given processor proximity information, pcpu_lpage_build_unit_map() will find fitting cpu -> unit mapping in which cpus in LOCAL_DISTANCE share the same large page and not too much virtual address space is wasted. This greatly reduces the unit and thus chunk size and wastes much less address space for the first chunk. For example, on 4/4 NUMA machine, the original code occupied 16MB of virtual space for the first chunk while the new code only uses 4MB - one 2MB page for each node. [ Impact: much better space efficiency on NUMA machines ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net>	2009-07-04 08:11:00 +09:00
Tejun Heo	8c4bfc6e88	x86,percpu: generalize lpage first chunk allocator Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-07-04 08:10:59 +09:00
Tejun Heo	d4b95f8039	x86,percpu: generalize 4k first chunk allocator Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk(). setup_pcpu_4k() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for consistency. [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-07-04 08:10:59 +09:00
Tejun Heo	788e5abc54	percpu: drop @unit_size from embed first chunk allocator The only extra feature @unit_size provides is making dead space at the end of the first chunk which doesn't have any valid usecase. Drop the parameter. This will increase consistency with generalized 4k allocator. James Bottomley spotted missing conversion for the default setup_per_cpu_areas() which caused build breakage on all arcsh which use it. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Ingo Molnar <mingo@elte.hu>	2009-07-04 08:10:58 +09:00
Tejun Heo	c43768cbb7	Merge branch 'master' into for-next Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h	2009-07-04 07:13:18 +09:00
Ingo Molnar	22a26e6663	Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2009-07-03 14:35:02 +02:00
Jaswinder Singh Rajput	c7210e1ff8	x86: Remove unused function lapic_watchdog_ok() lapic_watchdog_ok() is a global function but no one is using it. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1246554335.2242.29.camel@jaswinder.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-03 14:34:31 +02:00
Jaswinder Singh Rajput	23d0cd8e71	x86: Remove unused variable disable_x2apic setup_nox2apic() is writing 1 to disable_x2apic but no one is reading it. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1246554239.2242.27.camel@jaswinder.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-03 14:34:27 +02:00
Rakib Mullick	d3ac88157c	x86, kvm: Fix section mismatches in kvm.c The function paravirt_ops_setup() has been refering the variable no_timer_check, which is a __initdata. Thus generates the following warning. paravirt_ops_setup() function is called from kvm_guest_init() which is a __init function. So to fix this we mark paravirt_ops_setup as __init. The sections-check output that warned us about this was: LD arch/x86/built-in.o WARNING: arch/x86/built-in.o(.text+0x166ce): Section mismatch in reference from the function paravirt_ops_setup() to the variable .init.data:no_timer_check The function paravirt_ops_setup() references the variable __initdata no_timer_check. This is often because paravirt_ops_setup lacks a __initdata annotation or the annotation of no_timer_check is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Acked-by: Avi Kivity <avi@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <b9df5fa10907012240y356427b8ta4bd07f0efc6a049@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-03 14:34:22 +02:00
Linus Torvalds	405d7ca515	Merge git://git.infradead.org/iommu-2.6 * git://git.infradead.org/iommu-2.6: (38 commits) intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable() intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops intel-iommu: Use cmpxchg64_local() for setting PTEs intel-iommu: Warn about unmatched unmap requests intel-iommu: Kill superfluous mapping_lock intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386 intel-iommu: Make iommu=pt work on i386 too intel-iommu: Performance improvement for dma_pte_free_pagetable() intel-iommu: Don't free too much in dma_pte_free_pagetable() intel-iommu: dump mappings but don't die on pte already set intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping() intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg() intel-iommu: Simplify __intel_alloc_iova() intel-iommu: Performance improvement for domain_pfn_mapping() intel-iommu: Performance improvement for dma_pte_clear_range() intel-iommu: Clean up iommu_domain_identity_map() intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument intel-iommu: Change aligned_size() to aligned_nrpages() intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping() ...	2009-07-02 16:51:09 -07:00
Yinghai Lu	7c5371c403	x86: add boundary check for 32bit res before expand e820 resource to alignment fix hang with HIGHMEM_64G and 32bit resource. According to hpa and Linus, use (resource_size_t)-1 to fend off big ranges. Analyzed by hpa Reported-and-tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-02 12:11:12 -07:00
Joerg Roedel	1bc6f83813	amd-iommu: set evt_buf_size correctly The setting of this variable got lost during the suspend/resume implementation. But keeping this variable zero causes a divide-by-zero error in the interrupt handler. This patch fixes this. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-07-02 18:32:05 +02:00
Joerg Roedel	7a6a3a086f	amd-iommu: handle alias entries correctly in init code An alias entry in the ACPI table means that the device can send requests to the IOMMU with both device ids, its own and the alias. This is not handled properly in the ACPI init code. This patch fixes the issue. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-07-02 12:23:23 +02:00
Ingo Molnar	251e1e44b9	x86: Fix printk call in print_local_apic() Instead of this: [ 75.690022] <7>printing local APIC contents on CPU#0/0: [ 75.704406] ... APIC ID: 00000000 (0) [ 75.707905] ... APIC VERSION: 00060015 [ 75.722551] ... APIC TASKPRI: 00000000 (00) [ 75.725473] ... APIC PROCPRI: 00000000 [ 75.728592] ... APIC LDR: 00000001 [ 75.742137] ... APIC SPIV: 000001ff [ 75.744101] ... APIC ISR field: [ 75.746648] 0123456789abcdef0123456789abcdef [ 75.746649] <7>00000000000000000000000000000000 Improve the code to be saner and simpler and just print out the bitfield in a single line using hexa values - not as a (rather pointless) binary bitfield. Partially reused Linus's initial fix for this. Reported-and-Tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A4C43BC.90506@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-02 08:54:08 +02:00
Frederic Weisbecker	0406ca6d8e	perf_counter: Ignore the nmi call frames in the x86-64 backtraces About every callchains recorded with perf record are filled up including the internal perfcounter nmi frame: perf_callchain perf_counter_overflow intel_pmu_handle_irq perf_counter_nmi_handler notifier_call_chain atomic_notifier_call_chain notify_die do_nmi nmi We want ignore this frame as it's not interesting for instrumentation. To solve this, we simply ignore every frames from nmi context. New example of "perf report -s sym -c" after this patch: 9.59% [k] search_by_key 4.88% search_by_key reiserfs_read_locked_inode reiserfs_iget reiserfs_lookup do_lookup __link_path_walk path_walk do_path_lookup user_path_at vfs_fstatat vfs_lstat sys_newlstat system_call_fastpath __lxstat 0x406fb1 3.19% search_by_key search_by_entry_key reiserfs_find_entry reiserfs_lookup do_lookup __link_path_walk path_walk do_path_lookup user_path_at vfs_fstatat vfs_lstat sys_newlstat system_call_fastpath __lxstat 0x406fb1 [...] For now this patch only solves the problem in x86-64. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 22:37:23 +02:00
David Woodhouse	3238c0c4d6	intel-iommu: Make iommu=pt work on i386 too Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-07-01 18:56:16 +01:00
Jaswinder Singh Rajput	b25ae679f6	x86: Mark device_nb as static and fix NULL noise This sparse warning: arch/x86/kernel/amd_iommu.c:1195:23: warning: symbol 'device_nb' was not declared. Should it be static? triggers because device_nb is global but is only used in a single .c file. change device_nb to static to fix that - this also addresses the sparse warning. This sparse warning: arch/x86/kernel/amd_iommu.c:1766:10: warning: Using plain integer as NULL pointer triggers because plain integer 0 is used in place of a NULL pointer. change 0 to NULL to fix that - this also address the sparse warning. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <1246458194.6940.20.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 16:52:53 +02:00
Linus Torvalds	55bcab4695	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits) perf report: Add --symbols parameter perf report: Add --comms parameter perf report: Add --dsos parameter perf_counter tools: Adjust only prelinked symbol's addresses perf_counter: Provide a way to enable counters on exec perf_counter tools: Reduce perf stat measurement overhead/skew perf stat: Use percentages for scaling output perf_counter, x86: Update x86_pmu after WARN() perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run perf stat: Improve output perf stat: Fix multi-run stats perf stat: Add -n/--null option to run without counters perf_counter tools: Remove dead code perf_counter: Complete counter swap perf report: Print sorted callchains per histogram entries perf_counter tools: Prepare a small callchain framework perf record: Fix unhandled io return value perf_counter tools: Add alias for 'l1d' and 'l1i' perf-report: Add bare minimum PERF_EVENT_READ parsing perf-report: Add modes for inherited stats and no-samples ...	2009-06-30 19:02:59 -07:00
Linus Torvalds	e717f33e98	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "x86: cap iomem_resource to addressable physical memory"	2009-06-29 09:42:01 -07:00
Yinghai Lu	4078c444cf	perf_counter, x86: Update x86_pmu after WARN() The print out should read the value before changing the value. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A487017.4090007@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-29 10:19:25 +02:00
Linus Torvalds	8326e284f8	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, delay: tsc based udelay should have rdtsc_barrier x86, setup: correct include file in <asm/boot.h> x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h> x86, mce: percpu mcheck_timer should be pinned x86: Add sysctl to allow panic on IOCK NMI error x86: Fix uv bau sending buffer initialization x86, mce: Fix mce resume on 32bit x86: Move init_gbpages() to setup_arch() x86: ensure percpu lpage doesn't consume too much vmalloc space x86: implement percpu_alloc kernel parameter x86: fix pageattr handling for lpage percpu allocator and re-enable it x86: reorganize cpa_process_alias() x86: prepare setup_pcpu_lpage() for pageattr fix x86: rename remap percpu first chunk allocator to lpage x86: fix duplicate free in setup_pcpu_remap() failure path percpu: fix too lazy vunmap cache flushing x86: Set cpu_llc_id on AMD CPUs	2009-06-28 11:05:28 -07:00
H. Peter Anvin	ff8a4bae45	Revert "x86: cap iomem_resource to addressable physical memory" This reverts commit `95ee14e437`. Mikael Petterson <mikepe@it.uu.se> reported that at least one of his systems will not boot as a result. We have ruled out the detection algorithm malfunctioning, so it is not a matter of producing the incorrect bitmasks; rather, something in the application of them fails. Revert the commit until we can root cause and correct this problem. -stable team: this means the underlying commit should be rejected. Reported-and-isolated-by: Mikael Petterson <mikpe@it.uu.se> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <200906261559.n5QFxJH8027336@pilspetsen.it.uu.se> Cc: stable@kernel.org Cc: Grant Grundler <grundler@parisc-linux.org>	2009-06-28 09:38:47 +02:00
Hidetoshi Seto	5be6066a7f	x86, mce: percpu mcheck_timer should be pinned If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might be migrated on other cpu. Use add_timer_on() instead. Avoids the following failure: Maciej Rutecki wrote: > > After normal boot I try: > > > > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval > > > > I found this in dmesg: > > > > [ 141.704025] ------------[ cut here ]------------ > > [ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102 > > mcheck_timer+0xf5/0x100() Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-25 13:33:02 -07:00
Kurt Garloff	5211a242d0	x86: Add sysctl to allow panic on IOCK NMI error This patch introduces a new sysctl: /proc/sys/kernel/panic_on_io_nmi which defaults to 0 (off). When enabled, the kernel panics when the kernel receives an NMI caused by an IO error. The IO error triggered NMI indicates a serious system condition, which could result in IO data corruption. Rather than contiuing, panicing and dumping might be a better choice, so one can figure out what's causing the IO error. This could be especially important to companies running IO intensive applications where corruption must be avoided, e.g. a bank's databases. [ SuSE has been shipping it for a while, it was done at the request of a large database vendor, for their users. ] Signed-off-by: Kurt Garloff <garloff@suse.de> Signed-off-by: Roberto Angelino <robertangelino@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <20090624213211.GA11291@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 22:06:11 +02:00
Peter Zijlstra	194002b274	perf_counter, x86: Add mmap counter read support Update the mmap control page with the needed information to use the userspace RDPMC instruction for self monitoring. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 21:39:06 +02:00
Linus Torvalds	0c26d7cc31	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (72 commits) asus-laptop: remove EXPERIMENTAL dependency asus-laptop: use pr_fmt and pr_<level> eeepc-laptop: cpufv updates eeepc-laptop: sync eeepc-laptop with asus_acpi asus_acpi: Deprecate in favor of asus-laptop acpi4asus: update MAINTAINER and KConfig links asus-laptop: platform dev as parent for led and backlight eeepc-laptop: enable camera by default ACPI: Rename ACPI processor device bus ID acerhdf: Acer Aspire One fan control ACPI: video: DMI workaround broken Acer 7720 BIOS enabling display brightness ACPI: run ACPI device hot removal in kacpi_hotplug_wq ACPI: Add the reference count to avoid unloading ACPI video bus twice ACPI: DMI to disable Vista compatibility on some Sony laptops ACPI: fix a deadlock in hotplug case Show the physical device node of backlight class device. ACPI: pdc init related memory leak with physical CPU hotplug ACPI: pci_root: remove unused dev/fn information ACPI: pci_root: simplify list traversals ACPI: pci_root: use driver data rather than list lookup ...	2009-06-24 10:17:07 -07:00
Cliff Wickman	9c26f52b90	x86: Fix uv bau sending buffer initialization The initialization of the UV Broadcast Assist Unit's sending buffers was making an invalid assumption about the initialization of an MMR that defines its address. The BIOS will not be providing that MMR. So uv_activation_descriptor_init() should unconditionally set it. Tested on UV simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> # for v2.6.30.x LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 17:33:58 +02:00
Yong Wang	c14dab5c07	perf_counter, x86: Set global control MSR correctly Previous code made an assumption that the power on value of global control MSR has enabled all fixed and general purpose counters properly. However, this is not the case for certain Intel processors, such as Atom - and it might also be firmware dependent. Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the enable bits for all privilege levels in the respective IA32_PERFEVTSELx or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of respective counters. Counting is enabled if the AND'ed results is true; counting is disabled when the result is false. The end result is that all fixed counters are always disabled on Atom processors because the assumption is just invalid. Fix this by not initializing the ctrl-mask out of the global MSR, but setting it to perf_counter_mask. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-24 10:51:24 +02:00
Tejun Heo	245b2e70ea	percpu: clean up percpu variable definitions Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>	2009-06-24 15:13:48 +09:00
Tejun Heo	204fba4aa3	percpu: cleanup percpu array definitions Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>	2009-06-24 15:13:45 +09:00
Len Brown	fbe8cddd2d	Merge branches 'acerhdf', 'acpi-pci-bind', 'bjorn-pci-root', 'bugzilla-12904', 'bugzilla-13121', 'bugzilla-13396', 'bugzilla-13533', 'bugzilla-13612', 'c3_lock', 'hid-cleanups', 'misc-2.6.31', 'pdc-leak-fix', 'pnpacpi', 'power_nocheck', 'thinkpad_acpi', 'video' and 'wmi' into release	2009-06-24 01:19:50 -04:00
Weidong Han	f007e99c8e	Intel-IOMMU, intr-remap: source-id checking To support domain-isolation usages, the platform hardware must be capable of uniquely identifying the requestor (source-id) for each interrupt message. Without source-id checking for interrupt remapping , a rouge guest/VM with assigned devices can launch interrupt attacks to bring down anothe guest/VM or the VMM itself. This patch adds source-id checking for interrupt remapping, and then really isolates interrupts for guests/VMs with assigned devices. Because PCI subsystem is not initialized yet when set up IOAPIC entries, use read_pci_config_byte to access PCI config space directly. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-06-23 22:09:17 +01:00
Pekka J Enberg	854c879f5a	x86: Move init_gbpages() to setup_arch() The init_gbpages() function is conditionally called from init_memory_mapping() function. There are two call-sites where this 'after_bootmem' condition can be true: setup_arch() and mem_init() via pci_iommu_alloc(). Therefore, it's safe to move the call to init_gbpages() to setup_arch() as it's always called before mem_init(). This removes an after_bootmem use - paving the way to remove all uses of that state variable. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 10:33:32 +02:00
Linus Torvalds	687d680985	Merge git://git.infradead.org/~dwmw2/iommu-2.6.31 * git://git.infradead.org/~dwmw2/iommu-2.6.31: intel-iommu: Fix one last ia64 build problem in Pass Through Support VT-d: support the device IOTLB VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps VT-d: add device IOTLB invalidation support VT-d: parse ATSR in DMA Remapping Reporting Structure PCI: handle Virtual Function ATS enabling PCI: support the ATS capability intel-iommu: dmar_set_interrupt return error value intel-iommu: Tidy up iommu->gcmd handling intel-iommu: Fix tiny theoretical race in write-buffer flush. intel-iommu: Clean up handling of "caching mode" vs. IOTLB flushing. intel-iommu: Clean up handling of "caching mode" vs. context flushing. VT-d: fix invalid domain id for KVM context flush Fix !CONFIG_DMAR build failure introduced by Intel IOMMU Pass Through Support Intel IOMMU Pass Through Support Fix up trivial conflicts in drivers/pci/{intel-iommu.c,intr_remapping.c}	2009-06-22 21:38:22 -07:00
Ingo Molnar	b7f797cb60	Merge branch 'for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into x86/urgent	2009-06-22 10:24:43 +02:00
Tejun Heo	0017c869dd	x86: ensure percpu lpage doesn't consume too much vmalloc space On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage percpu first chunk allocator can consume too much of vmalloc space. Make it fall back to 4k allocator if the consumption goes over 20%. [ Impact: add sanity check for lpage percpu first chunk allocator ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	fa8a7094ba	x86: implement percpu_alloc kernel parameter According to Andi, it isn't clear whether lpage allocator is worth the trouble as there are many processors where PMD TLB is far scarcer than PTE TLB. The advantage or disadvantage probably depends on the actual size of percpu area and specific processor. As performance degradation due to TLB pressure tends to be highly workload specific and subtle, it is difficult to decide which way to go without more data. This patch implements percpu_alloc kernel parameter to allow selecting which first chunk allocator to use to ease debugging and testing. While at it, make sure all the failure paths report why something failed to help determining why certain allocator isn't working. Also, kill the "Great future plan" comment which had already been realized quite some time ago. [ Impact: allow explicit percpu first chunk allocator selection ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	e59a1bb2fd	x86: fix pageattr handling for lpage percpu allocator and re-enable it lpage allocator aliases a PMD page for each cpu and returns whatever is unused to the page allocator. When the pageattr of the recycled pages are changed, this makes the two aliases point to the overlapping regions with different attributes which isn't allowed and known to cause subtle data corruption in certain cases. This can be handled in simliar manner to the x86_64 highmap alias. pageattr code should detect if the target pages have PMD alias and split the PMD alias and synchronize the attributes. pcpur allocator is updated to keep the allocated PMD pages map sorted in ascending address order and provide pcpu_lpage_remapped() function which binary searches the array to determine whether the given address is aliased and if so to which address. pageattr is updated to use pcpu_lpage_remapped() to detect the PMD alias and split it up as necessary from cpa_process_alias(). Jan Beulich spotted the original problem and incorrect usage of vaddr instead of laddr for lookup. With this, lpage percpu allocator should work correctly. Re-enable it. [ Impact: fix subtle lpage pageattr bug and re-enable lpage ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	0ff2587fd5	x86: prepare setup_pcpu_lpage() for pageattr fix Make the following changes in preparation of coming pageattr updates. * Define and use array of struct pcpul_ent instead of array of pointers. The only difference is ->cpu field which is set but unused yet. * Rename variables according to the above change. * Rename local variable vm to pcpul_vm and move it out of the function. [ Impact: no functional difference ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	97c9bf0618	x86: rename remap percpu first chunk allocator to lpage The "remap" allocator remaps large pages to build the first chunk; however, the name isn't very good because 4k allocator remaps too and the whole point of the remap allocator is using large page mapping. The allocator will be generalized and exported outside of x86, rename it to lpage before that happens. percpu_alloc kernel parameter is updated to accept both "remap" and "lpage" for lpage allocator. [ Impact: code cleanup, kernel parameter argument updated ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	c5806df923	x86: fix duplicate free in setup_pcpu_remap() failure path In the failure path, setup_pcpu_remap() tries to free the area which has already been freed to make holes in the large page. Fix it. [ Impact: fix duplicate free in failure path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Jaswinder Singh Rajput	d9f2a5ecb2	perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD Fix AMD's Data Cache Refills from System event. After this patch : ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null Performance counter stats for 'ls /dev/': 2499484 L1-data-Cache-Load-Referencees (scaled from 3.97%) 70347 L1-data-Cache-Load-Misses (scaled from 7.30%) 9360 L1-data-Cache-Store-Referencees (scaled from 8.64%) 32804 L1-data-Cache-Prefetch-Referencees (scaled from 17.72%) 7693 L1-data-Cache-Prefetch-Misses (scaled from 22.97%) 2180945 L1-instruction-Cache-Load-Referencees (scaled from 28.48%) 14518 L1-instruction-Cache-Load-Misses (scaled from 35.00%) 2405 L1-instruction-Cache-Prefetch-Referencees (scaled from 34.89%) 71387 L2-Cache-Load-Referencees (scaled from 34.94%) 18732 L2-Cache-Load-Misses (scaled from 34.92%) 79918 L2-Cache-Store-Referencees (scaled from 36.02%) 1295294 Data-TLB-Cache-Load-Referencees (scaled from 35.99%) 30896 Data-TLB-Cache-Load-Misses (scaled from 33.36%) 1222030 Instruction-TLB-Cache-Load-Referencees (scaled from 29.46%) 357 Instruction-TLB-Cache-Load-Misses (scaled from 20.46%) 530888 Branch-Cache-Load-Referencees (scaled from 11.48%) 8638 Branch-Cache-Load-Misses (scaled from 5.09%) 0.011295149 seconds time elapsed. Earlier it always shows value 0. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-21 13:25:55 +02:00
Andreas Herrmann	99bd0c0fc4	x86: Set cpu_llc_id on AMD CPUs This counts when building sched domains in case NUMA information is not available. ( See cpu_coregroup_mask() which uses llc_shared_map which in turn is created based on cpu_llc_id. ) Currently Linux builds domains as follows: (example from a dual socket quad-core system) CPU0 attaching sched-domain: domain 0: span 0-7 level CPU groups: 0 1 2 3 4 5 6 7 ... CPU7 attaching sched-domain: domain 0: span 0-7 level CPU groups: 7 0 1 2 3 4 5 6 Ever since that is borked for multi-core AMD CPU systems. This patch fixes that and now we get a proper: CPU0 attaching sched-domain: domain 0: span 0-3 level MC groups: 0 1 2 3 domain 1: span 0-7 level CPU groups: 0-3 4-7 ... CPU7 attaching sched-domain: domain 0: span 4-7 level MC groups: 7 4 5 6 domain 1: span 0-7 level CPU groups: 4-7 0-3 This allows scheduler to assign tasks to cores on different sockets (i.e. that don't share last level cache) for performance reasons. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20090619085909.GJ5218@alberich.amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-21 10:13:32 +02:00
Borislav Petkov	a95436e44a	x86, mce: use atomic_inc_return() instead of add by 1 Use atomic_inc_return() instead of atomic_add_return() by 1. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-20 23:28:22 -07:00
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
Linus Torvalds	1eb51c33b2	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling	2009-06-20 10:57:40 -07:00
Linus Torvalds	b0b7065b64	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually	2009-06-20 10:56:46 -07:00
Linus Torvalds	c4c5ab3089	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits) x86, mce: fix error path in mce_create_device() x86: use zalloc_cpumask_var for mce_dev_initialized x86: fix duplicated sysfs attribute x86: de-assembler-ize asm/desc.h i386: fix/simplify espfix stack switching, move it into assembly i386: fix return to 16-bit stack from NMI handler x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present x86: Remove duplicated #include's x86: msr.h linux/types.h is only required for __KERNEL__ x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround x86, mce: mce_intel.c needs <asm/apic.h> x86: apic/io_apic.c: dmar_msi_type should be static x86, io_apic.c: Work around compiler warning x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present x86: mce: Handle banks == 0 case in K7 quirk x86, boot: use .code16gcc instead of .code16 x86: correct the conversion of EFI memory types x86: cap iomem_resource to addressable physical memory x86, mce: rename _64.c files which are no longer 64-bit-specific x86, mce: mce.h cleanup ... Manually fix up trivial conflict in arch/x86/mm/fault.c	2009-06-20 10:49:48 -07:00
Jaswinder Singh Rajput	feaa0457ec	x86: ds.c fix invalid assignment Fixes the type mixups that cause the following sparse warnings: CHECK arch/x86/kernel/ds.c arch/x86/kernel/ds.c:549:19: warning: incorrect type in argument 2 (invalid types) arch/x86/kernel/ds.c:549:19: expected bad type enum bts_field field arch/x86/kernel/ds.c:549:19: got int arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:514:35: error: incompatible types for operation () arch/x86/kernel/ds.c:514:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:514:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:514:7: error: invalid assignment arch/x86/kernel/ds.c:520:35: error: incompatible types for operation () arch/x86/kernel/ds.c:520:35: left side has type unsigned char static [unsigned] [toplevel] sizeof_ptr_field arch/x86/kernel/ds.c:520:35: right side has type bad type enum bts_field field arch/x86/kernel/ds.c:520:7: error: invalid assignment arch/x86/kernel/ds.c:520:35: error: incompatible types for operation () Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Markus Metzger <markus.t.metzger@intel.com> LKML-Reference: <1245494740.8613.12.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-20 17:53:13 +02:00
Ingo Molnar	1d99100120	Merge branch 'x86/mce3' into x86/urgent	2009-06-20 10:54:22 +02:00
Pallipadi, Venkatesh	7b768f07dc	ACPI: pdc init related memory leak with physical CPU hotplug arch_acpi_processor_cleanup_pdc() in x86 and ia64 results in memory allocated for _PDC objects that is never freed and will cause memory leak in case of physical CPU remove and add. Patch fixes the memory leak by freeing the objects soon after _PDC is evaluated. Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-06-20 00:50:52 -04:00
Peter Zijlstra	f9188e023c	perf_counter: Make callchain samples extensible Before exposing upstream tools to a callchain-samples ABI, tidy it up to make it more extensible in the future: Use markers in the IP chain to denote context, use (u64)-1..-4095 range for these context markers because we use them for ERR_PTR(), so these addresses are unlikely to be mapped. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 13:42:34 +02:00
Steven Rostedt	71e308a239	function-graph: add stack frame test In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:40:18 -04:00
Peter Oberparleiter	7bf99fb673	gcov: enable GCOV_PROFILE_ALL for x86_64 Enable gcov profiling of the entire kernel on x86_64. Required changes include disabling profiling for: * arch/kernel/acpi/realmode and arch/kernel/boot/compressed: not linked to main kernel * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet: profiling causes segfaults during boot (incompatible context) Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:58 -07:00
Hidetoshi Seto	b1f49f9582	x86, mce: fix error path in mce_create_device() Don't skip removing mce_attrs in route from error2. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-18 07:02:32 -07:00
Yinghai Lu	e92fae064a	x86: use zalloc_cpumask_var for mce_dev_initialized We need a cleared cpu_mask to record if mce is initialized, especially when MAXSMP is used. used zalloc_... instead Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:47:18 -07:00
Yinghai Lu	74b602c714	x86: fix duplicated sysfs attribute The sysfs attribute cmci_disabled was accidentall turned into a duplicate of ignore_ce, breaking all other attributes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:43:16 -07:00
Alexander van Heukelum	bc3f5d3dbd	x86: de-assembler-ize asm/desc.h asm/desc.h is included in three assembly files, but the only macro it defines, GET_DESC_BASE, is never used. This patch removes the includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard from asm/desc.h. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:10 -07:00
Alexander van Heukelum	dc4c2a0aed	i386: fix/simplify espfix stack switching, move it into assembly The espfix code triggers if we have a protected mode userspace application with a 16-bit stack. On returning to userspace, with iret, the CPU doesn't restore the high word of the stack pointer. This is an "official" bug, and the work-around used in the kernel is to temporarily switch to a 32-bit stack segment/pointer pair where the high word of the pointer is equal to the high word of the userspace stackpointer. The current implementation uses THREAD_SIZE to determine the cut-off, but there is no good reason not to use the more natural 64kb... However, implementing this by simply substituting THREAD_SIZE with 65536 in patch_espfix_desc crashed the test application. patch_espfix_desc tries to do what is described above, but gets it subtly wrong if the userspace stack pointer is just below a multiple of THREAD_SIZE: an overflow occurs to bit 13... With a bit of luck, when the kernelspace stackpointer is just below a 64kb-boundary, the overflow then ripples trough to bit 16 and userspace will see its stack pointer changed by 65536. This patch moves all espfix code into entry_32.S. Selecting a 16-bit cut-off simplifies the code. The game with changing the limit dynamically is removed too. It complicates matters and I see no value in it. Changing only the top 16-bit word of ESP is one instruction and it also implies that only two bytes of the ESPFIX GDT entry need to be changed and this can be implemented in just a handful simple to understand instructions. As a side effect, the operation to compute the original ESP from the ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining three instructions have been expanded inline in entry_32.S. impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit stack segment Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:09 -07:00
Alexander van Heukelum	2e04bc7656	i386: fix return to 16-bit stack from NMI handler Returning to a task with a 16-bit stack requires special care: the iret instruction does not restore the high word of esp in that case. The espfix code fixes this, but currently is not invoked on NMIs. This means that a running task gets the upper word of esp clobbered due intervening NMIs. To reproduce, compile and run the following program with the nmi watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can see that the high bits of esp contain garbage, while the low bits are still correct. This patch puts the espfix code back into the NMI code path. The patch is slightly complicated due to the irqtrace infrastructure not being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET. Otherwise, the tail of the normal iret-code is correct for the nmi code path too. To be able to share this code-path, the TRACE_IRQS_IRET was move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but this code explicitly disables interrupts. This short interrupts-off section is now not traced anymore. The return-to-kernel path now always includes the preliminary test to decide if the espfix code should be called. This is never the case, but doing it this way keeps the patch as simple as possible and the few extra instructions should not affect timing in any significant way. #define _GNU_SOURCE #include <stdio.h> #include <sys/types.h> #include <sys/mman.h> #include <unistd.h> #include <sys/syscall.h> #include <asm/ldt.h> int modify_ldt(int func, void ptr, unsigned long bytecount) { return syscall(SYS_modify_ldt, func, ptr, bytecount); } / this is assumed to be usable / #define SEGBASEADDR 0x10000 #define SEGLIMIT 0x20000 / 16-bit segment / struct user_desc desc = { .entry_number = 0, .base_addr = SEGBASEADDR, .limit = SEGLIMIT, .seg_32bit = 0, .contents = 0, / ??? / .read_exec_only = 0, .limit_in_pages = 0, .seg_not_present = 0, .useable = 1 }; int main(void) { setvbuf(stdout, NULL, _IONBF, 0); / map a 64 kb segment / char pointer = mmap((void )SEGBASEADDR, SEGLIMIT+1, PROT_EXEC\|PROT_READ\|PROT_WRITE, MAP_SHARED\|MAP_ANONYMOUS, -1, 0); if (pointer == NULL) { printf("could not map space\n"); return 0; } / write ldt, new mode / int err = modify_ldt(0x11, &desc, sizeof(desc)); if (err) { printf("error modifying ldt: %i\n", err); return 0; } for (int i=0; i<1000; i++) { asm volatile ( "pusha\n\t" "mov %ss, %eax\n\t" / preserve ss:esp / "mov %esp, %ebp\n\t" "push $7\n\t" / index 0, ldt, user mode / "push $65536-4096\n\t" / esp / "lss (%esp), %esp\n\t" / switch to new stack / "push %eax\n\t" / save old ss:esp on new stack / "push %ebp\n\t" "add $1765536, %esp\n\t" /* set high bits / "mov %esp, %edx\n\t" "mov $10000000, %ecx\n\t" / wait... / "1: loop 1b\n\t" / ... a bit / "cmp %esp, %edx\n\t" "je 1f\n\t" "ud2\n\t" / esp changed inexplicably! / "1:\n\t" "sub $1765536, %esp\n\t" /* restore high bits / "lss (%esp), %esp\n\t" / restore old ss:esp */ "popa\n\t"); printf("\rx%ix", i); } return 0; } Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-17 21:35:09 -07:00
Jeremy Fitzhardinge	17950c5b24	x86-64: move clts into batch cpu state updates when preloading fpu When a task is likely to be using the fpu, we preload its state during the context switch, rather than waiting for it to run an fpu instruction. Make sure the clts() happens while we're doing batched fpu state updates to optimise paravirtualized context switches. [ Impact: optimise paravirtual FPU context switch ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-06-17 13:27:58 -07:00
Jeremy Fitzhardinge	16d9dbf0c2	x86-64: move unlazy_fpu() into lazy cpu state part of context switch Make sure that unlazy_fpu()'s stts gets batched along with the other cpu state changes during context switch. (32-bit already does this.) This makes sure it gets batched when running paravirtualized. [ Impact: optimise paravirtual FPU context switch ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-06-17 13:21:26 -07:00
Jeremy Fitzhardinge	2fcddce10f	x86-32: make sure clts is batched during context switch If we're preloading the fpu state during context switch, make sure the clts happens while we're batching the cpu context update, then do the actual __math_state_restore once the updates are flushed. This allows more efficient context switches when running paravirtualized, as all the hypercalls can be folded together into one. [ Impact: optimise paravirtual FPU context switch ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-06-17 13:21:25 -07:00
Jeremy Fitzhardinge	e6e9cac8c3	x86: split out core __math_state_restore Split the core fpu state restoration out into __math_state_restore, which assumes that cr0.TS is clear and that the fpu context has been initialized. This will be used during context switch. There are two reasons this is desireable: - There's a small clarification. When __switch_to() calls math_state_restore, it relies on the fact that tsk_used_math() returns true, and so will never do a blocking init_fpu(). __math_state_restore() does not have (or need) that logic, so the question never arises. - It allows the clts() to be moved earler in __switch_to() so it can be performed while cpu context updates are batched (will be done in a later patch). [ Impact: refactor code to make reuse cleaner; no functional change ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>	2009-06-17 13:21:25 -07:00
Cyrill Gorcunov	3f4c3955ea	x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present Vegard Nossum reported: [ 503.576724] ACPI: Preparing to enter system sleep state S5 [ 503.710857] Disabling non-boot CPUs ... [ 503.716853] Power down. [ 503.717770] ------------[ cut here ]------------ [ 503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du) [ 503.717770] Hardware name: OptiPlex GX100 [ 503.717770] Modules linked in: [ 503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443 [ 503.717770] Call Trace: [ 503.717770] [<c154d327>] ? printk+0x18/0x1a [ 503.717770] [<c1017358>] ? native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c10360fc>] warn_slowpath_common+0x6c/0xc0 [ 503.717770] [<c1017358>] ? native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c1036165>] warn_slowpath_null+0x15/0x20 [ 503.717770] [<c1017358>] native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c1017173>] disconnect_bsp_APIC+0x63/0x100 [ 503.717770] [<c1019e48>] disable_IO_APIC+0xb8/0xc0 [ 503.717770] [<c1214231>] ? acpi_power_off+0x0/0x29 [ 503.717770] [<c1015e55>] native_machine_shutdown+0x65/0x80 [ 503.717770] [<c1015c36>] native_machine_power_off+0x26/0x30 [ 503.717770] [<c1015c49>] machine_power_off+0x9/0x10 [ 503.717770] [<c1046596>] kernel_power_off+0x36/0x40 [ 503.717770] [<c104680d>] sys_reboot+0xfd/0x1f0 [ 503.717770] [<c109daa0>] ? perf_swcounter_event+0xb0/0x130 [ 503.717770] [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120 [ 503.717770] [<c102dfc6>] ? finish_task_switch+0x56/0xd0 [ 503.717770] [<c154da1e>] ? schedule+0x49e/0xb40 [ 503.717770] [<c10444b0>] ? sys_kill+0x70/0x160 [ 503.717770] [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50 [ 503.717770] [<c10dd443>] ? sys_ioctl+0x63/0x70 [ 503.717770] [<c1003024>] sysenter_do_call+0x12/0x22 [ 503.717770] ---[ end trace 8157b5d0ed378f15 ]--- \| \| That's including this commit: \| \| commit `103428e57b` \|Author: Cyrill Gorcunov <gorcunov@openvz.org> \|Date: Sun Jun 7 16:48:40 2009 +0400 \| \| x86, apic: Fix dummy apic read operation together with broken MP handling \| If we have apic disabled we don't even switch to APIC mode and do not calling for connect_bsp_APIC. Though on SMP compiled kernel the native_machine_shutdown does try to write the apic register anyway. Fix it with explicit check if we really should touch apic registers. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090617181322.GG10822@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 20:24:39 +02:00
Peter Zijlstra	60f916dee6	perf_counter: x86: Set the period in the intel overflow handler Commit `9e350de37a` ("perf_counter: Accurate period data") missed a spot, which caused all Intel-PMU samples to have a period of 0. This broke auto-freq sampling. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 19:23:52 +02:00
Huang Weiyi	8653f88ff9	x86: Remove duplicated #include's Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 19:02:35 +02:00
Linus Torvalds	c30938d59e	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c [CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c [CPUFREQ] powernow-k8: get drv data for correct CPU [CPUFREQ] powernow-k8: read P-state from HW [CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[] [CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier() [CPUFREQ] minor correction to cpu-freq documentation [CPUFREQ] powernow-k8.c: mess cleanup [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful [CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0 [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case	2009-06-17 09:51:50 -07:00
Ingo Molnar	813400060f	Merge branch 'x86/urgent' into x86/mce3 Conflicts: arch/x86/kernel/cpu/mcheck/mce_intel.c Merge reason: merge with an urgent-branch MCE fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 18:21:41 +02:00
Prarit Bhargava	fe955e5c79	x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping (cpuid family-6, model-f, stepping-4). Resolves a situation where the NMI would not enable on these processors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: prarit@redhat.com Cc: suresh.b.siddha@intel.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 18:20:39 +02:00
H. Peter Anvin	1bf7b31efa	x86, mce: mce_intel.c needs <asm/apic.h> mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs <asm/apic.h>. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>	2009-06-17 08:31:15 -07:00
Jaswinder Singh Rajput	8f7007aabe	x86: apic/io_apic.c: dmar_msi_type should be static Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 17:16:08 +02:00
Figo.zhang	50a8d4d297	x86, io_apic.c: Work around compiler warning This compiler warning: arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’: arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here Is bogus as 'eu' is always initialized. But annotate it away by initializing the variable, to make it easier for people to notice real warnings. A compiler that sees through this logic will optimize away the initialization. Signed-off-by: Figo.zhang <figo1802@gmail.com> LKML-Reference: <1245248720.3312.27.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 17:13:25 +02:00
Cyrill Gorcunov	5ce4243dce	x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present If APIC was disabled (for some reason) and as result it's not even mapped we should not try to enable thermal interrupts at all. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090615182633.GA7606@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 17:10:22 +02:00
Peter Zijlstra	84599f8a59	sched, x86: Fix cpufreq + sched_clock() TSC scaling For freqency dependent TSCs we only scale the cycles, we do not account for the discrepancy in absolute value. Our current formula is: time = cycles * mult (where mult is a function of the cpu-speed on variable tsc machines) Suppose our current cycle count is 10, and we have a multiplier of 5, then our time value would end up being 50. Now cpufreq comes along and changes the multiplier to say 3 or 7, which would result in our time being resp. 30 or 70. That means that we can observe random jumps in the time value due to frequency changes in both fwd and bwd direction. So what this patch does is change the formula to: time = cycles * frequency + offset And we calculate offset so that time_before == time_after, thereby ridding us of these jumps in time. [ Impact: fix/reduce sched_clock() jumps across frequency changing events ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Chucked-on-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2009-06-17 16:03:54 +02:00
Ingo Molnar	a3d06cc6aa	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/include/asm/kmap_types.h include/linux/mm.h include/asm-generic/kmap_types.h Merge reason: We crossed changes with kmap_types.h cleanups in mainline. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 13:06:17 +02:00
Andi Kleen	203abd67b7	x86: mce: Handle banks == 0 case in K7 quirk Vegard Nossum reported: > I get an MCE-related crash like this in latest linus tree: > > [ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > [ 0.116396] CPU: L2 Cache: 512K (64 bytes/line) > [ 0.120570] mce: CPU supports 0 MCE banks > [ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010 > [ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320 > [ 0.128001] PGD 0 > [ 0.128001] Thread overran stack, or stack corrupted > [ 0.128001] Oops: 0002 [#1] PREEMPT SMP > [ 0.128001] last sysfs file: > [ 0.128001] CPU 0 > [ 0.128001] Modules linked in: > [ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426 > [ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+0x278/0x320 > [ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246 > [ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000 > [ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010 > [ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000 > [ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000 > [ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000 > [ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000 > 00000000000 > [ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > [ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0 > [ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 > [ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff > ff8152a4a0) > [ 0.128001] Stack: > [ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f > 914 > [ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8 > 69c > [ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb > e6e > [ 0.128001] Call Trace: > [ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392 > [ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e > [ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60 > [ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e > [ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5 > [ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b > [ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71 This happens on QEMU which reports MCA capability, but no banks. Without this patch there is a buffer overrun and boot ops because the code would try to initialize the 0 element of a zero length kmalloc() buffer. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20090615125200.GD31969@one.firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 08:59:45 +02:00
Ingo Molnar	cc4949e1fd	Merge branch 'linus' into x86/urgent Merge reason: pull in latest to fix a bug in it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 08:59:10 +02:00
Linus Torvalds	517d08699b	Merge branch 'akpm' * akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: bfin: fix __dev{init,exit} markings fbdev: bfin: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c	2009-06-16 19:50:13 -07:00
Minchan Kim	a9c5695393	use printk_once() in several places There are some places to be able to use printk_once instead of hard coding. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:50 -07:00
Alexey Dobriyan	bb1f17b037	mm: consolidate init_mm definition * create mm/init-mm.c, move init_mm there * remove INIT_MM, initialize init_mm with C99 initializer * unexport init_mm on all arches: init_mm is already unexported on x86. One strange place is some OMAP driver (drivers/video/omap/) which won't build modular, but it's already wants get_vm_area() export. Somebody should look there. [akpm@linux-foundation.org: add missing #includes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:28 -07:00
Arnd Bergmann	08604bd993	time: move PIT_TICK_RATE to linux/timex.h PIT_TICK_RATE is currently defined in four architectures, but in three different places. While linux/timex.h is not the perfect place for it, it is still a reasonable replacement for those drivers that traditionally use asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency. Note that for Alpha, the actual value changed from 1193182UL to 1193180UL. This is unlikely to make a difference, and probably can only improve accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE a few years ago, after which every existing instance was getting changed to 1193182. According to the specification, it should be 1193181.818181... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:27 -07:00
Cliff Wickman	e2a7147640	x86: correct the conversion of EFI memory types This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded in the e820 table as type E820_RESERVED. (This patch replaces one called 'x86: vendor reserved memory type'. This version has been discussed a bit with Peter and Yinghai but not given a final opinion.) Without this patch EFI_RESERVED_TYPE memory reservations may be marked usable in the e820 table. There may be a collision between kernel use and some reserver's use of this memory. (An example use of this functionality is the UV system, which will access extremely large areas of memory with a memory engine that allows a user to address beyond the processor's range. Such areas are reserved in the EFI table by the BIOS. Some loaders have a restricted number of entries possible in the e820 table, hence the need to record the reservations in the unrestricted EFI table.) The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified on the kernel command line. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 17:47:32 -07:00
H. Peter Anvin	95ee14e437	x86: cap iomem_resource to addressable physical memory iomem_resource is by default initialized to -1, which means 64 bits of physical address space if 64-bit resources are enabled. However, x86 CPUs cannot address 64 bits of physical address space. Thus, we want to cap the physical address space to what the union of all CPU can actually address. Without this patch, we may end up assigning inaccessible values to uninitialized 64-bit PCI memory resources. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Martin Mares <mj@ucw.cz> Cc: stable@kernel.org	2009-06-16 17:47:31 -07:00
Hidetoshi Seto	1af0815f96	x86, mce: rename _64.c files which are no longer 64-bit-specific Rename files that are no longer 64bit specific: mce_amd_64.c => mce_amd.c mce_intel_64.c => mce_intel.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:11 -07:00
Hidetoshi Seto	1149e72645	x86, mce: remove therm_throt.h Now all symbols in the header are static. Remove the header. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:09 -07:00
Hidetoshi Seto	8363fc82d3	x86, mce: remove intel_set_thermal_handler() and make intel_thermal_interrupt() static. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	895287c0a6	x86, mce: squash mce_intel.c into therm_throt.c move intel_init_thermal() into therm_throt.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	a65c88dd2c	x86, mce: unify smp_thermal_interrupt Put common functions into therm_throt.c, modify Makefile. unexpected_thermal_interrupt intel_thermal_interrupt smp_thermal_interrupt intel_set_thermal_handler Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	e8ce2c5ee8	x86, mce: unify smp_thermal_interrupt, prepare Let them in same shape. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	5335612a57	x86, mce: unify smp_thermal_interrupt, prepare mce_intel_64 Break smp_thermal_interrupt() into two functions. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:08 -07:00
Hidetoshi Seto	3adacb70d3	x86, mce: unify smp_thermal_interrupt, prepare p4 Remove unused argument regs from handlers, and use inc_irq_stat. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:07 -07:00
Hidetoshi Seto	c697836985	x86, mce: make mce_disabled boolean The mce_disabled on 32bit is a tristate variable [1,0,-1], while 64bit version is boolean [0,1]. This patch makes mce_disabled always boolean, and use mce_p5_enabled to indicate the third state instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:07 -07:00
Hidetoshi Seto	9e55e44e39	x86, mce: unify mce.h There are 2 headers: arch/x86/include/asm/mce.h arch/x86/kernel/cpu/mcheck/mce.h and in the latter small header: #include <asm/mce.h> This patch move all contents in the latter header into the former, and fix all files using the latter to include the former instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:07 -07:00
Hidetoshi Seto	9af43b54ab	x86, mce: sysfs entries for new mce options Add sysfs interface for admins who want to tweak these options without rebooting the system. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:06 -07:00
Hidetoshi Seto	1020bcbcc7	x86, mce: rename static variables around trigger "trigger" is not straight forward name for valiable that holds name of user mode helper program which triggered by machine check events. This patch renames this valiable and kins to more recognizable names. trigger => mce_helper trigger_argv => mce_helper_argv notify_user => mce_need_notify No functional changes. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:06 -07:00
Hidetoshi Seto	4e5b3e690d	x86, mce: add __read_mostly Add __read_mostly to data written during setup. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:05 -07:00
Hidetoshi Seto	7fb06fc967	x86, mce: cleanup mce_start() Simplify interface of mce_start(): - no_way_out = mce_start(no_way_out, &order); + order = mce_start(&no_way_out); Now Monarch and Subjects share same exit(return) in usual path. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:05 -07:00
Hidetoshi Seto	33edbf02a9	x86, mce: don't init timer if !mce_available In mce_cpu_restart, mce_init_timer is called unconditionally. If !mce_available (e.g. mce is disabled), there are no useful work for timer. Stop running it. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-16 16:56:04 -07:00
Huang Ying	184e1fdfea	x86, mce: fix a race condition about mce_callin and no_way_out If one CPU has no_way_out == 1, all other CPUs should have no_way_out == 1. But despite global_nwo is read after mce_callin, global_nwo is updated after mce_callin too. So it is possible that some CPU read global_nwo before some other CPU update global_nwo, so that no_way_out == 1 for some CPU, while no_way_out == 0 for some other CPU. This patch fixes this race condition via moving mce_callin updating after global_nwo updating, with a smp_wmb in between. A smp_rmb is added between their reading too. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>	2009-06-16 16:56:04 -07:00
Linus Torvalds	b3fec0fe35	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...	2009-06-16 13:09:51 -07:00
Ingo Molnar	8a4a6182fd	Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2009-06-16 11:51:24 +02:00
Chris Wright	6a047d8b9e	amd-iommu: resume cleanup Now that enable_iommus() will call iommu_disable() for each iommu, the call to disable_iommus() during resume is redundant. Also, the order for an invalidation is to invalidate device table entries first, then domain translations. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-16 10:19:16 +02:00
Kay Sievers	07e9bb8eeb	Driver Core: x86: add nodename for cpuid and msr drivers. This adds support to the x86 cpuid and msr drivers to report the proper device name to userspace for their devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:25 -07:00
Kay Sievers	d405640539	Driver Core: misc: add nodename support for misc devices. This adds support for misc devices to report their requested nodename to userspace. It also updates a number of misc drivers to provide the needed subdirectory and device name to be used for them. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:25 -07:00
Linus Torvalds	19035e5b5d	Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually	2009-06-15 10:06:19 -07:00
Rusty Russell	8e7c25971b	[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c Remove all old-style cpumask operators, and cpumask_t. Also: get rid of the unused define_siblings function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Tested-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:43 -04:00
Rusty Russell	1ff6e97f1d	[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c cpumask: avoid playing with cpus_allowed in powernow-k8.c It's generally a very bad idea to mug some process's cpumask: it could legitimately and reasonably be changed by root, which could break us (if done before our code) or them (if we restore the wrong value). I did not replace powernowk8_target; it needs fixing, but it grabs a mutex (so no smp_call_function_single here) but Mark points out it can be called multiple times per second, so work_on_cpu is too heavy. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: cpufreq@vger.kernel.org Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Tested-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:43 -04:00
Rusty Russell	e3f996c26f	[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c Impact: don't play with current's cpumask It's generally a very bad idea to mug some process's cpumask: it could legitimately and reasonably be changed by root, which could break us (if done before our code) or them (if we restore the wrong value). Use rdmsr_on_cpu and wrmsr_on_cpu instead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: cpufreq@vger.kernel.org Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:43 -04:00
Rusty Russell	394122ab14	[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c Impact: don't play with current's cpumask It's generally a very bad idea to mug some process's cpumask: it could legitimately and reasonably be changed by root, which could break us (if done before our code) or them (if we restore the wrong value). We use smp_call_function_single: this had the advantage of being more efficient, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: cpufreq@vger.kernel.org Cc: Dominik Brodowski <linux@brodo.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:43 -04:00
Naga Chumbalkar	e15bc4559b	[CPUFREQ] powernow-k8: get drv data for correct CPU Make powernowk8_get() similar to powernowk8_target() and powernowk8_verify() in the way it obtains "powernow_data" for a given CPU. Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:42 -04:00
Naga Chumbalkar	532cfee6ba	[CPUFREQ] powernow-k8: read P-state from HW By definition, "cpuinfo_cur_freq" should report the value from HW. So, don't depend on the cached value. Instead read P-state directly from HW, while taking into account the erratum 311 workaround for Fam 11h processors. Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:42 -04:00
Andrew Morton	b394f1dfc0	[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[] This symbol doesn't need file-global scope. Cc: "Zhang, Rui" <rui.zhang@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Cc: Leo Milano <lmilano@gmx.net> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:42 -04:00
Dave Jones	931db6a32d	[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier() Christoph Hellwig noticed the following potential uninitialised use: > arch/x86/kernel/tsc.c: In function 'time_cpufreq_notifier': > arch/x86/kernel/tsc.c:634: warning: 'dummy' may be used uninitialized in this function > > where we do have CONFIG_SMP set, freq->flags & CPUFREQ_CONST_LOOPS is > true and ref_freq is false. It seems plausable, though the circumstances for hitting it are really low. Nearly all SMP capable cpufreq drivers set CPUFREQ_CONST_LOOPS. powernow-k8 is really the only exception. The older CPUs were typically only ever UP. (powernow-k7 never supported SMP for eg) It's worth fixing regardless, as it cleans up the code. Fix possible uninitialized use of dummy, by just removing it, and making the setting of lpj more obvious. Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:42 -04:00
Luis Henriques	21335d0214	[CPUFREQ] powernow-k8.c: mess cleanup Mess cleanup in powernow_k8_acpi_pst_values() function. Signed-off-by: Luis Henriques <henrix@sapo.pt> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:41 -04:00
Thomas Renninger	86e13684aa	[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0 This doesn't fix anything, but it's expected that a transition latency of 0 could cause trouble in the future. Signed-off-by: Thomas Renninger <trenn@suse.de> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-15 11:49:41 -04:00
Joerg Roedel	09067207f6	amd-iommu: set event buffer head and tail to 0 manually These registers may contain values from previous kernels. So reset them to known values before enable the event buffer again. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-15 16:06:48 +02:00
Peter Zijlstra	74193ef0ec	perf_counter: x86: Fix call-chain support to use NMI-safe methods __copy_from_user_inatomic() isn't NMI safe in that it can trigger the page fault handler which is another trap and its return path invokes IRET which will also close the NMI context. Therefore use a GUP based approach to copy the stack frames over. We tried an alternative solution as well: we used a forward ported version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch that modifies the exception return path to use an open-coded IRET with explicit stack unrolling and TF checking. This didnt work as it interacted with faulting user-space instructions, causing them not to restart properly, which corrupts user-space registers. Solving that would probably involve disassembling those instructions and backtracing the RIP. But even without that, the code was deemed rather complex to the already non-trivial x86 entry assembly code, so instead we went for this GUP based method that does a software-walk of the pagetables. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-15 15:57:53 +02:00
Chris Wright	a8c485bb68	amd-iommu: disable cmd buffer and evt logging before reprogramming iommu The IOMMU spec states that IOMMU behavior may be undefined when the IOMMU registers are rewritten while command or event buffer is enabled. Disable them in IOMMU disable path. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-15 15:53:45 +02:00
Vegard Nossum	722f2a6c87	Merge commit 'linus/master' into HEAD Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:50:49 +02:00
Chris Wright	42a49f965a	amd-iommu: flush domain tlb when attaching a new device When kexec'ing to a new kernel (for example, when crashing and launching a kdump session), the AMD IOMMU may have cached translations. The kexec'd kernel, during initialization, will invalidate the IOMMU device table entries, but not the domain translations. These stale entries can cause a device's DMA to fail, makes it rough to write a dump to disk when the disk controller can't DMA ;-) Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-15 15:42:00 +02:00
Joerg Roedel	61d047be99	x86: disable IOMMUs on kernel crash If the IOMMUs are still enabled when the kexec kernel boots access to the disk is not possible. This is bad for tools like kdump or anything else which wants to use PCI devices. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-15 15:20:40 +02:00
Joerg Roedel	0975904276	amd-iommu: disable IOMMU hardware on shutdown When the IOMMU stays enabled the BIOS may not be able to finish the machine shutdown properly. So disable the hardware on shutdown. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-15 15:20:40 +02:00
Vegard Nossum	2dff440525	kmemcheck: add mm functions With kmemcheck enabled, the slab allocator needs to do this: 1. Tell kmemcheck to allocate the shadow memory which stores the status of each byte in the allocation proper, e.g. whether it is initialized or uninitialized. 2. Tell kmemcheck which parts of memory that should be marked uninitialized. There are actually a few more states, such as "not yet allocated" and "recently freed". If a slab cache is set up using the SLAB_NOTRACK flag, it will never return memory that can take page faults because of kmemcheck. If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still request memory with the __GFP_NOTRACK flag. This does not prevent the page faults from occuring, however, but marks the object in question as being initialized so that no warnings will ever be produced for this object. In addition to (and in contrast to) __GFP_NOTRACK, the __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should not be tracked _because_ it would produce a false positive. Their values are identical, but need not be so in the future (for example, we could now enable/disable false positives with a config option). Parts of this patch were contributed by Pekka Enberg but merged for atomicity. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:03 +02:00
Vegard Nossum	f85612967c	x86: add hooks for kmemcheck The hooks that we modify are: - Page fault handler (to handle kmemcheck faults) - Debug exception handler (to hide pages after single-stepping the instruction that caused the page fault) Also redefine memset() to use the optimized version if kmemcheck is enabled. (Thanks to Pekka Enberg for minimizing the impact on the page fault handler.) As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable the optimized xor code, and rely instead on the generic C implementation in order to avoid false-positive warnings. Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no> [whitespace fixlet] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>	2009-06-15 12:40:02 +02:00
Ingo Molnar	038e836e97	perf_counter, x86: Fix kernel-space call-chains Kernel-space call-chains were trimmed at the first entry because we never processed anything beyond the first stack context. Allow the backtrace to jump from NMI to IRQ stack then to task stack and finally user-space stack. Also calculate the stack and bp variables correctly so that the stack walker does not exit early. We can get deep traces as a result, visible in perf report -D output: 0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1 ... chain: u:2, k:22, nr:24 ..... 0: 0xffffffff815225fd ..... 1: 0xffffffff810ac51c ..... 2: 0xffffffff81018e29 ..... 3: 0xffffffff81523939 ..... 4: 0xffffffff81524b8f ..... 5: 0xffffffff81524bd9 ..... 6: 0xffffffff8105e498 ..... 7: 0xffffffff8152315a ..... 8: 0xffffffff81522c3a ..... 9: 0xffffffff810d9b74 ..... 10: 0xffffffff810dbeec ..... 11: 0xffffffff810dc3fb This is a 22-entries kernel-space chain. (We still only record reliable stack entries.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-15 09:08:08 +02:00
Ingo Molnar	5a6cec3abb	perf_counter, x86: Fix call-chain walking Fix the ptregs variant when we hit user-mode tasks. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-14 22:37:15 +02:00
Thomas Gleixner	507fa3a3d8	x86: hpet: Mark per cpu interrupts IRQF_TIMER to prevent resume failure timer interrupts are excluded from being disabled during suspend. The clock events code manages the disabling of clock events on its own because the timer interrupt needs to be functional before the resume code reenables the device interrupts. The hpet per cpu timers request their interrupt without setting the IRQF_TIMER flag so suspend_device_irqs() disables them as well which results in a fatal resume failure on the boot CPU. Adding IRQF_TIMER to the interupt flags when requesting the hpet per cpu timer interrupts solves the problem. Reported-by: Benjamin S. <sbenni@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Benjamin S. <sbenni@gmx.de> Cc: stable@kernel.org	2009-06-14 18:24:29 +02:00
Jaswinder Singh Rajput	c64b04fe6e	x86, cpu: cpu/proc.c display cache alignment and address sizes for 32 bit 32 bits can also access x86_cache_alignment, x86_phys_bits and x86_virt_bits, make them available to user space just as on 64 bits. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1244921390.11733.30.camel@ht.satnam> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-13 14:00:49 -07:00
Linus Torvalds	a2ee2981ae	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits) x86, mce: Add boot options for corrected errors x86, mce: Fix mce printing x86, mce: fix for mce counters x86, mce: support action-optional machine checks x86, mce: define MCE_VECTOR x86, mce: rename mce_notify_user to mce_notify_irq x86: fix panic with interrupts off (needed for MCE) x86, mce: export MCE severities coverage via debugfs x86, mce: implement new status bits x86, mce: print header/footer only once for multiple MCEs x86, mce: default to panic timeout for machine checks x86, mce: improve mce_get_rip x86, mce: make non Monarch panic message "Fatal machine check" too x86, mce: switch x86 machine check handler to Monarch election. x86, mce: implement panic synchronization x86, mce: implement bootstrapping for machine check wakeups x86, mce: check early in exception handler if panic is needed x86, mce: add table driven machine check grading x86, mce: remove TSC print heuristic x86, mce: log corrected errors when panicing ...	2009-06-13 13:14:51 -07:00
Jaswinder Singh Rajput	f4db43a38f	perf_counter, x86: Update AMD hw caching related event table All AMD models share the same hw caching related event table. Also complete the table with more events. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244835381.2802.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-13 12:58:25 +02:00
Jaswinder Singh Rajput	4d2be1267f	perf_counter, x86: Check old-AMD performance monitoring support AMD supports performance monitoring start from K7 (i.e. family 6), so disable it for earlier AMD CPUs. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244714289.6923.0.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-13 12:58:25 +02:00
Len Brown	c4bf2f372d	ACPI, PCI, x86: move MCFG parsing routine from ACPI to PCI file Move arch/x86/kernel/acpi/boot.c: acpi_parse_mcfg() to arch/x86/pci/mmconfig-shared.c: pci_parse_mcfg() where it is used, and make it static. Move associated globals and helper routine with it. No functional change. This code move is in preparation for SFI support, which will allow the PCI code to find the MCFG table on systems which do not support ACPI. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-06-12 20:50:38 -04:00
Olivier Berger	d023e49118	ACPI: Remove Asus P4B266 from blacklist See http://marc.info/?l=linux-acpi&m=124068823904429&w=2 for discussion Signed-off-by: Olivier Berger <oberger@ouvaton.org> Signed-off-by: Len Brown <len.brown@intel.com>	2009-06-12 20:50:37 -04:00
Len Brown	c636f753b5	ACPI: delete dead acpi_disabled setting code Testing CONFIG_ACPI inside boot.c is a waste of text, since boot.c is built only when CONFIG_ACPI=y Signed-off-by: Len Brown <len.brown@intel.com>	2009-06-12 20:49:50 -04:00
Vegard Nossum	acc6be5405	x86: add save_stack_trace_bp() for tracing from a specific stack frame This will help kmemcheck (and possibly other debugging tools) since we can now simply pass regs->bp to the stack tracer instead of specifying the number of stack frames to skip, which is unreliable if gcc decides to inline functions, etc. Note that this makes the API incomplete for other architectures, but I expect that those can be updated lazily, e.g. when they need it. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-12 23:01:05 +02:00
Linus Torvalds	947ec0b0c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Add empty suspend/resume device irq functions PM/Hibernate: Move NVS routines into a seperate file (v2). PM/Hibernate: Rename disk.c to hibernate.c PM: Separate suspend to RAM functionality from core Driver Core: Rework platform suspend/resume, print warning PM: Remove device_type suspend()/resume() PM/Hibernate: Move memory shrinking to snapshot.c (rev. 2) PM/Suspend: Do not shrink memory before suspend PM: Remove bus_type suspend_late()/resume_early() V2 PM core: rename suspend and resume functions PM: Rename device_power_down/up() PM: Remove unused asm/suspend.h x86: unify power/cpu_(32\|64).c x86: unify power/cpu_(32\|64) copyright notes x86: unify power/cpu_(32\|64) regarding restoring processor state x86: unify power/cpu_(32\|64) regarding saving processor state x86: unify power/cpu_(32\|64) global variables x86: unify power/cpu_(32\|64) headers PM: Warn if interrupts are enabled during suspend-resume of sysdevs PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c	2009-06-12 13:17:27 -07:00
Linus Torvalds	4ddbac9898	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Start documenting HAVE_PERF_COUNTERS requirements perf_counter: Add forward/backward attribute ABI compatibility perf record: Explicity program a default counter perf_counter: Remove PERF_TYPE_RAW special casing perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too powerpc, perf_counter: Fix performance counter event types perf_counter/x86: Add a quirk for Atom processors perf_counter tools: Remove one L1-data alias	2009-06-12 13:16:52 -07:00
Alan Stern	d161630297	PM core: rename suspend and resume functions This patch (as1241) renames a bunch of functions in the PM core. Rather than go through a boring list of name changes, suffice it to say that in the end we have a bunch of pairs of functions: device_resume_noirq dpm_resume_noirq device_resume dpm_resume device_complete dpm_complete device_suspend_noirq dpm_suspend_noirq device_suspend dpm_suspend device_prepare dpm_prepare in which device_X does the X operation on a single device and dpm_X invokes device_X for all devices in the dpm_list. In addition, the old dpm_power_up and device_resume_noirq have been combined into a single function (dpm_resume_noirq). Lastly, dpm_suspend_start and dpm_resume_end are the renamed versions of the former top-level device_suspend and device_resume routines. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2009-06-12 21:32:31 +02:00
Magnus Damm	e39a71ef80	PM: Rename device_power_down/up() Rename the functions performing "_noirq" dev_pm_ops operations from device_power_down() and device_power_up() to device_suspend_noirq() and device_resume_noirq(). The new function names are chosen to show that the functions are responsible for calling the _noirq() versions to finalize the suspend/resume operation. The current function names do not perform power down/up anymore so the names may be misleading. Global function renames: - device_power_down() -> device_suspend_noirq() - device_power_up() -> device_resume_noirq() Static function renames: - suspend_device_noirq() -> __device_suspend_noirq() - resume_device_noirq() -> __device_resume_noirq() Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2009-06-12 21:32:31 +02:00
Jaswinder Singh Rajput	ce4b3c5547	PM/ACPI/x86: Fix sparse warning in arch/x86/kernel/acpi/sleep.c One of the numbers in arch/x86/kernel/acpi/sleep.c is long, but it is not annotated appropriately, so sparese warns about it. Fix that. [rjw: added the changelog.] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2009-06-12 21:32:29 +02:00
Linus Torvalds	6d21491838	Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slab: setup cpu caches later on when interrupts are enabled slab,slub: don't enable interrupts during early boot slab: fix gfp flag in setup_cpu_cache() x86: make zap_low_mapping could be used early irq: slab alloc for default irq_affinity memcg: fix page_cgroup fatal error in FLATMEM	2009-06-12 09:52:30 -07:00
Linus Torvalds	7f3591cfac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...	2009-06-12 09:32:26 -07:00
Linus Torvalds	65d52cc9d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param: module: cleanup FIXME comments about trimming exception table entries. module: trim exception table on init free. module: merge module_alloc() finally uml module: fix uml build process due to this merge x86 module: merge the rest functions with macros x86 module: merge the same functions in module_32.c and module_64.c uvesafb: improve parameter handling. module_param: allow 'bool' module_params to be bool, not just int. module_param: add __same_type convenience wrapper for __builtin_types_compatible_p module_param: split perm field into flags and perm module_param: invbool should take a 'bool', not an 'int' cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK	2009-06-12 09:30:36 -07:00
Linus Torvalds	db8e7f10ed	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Provide _sdata in the vmlinux.lds.S file x86: handle initrd that extends into unusable memory	2009-06-12 09:26:32 -07:00
Rusty Russell	61f4bc83fe	lguest: optimize by coding restore_flags and irq_enable in assembler. The downside of the last patch which made restore_flags and irq_enable check interrupts is that they are now too big to be patched directly into the callsites, so the C versions are always used. But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all the registers. In fact, we don't need any registers in the fast path, so we can do better than this if we actually code them in assembler. The results are in the noise, but since it's about the same amount of code, it's worth applying. 1GB Guest->Host: input(suppressed),output(suppressed) Before: Seconds: 0:16.53 Packets: 377268,753673 Interrupts: 22461,24297 Notifications: 1(5245),21303(732370) Net IRQs triggered: 377023(245),42578(711095) After: Seconds: 0:16.48 Packets: 377289,753673 Interrupts: 22281,24465 Notifications: 1(5245),21296(732377) Net IRQs triggered: 377060(229),42564(711109) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 22:27:03 +09:30
Rusty Russell	5933048c69	module: cleanup FIXME comments about trimming exception table entries. Everyone cut and paste this comment from my original one. We now do it generically, so cut the comments. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Amerigo Wang <amwang@redhat.com>	2009-06-12 21:47:05 +09:30
Amerigo Wang	c398df30d5	module: merge module_alloc() finally As Christoph Hellwig suggested, module_alloc() actually can be unified for i386 and x86_64 (of course, also UML). Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: 'Ingo Molnar' <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 21:47:03 +09:30
Amerigo Wang	0fdc83b950	x86 module: merge the rest functions with macros Merge the rest functions together, with proper preprocessing directives. Finally remove module_{32\|64}.c. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 21:47:01 +09:30
Amerigo Wang	2d5bf28fb9	x86 module: merge the same functions in module_32.c and module_64.c Merge the same functions both in module_32.c and module_64.c into module.c. This is the first step to merge both of them finally. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-12 21:47:00 +09:30
Yong Wang	dff5da6d09	perf_counter/x86: Add a quirk for Atom processors The fixed-function performance counters do not work on current Atom processors. Use the general-purpose ones instead. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090612080855.GA2286@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-12 13:48:32 +02:00
Yinghai Lu	55cd63676e	x86: make zap_low_mapping could be used early Only one cpu is there, just call __flush_tlb for it. Fixes the following boot warning on x86: [ 0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem) [ 0.000000] virtual kernel memory layout: [ 0.000000] fixmap : 0xffe17000 - 0xfffff000 (1952 kB) [ 0.000000] vmalloc : 0xf8615000 - 0xffe15000 ( 120 MB) [ 0.000000] lowmem : 0xc0000000 - 0xf7e15000 ( 894 MB) [ 0.000000] .init : 0xc19a5000 - 0xc1a10000 ( 428 kB) [ 0.000000] .data : 0xc15da4bb - 0xc199af6c (3842 kB) [ 0.000000] .text : 0xc1000000 - 0xc15da4bb (5993 kB) [ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok. [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0() [ 0.000000] Hardware name: System Product Name [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504 [ 0.000000] Call Trace: [ 0.000000] [<c104aa16>] warn_slowpath_common+0x65/0x95 [ 0.000000] [<c104aa58>] warn_slowpath_null+0x12/0x15 [ 0.000000] [<c1073bbe>] smp_call_function_many+0x50/0x1b0 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c1073d4f>] smp_call_function+0x31/0x58 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c104f635>] on_each_cpu+0x26/0x65 [ 0.000000] [<c10374b5>] flush_tlb_all+0x19/0x1b [ 0.000000] [<c1032ab3>] zap_low_mappings+0x4d/0x56 [ 0.000000] [<c15d64b5>] ? printk+0x14/0x17 [ 0.000000] [<c19b42a8>] mem_init+0x23d/0x245 [ 0.000000] [<c19a56a1>] start_kernel+0x17a/0x2d5 [ 0.000000] [<c19a5347>] ? unknown_bootoption+0x0/0x19a [ 0.000000] [<c19a5039>] __init_begin+0x39/0x41 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-12 13:50:24 +03:00
Catalin Marinas	1260866a27	x86: Provide _sdata in the vmlinux.lds.S file _sdata is a common symbol defined by many architectures and made available to the kernel via asm-generic/sections.h. Kmemleak uses this symbol when scanning the data sections. [ Impact: add new global symbol ] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> LKML-Reference: <20090511122105.26556.96593.stgit@pc1117.cambridge.arm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-12 09:21:33 +02:00
Yinghai Lu	12274e96b4	x86: use zalloc_cpumask_var in arch_early_irq_init So we make sure MAXSMP gets a cleared cpumask Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-11 20:04:36 -07:00
Yinghai Lu	8c5dd8f433	x86: handle initrd that extends into unusable memory On a system where system memory (according e820) is not covered by mtrr, mtrr_trim_memory converts a portion of memory to reserved, but bootloader has already put the initrd in that range. Thus, we need to have 64bit to use relocate_initrd too. [ Impact: fix using initrd when mtrr_trim_memory happen ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org	2009-06-11 15:19:13 -07:00
Ingo Molnar	0d5959723e	Merge branch 'linus' into x86/mce3 Conflicts: arch/x86/kernel/cpu/mcheck/mce_64.c arch/x86/kernel/irq.c Merge reason: Resolve the conflicts above. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 23:31:52 +02:00
Linus Torvalds	8a1ca8cedd	Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits) perf_counter: Turn off by default perf_counter: Add counter->id to the throttle event perf_counter: Better align code perf_counter: Rename L2 to LL cache perf_counter: Standardize event names perf_counter: Rename enums perf_counter tools: Clean up u64 usage perf_counter: Rename perf_counter_limit sysctl perf_counter: More paranoia settings perf_counter: powerpc: Implement generalized cache events for POWER processors perf_counters: powerpc: Add support for POWER7 processors perf_counter: Accurate period data perf_counter: Introduce struct for sample data perf_counter tools: Normalize data using per sample period data perf_counter: Annotate exit ctx recursion perf_counter tools: Propagate signals properly perf_counter tools: Small frequency related fixes perf_counter: More aggressive frequency adjustment perf_counter/x86: Fix the model number of Intel Core2 processors perf_counter, x86: Correct some event and umask values for Intel processors ...	2009-06-11 14:01:07 -07:00
Linus Torvalds	b640f042fa	Merge branch 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'topic/slab/earlyboot' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: vgacon: use slab allocator instead of the bootmem allocator irq: use kcalloc() instead of the bootmem allocator sched: use slab in cpupri_init() sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var() memcg: don't use bootmem allocator in setup code irq/cpumask: make memoryless node zero happy x86: remove some alloc_bootmem_cpumask_var calling vt: use kzalloc() instead of the bootmem allocator sched: use kzalloc() instead of the bootmem allocator init: introduce mm_init() vmalloc: use kzalloc() instead of alloc_bootmem() slab: setup allocators earlier in the boot sequence bootmem: fix slab fallback on numa bootmem: use slab if bootmem is no longer available	2009-06-11 12:25:06 -07:00
Linus Torvalds	6cd8e300b4	Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: Prevent overflow in largepages calculation KVM: Disable large pages on misaligned memory slots KVM: Add VT-x machine check support KVM: VMX: Rename rmode.active to rmode.vm86_active KVM: Move "exit due to NMI" handling into vmx_complete_interrupts() KVM: Disable CR8 intercept if tpr patching is active KVM: Do not migrate pending software interrupts. KVM: inject NMI after IRET from a previous NMI, not before. KVM: Always request IRQ/NMI window if an interrupt is pending KVM: Do not re-execute INTn instruction. KVM: skip_emulated_instruction() decode instruction if size is not known KVM: Remove irq_pending bitmap KVM: Do not allow interrupt injection from userspace if there is a pending event. KVM: Unprotect a page if #PF happens during NMI injection. KVM: s390: Verify memory in kvm run KVM: s390: Sanity check on validity intercept KVM: s390: Unlink vcpu on destroy - v2 KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock KVM: s390: use hrtimer for clock wakeup from idle - v2 KVM: s390: Fix memory slot versus run - v3 ...	2009-06-11 10:03:30 -07:00
Yinghai Lu	dad213aeb5	irq/cpumask: make memoryless node zero happy Don't hardcode to node zero for early boot IRQ setup memory allocations. [ penberg@cs.helsinki.fi: minor cleanups ] Cc: Ingo Molnar <mingo@elte.hu> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-11 19:27:08 +03:00
Yinghai Lu	38c7fed2f5	x86: remove some alloc_bootmem_cpumask_var calling Now that we set up the slab allocator earlier, we can get rid of some alloc_bootmem_cpumask_var() calls in boot code. Cc: Ingo Molnar <mingo@elte.hu> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-11 19:27:07 +03:00
Ingo Molnar	940010c5a3	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/kernel/irqinit.c arch/x86/kernel/irqinit_64.c arch/x86/kernel/traps.c arch/x86/mm/fault.c include/linux/sched.h kernel/exit.c	2009-06-11 17:55:42 +02:00
Peter Zijlstra	8be6e8f3c3	perf_counter: Rename L2 to LL cache The top (fastest) and last level (biggest) caches are the most interesting ones, performance wise. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> [ Fixed the Nehalem LL table to LLC Reference/Miss events ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:17 +02:00
Peter Zijlstra	f4dbfa8f31	perf_counter: Standardize event names Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:15 +02:00
Hidetoshi Seto	62fdac5913	x86, mce: Add boot options for corrected errors This patch introduces three boot options (no_cmci, dont_log_ce and ignore_ce) to control handling for corrected errors. The "mce=no_cmci" boot option disables the CMCI feature. Since CMCI is a new feature so having boot controls to disable it will be a help if the hardware is misbehaving. The "mce=dont_log_ce" boot option disables logging for corrected errors. All reported corrected errors will be cleared silently. This option will be useful if you never care about corrected errors. The "mce=ignore_ce" boot option disables features for corrected errors, i.e. polling timer and cmci. All corrected events are not cleared and kept in bank MSRs. Usually this disablement is not recommended, however it will be a help if there are some conflict with the BIOS or hardware monitoring applications etc., that clears corrected events in banks instead of OS. [ And trivial cleanup (space -> tab) for doc is included. ] Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 11:42:18 +02:00
Hidetoshi Seto	77e26cca20	x86, mce: Fix mce printing This patch: - Adds print_mce_head() instead of first flag - Makes the header to be printed always - Stops double printing of corrected errors [ This portion originates from Huang Ying's patch ] Originally-From: Huang Ying <ying.huang@intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: <4A30AC83.5010708@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 11:42:17 +02:00
Linus Torvalds	8623661180	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...	2009-06-10 19:53:40 -07:00
Linus Torvalds	8f40642ad3	Merge branch 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hookup sys_rt_tgsigqueueinfo signals: implement sys_rt_tgsigqueueinfo signals: split do_tkill	2009-06-10 19:50:52 -07:00
Peter Zijlstra	9e350de37a	perf_counter: Accurate period data We currently log hw.sample_period for PERF_SAMPLE_PERIOD, however this is incorrect. When we adjust the period, it will only take effect the next cycle but report it for the current cycle. So when we adjust the period for every cycle, we're always wrong. Solve this by keeping track of the last_period. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 02:39:02 +02:00
Peter Zijlstra	df1a132bf3	perf_counter: Introduce struct for sample data For easy extension of the sample data, put it in a structure. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 02:39:02 +02:00
Linus Torvalds	3f6280ddf2	Merge branch 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) amd-iommu: remove unnecessary "AMD IOMMU: " prefix amd-iommu: detach device explicitly before attaching it to a new domain amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling dma-debug: simplify logic in driver_filter() dma-debug: disable/enable irqs only once in device_dma_allocations dma-debug: use pr_* instead of printk(KERN_* ...) dma-debug: code style fixes dma-debug: comment style fixes dma-debug: change hash_bucket_find from first-fit to best-fit x86: enable GART-IOMMU only after setting up protection methods amd_iommu: fix lock imbalance dma-debug: add documentation for the driver filter dma-debug: add dma_debug_driver kernel command line dma-debug: add debugfs file for driver filter dma-debug: add variables and checks for driver filter dma-debug: fix debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device dma-debug: use sg_dma_len accessor dma-debug: use sg_dma_address accessor instead of using dma_address directly amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS ...	2009-06-10 16:19:14 -07:00
Linus Torvalds	be15f9d63b	Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...	2009-06-10 16:16:27 -07:00
Linus Torvalds	595dc54a1d	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: move rdtsc_barrier() into the TSC vread method	2009-06-10 16:15:59 -07:00
Linus Torvalds	9b29e8228a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clear TS in irq_ts_save() when in an atomic section x86: Detect use of extended APIC ID for AMD CPUs x86: memtest: remove 64-bit division x86, UV: Fix macros for multiple coherency domains x86: Fix non-lazy GS handling in sys_vm86() x86: Add quirk for reboot stalls on a Dell Optiplex 360 x86: Fix UV BAU activation descriptor init	2009-06-10 16:15:14 -07:00
Linus Torvalds	bec706838e	Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, setup: fix comment in the "glove box" code x86, setup: "glove box" BIOS interrupts in the video code x86, setup: "glove box" BIOS interrupts in the MCA code x86, setup: "glove box" BIOS interrupts in the EDD code x86, setup: "glove box" BIOS interrupts in the APM code x86, setup: "glove box" BIOS interrupts in the core boot code x86, setup: "glove box" BIOS calls -- infrastructure	2009-06-10 16:14:41 -07:00
Linus Torvalds	bb7762961d	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: fix system without memory on node0 x86, mm: Fix node_possible_map logic mm, x86: remove MEMORY_HOTPLUG_RESERVE related code x86: make sparse mem work in non-NUMA mode x86: process.c, remove useless headers x86: merge process.c a bit x86: use sparse_memory_present_with_active_regions() on UMA x86: unify 64-bit UMA and NUMA paging_init() x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB x86: Sanity check the e820 against the SRAT table using e820 map only x86: clean up and and print out initial max_pfn_mapped x86/pci: remove rounding quirk from e820_setup_gap() x86, e820, pci: reserve extra free space near end of RAM x86: fix typo in address space documentation x86: 46 bit physical address support on 64 bits x86, mm: fault.c, use printk_once() in is_errata93() x86: move per-cpu mmu_gathers to mm/init.c x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c x86: unify noexec handling x86: remove (null) in /sys kernel_page_tables ...	2009-06-10 16:13:20 -07:00
Linus Torvalds	48c72d1ab4	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode: Simplify vfree() use x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic	2009-06-10 16:13:06 -07:00
Linus Torvalds	5301e0de34	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86_64: fix incorrect comments x86: unify restore_fpu_checking x86_32: introduce restore_fpu_checking()	2009-06-10 15:55:04 -07:00
Linus Torvalds	c44e3ed539	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: cpu_debug: Remove model information to reduce encoding-decoding x86: fixup numa_node information for AMD CPU northbridge functions x86: k8 convert node_to_k8_nb_misc() from a macro to an inline function x86: cacheinfo: complete L2/L3 Cache and TLB associativity field definitions x86/docs: add description for cache_disable sysfs interface x86: cacheinfo: disable L3 ECC scrubbing when L3 cache index is disabled x86: cacheinfo: replace sysfs interface for cache_disable feature x86: cacheinfo: use cached K8 NB_MISC devices instead of scanning for it x86: cacheinfo: correct return value when cache_disable feature is not active x86: cacheinfo: use L3 cache index disable feature only for CPUs that support it	2009-06-10 15:51:15 -07:00
Linus Torvalds	7dc3ca39cb	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, nmi: Use predefined numbers instead of hardcoded one x86: asm/processor.h: remove double declaration x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 x86, mtrr: remove mtrr MSRs double declaration x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap x86: mce: remove duplicated #include x86: msr-index.h remove duplicate MSR C001_0015 declaration x86: clean up arch/x86/kernel/tsc_sync.c a bit x86: use symbolic name for VM86_SIGNAL when used as vm86 default return x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h x86: avoid multiple declaration of kstack_depth_to_print x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used x86: clean up declarations and variables x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static x86 early quirks: eliminate unused function	2009-06-10 15:49:36 -07:00
Linus Torvalds	aa98936e4f	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, 64-bit: ifdef out struct thread_struct::ip x86, 32-bit: ifdef out struct thread_struct::fs x86: clean up alternative.h	2009-06-10 15:49:10 -07:00
Linus Torvalds	82782ca77d	Merge branch 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits) x86, boot: add new generated files to the appropriate .gitignore files x86, boot: correct the calculation of ZO_INIT_SIZE x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN x86, boot: correct sanity checks in boot/compressed/misc.c x86: add extension fields for bootloader type and version x86, defconfig: update kernel position parameters x86, defconfig: update to current, no material changes x86: make CONFIG_RELOCATABLE the default x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB x86: document new bzImage fields x86, boot: make kernel_alignment adjustable; new bzImage fields x86, boot: remove dead code from boot/compressed/head_.S x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits x86, boot: make symbols from the main vmlinux available x86, boot: determine compressed code offset at compile time x86, boot: use appropriate rep string for move and clear x86, boot: zero EFLAGS on 32 bits x86, boot: set up the decompression stack as early as possible x86, boot: straighten out ranges to copy/zero in compressed/head.S x86, boot: stylistic cleanups for boot/compressed/head_64.S ... Fixed trivial conflict in arch/x86/configs/x86_64_defconfig manually	2009-06-10 15:30:41 -07:00
Linus Torvalds	f0d5e12bd4	Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits) x86, apic: Fix dummy apic read operation together with broken MP handling x86, apic: Restore irqs on fail paths x86: Print real IOAPIC version for x86-64 x86: enable_update_mptable should be a macro sparseirq: Allow early irq_desc allocation x86, io-apic: Don't mark pin_programmed early x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled x86, irq: update_mptable needs pci_routeirq x86: don't call read_apic_id if !cpu_has_apic x86, apic: introduce io_apic_irq_attr x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix x86: read apic ID in the !acpi_lapic case x86: apic: Fixmap apic address even if apic disabled x86: display extended apic registers with print_local_APIC and cpu_debug code x86: read apic ID in the !acpi_lapic case x86: clean up and fix setup_clear/force_cpu_cap handling x86: apic: Check rev 3 fadt correctly for physical_apic bit x86/pci: update pirq_enable_irq() to setup io apic routing x86/acpi: move setup io apic routing out of CONFIG_ACPI scope x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() ...	2009-06-10 15:25:41 -07:00
Harald Welte	0fea615e52	CPUFREQ: Mark e_powersaver driver as EXPERIMENTAL and DANGEROUS The e_powersaver driver for VIA's C7 CPU's needs to be marked as DANGEROUS as it configures the CPU to power states that are out of specification. According to Centaur, all systems with C7 and Nano CPU's support the ACPI p-state method. Thus, the acpi-cpufreq driver should be used instead. Signed-off-by: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-10 15:22:44 -07:00
Harald Welte	0de51088e6	CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states using a MSR interface. The Linux driver just never made use of it, since in addition to the check for the EST flag it also checked if the vendor is Intel. Signed-off-by: Harald Welte <HaraldWelte@viatech.com> [ Removed the vendor checks entirely - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-10 15:22:44 -07:00
Peter Zijlstra	bd2b5b1284	perf_counter: More aggressive frequency adjustment Also employ the overflow handler to adjust the frequency, this results in a stable frequency in about 40~50 samples, instead of that many ticks. This also means we can start sampling at a sample period of 1 without running head-first into the throttle. It relies on sched_clock() to accurately measure the time difference between the overflow NMIs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-10 16:55:26 +02:00
Yong Wang	dc81081b2d	perf_counter/x86: Fix the model number of Intel Core2 processors Fix the model number of Intel Core2 processors according to the documentation: Intel Processor Identification with the CPUID Instruction: http://www.intel.com/support/processors/sb/cs-009861.htm Signed-off-by: Yong Wang <yong.y.wang@intel.com> Also-Reported-by: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090610090612.GA26580@ywang-moblin2.bj.intel.com> [ Added two more model numbers suggested by Arnd Bergmann ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-10 13:04:43 +02:00
Andi Kleen	a0861c02a9	KVM: Add VT-x machine check support VT-x needs an explicit MC vector intercept to handle machine checks in the hyper visor. It also has a special option to catch machine checks that happen during VT entry. Do these interceptions and forward them to the Linux machine check handler. Make it always look like user space is interrupted because the machine check handler treats kernel/user space differently. Thanks to Jiang Yunhong for help and testing. Cc: stable@kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 12:27:08 +03:00
Marcelo Tosatti	32f8840064	KVM: use smp_send_reschedule in kvm_vcpu_kick KVM uses a function call IPI to cause the exit of a guest running on a physical cpu. For virtual interrupt notification there is no need to wait on IPI receival, or to execute any function. This is exactly what the reschedule IPI does, without the overhead of function IPI. So use it instead of smp_call_function_single in kvm_vcpu_kick. Also change the "guest_mode" variable to a bit in vcpu->requests, and use that to collapse multiple IPI's that would be issued between the first one and zeroing of guest mode. This allows kvm_vcpu_kick to called with interrupts disabled. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:53 +03:00
Marcelo Tosatti	a90ede7b17	KVM: x86: paravirt skip pit-through-ioapic boot check Skip the test which checks if the PIT is properly routed when using the IOAPIC, aimed at buggy hardware. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-06-10 11:48:24 +03:00
Yong Wang	fecc8ac849	perf_counter, x86: Correct some event and umask values for Intel processors Correct some event and UMASK values according to Intel SDM, in the Nehalem and Atom tables. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090609131553.GA12489@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 16:50:07 +02:00
Andreas Herrmann	42937e81a8	x86: Detect use of extended APIC ID for AMD CPUs Booting a 32-bit kernel on Magny-Cours results in the following panic: ... Using APIC driver default ... Overriding APIC driver with bigsmp ... Getting VERSION: 80050010 Getting VERSION: 80050010 Getting ID: 10000000 Getting ID: ef000000 Getting LVT0: 700 Getting LVT1: 10000 Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0) Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2 Call Trace: [<c05194da>] ? panic+0x38/0xd3 [<c0743102>] ? native_smp_prepare_cpus+0x259/0x31f [<c073b19d>] ? kernel_init+0x3e/0x141 [<c073b15f>] ? kernel_init+0x0/0x141 [<c020325f>] ? kernel_thread_helper+0x7/0x10 The reason is that default_get_apic_id handled extension of local APIC ID field just in case of XAPIC. Thus for this AMD CPU, default_get_apic_id() returns 0 and bigsmp_get_apic_id() returns 16 which leads to the respective kernel panic. This patch introduces a Linux specific feature flag to indicate support for extended APIC id (8 bits instead of 4 bits width) and sets the flag on AMD CPUs if applicable. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: <stable@kernel.org> LKML-Reference: <20090608135509.GA12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-09 15:28:46 +02:00
Yinghai Lu	eaa958402e	cpumask: alloc zeroed cpumask for static cpumask_var_ts These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-06-09 22:30:27 +09:30
Joerg Roedel	e9a22a13c7	amd-iommu: remove unnecessary "AMD IOMMU: " prefix That prefix is already included in the DUMP_printk macro. So there is no need to repeat it in the format string. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 12:01:58 +02:00
Joerg Roedel	71ff3bca2f	amd-iommu: detach device explicitly before attaching it to a new domain This fixes a bug with a device that could not be assigned to a KVM guest because it is still assigned to a dma_ops protection domain. [chrisw: simply remove WARN_ON(), will always fire since dev->driver will be pci-sub] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 11:14:14 +02:00
Joerg Roedel	29150078d7	amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling Handling this event causes device assignment in KVM to fail because the device gets re-attached as soon as the pci-stub registers as the driver for the device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-09 10:54:18 +02:00
Joerg Roedel	d2dd01de99	Merge commit 'tip/core/iommu' into amd-iommu/fixes	2009-06-09 10:50:57 +02:00
Thomas Gleixner	820a644211	perf_counter, x86: Clean up hw_cache_event ids copies Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:42 +02:00
Thomas Gleixner	f86748e91a	perf_counter, x86: Implement generalized cache event types, add AMD support Fill in amd_hw_cache_event_id[] with the AMD CPU specific events, for family 0x0f, 0x10 and 0x11. There's apparently no distinction between load and store events, so we only fill in the load events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 23:10:37 +02:00
Jack Steiner	c4ed3f04ba	x86, UV: Fix macros for multiple coherency domains Fix bug in the SGI UV macros that support systems with multiple coherency domains. The macros used for referencing global MMR (chipset registers) are failing to correctly "or" the NASID (node identifier) bits that reside above M+N. These high bits are supplied automatically by the chipset for memory accesses coming from the processor socket. However, the bits must be present for references to the special global MMR space used to map chipset registers. (See uv_hub.h for more details ...) The bug results in references to invalid/incorrect nodes. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <20090608154405.GA16395@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 18:57:47 +02:00
Ingo Molnar	1123e3ad73	perf_counter: Clean up x86 boot messages Standardize and tidy up all the messages we print during perfcounter initialization. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 12:29:30 +02:00
Thomas Gleixner	ad68922061	perf_counter, x86: Implement generalized cache event types, add Atom support Fill in core2_hw_cache_event_id[] with the Atom model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". ( Note: these are straight from the Intel manuals - not tested yet.) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:27 +02:00
Thomas Gleixner	0312af8416	perf_counter, x86: Implement generalized cache event types, add Core2 support Fill in core2_hw_cache_event_id[] with the Core2 model specific events. The events can be used in all the tools via the -e (--event) parameter, for example "-e l1-misses" or -"-e l2-accesses" or "-e l2-write-misses". Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 11:18:26 +02:00
Figo.zhang	aeef50bc04	x86, microcode: Simplify vfree() use vfree() does its own 'NULL' check, so no need for check before calling it. In v2, remove the stray newline. [ Impact: cleanup ] Signed-off-by: Figo.zhang <figo1802@gmail.com> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> LKML-Reference: <1244385036.3402.11.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:35:11 +02:00
Lubomir Rintel	3aa6b186f8	x86: Fix non-lazy GS handling in sys_vm86() This fixes a stack corruption panic or null dereference oops due to a bad GS in resume_userspace() when returning from sys_vm86() and calling lockdep_sys_exit(). Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR enabled. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:31:23 +02:00
Cyrill Gorcunov	103428e57b	x86, apic: Fix dummy apic read operation together with broken MP handling Ingo Molnar reported that read_apic is buggy novadays: [ 0.000000] Using APIC driver default [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs [ 0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic" [ 0.000000] APIC: disable apic facility [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b() [ 0.000000] Hardware name: HP OmniBook PC Indeed we still rely on apic->read operation for SMP compiled kernel. And instead of disfigure the SMP code with #ifdef we allow to call apic->read. To capture any unexpected results we check for apic->read being called for sane reason via WARN_ON_ONCE but(!) instead of OR we should use AND logical operation (thanks Yinghai for spotting the root of the problem). Along with that we could be have bad MP table and we are to fix it that way no SMP started and no complains about BIOS bug if apic was just disabled via command line. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090607124840.GD4547@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 16:08:05 +02:00
Jean Delvare	4a4aca641b	x86: Add quirk for reboot stalls on a Dell Optiplex 360 The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so the same quirk is needed. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve Conklin <steve.conklin@canonical.com> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Cc: <stable@kernel.org> LKML-Reference: <200906051202.38311.jdelvare@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 15:51:20 +02:00
Jaswinder Singh Rajput	5095f59bda	x86: cpu_debug: Remove model information to reduce encoding-decoding Remove model information, encoding/decoding and reduce bookkeeping. This, besides removing a lot of code and cleaning up the code, also enables these features on many more CPUs that were enumerated before. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1244224637.8212.6.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 12:22:56 +02:00
Ingo Molnar	5f4457a4f6	Merge branch 'linus' into x86/cpu	2009-06-07 12:22:15 +02:00
Ingo Molnar	56fdd18c7b	Merge branch 'linus' into core/iommu Merge reason: This branch was on an -rc5 base so pull almost-2.6.30 to resync with the latest upstream fixes and make sure the combination works fine. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-07 11:35:05 +02:00
Ingo Molnar	75b5032212	Merge branch 'linus' into perfcounters/core Merge reason: Pick up the latest fixes before the -v8 perfcounters release. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 20:21:28 +02:00
Ingo Molnar	8326f44da0	perf_counter: Implement generalized cache event types Extend generic event enumeration with the PERF_TYPE_HW_CACHE method. This is a 3-dimensional space: { L1-D, L1-I, L2, ITLB, DTLB, BPU } x { load, store, prefetch } x { accesses, misses } User-space passes in the 3 coordinates and the kernel provides a counter. (if the hardware supports that type and if the combination makes sense.) Combinations that make no sense produce a -EINVAL. Combinations that are not supported by the hardware produce -ENOTSUP. Extend the tools to deal with this, and rewrite the event symbol parsing code with various popular aliases for the units and access methods above. So 'l1-cache-miss' and 'l1d-read-ops' are both valid aliases. ( x86 is supported for now, with the Nehalem event table filled in, and with Core2 and Atom having placeholder tables. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 13:14:47 +02:00
Ingo Molnar	a21ca2cac5	perf_counter: Separate out attr->type from attr->config Counter type is a frequently used value and we do a lot of bit juggling by encoding and decoding it from attr->config. Clean this up by creating a separate attr->type field. Also clean up the various similarly complex user-space bits all around counter attribute management. The net improvement is significant, and it will be easier to add a new major type (which is what triggered this cleanup). (This changes the ABI, all tools are adapted.) (PowerPC build-tested.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 11:37:22 +02:00
Mark Langsdorf	fe2245c905	x86: enable GART-IOMMU only after setting up protection methods The current code to set up the GART as an IOMMU enables GART translations before it removes the aperture from the kernel memory map, sets the GART PTEs to UC, sets up the guard and scratch pages, or does a wbinvd(). This leaves the possibility of cache aliasing open and can cause system crashes. Re-order the code so as to enable the GART translations only after all safeguards are in place and the tlb has been flushed. AMD has tested this patch on both Istanbul systems and 1st generation Opteron systems with APG enabled and seen no adverse effects. Istanbul systems with HT Assist enabled sometimes see MCE errors due to cache artifacts with the unmodified code. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Cc: <stable@kernel.org> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: akpm@linux-foundation.org Cc: jbarnes@virtuousgeek.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-06 09:42:09 +02:00
Dave Jones	2c701b1028	[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH The powernow-k8 driver checks to see that the Performance Control/Status Registers are declared as FFH (functional fixed hardware) by the BIOS. However, this check got broken in the commit: `0e64a0c982` [CPUFREQ] checkpatch cleanups for powernow-k8 Fix based on an original patch from Naga Chumbalkar. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-06-05 13:25:25 -04:00
Hidetoshi Seto	8051dbd2df	x86, mce: fix for mce counters Make the MCE counters work on 32bit and add poll count in arch_irq_stat_cpu. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:59 -07:00
Andi Kleen	9b1beaf2b5	x86, mce: support action-optional machine checks Newer Intel CPUs support a new class of machine checks called recoverable action optional. Action Optional means that the CPU detected some form of corruption in the background and tells the OS about using a machine check exception. The OS can then take appropiate action, like killing the process with the corrupted data or logging the event properly to disk. This is done by the new generic high level memory failure handler added in a earlier patch. The high level handler takes the address with the failed memory and does the appropiate action, like killing the process. In this version of the patch the high level handler is stubbed out with a weak function to not create a direct dependency on the hwpoison branch. The high level handler cannot be directly called from the machine check exception though, because it has to run in a defined process context to be able to sleep when taking VM locks (it is not expected to sleep for a long time, just do so in some exceptional cases like lock contention) Thus the MCE handler has to queue a work item for process context, trigger process context and then call the high level handler from there. This patch adds two path to process context: through a per thread kernel exit notify_user() callback or through a high priority work item. The first runs when the process exits back to user space, the other when it goes to sleep and there is no higher priority process. The machine check handler will schedule both, and whoever runs first will grab the event. This is done because quick reaction to this event is critical to avoid a potential more fatal machine check when the corruption is consumed. There is a simple lock less ring buffer to queue the corrupted addresses between the exception handler and the process context handler. Then in process context it just calls the high level VM code with the corrupted PFNs. The code adds the required code to extract the failed address from the CPU's machine check registers. It doesn't try to handle all possible cases -- the specification has 6 different ways to specify memory address -- but only the linear address. Most of the required checking has been already done earlier in the mce_severity rule checking engine. Following the Intel recommendations Action Optional errors are only enabled for known situations (encoded in MCACODs). The errors are ignored otherwise, because they are action optional. v2: Improve comment, disable preemption while processing ring buffer (reported by Ying Huang) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:59 -07:00
Andi Kleen	9ff36ee966	x86, mce: rename mce_notify_user to mce_notify_irq Rename the mce_notify_user function to mce_notify_irq. The next patch will split the wakeup handling of interrupt context and of process context and it's better to give it a clearer name for this. Contains a fix from Ying Huang [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:48:04 -07:00
Andi Kleen	4ef702c10b	x86: fix panic with interrupts off (needed for MCE) For some time each panic() called with interrupts disabled triggered the !irqs_disabled() WARN_ON in smp_call_function(), producing ugly backtraces and confusing users. This is a common situation with machine checks for example which tend to call panic with interrupts disabled, but will also hit in other situations e.g. panic during early boot. In fact it means that panic cannot be called in many circumstances, which would be bad. This all started with the new fancy queued smp_call_function, which is then used by the shutdown path to shut down the other CPUs. On closer examination it turned out that the fancy RCU smp_call_function() does lots of things not suitable in a panic situation anyways, like allocating memory and relying on complex system state. I originally tried to patch this over by checking for panic there, but it was quite complicated and the original patch was also not very popular. This also didn't fix some of the underlying complexity problems. The new code in post 2.6.29 tries to patch around this by checking for oops_in_progress, but that is not enough to make this fully safe and I don't think that's a real solution because panic has to be reliable. So instead use an own vector to reboot. This makes the reboot code extremly straight forward, which is definitely a big plus in a panic situation where it is important to avoid relying on too much kernel state. The new simple code is also safe to be called from interupts off region because it is very very simple. There can be situations where it is important that panic is reliable. For example on a fatal machine check the panic is needed to get the system up again and running as quickly as possible. So it's important that panic is reliable and all function it calls simple. This is why I came up with this simple vector scheme. It's very hard to beat in simplicity. Vectors are not particularly precious anymore since all big systems are using per CPU vectors. Another possibility would have been to use an NMI similar to kdump, but there is still the problem that NMIs don't work reliably on some systems due to BIOS issues. NMIs would have been able to stop CPUs running with interrupts off too. In the sake of universal reliability I opted for using a non NMI vector for now. I put the reboot vector into the highest priority bucket of the APIC vectors and moved the 64bit UV_BAU message down instead into the next lower priority. [ Impact: bug fix, fixes an old regression ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:35 -07:00
Huang Ying	4611a6fa4b	x86, mce: export MCE severities coverage via debugfs The MCE severity judgement code is data-driven, so code coverage tools such as gcov can not be used for measuring coverage. Instead a dedicated coverage mechanism is implemented. The kernel keeps track of rules executed and reports them in debugfs. This is useful for increasing coverage of the mce-test testsuite. Right now it's unconditionally enabled because it's very little code. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	ed7290d0ee	x86, mce: implement new status bits The x86 architecture recently added some new machine check status bits: S(ignalled) and AR (Action-Required). Signalled allows to check if a specific event caused an exception or was just logged through CMCI. AR allows the kernel to decide if an event needs immediate action or can be delayed or ignored. Implement support for these new status bits. mce_severity() uses the new bits to grade the machine check correctly and decide what to do. The exception handler uses AR to decide to kill or not. The S bit is used to separate events between the poll/CMCI handler and the exception handler. Classical UC always leads to panic. That was true before anyways because the existing CPUs always passed a PCC with it. Also corrects the rules whether to kill in user or kernel context and how to handle missing RIPV. The machine check handler largely uses the mce-severity grading engine now instead of making its own decisions. This means the logic is centralized in one place. This is useful because it has to be evaluated multiple times. v2: Some rule fixes; Add AO events Fix RIPV, RIPV\|EIPV order (Ying Huang) Fix UCNA with AR=1 message (Ying Huang) Add comment about panicing in m_c_p. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	86503560e4	x86, mce: print header/footer only once for multiple MCEs When multiple MCEs are printed print the "HARDWARE ERROR" header and "This is not a software error" footer only once. This makes the output much more compact with many CPUs. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:34 -07:00
Andi Kleen	29b0f591d6	x86, mce: default to panic timeout for machine checks Fatal machine checks can be logged to disk after boot, but only if the system did a warm reboot. That's unfortunately difficult with the default panic behaviour, which waits forever and the admin has to press the power button because modern systems usually miss a reset button. This clears the machine checks in the registers and make it impossible to log them. This patch changes the default for machine check panic to always reboot after 30s. Then the mce can be successfully logged after reboot. I believe this will improve machine check experience for any system running the X server. This is dependent on successfull boot logging of MCEs. This currently only works on Intel systems, on AMD there are quite a lot of systems around which leave junk in the machine check registers after boot, so it's disabled here. These systems will continue to default to endless waiting panic. v2: Only force panic timeout when it's shorter (H.Seto) v3: Only force timeout when there is no timeout (based on comment H.Seto) [ Fix changelog - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:33 -07:00
Huang Ying	1b2797dcc9	x86, mce: improve mce_get_rip Assume IP on the stack is valid when either EIPV or RIPV are set. This influences whether the machine check exception handler decides to return or panic. This fixes a test case in the mce-test suite and is more compliant to the specification. This currently only makes a difference in a artificial testing scenario with the mce-test test suite. Also in addition do not force the EIPV to be valid with the exact register MSRs, and keep in trust the CS value on stack even if MSR is available. [AK: combination of patches from Huang Ying and Hidetoshi Seto, with new description by me] [add some description, no code changed - HS] Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:33 -07:00
Andi Kleen	ac9603754d	x86, mce: make non Monarch panic message "Fatal machine check" too ... instead of "Machine check". This is for consistency with the Monarch panic message. Based on a report from Ying Huang. v2: But add a descriptive postfix so that the test suite can distingush. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	3c0797925f	x86, mce: switch x86 machine check handler to Monarch election. On Intel platforms machine check exceptions are always broadcast to all CPUs. This patch makes the machine check handler synchronize all these machine checks, elect a Monarch to handle the event and collect the worst event from all CPUs and then process it first. This has some advantages: - When there is a truly data corrupting error the system panics as quickly as possible. This improves containment of corrupted data and makes sure the corrupted data never hits stable storage. - The panics are synchronized and do not reenter the panic code on multiple CPUs (which currently does not handle this well). - All the errors are reported. Currently it often happens that another CPU happens to do the panic first, but reports useless information (empty machine check) because the real error happened on another CPU which came in later. This is a big advantage on Nehalem where the 8 threads per CPU lead to often the wrong CPU winning the race and dumping useless information on a machine check. The problem also occurs in a less severe form on older CPUs. - The system can detect when no CPUs detected a machine check and shut down the system. This can happen when one CPU is so badly hung that that it cannot process a machine check anymore or when some external agent wants to stop the system by asserting the machine check pin. This follows Intel hardware recommendations. - This matches the recommended error model by the CPU designers. - The events can be output in true severity order - When a panic happens on another CPU it makes sure to be actually be able to process the stop IPI by enabling interrupts. The code is extremly careful to handle timeouts while waiting for other CPUs. It can't rely on the normal timing mechanisms (jiffies, ktime_get) because of its asynchronous/lockless nature, so it uses own timeouts using ndelay() and a "SPINUNIT" The timeout is configurable. By default it waits for upto one second for the other CPUs. This can be also disabled. From some informal testing AMD systems do not see to broadcast machine checks, so right now it's always disabled by default on non Intel CPUs or also on very old Intel systems. Includes fixes from Ying Huang Fixed a "ecception" in a comment (H.Seto) Moved global_nwo reset later based on suggestion from H.Seto v2: Avoid duplicate messages [ Impact: feature, fixes long standing problems. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	f94b61c2c9	x86, mce: implement panic synchronization In some circumstances multiple CPUs can enter mce_panic() in parallel. This gives quite confused output because they will all dump the same machine check buffer. The other problem is that they would all panic in parallel, but not process each other's shutdown IPIs because interrupts are disabled. Detect this situation early on in mce_panic(). On the first CPU entering will do the panic, the others will just wait to be killed. For paranoia reasons in case the other CPU dies during the MCE I added a 5 seconds timeout. If it expires each CPU will panic on its own again. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:45:12 -07:00
Andi Kleen	ccc3c3192a	x86, mce: implement bootstrapping for machine check wakeups Machine checks support waking up the mcelog daemon quickly. The original wake up code for this was pretty ugly, relying on a idle notifier and a special process flag. The reason it did it this way is that the machine check handler is not subject to normal interrupt locking rules so it's not safe to call wake_up(). Instead it set a process flag and then either did the wakeup in the syscall return or in the idle notifier. This patch adds a new "bootstraping" method as replacement. The idea is that the handler checks if it's in a state where it is unsafe to call wake_up(). If it's safe it calls it directly. When it's not safe -- that is it interrupted in a critical section with interrupts disables -- it uses a new "self IPI" to trigger an IPI to its own CPU. This can be done safely because IPI triggers are atomic with some care. The IPI is raised once the interrupts are reenabled and can then safely call wake_up(). When APICs are disabled the event is just queued and will be picked up eventually by the next polling timer. I think that's a reasonable compromise, since it should only happen quite rarely. Contains fixes from Ying Huang. [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:44:05 -07:00
Andi Kleen	bd19a5e6b7	x86, mce: check early in exception handler if panic is needed The exception handler should behave differently if the exception is fatal versus one that can be returned from. In the first case it should never clear any registers because these need to be preserved for logging after the next boot. Otherwise it should clear them on each CPU step by step so that other CPUs sharing the same bank don't see duplicate events. Otherwise we risk reporting events multiple times on any CPUs which have shared machine check banks, which is a common problem on Intel Nehalem which has both SMT (two CPU threads sharing banks) and shared machine check banks in the uncore. Determine early in a special pass if any event requires a panic. This uses the mce_severity() function added earlier. This is needed for the next patch. Also fixes a problem together with an earlier patch that corrected events weren't logged on a fatal MCE. [ Impact: Feature ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	817f32d02a	x86, mce: add table driven machine check grading The machine check grading (as in deciding what should be done for a given register value) has to be done multiple times soon and it's also getting more complicated. So it makes sense to consolidate it into a single function. To get smaller and more straight forward and possibly more extensible code I opted towards a new table driven method. The various rules are put into a table when is then executed by a very simple interpreter. The grading engine is in a new file mce-severity.c. I also added a private include file mce-internal.h, because mce.h is already a bit too cluttered. This is dead code right now, but will be used in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	a0189c70e5	x86, mce: remove TSC print heuristic Previously mce_panic used a simple heuristic to avoid printing old so far unreported machine check events on a mce panic. This worked by comparing the TSC value at the start of the machine check handler with the event time stamp and only printing newer ones. This has a couple of issues, in particular on systems where the TSC is not fully synchronized between CPUs it could lose events or print old ones. It is also problematic with full system synchronization as it is added by the next patch. Remove the TSC heuristic and instead replace it with a simple heuristic to print corrected errors first and after that uncorrected errors and finally the worst machine check as determined by the machine check handler. This simplifies the code because there is no need to pass the original TSC value around. Contains fixes from Ying Huang [ Impact: bug fix, cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Ying Huang <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	de8a84d85a	x86, mce: log corrected errors when panicing Normally the machine check handler ignores corrected errors and leaves them to machine_check_poll(). But when panicing mcp won't run, so log all errors. Note: this can still miss some cases until the "early no way out" patch later is applied too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:39 -07:00
Andi Kleen	8ee08347c1	x86, mce: extend struct mce user interface with more information. Experience has shown that struct mce which is used to pass an machine check to the user space daemon currently a few limitations. Also some data which is useful to print at panic level is also missing. This patch addresses most of them. The same information is also printed out together with mce panic. struct mce can be painlessly extended in a compatible way, the mcelog user space code just ignores additional fields with a warning. - It doesn't provide a wall time timestamp. There have been a few complaints about that. Fix that by adding a 64bit time_t - It doesn't provide the exact CPU identification. This makes it awkward for mcelog to decode the event correctly, especially when there are variations in the supported MCE codes on different CPU models or when mcelog is running on a different host after a panic. Previously the administrator had to specify the correct CPU when mcelog ran on a different host, but with the more variation in machine checks now it's better to auto detect that. It's also useful for more detailed analysis of CPU events. Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead. - Socket ID and initial APIC ID are useful to report because they allow to identify the failing CPU in some (not all) cases. This is also especially useful for the panic situation. This addresses one of the complaints from Thomas Gleixner earlier. - The MCG capabilities MSR needs to be reported for some advanced error processing in mcelog Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	d620c67fb9	x86, mce: support more than 256 CPUs in struct mce The old struct mce had a limitation to 256 CPUs. But x86 Linux supports more than that now with x2apic. Add a new field extcpu to report the extended number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	f6fb0ac086	x86, mce: store record length into memory struct mce anchor This makes it easier for tools who want to extract the mcelog out of crash images or memory dumps to adapt to changing struct mce size. The length field replaces padding, so it's fully compatible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	ca84f69697	x86, mce: add MCE poll count to /proc/interrupts Keep a count of the machine check polls (or CMCI events) in /proc/interrupts. Andi needs this for debugging, but it's also useful in general to see what's going in by the kernel. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Andi Kleen	01ca79f141	x86, mce: add machine check exception count in /proc/interrupts Useful for debugging, but it's also good general policy to have a counter for all special interrupts there. This makes it easier to diagnose where a CPU is spending its time. [ Impact: feature, debugging tool ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-03 14:40:38 -07:00
Ingo Molnar	128f048f0f	perf_counter: Fix throttling lock-up Throttling logic is broken and we can lock up with too small hw sampling intervals. Make the throttling code more robust: disable counters even if we already disabled them. ( Also clean up whitespace damage i noticed while reading various pieces of code related to throttling. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 23:39:51 +02:00
Cliff Wickman	0e2595cdfd	x86: Fix UV BAU activation descriptor init The UV tlb shootdown code has a serious initialization error. An array of structures [328] is initialized as if it were [32]. The array is indexed by (cpu number on the blade)8, so the short initialization works for up to 4 cpus on a blade. But above that, we provide an invalid opcode to the hub's broadcast assist unit. This patch changes the allocation of the array to use its symbolic dimensions for better clarity. And initializes all 32*8 entries. Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's recommendation. Tested on the UV simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <E1M6lZR-0007kV-Aq@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 13:07:31 +02:00
Jiri Slaby	367d04c4ec	amd_iommu: fix lock imbalance In alloc_coherent there is an omitted unlock on the path where mapping fails. Add the unlock. [ Impact: fix lock imbalance in alloc_coherent ] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-06-03 10:34:55 +02:00
Yong Wang	a32881066e	perf_counter/x86: Remove the IRQ (non-NMI) handling bits Remove the IRQ (non-NMI) handling bits as NMI will be used always. Signed-off-by: Yong Wang <yong.y.wang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-03 09:53:34 +02:00
Peter Zijlstra	0d48696f87	perf_counter: Rename perf_counter_hw_event => perf_counter_attr The structure isn't hw only and when I read event, I think about those things that fall out the other end. Rename the thing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> Cc: Stephane Eranian <eranian@googlemail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:33 +02:00
Peter Zijlstra	e4abb5d4f7	perf_counter: x86: Emulate longer sample periods Do as Power already does, emulate sample periods up to 2^63-1 by composing them of smaller values limited by hardware capabilities. Only once we wrap the software period do we generate an overflow event. Just 10 lines of new code. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	8a016db386	perf_counter: Remove the last nmi/irq bits IRQ (non-NMI) sampling is not used anymore - remove the last few bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:31 +02:00
Peter Zijlstra	b23f3325ed	perf_counter: Rename various fields A few renames: s/irq_period/sample_period/ s/irq_freq/sample_freq/ s/PERF_RECORD_/PERF_SAMPLE_/ s/record_type/sample_type/ And change both the new sample_type and read_format to u64. Reported-by: Stephane Eranian <eranian@googlemail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-02 21:45:30 +02:00
Jiri Slaby	3d58829b05	x86, apic: Restore irqs on fail paths lapic_resume forgets to restore interrupts on fail paths. Fix that. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com>	2009-06-02 02:48:59 +02:00
Naga Chumbalkar	58f892e022	x86: Print real IOAPIC version for x86-64 Fix the fact that the IOAPIC version number in the x86_64 code path always gets assigned to 0, instead of the correct value. Before the patch: (from "dmesg" output): ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23 <--- After the patch: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 <--- History: io_apic_get_version() was compiled out of the x86_64 code path in the commit `f2c2cca3ac`: Author: Andi Kleen <ak@suse.de> Date: Tue Sep 26 10:52:37 2006 +0200 [PATCH] Remove APIC version/cpu capability mpparse checking/printing ACPI went to great trouble to get the APIC version and CPU capabilities of different CPUs before passing them to the mpparser. But all that data was used was to print it out. Actually it even faked some data based on the boot cpu, not on the actual CPU being booted. Remove all this code because it's not needed. Cc: len.brown@intel.com At the time, the IOAPIC version number was deliberately not printed in the x86_64 code path. However, after the x86 and x86_64 files were merged, the net result is that the IOAPIC version is printed incorrectly in the x86_64 code path. The patch below provides a fix. I have tested it with acpi, and with acpi=off, and did not see any problems. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu> *************************	2009-06-02 02:03:18 +02:00
H. Peter Anvin	48b1fddbb1	Merge branch 'irq/numa' into x86/mce3 Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa and modified in x86/mce3; this merge resolves the conflict. Conflicts: arch/x86/kernel/irqinit.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-06-01 15:25:31 -07:00
Ingo Molnar	ee4c24a5c9	Merge branch 'x86/cpufeature' into irq/numa Merge reason: irq/numa didnt build because this commit: `2759c32`: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 22:30:01 +02:00
Ingo Molnar	3d58f48ba0	Merge branch 'linus' into irq/numa Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 21:06:21 +02:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Joe Perches	61c8c67e3a	acpi-cpufreq: fix printk typo and indentation Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2009-05-29 21:26:26 -04:00
Yong Wang	c323d95fa4	perf_counter/x86: Always use NMI for performance-monitoring interrupt Always use NMI for performance-monitoring interrupt as there could be racy situations if we switch between irq and nmi mode frequently. Signed-off-by: Yong Wang <yong.y.wang@intel.com> LKML-Reference: <20090529052835.GA13657@ywang-moblin2.bj.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-29 09:04:58 +02:00
Hidetoshi Seto	98a9c8c3ba	x86, mce: trivial clean up for mce-inject.c Fix for: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> +#include <asm/uaccess.h> WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc + if (m.cpu >= NR_CPUS \|\| !cpu_online(m.cpu)) ERROR: trailing whitespace +/* $ Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	61a021a070	x86, mce: trivial clean up for mce_intel_64.c Fix for: WARNING: space prohibited between function name and open parenthesis '(' + for_each_online_cpu (cpu) { Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	34fa1967aa	x86, mce: trivial clean up for mce_amd_64.c Fix for followings: WARNING: Use #include <linux/percpu.h> instead of <asm/percpu.h> +#include <asm/percpu.h> ERROR: Macros with multiple statements should be enclosed in a do - while loop +#define THRESHOLD_ATTR(_name, _mode, _show, _store) \ +{ \ + .attr = {.name = __stringify(_name), .mode = _mode }, \ + .show = _show, \ + .store = _store, \ +}; WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc + if (cpu >= NR_CPUS) Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	14a02530e2	x86, mce: trivial clean up for mce.c This fixs following checkpatch warnings: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> +#include <asm/uaccess.h> WARNING: Use #include <linux/smp.h> instead of <asm/smp.h> +#include <asm/smp.h> WARNING: line over 80 characters + set_bit(MCE_OVERFLOW, (unsigned long *)&mcelog.flags); WARNING: braces {} are not necessary for any arm of this statement + if (mce_notify_user()) { [...] + } else { [...] Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:16 -07:00
Hidetoshi Seto	cc3aec52ab	x86, mce: trivial clean up for therm_throt.c This patch removes following checkpatch warning: WARNING: Use #include <linux/cpu.h> instead of <asm/cpu.h> +#include <asm/cpu.h> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Hidetoshi Seto	9319cec8c1	x86, mce: use strict_strtoull Use strict_strtoull instead of simple_strtoull. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	b170204ddb	x86, mce: drop BKL in mce_open BKL is not needed for anything in mce_open because it has an own spinlock. Remove it. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	32561696c2	x86, mce: rename and align out2 label There's only a single out path in do_machine_check now, so rename the label from out2 to out. Also align it at the first column. [ Impact: minor cleanup, no functional changes ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Thomas Gleixner	8be9110569	x86, mce: remove mce_init unused argument Remove unused mce_init argument. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	fc016a49c2	x86, mce: remove unused mce_events variable Remove unused mce_events static variable. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	b56f642d2b	x86, mce: use extended sysattrs for the check_interval attribute. Instead of using own callbacks use the generic ones provided by the sysdev later. This finally allows to get rid of the ugly ACCESSOR macros. Should also save some text size. [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:15 -07:00
Andi Kleen	88921be302	x86, mce: synchronize core after machine check handling The example code in the IA32 SDM recommends to synchronize the CPU after machine check handling. So do that here. [ Impact: Spec compliance ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
H. Peter Anvin	5706001aac	x86, mce: fix comment style in mce-inject.c Fix style of winged comment in mce-inject.c. [ Impact: comment only ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
H. Peter Anvin	a1ff41bfc1	x86, mce: add comment about mce_chrdev_ops being writable Add a comment explaining that mce_chrdev_ops is intentionally writable. [ Impact: comment only ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	ea149b36c7	x86, mce: add basic error injection infrastructure Allow user programs to write mce records into /dev/mcelog. When they do that a fake machine check is triggered to test the machine check code. This uses the MCE MSR wrappers added earlier. The implementation is straight forward. There is a struct mce record per CPU and the MCE MSR accesses get data from there if there is valid data injected there. This allows to test the machine check code relatively realistically because only the lowest layer of hardware access is intercepted. The test suite and injector are available at git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	5f8c1a54ca	x86, mce: add MSR read wrappers for easier error injection This will be used by future patches to allow machine check error injection. Right now it's a nop, except for adding some wrappers around the MSR reads. This is early in the sequence to avoid too many conflicts. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:14 -07:00
Andi Kleen	7856f6cce4	x86, mce: enable MCE_INTEL for 32bit new MCE Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	4efc0670ba	x86, mce: use 64bit machine check code on 32bit The 64bit machine check code is in many ways much better than the 32bit machine check code: it is more specification compliant, is cleaner, only has a single code base versus one per CPU, has better infrastructure for recovery, has a cleaner way to communicate with user space etc. etc. Use the 64bit code for 32bit too. This is the second attempt to do this. There was one a couple of years ago to unify this code for 32bit and 64bit. Back then this ran into some trouble with K7s and was reverted. I believe this time the K7 problems (and some others) are addressed. I went over the old handlers and was very careful to retain all quirks. But of course this needs a lot of testing on old systems. On newer 64bit capable systems I don't expect much problems because they have been already tested with the 64bit kernel. I made this a CONFIG for now that still allows to select the old machine check code. This is mostly to make testing easier, if someone runs into a problem we can ask them to try with the CONFIG switched. The new code is default y for more coverage. Once there is confidence the 64bit code works well on older hardware too the CONFIG_X86_OLD_MCE and the associated code can be easily removed. This causes a behaviour change for 32bit installations. They now have to install the mcelog package to be able to log corrected machine checks. The 64bit machine check code only handles CPUs which support the standard Intel machine check architecture described in the IA32 SDM. The 32bit code has special support for some older CPUs which have non standard machine check architectures, in particular WinChip C3 and Intel P5. I made those a separate CONFIG option and kept them for now. The WinChip variant could be probably removed without too much pain, it doesn't really do anything interesting. P5 is also disabled by default (like it was before) because many motherboards have it miswired, but according to Alan Cox a few embedded setups use that one. Forward ported/heavily changed version of old patch, original patch included review/fixes from Thomas Gleixner, Bert Wesarg. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	d896a940ef	x86, mce: remove oops_begin() use in 64bit machine check First 32bit doesn't have oops_begin, so it's a barrier of using this code on 32bit. On closer examination it turns out oops_begin is not a good idea in a machine check panic anyways. All oops_begin does it so check for recursive/parallel oopses and implement the "wait on oops" heuristic. But there's actually no good reason to lock machine checks against oopses or prevent them from recursion. Also "wait on oops" does not really make sense for a machine check too. Replace it with a manual bust_spinlocks/console_verbose. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:13 -07:00
Andi Kleen	8e97aef5f4	x86, mce: remove machine check handler idle notify on 64bit i386 has no idle notifiers, but the 64bit machine check code uses them to wake up mcelog from a fatal machine check exception. For corrected machine checks found by the poller or threshold interrupts going through an idle notifier is not needed because the wake_up can is just done directly and doesn't need the idle notifier. It is only needed for logging exceptions. To be honest I never liked the idle notifier even though I signed off on it. On closer investigation the code actually turned out to be nearly. Right now machine check exceptions on x86 are always unrecoverable (lead to panic due to PCC), which means we never execute the idle notifier path. The only exception is the somewhat weird tolerant==3 case, which ignores PCC. I'll fix this in a future patch in a much cleaner way. So remove the "mcelog wakeup through idle notifier" code from 64bit. This allows to compile the 64bit machine check handler on 32bit which doesn't have idle notifiers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	d7c3c9a609	x86, mce: move mce_disabled option into common 32bit/64bit code It's the same function, so let's share it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	04b2b1a4df	x86, mce: rename 64bit mce_dont_init to mce_disabled Give it the same name as on 32bit. This makes further merging easier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	5d7279268b	x86, mce: use a call vector to call the 64bit mce handler Allows to call different machine check handlers from the low level machine check entry vector. This is needed for later when it will be used for 32bit too. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	2e6f694fde	x86, mce: port K7 bank 0 quirk to 64bit mce code Various K7 have broken bank 0s. Don't enable it by default Port from the 32bit code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	06b7a7a5ec	x86, mce: implement the PPro bank 0 quirk in the 64bit machine check code Quoting the comment: * SDM documents that on family 6 bank 0 should not be written * because it aliases to another special BIOS controlled * register. * But it's not aliased anymore on model 0x1a+ * Don't ignore bank 0 completely because there could be a valid * event later, merely don't write CTL0. This is mostly a port on the 32bit code, except that 32bit always didn't write it and didn't have the 0x1a heuristic. I checked with the CPU designers that the quirk is not required starting with this model. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Andi Kleen	3cde5c8c83	x86, mce: initial steps to make 64bit mce code 32bit clean Replace unsigned long with u64s if they need to contain 64bit values. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Thomas Gleixner	01c6680a54	x86, mce: Cleanup MCG definitions Decode more magic constants and turn them into symbols. [ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:12 -07:00
Thomas Gleixner	ba2d0f2b0c	x86, mce: Cleanup symbols in intel thermal codes Decode magic constants and turn them into symbols. [ Cleanup to use symbols already exists - HS ] [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	b659294b77	x86, mce: print number of MCE banks The number of MCE banks supported by a CPU is a useful number to know, so print it out during CPU initialization. [ Impact: add printout ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	cb491fca55	x86, mce: Rename sysfs variables Shorten variable names. This also compacts the code a bit. device_mce => mce_dev mce_device_initialized => mce_dev_initialized mce_attribute => mce_attrs [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	dba3725d44	x86, mce: unify move mce_64.c => mce.c and glue it up in the Makefile. Remove mce_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	711c2e481c	x86, mce: unify, prepare for 32-bit v2 Prepare the 64-bit mce_64.c code side to be built on 32-bit. [ includes ifdef relocation by Andi Kleen ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@firstfloor.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Ingo Molnar	a988d334ae	x86, mce: unify, prepare codes Move current 32-bit mce_32.c code into mce_64.c. [ Remove unused artifact stop/restart_mce pointed by Andi Kleen ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@firstfloor.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:11 -07:00
Thomas Gleixner	a65d086235	x86, mce: unify Intel thermal init Mechanic unification. No change in code. [ Impact: cleanup, 32-bit / 64-bit unification ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Thomas Gleixner	6cc6f3ebd1	x86, mce: unify Intel thermal init, prepare Prepare for unification, make two intel_init_thermal equal. [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	1cb2a8e176	x86, mce: clean up mce_amd_64.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	cb6f3c155b	x86, mce: clean up therm_throt.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	bdbfbdd5e8	x86, mce: clean up non-fatal.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	91425084f7	x86, mce: clean up winchip.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	efee4ca809	x86, mce: clean up k7.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:10 -07:00
Ingo Molnar	ea2566ff80	x86, mce: clean up p6.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Ingo Molnar	ed8bc7ed9a	x86, mce: clean up p5.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Ingo Molnar	c5aaf0e070	x86, mce: clean up p4.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Ingo Molnar	3b58dfd04b	x86, mce: clean up mce_32.c Make the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Ingo Molnar	e9eee03e99	x86, mce: clean up mce_64.c This file has been modified many times along the years, by multiple authors, so the general style and structure has diverged in a number of areas making this file hard to read. So fix the coding style match that of the rest of the x86 arch code. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Hidetoshi Seto	13503fa913	x86, mce: Cleanup param parser - Fix the comment formatting. - The error path does not return 0, and printk lacks level and "\n". - Move __setup("nomce") next to mcheck_disable(). - Improve readability etc. [ Impact: cleanup ] Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <49CB3F38.7090703@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-28 09:24:09 -07:00
Joerg Roedel	83cce2b69e	Merge branches 'amd-iommu/fixes', 'amd-iommu/debug', 'amd-iommu/suspend-resume' and 'amd-iommu/extended-allocator' into amd-iommu/2.6.31 Conflicts: arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c	2009-05-28 18:23:56 +02:00
Joerg Roedel	47bccd6bb2	amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS This will test the automatic aperture enlargement code. This is important because only very few devices will ever trigger this code path. So force it under CONFIG_IOMMU_STRESS. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:18:33 +02:00
Joerg Roedel	f5e9705c64	amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS This forces testing of on-demand page table allocation code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:18:08 +02:00
Joerg Roedel	fe16f088a8	amd-iommu: disable round-robin allocator for CONFIG_IOMMU_STRESS Disabling the round-robin allocator results in reusing the same dma-addresses again very fast. This is a good test if the iotlb flushing is working correctly. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:17:13 +02:00
Joerg Roedel	d9cfed9254	amd-iommu: remove amd_iommu_size kernel parameter This parameter is not longer necessary when aperture increases dynamically. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:16:49 +02:00
Joerg Roedel	11b83888ae	amd-iommu: enlarge the aperture dynamically By dynamically increasing the aperture the extended allocator is now ready for use. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:15:57 +02:00
Joerg Roedel	00cd122ae5	amd-iommu: handle exlusion ranges and unity mappings in alloc_new_range This patch makes sure no reserved addresses are allocated in an dma_ops domain when the aperture is increased dynamically. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:15:19 +02:00
Joerg Roedel	9cabe89b99	amd-iommu: move aperture_range allocation code to seperate function This patch prepares the dynamic increasement of dma_ops domain apertures. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:35 +02:00
Joerg Roedel	803b8cb4d9	amd-iommu: change dma_dom->next_bit to dma_dom->next_address Simplify the code a little bit by using the same unit for all address space related state in the dma_ops domain structure. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:26 +02:00
Joerg Roedel	384de72910	amd-iommu: make address allocator aware of multiple aperture ranges This patch changes the AMD IOMMU address allocator to allow up to 32 aperture ranges per dma_ops domain. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:14:15 +02:00
Joerg Roedel	53812c115c	amd-iommu: handle page table allocation failures in dma_ops code The code will be required when the aperture size increases dynamically in the extended address allocator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:13:43 +02:00
Joerg Roedel	8bda3092bc	amd-iommu: move page table allocation code to seperate function This patch makes page table allocation usable for dma_ops code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:13:20 +02:00
Joerg Roedel	c3239567a2	amd-iommu: introduce aperture_range structure This is a preperation for extended address allocator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:12:52 +02:00
Joerg Roedel	736501ee00	amd-iommu: implement suspend/resume This patch puts everything together and enables suspend/resume support in the AMD IOMMU driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:11:39 +02:00
Joerg Roedel	05f92db9f4	amd_iommu: un __init functions required for suspend/resume This patch makes sure that no function required for suspend/resume of AMD IOMMU driver is thrown away after boot. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:56 +02:00
Joerg Roedel	7d7a110c61	amd-iommu: add function to flush tlb for all devices This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:43 +02:00
Joerg Roedel	bfd1be1857	amd-iommu: add function to flush tlb for all domains This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:10:12 +02:00
Joerg Roedel	92ac4320af	amd-iommu: add function to disable all iommus This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:26 +02:00
Joerg Roedel	d91cecdd79	amd-iommu: remove support for msi-x Current hardware uses msi instead of msi-x so this code it not necessary and can not be tested. The best thing is to drop this code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:18 +02:00
Joerg Roedel	fab6afa309	amd-iommu: drop pointless iommu-loop in msi setup code It is not necessary to loop again over all IOMMUs in this code. So drop the loop. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:09:08 +02:00
Joerg Roedel	58492e1288	amd-iommu: consolidate hardware initialization to one function This patch restructures the AMD IOMMU initialization code to initialize all hardware registers with one single function call. This is helpful for suspend/resume support. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:08:58 +02:00
Joerg Roedel	3bd221724a	amd-iommu: introduce for_each_iommu* macros This patch introduces the for_each_iommu and for_each_iommu_safe macros to simplify the developers life when having to iterate over all AMD IOMMUs in the system. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:08:50 +02:00
Chris Wright	c1eee67b2d	amd iommu: properly detach from protection domain on ->remove Some drivers may use the dma api during ->remove which will cause a protection domain to get reattached to a device. Delay the detach until after the driver is completely unbound. [ joro: added a little merge helper ] [ Impact: fix too early device<->domain removal ] Signed-off-by: Chris Wright <chrisw@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:54 +02:00
Joerg Roedel	0bc252f430	amd-iommu: make sure only ivmd entries are parsed The bug never triggered. But it should be fixed to protect against broken ACPI tables in the future. [ Impact: protect against broken ivrs acpi table ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:47 +02:00
Neil Turton	7455aab1f9	amd-iommu: fix the handling of device aliases in the AMD IOMMU driver. The devid parameter to set_dev_entry_from_acpi is the requester ID rather than the device ID since it is used to index the IOMMU device table. The handling of IVHD_DEV_ALIAS used to pass the device ID. This patch fixes it to pass the requester ID. [ Impact: fix setting the wrong req-id in acpi-table parsing ] Signed-off-by: Neil Turton <nturton@solarflare.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:38 +02:00
Neil Turton	421f909c80	amd-iommu: fix an off-by-one error in the AMD IOMMU driver. The variable amd_iommu_last_bdf holds the maximum bdf of any device controlled by an IOMMU, so the number of device entries needed is amd_iommu_last_bdf+1. The function tbl_size used amd_iommu_last_bdf instead. This would be a problem if the last device were a large enough power of 2. [ Impact: fix amd_iommu_last_bdf off-by-one error ] Signed-off-by: Neil Turton <nturton@solarflare.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 18:06:27 +02:00
Joerg Roedel	2e8b569614	amd-iommu: disable device isolation with CONFIG_IOMMU_STRESS With device isolation disabled we can test better for race conditions in dma_ops related code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:56:57 +02:00
Joerg Roedel	b3b99ef8b4	amd-iommu: move protection domain printk to dump code This information is only helpful for debugging. Don't print it anymore unless explicitly requested. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:55:08 +02:00
Joerg Roedel	02acc43a29	amd-iommu: print ivmd information to dmesg when requested Add information about device memory mapping requirements for the IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:53:30 +02:00
Joerg Roedel	42a698f40a	amd-iommu: print ivhd information to dmesg when requested Add information about devices belonging to an IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:52:04 +02:00
Joerg Roedel	9c72041f71	amd-iommu: add dump for iommus described in ivrs table Add information about IOMMU devices described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:50:56 +02:00
Joerg Roedel	fefda117dd	amd-iommu: add amd_iommu_dump parameter This kernel parameter will be useful to get some AMD IOMMU related information in dmesg that is not necessary for the default user but may be helpful in debug situations. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2009-05-28 17:49:56 +02:00
Petr Tesarik	7d96fd41ca	x86: move rdtsc_barrier() into the TSC vread method The *fence instructions were moved to vsyscall_64.c by commit `cb9e35dce9`. But this breaks the vDSO, because vread methods are also called from there. Besides, the synchronization might be unnecessary for other time sources than TSC. [ Impact: fix potential time warp in VDSO ] Signed-off-by: Petr Tesarik <ptesarik@suse.cz> LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>	2009-05-28 14:15:54 +02:00
Pallipadi, Venkatesh	ee1ca48fae	ACPI: Disable ARB_DISABLE on platforms where it is not needed ARB_DISABLE is a NOP on all of the recent Intel platforms. For such platforms, reduce contention on c3_lock by skipping the fake ARB_DISABLE. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-05-27 21:57:30 -04:00
Andreas Herrmann	ca446d0635	[CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h. Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but according to AMD family 11h BKDG this frequency is just a rounded value: "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid] rounded to the nearest 100 Mhz." As a consequnce powernow-k8 reports wrong CPU frequency on some systems, e.g. on Turion X2 Ultra: powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 cpu cores) (version 2.20.00) powernow-k8: 0 : pstate 0 (2200 MHz) powernow-k8: 1 : pstate 1 (1100 MHz) powernow-k8: 2 : pstate 2 (600 MHz) But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it correctly: #x86info -a \|grep Pstate ... Pstate-0: fid=e, did=0, vid=24 (2200MHz) Pstate-1: fid=e, did=1, vid=30 (1100MHz) Pstate-2: fid=e, did=2, vid=3c (550MHz) (current) Solution is to determine the frequency directly from Pstate MSRs instead of using rounded values from ACPI table. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:51 -04:00
Thomas Renninger	df1829770d	[CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data - Make the message shorter and easier to grep for - Use printk_once instead of WARN_ONCE (functionality of these was mixed) Signed-off-by: Thomas Renninger <trenn@suse.de> Cc: Langsdorf, Mark <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:51 -04:00
Dave Jones	d38e73e8da	[CPUFREQ] powernow-k7 build fix when ACPI=n arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:50 -04:00
Jarod Wilson	4319503779	[CPUFREQ] add atom family to p4-clockmod Some atom procs don't do freq scaling (such as the atom 330 on my own littlefalls2 board). By adding the atom family here, we at least get the benefit of passive cooling in a thermal emergency. Not sure how to see that its actually helping any, but the driver does bind and claim its functioning on my atom 330. Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>	2009-05-26 12:04:50 -04:00
Ingo Molnar	aaba98018b	perf_counter, x86: Make NMI lockups more robust We have a debug check that detects stuck NMIs and returns with the PMU disabled in the global ctrl MSR - but i managed to trigger a situation where this was not enough to deassert the NMI. So clear/reset the full PMU and keep the disable count balanced when exiting from here. This way the box produces a debug warning but stays up and is more debuggable. [ Impact: in case of PMU related bugs, recover more gracefully ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-26 09:52:03 +02:00
Ingo Molnar	79202ba9ff	perf_counter, x86: Fix APIC NMI programming My Nehalem box locks up in certain situations (with an always-asserted NMI causing a lockup) if the PMU LVT entry is programmed between NMI and IRQ mode with a high frequency. Standardize exlusively on NMIs instead. [ Impact: fix lockup ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-26 09:49:28 +02:00
Ingo Molnar	53b441a565	Revert "perf_counter, x86: speed up the scheduling fast-path" This reverts commit `b68f1d2e7a`. It is causing problems (stuck/stuttering profiling) - when mixed NMI and non-NMI counters are used. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.703093461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:28 +02:00
Peter Zijlstra	a78ac32587	perf_counter: Generic per counter interrupt throttle Introduce a generic per counter interrupt throttle. This uses the perf_counter_overflow() quick disable to throttle a specific counter when its going too fast when a pmu->unthrottle() method is provided which can undo the quick disable. Power needs to implement both the quick disable and the unthrottle method. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.703093461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:12 +02:00
Peter Zijlstra	48e22d56ec	perf_counter: x86: Remove interrupt throttle remove the x86 specific interrupt throttle Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.616671838@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:12 +02:00
Peter Zijlstra	ff99be573e	perf_counter: x86: Expose INV and EDGE bits Expose the INV and EDGE bits of the PMU to raw configs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.494709027@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 21:41:11 +02:00
Tejun Heo	71c9d8b68b	x86: Remove remap percpu allocator for the time being Remap percpu allocator has subtle bug when combined with page attribute changing. Remap percpu allocator aliases PMD pages for the first chunk and as pageattr doesn't know about the alias it ends up updating page attributes of the original mapping thus leaving the alises in inconsistent state which might lead to subtle data corruption. Please read the following threads for more information: http://thread.gmane.org/gmane.linux.kernel/835783 The following is the proposed fix which teaches pageattr about percpu aliases. http://thread.gmane.org/gmane.linux.kernel/837157 However, the above changes are deemed too pervasive for upstream inclusion for 2.6.30 release, so this patch essentially disables the remap allocator for the time being. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <4A1A0A27.4050301@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-25 05:37:55 +02:00
H. Peter Anvin	ee0736627d	Merge branch 'x86/urgent' into x86/setup Resolved conflicts: arch/x86/boot/memory.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-23 16:42:19 -07:00
Suresh Siddha	0c752a9335	x86: introduce noxsave boot parameter Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor capabilities. Useful for debugging and working around xsave related issues. [ Impact: make it possible to debug problems in the field ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:10:54 -07:00
Paul Mackerras	a63eaf34ae	perf_counter: Dynamically allocate tasks' perf_counter_context struct This replaces the struct perf_counter_context in the task_struct with a pointer to a dynamically allocated perf_counter_context struct. The main reason for doing is this is to allow us to transfer a perf_counter_context from one task to another when we do lazy PMU switching in a later patch. This has a few side-benefits: the task_struct becomes a little smaller, we save some memory because only tasks that have perf_counters attached get a perf_counter_context allocated for them, and we can remove the inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't end up recompiling nearly everything whenever perf_counter.h changes. The perf_counter_context structures are reference-counted and freed when the last reference is dropped. A context can have references from its task and the counters on its task. Counters can outlive the task so it is possible that a context will be freed well after its task has exited. Contexts are allocated on fork if the parent had a context, or otherwise the first time that a per-task counter is created on a task. In the latter case, we set the context pointer in the task struct locklessly using an atomic compare-and-exchange operation in case we raced with some other task in creating a context for the subject task. This also removes the task pointer from the perf_counter struct. The task pointer was not used anywhere and would make it harder to move a context from one task to another. Anything that needed to know which task a counter was attached to was already using counter->ctx->task. The __perf_counter_init_context function moves up in perf_counter.c so that it can be called from find_get_context, and now initializes the refcount, but is otherwise unchanged. We were potentially calling list_del_counter twice: once from __perf_counter_exit_task when the task exits and once from __perf_counter_remove_from_context when the counter's fd gets closed. This adds a check in list_del_counter so it doesn't do anything if the counter has already been removed from the lists. Since perf_counter_task_sched_in doesn't do anything if the task doesn't have a context, and leaves cpuctx->task_ctx = NULL, this adds code to __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in the case where the current task adds the first counter to itself and thus creates a context for itself. This also adds similar code to __perf_counter_enable to handle a similar situation which can arise when the counters have been disabled using prctl; that also leaves cpuctx->task_ctx = NULL. [ Impact: refactor counter context management to prepare for new feature ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-22 12:18:19 +02:00
Zhang Rui	88dff4936c	x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot, see: http://bugzilla.kernel.org/show_bug.cgi?id=12901 [ Impact: fix hung reboot on certain systems ] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <1242963350.32574.53.camel@rzhang-dt> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-22 09:11:30 +02:00
Ingo Molnar	34adc80622	perf_counter: Fix context removal deadlock Disable the PMU globally before removing a counter from a context. This fixes the following lockup: [22081.741922] ------------[ cut here ]------------ [22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e() [22081.755624] Hardware name: X8DTN [22081.758903] perfcounters: irq loop stuck! [22081.762985] Modules linked in: [22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226 [22081.772432] Call Trace: [22081.774940] <NMI> [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.781993] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.788368] [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3 [22081.794649] [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45 [22081.800696] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.807080] [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a [22081.813751] [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86 [22081.819951] [<ffffffff8105b250>] ? notify_die+0x2d/0x32 [22081.825392] [<ffffffff814d1414>] ? do_nmi+0x8e/0x242 [22081.830538] [<ffffffff814d0f0a>] ? nmi+0x1a/0x20 [22081.835342] [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a [22081.842105] [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41 [22081.848673] <<EOE>> [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103 [22081.855512] [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe [22081.862926] [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce [22081.868909] [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe [22081.876382] [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6 [22081.882955] [<ffffffff81091b96>] ? perf_release+0x86/0x14c [22081.888600] [<ffffffff810c4c84>] ? __fput+0xe7/0x195 [22081.893718] [<ffffffff810c213e>] ? filp_close+0x5b/0x62 [22081.899107] [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2 [22081.905031] [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef [22081.910360] [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe [22081.916292] [<ffffffff8104898e>] ? do_group_exit+0x67/0x93 [22081.921953] [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16 [22081.927759] [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b [22081.934076] ---[ end trace 3a3936ce3e1b4505 ]--- And could potentially also fix the lockup reported by Marcelo Tosatti. Also, print more debug info in case of a detected lockup. [ Impact: fix lockup ] Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-20 20:12:54 +02:00
Yinghai Lu	4c6f18fc81	x86, io-apic: Don't mark pin_programmed early Peter bisected that: \| commit `b9c61b7007` \| Date: Wed May 6 10:10:06 2009 -0700 \| \| x86/pci: update pirq_enable_irq() to setup io apic routing \| \| So we can set io apic routing only when enabling the device irq. wrecked his opteron box, ata1 interrupts fail to get through. ata1 is using irq 11: [ 1.451839] sata_svw 0000:01:0e.0: version 2.3 [ 1.456333] sata_svw 0000:01:0e.0: PCI INT A -> GSI 11 (level, low) -> IRQ 11 [ 1.463639] scsi0 : sata_svw [ 1.466949] scsi1 : sata_svw [ 1.470022] scsi2 : sata_svw [ 1.473090] scsi3 : sata_svw [ 1.476112] ata1: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe000 irq 11 [ 1.483490] ata2: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe100 irq 11 [ 1.490870] ata3: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe200 irq 11 [ 1.498247] ata4: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe300 irq 11 that pin is overlapped with pin with legacy ones. We should not set bits in pin_programmed here, so that those bit could be set later via io_apic_set_pci_routing(). [ Impact: fix boot hang on certain systems ] Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Tested-by: Peter Zijlstra <peterz@infradead.org> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <4A119990.9020606@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-19 14:26:51 +02:00
Linus Torvalds	13bba6fda9	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix performance regression caused by paravirt_ops on native kernels xen: use header for EXPORT_SYMBOL_GPL x86, 32-bit: fix kernel_trap_sp() x86: fix percpu_{to,from}_op() x86: mtrr: Fix high_width computation when phys-addr is >= 44bit x86: Fix false positive section mismatch warnings in the apic code	2009-05-18 09:17:37 -07:00
Linus Torvalds	0130b2d701	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Append prompt in /debug/tracing/README file x86/function-graph: fix constraint for recording old return value	2009-05-18 09:15:41 -07:00
Ingo Molnar	1079cac0f4	Merge commit 'v2.6.30-rc6' into tracing/core Merge reason: we were on an -rc4 base, sync up to -rc6 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 10:15:35 +02:00
Ingo Molnar	b68f1d2e7a	perf_counter, x86: speed up the scheduling fast-path We have to set up the LVT entry only at counter init time, not at every switch-in time. There's friction between NMI and non-NMI use here - we'll probably remove the per counter configurability of it - but until then, dont slow down things ... [ Impact: micro-optimization ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:37:09 +02:00
Yinghai Lu	f1bdb52388	x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled Len expressed concern that the update_mptable feature has side-effects on the ACPI code. Make it sure explicitly that the code only ever gets called if the (default disabled) update_mptable boot quirk option is disabled. [ Impact: isolate the update_mptable feature from ACPI code more ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A0DC832.5090200@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:33:29 +02:00
Yinghai Lu	629e15d245	x86, irq: update_mptable needs pci_routeirq To get all device irq routing and to save them. This is basically an implicit pci=routeirq enablement if (and on if) the update_mptable boot option (which is off by default) has been specified. [ Impact: extend the update_mptable boot opion's scope ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <4A0DB7B4.4060702@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:33:17 +02:00
Yinghai Lu	35d5a9a614	x86: fix system without memory on node0 Jack found a boot crash on a system which doesn't have memory on node0. It turns out with recent per_cpu changes, node_number for BSP will always be 0, and it is not consistent to cpu_to_node() that might set it to a different (nearer) node already. aka when numa_set_node() for node0 is called early before per_cpu area is setup: two places touched that per_cpu(node_number,): 1. in cpu/common.c::cpu_init() and it is not for BP \| #ifdef CONFIG_NUMA \| if (cpu != 0 && percpu_read(node_number) == 0 && \| cpu_to_node(cpu) != NUMA_NO_NODE) \| percpu_write(node_number, cpu_to_node(cpu)); \| #endif for BP: traps_init ==> cpu_init for AP: start_secondary ==> cpu_init 2. cpu/intel.c or amd.c::srat_detect_node via numa_set_node() for BP: check_bugs ==> identify_boot_cpu ==> identify_cpu() that is rather later before numa_node_id() is used for BP... for AP: start_secondary => smp_callin => smp_store_cpu_info() => => identify_secondary_cpu => identify_cpu() so try to set that for BP earlier in setup_per_cpu_areas(), and don't bother to set that for APs there (it will be updated later and will be used later) (and don't mess the 0 before the copying BP per_cpu data to APs) [ Impact: fix boot crash on memoryless node-0 ] Reported-and-tested-by: Jack Steiner <steiner@sgi.com> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4A0C4A02.7050401@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:27:09 +02:00
Ingo Molnar	b286e21868	Merge commit 'v2.6.30-rc6' into x86/mm Merge reason: sync up to -rc6 which has changes to mm/ which we are going to touch in the commits to follow as well. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:12:51 +02:00
Yinghai Lu	2759c3287d	x86: don't call read_apic_id if !cpu_has_apic should not call that if apic is disabled. [ Impact: fix crash on certain UP configs ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <4A09CCBB.2000306@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 08:43:25 +02:00
Yinghai Lu	e5198075c6	x86, apic: introduce io_apic_irq_attr according to Ingo, io_apic irq-setup related functions have too many parameters with a repetitive signature. So reduce related funcs to get less params by passing a pointer to a newly defined io_apic_irq_attr structure. v2: io_apic_irq ==> irq_attr triggering ==> trigger v3: add set_io_apic_irq_attr [ Impact: cleanup ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A08ACD3.2070401@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 08:38:55 +02:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
Ingo Molnar	d2517a49d5	perf_counter, x86: fix zero irq_period counters The quirk to irq_period unearthed an unrobustness we had in the hw_counter initialization sequence: we left irq_period at 0, which was then quirked up to 2 ... which then generated a _lot_ of interrupts during 'perf stat' runs, slowed them down and skewed the counter results in general. Initialize irq_period to the maximum instead. [ Impact: fix perf stat results ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-17 12:27:37 +02:00
Jeremy Fitzhardinge	b4ecc12699	x86: Fix performance regression caused by paravirt_ops on native kernels Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: \| commit `8efcbab674` \| Date: Mon Jul 7 12:07:51 2008 -0700 \| \| paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq 0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop / or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 20:07:42 +02:00
Jaswinder Singh Rajput	52650257ea	x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput	ba5673ff1f	x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput	654ac05801	x86, mtrr: remove mtrr MSRs double declaration Removed MTRR MSR from mtrr/mtrr.h as these are already declared in msr-index.h and nobody is using them: MTRRfix16K_A0000_MSR MTRRfix4K_C8000_MSR MTRRfix4K_D0000_MSR MTRRfix4K_D8000_MSR MTRRfix4K_E0000_MSR MTRRfix4K_E8000_MSR MTRRfix4K_F0000_MSR MTRRfix4K_F8000_MSR Use standard msr-index.h's MSR declaration and no need to declare again [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:01 -07:00
Jaswinder Singh Rajput	7d9d55e449	x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 Use standard msr-index.h's MSR declaration and no need to declare again [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:00 -07:00
Jaswinder Singh Rajput	a036c7a358	x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:00 -07:00
Jaswinder Singh Rajput	d9bcc01d58	x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-15 07:49:00 -07:00
Peter Zijlstra	60db5e09c1	perf_counter: frequency based adaptive irq_period Instead of specifying the irq_period for a counter, provide a target interrupt frequency and dynamically adapt the irq_period to match this frequency. [ Impact: new perf-counter attribute/feature ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.646195868@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 15:26:56 +02:00
Jason Wessel	33ab1979bc	kgdb,i386: use address that SP register points to in the exception frame The treatment of the SP register is different on x86_64 and i386. This is a regression fix that lived outside the mainline kernel from 2.6.27 to now. The regression was a result of the original merge consolidation of the i386 and x86_64 archs to x86. The incorrectly reported SP on i386 prevented stack tracebacks from working correctly in gdb. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2009-05-15 07:56:25 -05:00
Ingo Molnar	9029a5e380	perf_counter: x86: Protect against infinite loops in intel_pmu_handle_irq() intel_pmu_handle_irq() can lock up in an infinite loop if the hardware does not allow the acking of irqs. Alas, this happened in testing so make this robust and emit a warning if it happens in the future. Also, clean up the IRQ handlers a bit. [ Impact: improve perfcounter irq/nmi handling robustness ] Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:47:06 +02:00
Ingo Molnar	1c80f4b598	perf_counter: x86: Disallow interval of 1 On certain CPUs i have observed a stuck PMU if interval was set to 1 and NMIs were used. The PMU had PMC0 set in MSR_CORE_PERF_GLOBAL_STATUS, but it was not possible to ack it via MSR_CORE_PERF_GLOBAL_OVF_CTRL, and the NMI loop got stuck infinitely. [ Impact: fix rare hangs during high perfcounter load ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:47:05 +02:00
Peter Zijlstra	a4016a79fc	perf_counter: x86: Robustify interrupt handling Two consecutive NMIs could daze and confuse the machine when the first would handle the overflow of both counters. [ Impact: fix false-positive syslog messages under multi-session profiling ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:47:03 +02:00
Peter Zijlstra	9e35ad388b	perf_counter: Rework the perf counter disable/enable The current disable/enable mechanism is: token = hw_perf_save_disable(); ... /* do bits */ ... hw_perf_restore(token); This works well, provided that the use nests properly. Except we don't. x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore provide a reference counter disable/enable interface, where the first disable disables the hardware, and the last enable enables the hardware again. [ Impact: refactor, simplify the PMU disable/enable logic ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:47:02 +02:00
Peter Zijlstra	962bf7a66e	perf_counter: x86: Fix up the amd NMI/INT throttle perf_counter_unthrottle() restores throttle_ctrl, buts its never set. Also, we fail to disable all counters when throttling. [ Impact: fix rare stuck perf-counters when they are throttled ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:47:01 +02:00
Peter Zijlstra	a026dfecc0	perf_counter: x86: Allow unpriviliged use of NMIs Apply sysctl_perf_counter_priv to NMIs. Also, fail the counter creation instead of silently down-grading to regular interrupts. [ Impact: allow wider perf-counter usage ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:46:57 +02:00
Ingo Molnar	f5a5a2f6e6	perf_counter: x86: Fix throttling If counters are disabled globally when a perfcounter IRQ/NMI hits, and if we throttle in that case, we'll promote the '0' value to the next lapic IRQ and disable all perfcounters at that point, permanently ... Fix it. [ Impact: fix hung perfcounters under load ] Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:46:56 +02:00
Peter Zijlstra	ec3232bdf8	perf_counter: x86: More accurate counter update Take the counter width into account instead of assuming 32 bits. In particular Nehalem has 44 bit wide counters, and all arithmetics should happen on a 44-bit signed integer basis. [ Impact: fix rare event imprecision, warning message on Nehalem ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-15 09:46:54 +02:00
Steven Rostedt	29a679754b	x86/stacktrace: return 0 instead of -1 for stack ops If we return -1 in the ops->stack for the stacktrace saving, we end up breaking out of the loop if the stack we are tracing is in the exception stack. This causes traces like: <idle>-0 [002] 34263.745825: raise_softirq_irqoff <-__blk_complete_request <idle>-0 [002] 34263.745826: <= 0 <= 0 <= 0 <= 0 <= 0 <= 0 <= 0 By returning "0" instead, the irq stack is saved as well, and we see: <idle>-0 [003] 883.280992: raise_softirq_irqoff <-__hrtimer_star t_range_ns <idle>-0 [003] 883.280992: <= hrtimer_start_range_ns <= tick_nohz_restart_sched_tick <= cpu_idle <= start_secondary <= <= 0 <= 0 [ Impact: record stacks from interrupts ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-14 23:19:09 -04:00
Steven Rostedt	aa512a27e9	x86/function-graph: fix constraint for recording old return value After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke. Investigating, I found that in the asm that replaces the return value, gcc was using the same register for the old value as it was for the new value. mov (addr), old mov new, (addr) But if old and new are the same register, we clobber new with old! I first thought this was a bug in gcc 4.4.0 and reported it: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132 Andrew Pinski responded (quickly), saying that it was correct gcc behavior and the code needed to denote old as an "early clobber". Instead of "=r"(old), we need "=&r"(old). [Impact: keep function graph tracer from breaking with gcc 4.4.0 ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-13 13:52:19 -04:00
Arun R Bharadwaj	5c333864a6	timers: Identifying the existing pinned timers * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: The following pinned hrtimers have been identified and marked: 1)sched_rt_period_timer 2)tick_sched_timer 3)stack_trace_timer_fn [ tglx: fixup the hrtimer pinned mode ] Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-05-13 16:52:42 +02:00
Peter Zijlstra	5bb9efe33e	perf_counter: fix print debug irq disable inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. bash/15802 [HC0[0]:SC0[0]:HE1:SE1] takes: (sysrq_key_table_lock){?.....}, Don't unconditionally enable interrupts in the perf_counter_print_debug() path. [ Impact: fix potential deadlock pointed out by lockdep ] LKML-Reference: <new-submission> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2009-05-13 08:17:37 +02:00
Yinghai Lu	4797f6b021	x86: read apic ID in the !acpi_lapic case Ed found that on 32-bit, boot_cpu_physical_apicid is not read right, when the mptable is broken. Interestingly, actually three paths use/set it: 1. acpi: at that time that is already read from reg 2. mptable: only read from mptable 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit so we could read the apic id for the 2/3 path. We trust the hardware register more than we trust a BIOS data structure (the mptable). We can also avoid the double set_fixmap() when acpi_lapic is used, and also need to move cpu_has_apic earlier and call apic_disable(). Also when need to update the apic id, we'd better read and set the apic version as well - so that quirks are applied precisely. v2: make path 3 with 64bit, use -1 as apic id, so could read it later. v3: fix whitespace problem pointed out by Ed Swierk v5: fix boot crash [ Impact: get correct apic id for bsp other than acpi path ] Reported-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <49FC85A9.2070702@kernel.org> [ v4: sanity-check in the ACPI case too ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-12 12:22:06 +02:00
Ingo Molnar	6cda3eb62e	Merge branch 'x86/apic' into irq/numa Merge reason: both topics modify the APIC code but were able to do it in parallel so far. An upcoming patch generates a conflict so merge them to avoid the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-12 12:17:36 +02:00
Amerigo Wang	bf78ad69cd	x86: process.c, remove useless headers <stdarg.h> is not needed by these files, remove them. [ Impact: cleanup ] Signed-off-by: WANG Cong <amwang@redhat.com> Cc: akpm@linux-foundation.org LKML-Reference: <20090512032956.5040.77055.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-12 11:26:32 +02:00
Amerigo Wang	9d62dcdfa6	x86: merge process.c a bit Merge arch_align_stack() and arch_randomize_brk(), since they are the same. Tested on x86_64. [ Impact: cleanup ] Signed-off-by: Amerigo Wang <amwang@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-12 11:13:45 +02:00
Dmitry Adamushko	871b72dd1e	x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic * Solve issues described in `6f66cbc630` in a way that doesn't resort to set_cpus_allowed(); * in fact, only collect_cpu_info and apply_microcode callbacks must run on a target cpu, others will do just fine on any other. smp_call_function_single() (as suggested by Ingo) is used to run these callbacks on a target cpu. * cleanup of synchronization logic of the 'microcode_core' part The generic 'microcode_core' part guarantees that only a single cpu (be it a full-fledged cpu, one of the cores or HT) is being updated at any particular moment of time. In general, there is no need for any additional sync. mechanism in arch-specific parts (the patch removes existing spinlocks). See also the "Synchronization" section in microcode_core.c. * return -EINVAL instead of -1 (which is translated into -EPERM) in microcode_write(), reload_cpu() and mc_sysdev_add(). Other suggestions for an error code? * use 'enum ucode_state' as return value of request_microcode_{fw, user} to gain more flexibility by distinguishing between real error cases and situations when an appropriate ucode was not found (which is not an error per-se). * some minor cleanups Thanks a lot to Hugh Dickins for review/suggestions/testing! Reference: http://marc.info/?l=linux-kernel&m=124025889012541&w=2 [ Impact: refactor and clean up microcode driver locking code ] Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Peter Oruba <peter.oruba@amd.com> Cc: Arjan van de Ven <arjan@infradead.org> LKML-Reference: <1242078507.5560.9.camel@earth> [ did some more cleanups ] Signed-off-by: Ingo Molnar <mingo@elte.hu> arch/x86/include/asm/microcode.h \| 25 ++ arch/x86/kernel/microcode_amd.c \| 58 ++---- arch/x86/kernel/microcode_core.c \| 326 +++++++++++++++++++++----------------- arch/x86/kernel/microcode_intel.c \| 92 +++------- 4 files changed, 261 insertions(+), 240 deletions(-) (~20 new comment lines)	2009-05-12 10:36:44 +02:00
H. Peter Anvin	5031296c57	x86: add extension fields for bootloader type and version A long ago, in days of yore, it all began with a god named Thor. There were vikings and boats and some plans for a Linux kernel header. Unfortunately, a single 8-bit field was used for bootloader type and version. This has generally worked without too much pain, but we're getting close to flat running out of ID fields. Add extension fields for both type and version. The type will be extended if it the old field is 0xE; the version is a simple MSB extension. Keep /proc/sys/kernel/bootloader_type containing (type << 4) + (ver & 0xf) for backwards compatiblity, but also add /proc/sys/kernel/bootloader_version which contains the full version number. [ Impact: new feature to support more bootloaders ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-11 17:45:06 -07:00
H. Peter Anvin	37ba7ab5e3	x86, boot: make kernel_alignment adjustable; new bzImage fields Make the kernel_alignment field adjustable; this allows us to set it to a large value (intended to be 16 MB to avoid ZONE_DMA contention, memory holes and other weirdness) while a smart bootloader can still force a loading at a lesser alignment if absolutely necessary. Also export pref_address (preferred loading address, corresponding to the link-time address) and init_size, the total amount of linear memory the kernel will require during initialization. [ Impact: allows better kernel placement, gives bootloader more info ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-11 17:44:39 -07:00
Alexey Dobriyan	0c23590f00	x86, 64-bit: ifdef out struct thread_struct::ip struct thread_struct::ip isn't used on x86_64, struct pt_regs::ip is used instead. kgdb should be reading 0 always, but I can't check it. [ Impact: (potentially) reduce thread_struct size on 64-bit ] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: containers@lists.linux-foundation.org LKML-Reference: <20090503233015.GJ16631@x200.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 16:23:54 +02:00
Cyrill Gorcunov	cec6be6d10	x86: apic: Fixmap apic address even if apic disabled In case if apic were disabled by boot option we still need read_apic operation. So fixmap a fake apic area if needed. [ Impact: fix boot crash ] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org Cc: eswierk@aristanetworks.com LKML-Reference: <20090511134140.GH4624@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 15:50:58 +02:00
Ingo Molnar	41fb454ebe	Merge commit 'v2.6.30-rc5' into core/iommu Merge reason: core/iommu was on an .30-rc1 base, update it to .30-rc5 to refresh. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 14:44:31 +02:00
Andreas Herrmann	97a5271465	x86: display extended apic registers with print_local_APIC and cpu_debug code Both print_local_APIC (used when apic=debug kernel param is set) and cpu_debug code missed support for some extended APIC registers that I'd like to see. This adds support to show: - extended APIC feature register - extended APIC control register - extended LVT registers [ Impact: print more debug info ] Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090508162350.GO29045@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 14:37:36 +02:00
Mike Galbraith	8823392360	perf_counter, x86: clean up throttling printk s/PERFMON/perfcounters for perfcounter interrupt throttling warning. 'perfmon' is the CPU feature name that is Intel-only, while we do throttling in a generic way. [ Impact: cleanup ] Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 12:04:30 +02:00
Yinghai Lu	917a015362	x86: mtrr: Fix high_width computation when phys-addr is >= 44bit found one system where cpu address line is 44bits, mtrr printout is not right: [ 0.000000] MTRR variable ranges enabled: [ 0.000000] 0 base 0 00000000 mask FF0 00000000 write-back [ 0.000000] 1 base 10 00000000 mask FFF 80000000 write-back [ 0.000000] 2 base 0 80000000 mask FFF 80000000 uncachable [ 0.000000] 3 base 0 7F800000 mask FFF FF800000 uncachable Li Zefan and Frederic pointed out the high_width could be -4 some how. It turns out when phys_addr is 44bit, size_or_mask will be ffffffff,00000000 so ffs(size_or_mask) will be 0. Try to check low 32 bit, to get correct high_width. Signed-off-by: Yinghai Lu <yinghai@kerne.org> Also-analyzed-by: Frederic Weisbecker <fweisbec@gmail.com> Also-analyzed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Zhaolei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A026540.8060504@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:40:43 +02:00
Yinghai Lu	4401da6111	x86: read apic ID in the !acpi_lapic case Ed found that on 32-bit, boot_cpu_physical_apicid is not read right, when the mptable is broken. Interestingly, actually three paths use/set it: 1. acpi: at that time that is already read from reg 2. mptable: only read from mptable 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit so we could read the apic id for the 2/3 path. We trust the hardware register more than we trust a BIOS data structure (the mptable). We can also avoid the double set_fixmap() when acpi_lapic is used, and also need to move cpu_has_apic earlier and call apic_disable(). Also when need to update the apic id, we'd better read and set the apic version as well - so that quirks are applied precisely. v2: make path 3 with 64bit, use -1 as apic id, so could read it later. v3: fix whitespace problem pointed out by Ed Swierk [ Impact: get correct apic id for bsp other than acpi path ] Reported-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <49FC85A9.2070702@kernel.org> [ v4: sanity-check in the ACPI case too ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:29:23 +02:00
Yinghai Lu	80989ce064	x86: clean up and and print out initial max_pfn_mapped Do this so we can check the range that is mapped before init_memory_mapping(). To be able to print out meaningful info, we first have to fix 64-bit to have max_pfn_mapped assigned before that call. This also unifies the code-path a bit. [ Impact: print more debug info, cleanup ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <49BF0978.40605@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:11:12 +02:00
Yinghai Lu	3e0c373749	x86: clean up and fix setup_clear/force_cpu_cap handling setup_force_cpu_cap() only have one user (Xen guest code), but it should not reuse cleared_cpu_cpus, otherwise it will have problems on SMP. Need to have a separate cpu_cpus_set array too, for forced-on flags, beyond the forced-off flags. Also need to setup handling before all cpus caps are combined. [ Impact: fix the forced-set CPU feature flag logic ] Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:57:24 +02:00
Yinghai Lu	61fe91e131	x86: apic: Check rev 3 fadt correctly for physical_apic bit Impact: fix fadt version checking FADT2_REVISION_ID has value 3 aka rev 3 FADT. So need to use >= instead of >, as other places in the code do. [ Impact: extend scope of APIC boot quirk ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:52:40 +02:00
Yinghai Lu	b9c61b7007	x86/pci: update pirq_enable_irq() to setup io apic routing So we can set io apic routing only when enabling the device irq. This is advantageous for IRQ descriptor allocation affinity: if we set up the IO-APIC entry later, we have a chance to allocate the IRQ descriptor later and know which device it is on and can set affinity accordingly. [ Impact: standardize/enhance irq-enabling sequence for mptable irqs ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A01C46E.8000501@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:35:10 +02:00
Yinghai Lu	5ef2183768	x86/acpi: move setup io apic routing out of CONFIG_ACPI scope So we could set io apic routing when ACPI is not enabled. [ Impact: prepare for new functionality ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A01C422.5070400@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:35:09 +02:00
Yinghai Lu	e20c06fd69	x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() To prepare those params for pcibios_irq_enable() to call setup_io_apic_routing(). [ Impact: extend function call API to prepare for new functionality ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A01C406.2040303@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:35:09 +02:00
Yinghai Lu	bdfe8ac153	x86/acpi: move pin_programmed bit map to io_apic.c Prepare to call setup_io_apic_routing() in pcibios_irq_enable() also remove not needed member apic_id. [ Impact: clean up, prepare for future change ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A01C3DD.3050104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 10:35:08 +02:00

... 7 8 9 10 11 ...

5462 Commits