linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 13:11:45 +00:00

Author	SHA1	Message	Date
Joerg Roedel	8061252ee0	KVM: SVM: Add intercept checks for remaining twobyte instructions This patch adds intercepts checks for the remaining twobyte instructions to the KVM instruction emulator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	d7eb820306	KVM: SVM: Add intercept checks for remaining group7 instructions This patch implements the emulator intercept checks for the RDTSCP, MONITOR, and MWAIT instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	01de8b09e6	KVM: SVM: Add intercept checks for SVM instructions This patch adds the necessary code changes in the instruction emulator and the extensions to svm.c to implement intercept checks for the svm instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	dee6bb70e4	KVM: SVM: Add intercept checks for descriptor table accesses This patch add intercept checks into the KVM instruction emulator to check for the 8 instructions that access the descriptor table addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	3b88e41a41	KVM: SVM: Add intercept check for accessing dr registers This patch adds the intercept checks for instruction accessing the debug registers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	cfec82cb7d	KVM: SVM: Add intercept check for emulated cr accesses This patch adds all necessary intercept checks for instructions that access the crX registers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	8a76d7f25f	KVM: x86: Add x86 callback for intercept check This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	8ea7d6aef8	KVM: x86 emulator: Add flag to check for protected mode instructions This patch adds a flag for the opcoded to tag instruction which are only recognized in protected mode. The necessary check is added too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	d09beabd7c	KVM: x86 emulator: Add check_perm callback This patch adds a check_perm callback for each opcode into the instruction emulator. This will be used to do all necessary permission checks on instructions before checking whether they are intercepted or not. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	775fde8648	KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED This patch prevents the changed CPU state to be written back when the emulator detected that the instruction was intercepted by the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	3c6e276f22	KVM: x86 emulator: add SVM intercepts Add intercept codes for instructions defined by SVM as interceptable. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	c4f035c60d	KVM: x86 emulator: add framework for instruction intercepts When running in guest mode, certain instructions can be intercepted by hardware. This also holds for nested guests running on emulated virtualization hardware, in particular instructions emulated by kvm itself. This patch adds a framework for intercepting instructions. If an instruction is marked for interception, and if we're running in guest mode, a callback is called to check whether an intercept is needed or not. The callback is called at three points in time: immediately after beginning execution, after checking privilge exceptions, and after checking memory exception. This suits the different interception points defined for different instructions and for the various virtualization instruction sets. In addition, a new X86EMUL_INTERCEPT is defined, which any callback or memory access may define, allowing the more complicated intercepts to be implemented in existing callbacks. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	aa97bb4891	KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f) Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	1253791df9	KVM: x86 emulator: SSE support Add support for marking an instruction as SSE, switching registers used to the SSE register file. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	0d7cdee83a	KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes Most SIMD instructions use the 66/f2/f3 prefixes to distinguish between different variants of the same instruction. Usually the encoding is quite regular, but in some cases (including non-SIMD instructions) the prefixes generate very different instructions. Examples include XCHG/PAUSE, MOVQ/MOVDQA/MOVDQU, and MOVBE/CRC32. Allow the emulator to handle these special cases by splitting such opcodes into groups, with different decode flags and execution functions for different prefixes. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	5037f6f324	KVM: x86 emulator: define callbacks for using the guest fpu within the emulator Needed for emulating fpu instructions. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	1d6b114f20	KVM: x86 emulator: do not munge rep prefix Currently we store a rep prefix as 1 or 2 depending on whether it is a REPE or REPNE. Since sse instructions depend on the prefix value, store it as the original opcode to simplify things further on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	cef4dea07f	KVM: 16-byte mmio support Since sse instructions can issue 16-byte mmios, we need to support them. We can't increase the kvm_run mmio buffer size to 16 bytes without breaking compatibility, so instead we break the large mmios into two smaller 8-byte ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	5287f194bf	KVM: Split mmio completion into a function Make room for sse mmio completions. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	70252a1053	KVM: extend in-kernel mmio to handle >8 byte transactions Needed for coalesced mmio using sse. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Gleb Natapov	1499e54af0	KVM: x86: better fix for race between nmi injection and enabling nmi window Fix race between nmi injection and enabling nmi window in a simpler way. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:56:57 -04:00
Marcelo Tosatti	c761e5868e	Revert "KVM: Fix race between nmi injection and enabling nmi window" This reverts commit `f86368493e`. Simpler fix to follow. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:56:57 -04:00
Glauber Costa	3291892450	KVM: expose async pf through our standard mechanism As Avi recently mentioned, the new standard mechanism for exposing features is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf missed that. So expose async_pf here. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Gleb Natapov <gleb@redhat.com> CC: Avi Kivity <avi@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Avi Kivity	654f06fc65	KVM: VMX: simplify NMI mask management Use vmx_set_nmi_mask() instead of open-coding management of the hardware bit and the software hint (nmi_known_unmasked). There's a slight change of behaviour when running without hardware virtual NMI support - we now clear the NMI mask if NMI delivery faulted in that case as well. This improves emulation accuracy. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Jan Kiszka	89a9fb78b5	KVM: SVM: Remove unused svm_features We use boot_cpu_has now. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Avi Kivity	8878647585	KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception vmx_complete_atomic_exit() cached it for us, so we can use it here. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	c5ca8e572c	KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally Only read it if we're going to use it later. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	00eba012d5	KVM: VMX: Refactor vmx_complete_atomic_exit() Move the exit reason checks to the front of the function, for early exit in the common case. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	f9902069c4	KVM: VMX: Qualify check for host NMI Check for the exit reason first; this allows us, later, to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	9d58b93192	KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded When we haven't injected an interrupt, we don't need to recover the nmi blocking state (since the guest can't set it by itself). This allows us to avoid a VMREAD later on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	69c7302890	KVM: VMX: Cache cpl We may read the cpl quite often in the same vmexit (instruction privilege check, memory access checks for instruction and operands), so we gain a bit if we cache the value. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f4c63e5d5a	KVM: VMX: Optimize vmx_get_cpl() In long mode, vm86 mode is disallowed, so we need not check for it. Reading rflags.vm may require a VMREAD, so it is expensive. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	6de12732c4	KVM: VMX: Optimize vmx_get_rflags() If called several times within the same exit, return cached results. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f6e7847589	KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions Some rflags bits are owned by the host, not guest, so we need to use kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them back. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Jiri Olsa	9bbeacf52f	kprobes, x86: Disable irqs during optimized callback Disable irqs during optimized callback, so we dont miss any in-irq kprobes. The following commands: # cd /debug/tracing/ # echo "p mutex_unlock" >> kprobe_events # echo "p _raw_spin_lock" >> kprobe_events # echo "p smp_apic_timer_interrupt" >> ./kprobe_events # echo 1 > events/enable Cause the optimized kprobes to be missed. None is missed with the fix applied. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20110511110613.GB2390@jolsa.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-11 13:21:23 +02:00
Seth Heasley	c0a86a9bea	x86/PCI: irq and pci_ids patch for Intel Panther Point DeviceIDs This patch adds the LPC Controller DeviceIDs for the Intel Panther Point PCH. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-05-10 15:43:36 -07:00
Yinghai Lu	5491ff511d	x86/PCI: Remove dma32_reserve_bootmem This workaround holds a dma32 buffer at early boot to prevent later bootmem allocations from stealing it in the case of large RAM configs. Now that x86 is using memblock, and the nobootmem wrapper does top-down allocation, it's no longer necessary, so remove it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-05-10 15:43:32 -07:00
Julia Lawall	0e8ede5351	x86/PCI: Convert release_resource to release_region/release_mem_region Request_region should be used with release_region, not release_resource. The local variables region and region2 are dropped and the calls to release_resource are replaced with calls to release_region, using the first two arguments of the corresponding calls to request_region. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,E; @@ ( x = request_region(...) \| x = request_mem_region(...) ) ... when != release_region(x) when != x = E * release_resource(x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-05-10 15:43:30 -07:00
Joerg Roedel	fffcda1183	x86, gart: Rename pci-gart_64.c to amd_gart_64.c This file only contains code relevant for the northbridge gart in AMD processors. This patch renames the file to represent this fact in the filename. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-05-10 17:22:06 +02:00
Ingo Molnar	932fed4e2e	Merge commit 'v2.6.39-rc7' into perf/core Merge reason: pull in the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-10 17:05:45 +02:00
Joerg Roedel	72fe00f01f	x86/amd-iommu: Use threaded interupt handler Move the interupt handling for the iommu into the interupt thread to reduce latencies and prepare interupt handling for pri handling. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-05-10 11:07:58 +02:00
Joerg Roedel	604c307bf4	Merge branches 'dma-debug/next', 'amd-iommu/command-cleanups', 'amd-iommu/ats' and 'amd-iommu/extended-features' into iommu/2.6.40 Conflicts: arch/x86/include/asm/amd_iommu_types.h arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c	2011-05-10 10:25:23 +02:00
Joe Perches	e969687595	arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS Coalesce format as well. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-05-10 10:21:35 +02:00
Justin P. Mattock	70f23fd66b	treewide: fix a few typos in comments - kenrel -> kernel - whetehr -> whether - ttt -> tt - sss -> ss Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-05-10 10:16:21 +02:00
Jack Steiner	1d44e8288a	x86, UV: Fix NMI handler for UV platforms This fixes problems seen on UV systems handling NMIs from the node controller. I isolated the "dazed..." messages that I saw earlier to a bug in the BMC on our platform. It was sending NMIs w/o properly setting a register that indicated the source of NMI. So rather than _assuming_ any unhandled NMI came from the UV system maintenance console (SMC), add a check to verify that the SMC actually sent the NMI. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: gorcunov@gmail.com Cc: dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-10 09:26:55 +02:00
Matthew Garrett	935a638241	x86, efi: Ensure that the entirity of a region is mapped It's possible for init_memory_mapping() to fail to map the entire region if it crosses a boundary, so ensure that we complete the mapping. Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1304623186-18261-5-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-09 12:14:45 -07:00
Matthew Garrett	7cb00b7287	x86, efi: Pass a minimal map to SetVirtualAddressMap() Experimentation with various EFI implementations has shown that functions outside runtime services will still update their pointers if SetVirtualAddressMap() is called with memory descriptors outside the runtime area. This is obviously insane, and therefore is unsurprising. Evidence from instrumenting another EFI implementation suggests that it only passes the set of descriptors covering runtime regions, so let's avoid any problems by doing the same. Runtime descriptors are copied to a separate memory map, and only that map is passed back to the firmware. Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1304623186-18261-4-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-09 12:14:39 -07:00
Matthew Garrett	202f9d0a41	x86, efi: Merge contiguous memory regions of the same type and attribute Some firmware implementations assume that physically contiguous regions will be contiguous in virtual address space. This assumption is, obviously, entirely unjustifiable. Said firmware implementations lack the good grace to handle their failings in a measured and reasonable manner, instead tending to shit all over address space and oopsing the kernel. In an ideal universe these firmware implementations would simultaneously catch fire and cease to be a problem, but since some of them are present in attractively thin and shiny metal devices vanity wins out and some poor developer spends an extended period of time surrounded by a growing array of empty bottles until the underlying reason becomes apparent. Said developer presents this patch, which simply merges adjacent regions if they happen to be contiguous and have the same EFI memory type and caching attributes. Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1304623186-18261-3-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-09 12:14:34 -07:00
Matthew Garrett	9cd2b07c19	x86, efi: Consolidate EFI nx control The core EFI code and 64-bit EFI code currently have independent implementations of code for setting memory regions as executable or not. Let's consolidate them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1304623186-18261-2-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-09 12:14:29 -07:00
Matthew Garrett	2b5e8ef35b	x86, efi: Remove virtual-mode SetVirtualAddressMap call The spec says that SetVirtualAddressMap doesn't work once you're in virtual mode, so there's no point in having infrastructure for calling it from there. Signed-off-by: Matthew Garrett <mjg@redhat.com> Link: http://lkml.kernel.org/r/1304623186-18261-1-git-send-email-mjg@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-09 12:14:25 -07:00
Luis R. Rodriguez	8fab6af215	x86: Fix mrst sparse complaints Fix these Sparse complaints: CHECK arch/x86/platform/mrst/mrst.c arch/x86/platform/mrst/mrst.c:197:13: warning: symbol 'mrst_time_init' was not declared. Should it be static? arch/x86/platform/mrst/mrst.c:219:16: warning: symbol 'mrst_arch_setup' was not declared. Should it be static? Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Roman Gezikov <roman.gezikov@atheros.com> Cc: Joonas Viskari <joonas.viskari@atheros.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Allen Kao <allen.kao@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Link: http://lkml.kernel.org/r/1304719209-26913-1-git-send-email-lrodriguez@atheros.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-07 10:52:30 +02:00
Ingo Molnar	4cb1f43ce8	Merge commit 'v2.6.39-rc6' into x86/cleanups Merge reason: move to a (much) newer upstream base. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-07 10:51:48 +02:00
Ingo Molnar	57d524154f	Merge branch 'perf/stat' into perf/core Merge reason: the perf stat improvements are tested and ready now. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 21:07:38 +02:00
Rob Landley	6151658751	Correct occurrences of - Documentation/kvm/ to Documentation/virtual/kvm - Documentation/uml/ to Documentation/virtual/uml - Documentation/lguest/ to Documentation/virtual/lguest throughout the kernel source tree. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>	2011-05-06 09:27:55 -07:00
Peter Zijlstra	63b6a6758e	perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions The Intel Nehalem offcore bits implemented in: `e994d7d23a`: perf: Fix LLC-* events on Intel Nehalem/Westmere ... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as MISS even though its clearly documented as an L3 hit ... Fix them and the Westmere definitions as well. Cc: Andi Kleen <ak@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 11:24:48 +02:00
Lin Ming	e04d1b23f9	perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events Extend the Intel SandyBridge PMU driver with definitions for generic front-end and back-end stall events. ( As commit `3011203` "perf events, x86: Add Westmere stalled-cycles-frontend/backend events" says, these are only approximations. ) Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 09:37:03 +02:00
Ingo Molnar	4d70230bb4	Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent	2011-05-06 08:11:28 +02:00
Anton Blanchard	228e548e60	net: Add sendmmsg socket system call This patch adds a multiple message send syscall and is the send version of the existing recvmmsg syscall. This is heavily based on the patch by Arnaldo that added recvmmsg. I wrote a microbenchmark to test the performance gains of using this new syscall: http://ozlabs.org/~anton/junkcode/sendmmsg_test.c The test was run on a ppc64 box with a 10 Gbit network card. The benchmark can send both UDP and RAW ethernet packets. 64B UDP batch pkts/sec 1 804570 2 872800 (+ 8 %) 4 916556 (+14 %) 8 939712 (+17 %) 16 952688 (+18 %) 32 956448 (+19 %) 64 964800 (+20 %) 64B raw socket batch pkts/sec 1 1201449 2 1350028 (+12 %) 4 1461416 (+22 %) 8 `1513080` (+26 %) 16 1541216 (+28 %) 32 1553440 (+29 %) 64 1557888 (+30 %) We see a 20% improvement in throughput on UDP send and 30% on raw socket send. [ Add sparc syscall entries. -DaveM ] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-05 11:10:14 -07:00
Ingo Molnar	98bb318864	Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent	2011-05-04 20:33:42 +02:00
Randy Dunlap	9cddf15f18	x86/net: only select BPF_JIT when NET is enabled Fix kconfig unmet dependency warning: HAVE_BPF_JIT depends on NET, so make the "select" of it depend on NET also. warning: (X86) selects HAVE_BPF_JIT which has unmet direct dependencies (NET) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-04 11:06:05 -07:00
Dominik Brodowski	2d06d8c49a	[CPUFREQ] use dynamic debug instead of custom infrastructure With dynamic debug having gained the capability to report debug messages also during the boot process, it offers a far superior interface for debug messages than the custom cpufreq infrastructure. As a first step, remove the old cpufreq_debug_printk() function and replace it with a call to the generic pr_debug() function. How can dynamic debug be used on cpufreq? You need a kernel which has CONFIG_DYNAMIC_DEBUG enabled. To enabled debugging during runtime, mount debugfs and $ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control for debugging the complete "cpufreq" module. To achieve the same goal during boot, append ddebug_query="module cpufreq +p" as a boot parameter to the kernel of your choice. For more detailled instructions, please see Documentation/dynamic-debug-howto.txt Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com>	2011-05-04 11:50:57 -04:00
Naga Chumbalkar	904cc1e637	[CPUFREQ] Fix _OSC UUID in pcc-cpufreq UUID needs to be written out the way it is described in Sec 18.5.124 of ACPI 4.0a Specification. Platform firmware's use of this UUID/_OSC is optional, which is why we didn't notice this bug earlier. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Signed-off-by: Dave Jones <davej@redhat.com> Cc: stable@kernel.org	2011-05-04 11:50:56 -04:00
Linus Torvalds	609cfda586	Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top xen/mmu: Add workaround "x86-64, mm: Put early page table high"	2011-05-03 09:25:42 -07:00
H. Peter Anvin	7806a49ab6	x86, reboot: Fix relocations in reboot_32.S The use of base for %ebx in this file is arbitrary, except that we also use it to compute the real-mode segment. Therefore, make it so that r_base really is the true address to which %ebx points. This resolves kernel bugzilla 33302. Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org	2011-05-02 14:44:46 -07:00
Stefano Stabellini	b9269dc7bf	xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top mask_rw_pte is currently checking if a pfn is a pagetable page if it falls in the range pgt_buf_start - pgt_buf_end but that is incorrect because pgt_buf_end is a moving target: pgt_buf_top is the real boundary. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-02 16:33:52 -04:00
Konrad Rzeszutek Wilk	a38647837a	xen/mmu: Add workaround "x86-64, mm: Put early page table high" As a consequence of the commit: commit `4b239f458c` Author: Yinghai Lu <yinghai@kernel.org> Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high it causes the Linux kernel to crash under Xen: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7) (XEN) mm.c:3027:d0 Error while pinning mfn b1d89 (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: ... The reason is that at some point init_memory_mapping is going to reach the pagetable pages area and map those pages too (mapping them as normal memory that falls in the range of addresses passed to init_memory_mapping as argument). Some of those pages are already pagetable pages (they are in the range pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and everything is fine. Some of these pages are not pagetable pages yet (they fall in the range pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they are going to be mapped RW. When these pages become pagetable pages and are hooked into the pagetable, xen will find that the guest has already a RW mapping of them somewhere and fail the operation. The reason Xen requires pagetables to be RO is that the hypervisor needs to verify that the pagetables are valid before using them. The validation operations are called "pinning" (more details in arch/x86/xen/mmu.c). In order to fix the issue we mark all the pages in the entire range pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation is completed only the range pgt_buf_start-pgt_buf_end is reserved by init_memory_mapping. Hence the kernel is going to crash as soon as one of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those ranges are RO). For this reason, this function is introduced which is called _after_ the init_memory_mapping has completed (in a perfect world we would call this function from init_memory_mapping, but lets ignore that). Because we are called _after_ init_memory_mapping the pgt_buf_[start, end,top] have all changed to new values (b/c another init_memory_mapping is called). Hence, the first time we enter this function, we save away the pgt_buf_start value and update the pgt_buf_[end,top]. When we detect that the "old" pgt_buf_start through pgt_buf_end PFNs have been reserved (so memblock_x86_reserve_range has been called), we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top. And then we update those "old" pgt_buf_[end\|top] with the new ones so that we can redo this on the next pagetable. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [v1: Updated with Jeremy's comments] [v2: Added the crash output] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-02 16:33:34 -04:00
Yinghai Lu	e5a10c1bd1	x86, NUMA: Trim numa meminfo with max_pfn in a separate loop During testing 32bit numa unifying code from tj, found one system with more than 64g fails to use numa. It turns out we do not trim numa meminfo correctly against max_pfn in case start address of a node is higher than 64GiB. Bug fix made it to tip tree. This patch moves the checking and trimming to a separate loop. So we don't need to compare low/high in following merge loops. It makes the code more readable. Also it makes the node merge printouts less strange. On a 512GiB numa system with 32bit, before: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000) after: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000) Signed-off-by: Yinghai Lu <yinghai@kernel.org> [Updated patch description and comment slightly.] Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 17:24:49 +02:00
Yinghai Lu	a56bca80db	x86, NUMA: Rename setup_node_bootmem() to setup_node_data() After using memblock to replace bootmem, that function only sets up node_data now. Change the name to reflect what it actually does. tj: Minor adjustment to the patch description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 17:24:49 +02:00
Tejun Heo	1b7e03ef75	x86, NUMA: Enable emulation on 32bit too Now that NUMA init path is unified, NUMA emulation can be enabled on 32bit. Make numa_emluation.c safe on 32bit by doing the followings. * Define MAX_DMA32_PFN on 32bit too. * Include bootmem.h for max_pfn declaration. * Use u64 explicitly and always use PFN_PHYS() when converting page number to address. * Avoid __udivdi3() generation on 32bit by doing number of pages calculation instead in split_nodes_interleave(). And drop X86_64 dependency from Kconfig. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	2706a0bf7b	x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too Now that NUMA init path is unified, amdtopology can be enabled on 32bit. Make amdtopology.c safe on 32bit by explicitly using u64 and drop X86_64 dependency from Kconfig. Inclusion of bootmem.h is added for max_pfn declaration. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	c6f5887820	x86, NUMA: Rename amdtopology_64.c to amdtopology.c amdtopology is going to be used by 32bit too drop _64 suffix. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	752d4f372f	x86, NUMA: Make numa_init_array() static numa_init_array() no longer has users outside of numa.c. Make it static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	bd6709a91a	x86, NUMA: Make 32bit use common NUMA init path With both _numa_init() methods converted and the rest of init code adjusted, numa_32.c now can switch from the 32bit only init code to the common one in numa.c. * Shim get_memcfg_()'s are dropped and initmem_init() calls x86_numa_init(), which is updated to handle NUMAQ. All boilerplate operations including node range limiting, pgdat alloc/init are handled by numa_init(). 32bit only implementation is removed. * 32bit numa_add_memblk(), numa_set_distance() and memory_add_physaddr_to_nid() removed and common versions in numa_32.c enabled for 32bit. This change causes the following behavior changes. * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized for 32bit too. * Much more sanity checks and configuration cleanups. * Proper handling of node distances. * The same NUMA init messages as 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	7888e96b26	x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() setup_node_bootmem() is taken from 64bit and doesn't use remap allocator. It's about to be shared with 32bit so add support for it. If NODE_DATA is remapped, it's noted in the debug message and node locality check is skipped as the __pa() of the remapped address doesn't reflect the actual physical address. On 64bit, remap allocator becomes noop and doesn't affect the behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:54 +02:00
Tejun Heo	99cca492ea	x86-32, NUMA: Add @start and @end to init_alloc_remap() Instead of dereferencing node_start/end_pfn[] directly, make init_alloc_remap() take @start and @end and let the caller be responsible for making sure the range is sane. This is to prepare for use from unified NUMA init code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:54 +02:00
Tejun Heo	38f3e1ca24	x86, NUMA: Remove long 64bit assumption from numa.c Code moved from numa_64.c has assumption that long is 64bit in several places. This patch removes the assumption by using {s\|u}64_t explicity, using PFN_PHYS() for page number -> addr conversions and adjusting printf formats. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	744baba0c4	x86, NUMA: Enable build of generic NUMA init code on 32bit Generic NUMA init code was moved to numa.c from numa_64.c but is still guaraded by CONFIG_X86_64. This patch removes the compile guard and enables compiling on 32bit. * numa_add_memblk() and numa_set_distance() clash with the shim implementation in numa_32.c and are left out. * memory_add_physaddr_to_nid() clashes with 32bit implementation and is left out. * MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32. * node_data definition in numa_32.c removed in favor of the one in numa.c. There are places where ulong is assumed to be 64bit. The next patch will fix them up. Note that although the code is compiled it isn't used yet and this patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	a4106eae65	x86, NUMA: Move NUMA init logic from numa_64.c to numa.c Move the generic 64bit NUMA init machinery from numa_64.c to numa.c. * node_data[], numa_mem_info and numa_distance * numa_add_memblk[_to](), numa_remove_memblk[_from]() * numa_set_distance() and friends * numa_init() and all the numa_meminfo handling helpers called from it * dummy_numa_init() * memory_add_physaddr_to_nid() A new function x86_numa_init() is added and the content of numa_64.c::initmem_init() is moved into it. initmem_init() now simply calls x86_numa_init(). Constants and numa_off declaration are moved from numa_{32\|64}.h to numa.h. This is code reorganization and doesn't involve any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	299a180aec	x86-32, NUMA: Update numaq to use new NUMA init protocol Update numaq such that it calls numa_add_memblk() and sets numa_nodes_parsed instead of directly diddling with NUMA states. The original get_memcfg_numaq() is renamed to numaq_numa_init() and new get_memcfg_numaq() is created in numa_32.c. The shim numa_add_memblk() implementation handles node_start/end_pfn[] and node_set_online() for nodes with memory. The new get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other than calling the numaq init function. Things get_memcfgs_numaq() do are not strictly necessary for numaq but added for consistency and to help unifying NUMA init handling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	5acd91ab83	x86-32, NUMA: Replace srat_32.c with srat.c SRAT support implementation in srat_32.c and srat.c are generally similar; however, there are some differences. First of all, 64bit implementation supports more types of SRAT entries. 64bit supports x2apic, affinity, memory and SLIT. 32bit only supports processor and memory. Most other differences stem from different initialization protocols employed by 64bit and 32bit NUMA init paths. On 64bit, * Mappings among PXM, node and apicid are directly done in each SRAT entry callback. * Memory affinity information is passed to numa_add_memblk() which takes care of all interfacing with NUMA init. * Doesn't directly initialize NUMA configurations. All the information is recorded in numa_nodes_parsed and memblks. On 32bit, * Checks numa_off. * Things go through one more level of indirection via private tables but eventually end up initializing the same mappings. * node_start/end_pfn[] are initialized and memblock_x86_register_active_regions() is called for each memory chunk. * node_set_online() is called for each online node. * sort_node_map() is called. There are also other minor differences in sanity checking and messages but taking 64bit version should be good enough. This patch drops the 32bit specific implementation and makes the 64bit implementation common for both 32 and 64bit. The init protocol differences are dealt with in two places - the numa_add_memblk() shim added in the previous patch and new temporary numa_32.c:get_memcfg_from_srat() which wraps invocation of x86_acpi_numa_init(). The shim numa_add_memblk() handles the folowings. * node_start/end_pfn[] initialization. * node_set_online() for memory nodes. * Invocation of memblock_x86_register_active_regions(). The shim get_memcfg_from_srat() handles the followings. * numa_off check. * node_set_online() for CPU nodes. * sort_node_map() invocation. * Clearing of numa_nodes_parsed and active_ranges on failure. The shims are temporary and will be removed as the generic NUMA init path in 32bit is replaced with 64bit one. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	b0d310801a	x86-32, NUMA: implement temporary NUMA init shims To help transition to common NUMA init, implement temporary 32bit shims for numa_add_memblk() and numa_set_distance(). numa_add_memblk() registers the memblk and adjusts node_start/end_pfn[]. numa_set_distance() is noop. These shims will allow using 64bit NUMA init functions on 32bit and gradual transition to common NUMA init path. For detailed description, please read description of commits which make use of the shim functions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	e6df595b37	x86, NUMA: Move numa_nodes_parsed to numa.[hc] Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for NUMA init path unification. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	daf4f480ae	x86-32, NUMA: Move get_memcfg_numa() into numa_32.c There's no reason get_memcfg_numa() to be implemented inline in mmzone_32.h. Move it to numa_32.c and also make get_memcfg_numa_flag() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	eca9ad3132	x86, NUMA: make srat.c 32bit safe Make srat.c 32bit safe by removing the assumption that unsigned long is 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	7b2600f8ee	x86, NUMA: rename srat_64.c to srat.c Rename srat_64.c to srat.c. This is to prepare for unification of NUMA init paths between 32 and 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	1201e10a09	x86, NUMA: trivial cleanups * Kill no longer used struct bootnode. * Kill dangling declaration of pxm_to_nid() in numa_32.h. * Make setup_node_bootmem() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	797390d855	x86-32, NUMA: use sparse_memory_present_with_active_regions() Instead of calling memory_present() for each region from NUMA init, call sparse_memory_present_with_active_regions() from paging_init() similarly to x86-64. For flat and numaq, this results in exactly the same memory_present() calls. For srat, if there are multiple memory chunks for a node, after this change, memory_present() will be called separately for each chunk instead of being called once to encompass the whole range, which doesn't cause any harm and actually is the better behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	84914ed0ec	x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional NUMAQ is the only meaningful user of this callback and setup_local_APIC() the only callsite. Stop torturing everyone else by making the callback optional and removing all the boilerplate implementations and assignments. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	6bd262731b	x86, NUMA: Unify 32/64bit numa_cpu_node() implementation Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is NUMAQ which returns valid mapping only after CPU is initialized during SMP bringup; thus, the previous patch to set apicid -> node in setup_local_APIC() makes __apicid_to_node[] always contain the correct mapping whether custom apic->x86_32_numa_cpu_node() is used or not. So, there is no reason to keep separate 32bit implementation. We can always consult __apicid_to_node[]. Move 64bit implementation from numa_64.c to numa.c and remove 32bit implementation from numa_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	c4b90c1199	x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid -> node mapping using set_apicid_to_node() during NUMA init but implement custom apic->x86_32_numa_cpu_node() instead. This patch automatically initializes the default apic -> node mapping table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such that the mapping table is in sync with the actual mapping. As the table isn't used by custom implementations, this doesn't make any difference at this point. This is in preparation of unifying numa_cpu_node() between x86-32 and 64. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	acd26d611e	x86-64, NUMA: simplify nodedata allocation With top-down memblock allocation, the allocation range limits in ealry_node_mem() can be simplified - try node-local first, then any node but in any case don't allocate below DMA limit. Remove early_node_mem() and implement simplified allocation directly in setup_node_bootmem(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	ebe685f24e	x86-64, NUMA: trivial cleanups for setup_node_bootmem() Make the following trivial changes in preparation for further updates. * nodeid -> nid, nid -> tnid * use nd_ prefix for nodedata related variables * remove start/end_pfn and use start/end directly Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	9688678a66	x86-64, NUMA: Simplify hotadd memory handling The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	ba67cf5cf2	Merge branch 'x86/urgent' into x86-mm Merge reason: Pick up the following two fix commits. `2be19102b7`: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() `765af22da8`: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Scheduled NUMA init 32/64bit unification changes depend on these. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 14:16:47 +02:00
Tejun Heo	aff364860a	Merge branch 'x86/numa' into x86-mm Merge reason: Pick up x86-32 remap allocator cleanup changes - 14 commits, 3fe14ab541^..993ba1585c. `3fe14ab541`: x86-32, numa: Fix failure condition check in alloc_remap() `993ba1585c`: x86-32, numa: Update remap allocator comments Scheduled NUMA init 32/64bit unification changes depend on them. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 14:08:47 +02:00
Bart Van Assche	9de4966a4d	x86: Fix spelling error in the memcpy() source code comment Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/201105011409.21629.bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 19:16:18 +02:00
Yinghai Lu	2be19102b7	x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() numa_cleanup_meminfo() trims each memblk between low (0) and high (max_pfn) limits and discards empty ones. However, the emptiness detection incorrectly used equality test. If the start of a memblk is higher than max_pfn, it is empty but fails the equality test and doesn't get discarded. The condition triggers when max_pfn is lower than start of a NUMA node and results in memory misconfiguration - leading to WARN_ON()s and other funnies. The bug was discovered in devel branch where 32bit too uses this code path for NUMA init. If a node is above the addressing limit, max_pfn ends up lower than the node triggering this problem. The failure hasn't been observed on x86-64 but is still possible with broken hardware e820/NUMA info. As the fix is very low risk, it would be better to apply it even for 64bit. Fix it by using >= instead of ==. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Extracted the actual fix from the original patch and rewrote patch description. ] Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 19:15:11 +02:00
Ingo Molnar	809435ff4f	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2011-05-01 19:09:39 +02:00
Boris Ostrovsky	e20a2d205c	x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit `b87cf80af3` ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 18:55:51 +02:00
Linus Torvalds	40a963502c	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, nmi: Move LVT un-masking into irq handlers perf events, x86: Work around the Nehalem AAJ80 erratum perf, x86: Fix BTS condition ftrace: Build without frame pointers on Microblaze	2011-04-29 15:08:53 -07:00
Mike Waychison	f548ccd47d	x86: Better comments for get_bios_ebda() Make the comments a bit clearer for get_bios_ebda so that it actually tells us what it is returning. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-04-29 14:13:15 -07:00
Mike Waychison	57d5f9f808	x86: get_bios_ebda_length() Add a wrapper routine that tells us the length of the EBDA if it is present. This guy also ensures that the returned length doesn't let the EBDA run past the 640KiB mark. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-04-29 14:13:15 -07:00
Ingo Molnar	301120396b	perf events, x86: Add Westmere stalled-cycles-frontend/backend events Extend the Intel Westmere PMU driver with definitions for generic front-end and back-end stall events. ( These are only approximations. ) Reported-by: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n008io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 16:22:37 +02:00
Ingo Molnar	91fc4cc000	perf, x86: Add new stalled cycles events for Intel and AMD CPUs Extend the Intel and AMD event definitions with generic front-end and back-end stall events. ( These are only approximations - suggestions are welcome for better events. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 14:24:15 +02:00
Ingo Molnar	8f62242246	perf events: Add generic front-end and back-end stalled cycle event definitions Add two generic hardware events: front-end and back-end stalled cycles. These events measure conditions when the CPU is executing code but its capabilities are not fully utilized. Understanding such situations and analyzing them is an important sub-task of code optimization workflows. Both events limit performance: most front end stalls tend to be caused by branch misprediction or instruction fetch cachemisses, backend stalls can be caused by various resource shortages or inefficient instruction scheduling. Front-end stalls are the more important ones: code cannot run fast if the instruction stream is not being kept up. An over-utilized back-end can cause front-end stalls and thus has to be kept an eye on as well. The exact composition is very program logic and instruction mix dependent. We use the terms 'stall', 'front-end' and 'back-end' loosely and try to use the best available events from specific CPUs that approximate these concepts. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 14:23:58 +02:00
Tim Gardner	c7a7b814c9	ioremap: Delay sanity check until after a successful mapping While tracking down the reason for an ioremap() failure I was distracted by the WARN_ONCE() in __ioremap_caller(). Performing a WARN_ONCE() sanity check before the mapping is successful seems pointless if the caller sends bad values. A case in point is when the BIOS provides erroneous screen_info values causing vesafb_probe() to request an outrageuous size. The WARN_ONCE is then wasted on bogosity. Move the warning to a point where the mapping has been successfully allocated. Addresses: http://bugs.launchpad.net/bugs/772042 Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Link: http://lkml.kernel.org/r/4DB99D2E.9080106@canonical.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 08:02:47 +02:00
Oleg Nesterov	e9bd3f0faa	x86: signal: sys_rt_sigreturn() should use set_current_blocked() Normally sys_rt_sigreturn() restores the old current->blocked which was changed by handle_signal(), and unblocking is always fine. But the debugger or application itself can change frame->uc_sigmask and thus we need set_current_blocked()->retarget_shared_pending(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com> Acked-by: Tejun Heo <tj@kernel.org>	2011-04-28 13:01:38 +02:00
Oleg Nesterov	e6a585801b	x86: signal: handle_signal() should use set_current_blocked() This is ugly, but if sigprocmask() needs retarget_shared_pending() then handle signal should follow this logic. In theory it is newer correct to add the new signals to current->blocked, the signal handler can sleep/etc so we should notify other threads in case we block the pending signal and nobody else has TIF_SIGPENDING. Of course, this change doesn't make signals faster :/ Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com> Acked-by: Tejun Heo <tj@kernel.org>	2011-04-28 13:01:37 +02:00
Sebastian Andrzej Siewior	1ff42c32c7	x86: ce4100: Configure IOAPIC pins for USB and SATA to level type The USB and SATA ioapic interrrupt pins are configured as edge type, but need to be level type interrupts to work correctly. [ tglx: Split out from the combo patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-28 11:38:30 +02:00
Sebastian Andrzej Siewior	20443598d9	x86: devicetree: Configure IOAPIC pin only once We use io_apic_setup_irq_pin() in order to configure pin's interrupt number polarity and type. This is done on every irq_create_of_mapping() which happens for instance during pci enable calls. Level typed interrupts are masked by default, edge are unmasked. On the first ->xlate() call the level interrupt is configured and masked. The driver calls request_irq() and the line is unmasked. Lets assume the interrupt line is shared with another device and we call pci_enable_device() for this device. The ->xlate() configures the pin again and it is masked. request_irq() does not unmask the line because it _is_ already unmasked according to its internal state. So the interrupt will never be unmasked again. This patch is based on an earlier work by Torben Hohn and solves the problem by configuring the pin only once. Since all devices must agree on the same type and polarity there is no point in configuring the pin more than once. [ tglx: Split out the ce4100 part into a separate patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-28 11:38:30 +02:00
Ingo Molnar	8a850cadca	perf event, x86: Use better stalled cycles metric Use the UOPS_EXECUTED.*,c=1,i=1 event on Intel CPUs - it is a rather good indicator of CPU execution stalls, more sensitive and more inclusive than the 0xa2 resource stalls event (which does not count nearly as many stall types). Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn2dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-28 08:39:33 +02:00
Eric Dumazet	0a14842f5a	net: filter: Just In Time compiler for x86-64 In order to speedup packet filtering, here is an implementation of a JIT compiler for x86_64 It is disabled by default, and must be enabled by the admin. echo 1 >/proc/sys/net/core/bpf_jit_enable It uses module_alloc() and module_free() to get memory in the 2GB text kernel range since we call helpers functions from the generated code. EAX : BPF A accumulator EBX : BPF X accumulator RDI : pointer to skb (first argument given to JIT function) RBP : frame pointer (even if CONFIG_FRAME_POINTER=n) r9d : skb->len - skb->data_len (headlen) r8 : skb->data To get a trace of generated code, use : echo 2 >/proc/sys/net/core/bpf_jit_enable Example of generated code : # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24 flen=18 proglen=147 pass=3 image=ffffffffa00b5000 JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60 JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00 JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00 JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0 JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00 JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00 JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24 JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31 JIT code: ffffffffa00b5090: c0 c9 c3 BPF program is 144 bytes long, so native program is almost same size ;) (000) ldh [12] (001) jeq #0x800 jt 2 jf 8 (002) ld [26] (003) and #0xffffff00 (004) jeq #0xc0a81400 jt 16 jf 5 (005) ld [30] (006) and #0xffffff00 (007) jeq #0xc0a81400 jt 16 jf 17 (008) jeq #0x806 jt 10 jf 9 (009) jeq #0x8035 jt 10 jf 17 (010) ld [28] (011) and #0xffffff00 (012) jeq #0xc0a81400 jt 16 jf 13 (013) ld [38] (014) and #0xffffff00 (015) jeq #0xc0a81400 jt 16 jf 17 (016) ret #65535 (017) ret #0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 23:05:08 -07:00
Don Zickus	2bce5daca2	perf, x86, nmi: Move LVT un-masking into irq handlers It was noticed that P4 machines were generating double NMIs for each perf event. These extra NMIs lead to 'Dazed and confused' messages on the screen. I tracked this down to a P4 quirk that said the overflow bit had to be cleared before re-enabling the apic LVT mask. My first attempt was to move the un-masking inside the perf nmi handler from before the chipset NMI handler to after. This broke Nehalem boxes that seem to like the unmasking before the counters themselves are re-enabled. In order to keep this change simple for 2.6.39, I decided to just simply move the apic LVT un-masking to the beginning of all the chipset NMI handlers, with the exception of Pentium4's to fix the double NMI issue. Later on we can move the un-masking to later in the handlers to save a number of 'extra' NMIs on those particular chipsets. I tested this change on a P4 machine, an AMD machine, a Nehalem box, and a core2quad box. 'perf top' worked correctly along with various other small 'perf record' runs. Anything high stress breaks all the machines but that is a different problem. Thanks to various people for testing different versions of this patch. Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Cyrill Gorcunov <gorcunov@gmail.com>	2011-04-27 17:59:11 +02:00
Ingo Molnar	32673822e4	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core Conflicts: include/linux/perf_event.h Merge reason: pick up the latest jump-label enhancements, they are cooked ready. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-27 10:40:21 +02:00
Ingo Molnar	5c543e3c44	perf events, x86: Mark constrant tables read mostly Various constraint tables were not marked read-mostly. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-wpqwwvmhxucy5e718wnamjiv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 20:04:53 +02:00
Ingo Molnar	94403f8863	perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES The new PERF_COUNT_HW_STALLED_CYCLES event tries to approximate cycles the CPU does nothing useful, because it is stalled on a cache-miss or some other condition. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-fue11vymwqsoo5to72jxxjyl@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 20:04:53 +02:00
Ingo Molnar	7bd5fafeb4	Merge branch 'perf/urgent' into perf/stat Merge reason: We want to queue up dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 19:36:17 +02:00
Ingo Molnar	ec75a71634	perf events, x86: Work around the Nehalem AAJ80 erratum On Nehalem CPUs the retired branch-misses event can be completely bogus, when there are no branch-misses occuring. When there are a lot of branch misses then the count is pretty accurate. Still, this leaves us with an event that over-counts a lot. Detect this erratum and work it around by using BR_MISP_EXEC.ANY events. These will also count speculated branches but still it's a lot more precise in practice than the architectural event. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 19:34:34 +02:00
Peter Zijlstra	18a073a3ac	perf, x86: Fix BTS condition Currently the x86 backend incorrectly assumes that any BRANCH_INSN with sample_period==1 is a BTS request. This is not true when we do frequency driven profiling such as 'perf record -e branches'. Solves this error: $ perf record -e branches ./array Error: sys_perf_event_open() syscall returned with 95 (Operation not supported). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ingo Molnar <mingo@elte.hu> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 13:34:34 +02:00
H. Peter Anvin	39b68976ac	x86, setup: When probing memory with e801, use ax/bx as a pair When we use BIOS function e801 to probe memory, we should use ax/bx (or cx/dx) as a pair, not mix and match. This was a typo during the translation from assembly code, and breaks at least one set of machines in the field (which return cx = dx = 0). Reported-and-tested-by: Chris Samuel <chris@csamuel.org> Fix-proposed-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1303566747.12067.10.camel@localhost.localdomain	2011-04-25 14:52:37 -07:00
Frederic Weisbecker	87dc669ba2	x86, hw_breakpoints: Fix racy access to ptrace breakpoints While the tracer accesses ptrace breakpoints, the child task may concurrently exit due to a SIGKILL and thus release its breakpoints at the same time. We can then dereference some freed pointers. To fix this, hold a reference on the child breakpoints before manipulating them. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: v2.6.33.. <stable@kernel.org> Link: http://lkml.kernel.org/r/1302284067-7860-3-git-send-email-fweisbec@gmail.com	2011-04-25 17:32:40 +02:00
Avi Kivity	15d6aba24d	x86: Demacro CONFIG_PARAVIRT cpu accessors Recently, we had a build failure on !CONFIG_PARAVIRT due to a callback ->wbinvd() clashing with a macro wbinvd(). While we worked around the issue, avoid it in the future by changing the macro (and a few surrounding ones) to an inline function. Signed-off-by: Avi Kivity <avi@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/1303632711-21662-1-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-24 13:20:36 +02:00
Justin P. Mattock	fa7b69475a	perf events, x86, P4: Fix typo in comment Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1303492132-3004-1-git-send-email-justinmattock@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-24 13:16:04 +02:00
Linus Torvalds	686c4cbb10	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Add missing syscore_suspend() and syscore_resume() calls PM: Fix error code paths executed after failing syscore_suspend()	2011-04-23 22:35:16 -07:00
Peter Zijlstra	f4929bd372	perf, x86: Update/fix Intel Nehalem cache events Change the Nehalem cache events to use retired memory instruction counters (similar to Westmere), this greatly improves the provided stats. Using: main () { int i; for (i = 0; i < 1000000000; i++) { asm("mov (%%rsp), %%rbx;" "mov %%rbx, (%%rsp);" : : : "rbx"); } } We find: $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% ) 4,999,502,846 l1-dcache-loads:u ( +- 0.008% ) 1,000,034,832 l1-dcache-stores:u ( +- 0.000% ) 1.565184942 seconds time elapsed ( +- 0.005% ) The 5b is surprising - we'd expect 1b: $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% ) 1,000,021,961 r10b:u ( +- 0.000% ) 1,000,030,951 l1-dcache-stores:u ( +- 0.000% ) 1.565055422 seconds time elapsed ( +- 0.003% ) Which this patch thus fixes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 13:50:27 +02:00
Cyrill Gorcunov	1ea5a6afd9	perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow It's not enough to simply disable event on overflow the cpuc->active_mask should be cleared as well otherwise counter may stall in "active" even in real being already disabled (which potentially may lead to the situation that user may not use this counter further). Don pointed out that: " I also noticed this patch fixed some unknown NMIs on a P4 when I stressed the box". Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:21:34 +02:00
Cyrill Gorcunov	103b393481	perf, x86: P4 PMU -- Use perf_sample_data_init helper Instead of opencoded assignments better to use perf_sample_data_init helper. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:20:04 +02:00
Ingo Molnar	eff430de53	Merge branch 'linus' into perf/core Merge reason: Pick up upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:19:30 +02:00
Ingo Molnar	b52c55c6a2	x86, perf event: Turn off unstructured raw event access to offcore registers Andi Kleen pointed out that the Intel offcore support patches were merged without user-space tool support to the functionality: \| \| The offcore_msr perf kernel code was merged into 2.6.39-rc, but the \| user space bits were not. This made it impossible to set the extra mask \| and actually do the OFFCORE profiling \| Andi submitted a preliminary patch for user-space support, as an extension to perf's raw event syntax: \| \| Some raw events -- like the Intel OFFCORE events -- support additional \| parameters. These can be appended after a ':'. \| \| For example on a multi socket Intel Nehalem: \| \| perf stat -e r1b7:20ff -a sleep 1 \| \| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0 \| that measures any access to DRAM on another socket. \| But this kind of usability is absolutely unacceptable - users should not be expected to type in magic, CPU and model specific incantations to get access to useful hardware functionality. The proper solution is to expose useful offcore functionality via generalized events - that way users do not have to care which specific CPU model they are using, they can use the conceptual event and not some model specific quirky hexa number. We already have such generalization in place for CPU cache events, and it's all very extensible. "Offcore" events measure general DRAM access patters along various parameters. They are particularly useful in NUMA systems. We want to support them via generalized DRAM events: either as the fourth level of cache (after the last-level cache), or as a separate generalization category. That way user-space support would be very obvious, memory access profiling could be done via self-explanatory commands like: perf record -e dram ./myapp perf record -e dram-remote ./myapp ... to measure DRAM accesses or more expensive cross-node NUMA DRAM accesses. These generalized events would work on all CPUs and architectures that have comparable PMU features. ( Note, these are just examples: actual implementation could have more sophistication and more parameter - as long as they center around similarly simple usecases. ) Now we do not want to revert all* of the current offcore bits, as they are still somewhat useful for generic last-level-cache events, implemented in this commit: `e994d7d23a`: perf: Fix LLC-* events on Intel Nehalem/Westmere But we definitely do not yet want to expose the unstructured raw events to user-space, until better generalization and usability is implemented for these hardware event features. ( Note: after generalization has been implemented raw offcore events can be supported as well: there can always be an odd event that is marginally useful but not useful enough to generalize. DRAM profiling is definitely not such a category so generalization must be done first. ) Furthermore, PERF_TYPE_RAW access to these registers was not intended to go upstream without proper support - it was a side-effect of the above `e994d7d23a` commit, not mentioned in the changelog. As v2.6.39 is nearing release we go for the simplest approach: disable the PERF_TYPE_RAW offcore hack for now, before it escapes into a released kernel and becomes an ABI. Once proper structure is implemented for these hardware events and users are offered usable solutions we can revisit this issue. Reported-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:02:53 +02:00
Andi Kleen	b2508e828d	perf: Support Xeon E7's via the Westmere PMU driver There's a new model number public, 47, for Xeon E7 (aka Westmere EX). Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: a.p.zijlstra@chello.nl Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 08:27:29 +02:00
Ingo Molnar	42ac9e87fd	Merge commit 'v2.6.39-rc4' into sched/core Merge reason: Pick up upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:39:28 +02:00
Borislav Petkov	dffa4b2f62	x86, mce: Drop the default decoding notifier The default notifier doesn't make a lot of sense to call in the correctable errors case. Drop it and emit the mcelog decoding hint only in the uncorrectable errors case and when no notifier is registered. Also, limit issuing the "mcelog --ascii" message in the rare case when we dump unreported CEs before panicking. While at it, remove unused old x86_mce_decode_callback from the header. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com> Cc: Russ Anderson <rja@sgi.com> Link: http://lkml.kernel.org/r/20110420102349.GB1361@aftab Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:35:10 +02:00
David Rientjes	7a6c654782	x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y when NUMA emulation is enabled is currently broken because it does not iterate through every emulated node and bind cpus that have affinity to it. NUMA emulation should bind each cpu to every local node to accurately represent the true NUMA topology of the underlying machine. debug_cpumask_set_cpu() needs to be fixed at the same time so that the debugging information that it emits shows the new cpumask of the node being assigned when the cpu is being added or removed. It can now take responsibility of setting or clearing the cpu itself to remove the need for duplicate code. Also change its last parameter, "enable", to have the correct bool type since it can only be true or false. -v2: Fix the return statements, by Kosaki Motohiro Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:31:00 +02:00
David Rientjes	37f8527dbf	Revert "x86, NUMA: Fix fakenuma boot failure" Andreas Herrmann reported that `7d6b46707f` ("x86, NUMA: Fix fakenuma boot failure") causes certain physical NUMA topologies (for example AMD Magny-Cours) to move sibling cpus to a single node when in reality they are in separate domains. This may result in some nodes being completely void of cpus, which doesn't accurately represent the correct topology. The system will boot, but will have suboptimal NUMA performance. This commit was intended as a fix for NUMA emulation, but should not cause a regression for real NUMA machines as a side effect. ( There will be a separate fix for the numa-debug code, which will not affect physical topologies. ) Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:30:59 +02:00
Linus Torvalds	8653b3f1d5	Merge branch 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 xen: do not create the extra e820 region at an addr lower than 4G	2011-04-20 17:40:25 -07:00
Konrad Rzeszutek Wilk	8a91707d0a	xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. If the backends, which use these two functions, are compiled as a module we need these two functions to be exported. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-20 11:56:57 -04:00
Stefano Stabellini	ee176455e2	xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on x86_64, in fact early_ioremap is not used at all to setup the initial pagetable on x86_32. Moreover on x86_32 the two checks are wrong because the range pgt_buf_start..pgt_buf_end initially should be mapped RW because the pages in the range are not pagetable pages yet and haven't been cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is part of the initial mapping, xen_alloc_pte is capable of turning the ptes RO when they become pagetable pages. Fix the issue and improve the readability of the code providing two different implementation of mask_rw_pte for x86_32 and x86_64. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-20 09:43:13 -04:00
Stefano Stabellini	24bdb0b62c	xen: do not create the extra e820 region at an addr lower than 4G Do not add the extra e820 region at a physical address lower than 4G because it breaks e820_end_of_low_ram_pfn(). It is OK for us to move the xen_extra_mem_start up and down because this is the index of the memory that can be ballooned in/out - it is memory not available to the kernel during bootup. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-20 09:04:40 -04:00
Rafael J. Wysocki	19234c0819	PM: Add missing syscore_suspend() and syscore_resume() calls Device suspend/resume infrastructure is used not only by the suspend and hibernate code in kernel/power, but also by APM, Xen and the kexec jump feature. However, commit `40dc166cb5` (PM / Core: Introduce struct syscore_ops for core subsystems PM) failed to add syscore_suspend() and syscore_resume() calls to that code, which generally leads to breakage when the features in question are used. To fix this problem, add the missing syscore_suspend() and syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c and drivers/xen/manage.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>	2011-04-20 00:36:11 +02:00
Linus Torvalds	9d914b3ef3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, gart: Make sure GART does not map physmem above 1TB x86, gart: Set DISTLBWALKPRB bit always x86, gart: Convert spaces to tabs in enable_gart_translation	2011-04-19 10:58:13 -07:00
Linus Torvalds	96ad999918	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Fix AMD family 15h FPU event constraints perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus perf evsel: Fix use of inherit perf hists browser: Fix seg fault when annotate null symbol	2011-04-19 10:56:02 -07:00
Borislav Petkov	7b70bd3441	x86, MCE: Do not taint when handling correctable errors Correctable errors are considered something rather normal on modern hardware these days. Even more importantly, correctable errors mean exactly that - they've been corrected by the hardware - and there's no need to taint the kernel since execution hasn't been compromised so far. Also, drop tainting in the thermal throttling code for a similar reason: crossing a thermal threshold does not mean corruption. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Tony Luck <tony.luck@intel.com> Acked-by: Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Russ Anderson <rja@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1303135222-17118-1-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 19:14:13 +02:00
Youquan Song	2b398bd9f8	x86, apic: Print verbose error interrupt reason on apic=debug End users worry about the error interrupt printout we generate currently: pr_debug("APIC error on CPU%d: %02x(%02x)\n", smp_processor_id(), v , v1); ... and would like to know the reason why error interrupts are generated. This patch prints out more detailed debug information. Another practical problem is that dynamic debug is not initialized yet when the APIC initializes, so the pr_debug() will not output the error interrupt debug information on bootup. In this patch, we use apic_printk(APIC_DEBUG, ...), so the apic=debug boot option will print verbose error interupts during bootup. Signed-off-by: Youquan Song <youquan.song@intel.com> Cc: Joe Perches <joe@perches.com> Cc: hpa@linux.intel.com Cc: suresh.b.siddha@intel.com Cc: yong.y.wang@linux.intel.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1302762968-24380-2-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 19:03:30 +02:00
Robert Richter	c8e5910edf	perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids an extra pointer chase and data cache hit. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1302913676-14352-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:08:12 +02:00
Robert Richter	855357a217	perf, x86: Fix AMD family 15h FPU event constraints Depending on the unit mask settings some FPU events may be scheduled only on cpu counter #3. This patch fixes this. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@googlemail.com> Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:07:55 +02:00
Andre Przywara	83112e688f	perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus With AMD cpu family 15h a unit mask was introduced for the Data Cache Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0 (first data cache miss or streaming store to a 64 B cache line) of this mask to proper count data cache misses. Now we set this bit for all families and models. In case a PMU does not implement a unit mask for event 0x041 the bit is ignored. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:07:54 +02:00
Ingo Molnar	68d2cf25d3	Merge branch 'perf/urgent' into perf/core Merge reason: we'll be queueing up dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 07:56:17 +02:00
H. Peter Anvin	d8d9766c8c	x86, cpu: Change NOP selection for certain Intel CPUs Due to a decoder implementation quirk, some specific Intel CPUs actually perform better with the "k8_nops" than with the SDM-recommended NOPs. For runtime-selected NOPs, if we detect those specific CPUs then use the k8_nops instead of the ones we would normally use. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/r/1303166160-10315-4-git-send-email-hpa@linux.intel.com	2011-04-18 16:40:32 -07:00
H. Peter Anvin	dc326fca2b	x86, cpu: Clean up and unify the NOP selection infrastructure Clean up and unify the NOP selection infrastructure: - Make the atomic 5-byte NOP a part of the selection system. - Pick NOPs once during early boot and then be done with it. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/r/1303166160-10315-3-git-send-email-hpa@linux.intel.com	2011-04-18 16:40:21 -07:00
H. Peter Anvin	b1e7734f02	x86, percpu: Use ASM_NOP4 instead of hardcoding P6_NOP4 For use in assembly constants, use the ASM_NOP* defines. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1303166160-10315-2-git-send-email-hpa@linux.intel.com	2011-04-18 16:40:06 -07:00
Joerg Roedel	c387aa3a1a	x86, gart: Don't enforce GART aperture lower-bound by alignment This patch changes the allocation of the GART aperture to enforce only natural alignment instead of aligning it on 512MB. This big alignment was used to force the GART aperture to be over 512MB. This is enforced by using 512MB as the lower-bound address in the allocation range. [ hpa: The actual number 512 MiB needs to be revisited, too. ] Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-2-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-18 10:31:38 -07:00
Joerg Roedel	665d3e2af8	x86, gart: Make sure GART does not map physmem above 1TB The GART can only map physical memory below 1TB. Make sure the gart driver in the kernel does not try to map memory above 1TB. Cc: <stable@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-18 09:26:49 -07:00
Joerg Roedel	c34151a742	x86, gart: Set DISTLBWALKPRB bit always The DISTLBWALKPRB bit must be set for the GART because the gatt table is mapped UC. But the current code does not set the bit at boot when the BIOS setup the aperture correctly. Fix that by setting this bit when enabling the GART instead of the other places. Cc: <stable@kernel.org> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-18 09:26:48 -07:00
Joerg Roedel	af289bfe15	x86, gart: Convert spaces to tabs in enable_gart_translation Probably by copy&paste this function was indented by spaces. Convert this to tabs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-3-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-18 09:26:48 -07:00
Konrad Rzeszutek Wilk	cf8d91633d	xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. We only supported the M2P (and P2M) override only for the GNTMAP_contains_pte type mappings. Meaning that we grants operations would "contain the machine address of the PTE to update" If the flag is unset, then the grant operation is "contains a host virtual address". The latter case means that the Hypervisor takes care of updating our page table (specifically the PTE entry) with the guest's MFN. As such we should not try to do anything with the PTE. Previous to this patch we would try to clear the PTE which resulted in Xen hypervisor being upset with us: (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067 (XEN) domain_crash called from mm.c:1067 (XEN) Domain 0 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]---- and crashing us. This patch allows us to inhibit the PTE clearing in the PV guest if the GNTMAP_contains_pte is not set. On the m2p_remove_override path we provide the same parameter. Sadly in the grant-table driver we do not have a mechanism to tell m2p_remove_override whether to clear the PTE or not. Since the grant-table driver is used by user-space, we can safely assume that it operates only on PTE's. Hence the implementation for it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future we can implement the support for this. It will require some extra accounting structure to keep track of the page[i], and the flag. [v1: Added documentation details, made it return -EOPNOTSUPP instead of trying to do a half-way implementation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-18 11:10:27 -04:00
Linus Torvalds	fdfc552abe	Merge branches 'core-fixes-for-linus', 'perf-fixes-for-linus', 'sched-fixes-for-linus', 'timer-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec() perf: Fix a build error with some GCC versions * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix erroneous all_pinned logic sched: Fix sched-domain avg_load calculation * 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: rtc-mrst: follow on to the change of rtc_device_register() RTC: add missing "return 0" in new alarm func for rtc-bfin.c RTC: Fix s3c compile error due to missing s3c_rtc_setpie RTC: Fix early irqs caused by calling rtc_set_alarm too early * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd: Disable GartTlbWlkErr when BIOS forgets it x86, NUMA: Fix fakenuma boot failure x86/mrst: Fix boot crash caused by incorrect pin to irq mapping x86/ce4100: Add reg property to bridges	2011-04-16 09:45:08 -07:00
Joerg Roedel	5bbc097d89	x86, amd: Disable GartTlbWlkErr when BIOS forgets it This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-04-15 16:03:16 -07:00
KOSAKI Motohiro	7d6b46707f	x86, NUMA: Fix fakenuma boot failure Currently, numa=fake boot parameter is broken. If it's used, kernel may panic due to devide by zero error depending on CPU configuration Call Trace: [<ffffffff8104ad4c>] find_busiest_group+0x38c/0xd30 [<ffffffff81086aff>] ? local_clock+0x6f/0x80 [<ffffffff81050533>] load_balance+0xa3/0x600 [<ffffffff81050f53>] idle_balance+0xf3/0x180 [<ffffffff81550092>] schedule+0x722/0x7d0 [<ffffffff81550538>] ? wait_for_common+0x128/0x190 [<ffffffff81550a65>] schedule_timeout+0x265/0x320 [<ffffffff81095815>] ? lock_release_holdtime+0x35/0x1a0 [<ffffffff81550538>] ? wait_for_common+0x128/0x190 [<ffffffff8109bb6c>] ? __lock_release+0x9c/0x1d0 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40 [<ffffffff81550540>] wait_for_common+0x130/0x190 [<ffffffff81051920>] ? try_to_wake_up+0x510/0x510 [<ffffffff8155067d>] wait_for_completion+0x1d/0x20 [<ffffffff8107f36c>] kthread_create_on_node+0xac/0x150 [<ffffffff81077bb0>] ? process_scheduled_works+0x40/0x40 [<ffffffff8155045f>] ? wait_for_common+0x4f/0x190 [<ffffffff8107a283>] __alloc_workqueue_key+0x1a3/0x590 [<ffffffff81e0cce2>] cpuset_init_smp+0x6b/0x7b [<ffffffff81df3d07>] kernel_init+0xc3/0x182 [<ffffffff8155d5e4>] kernel_thread_helper+0x4/0x10 [<ffffffff81553cd4>] ? retint_restore_args+0x13/0x13 [<ffffffff81df3c44>] ? start_kernel+0x400/0x400 [<ffffffff8155d5e0>] ? gs_change+0x13/0x13 The divede by zero is caused by the following line, group->cpu_power==0: kernel/sched_fair.c::update_sg_lb_stats() /* Adjust by relative CPU power of the group / sgs->avg_load = (sgs->group_load SCHED_LOAD_SCALE) / group->cpu_power; This regression was caused by commit `e23bba6044` ("x86-64, NUMA: Unify emulated distance mapping") because it changes cpu -> node mapping in the process of dropping fake_physnodes(). old) all cpus are assinged node 0 now) cpus are assigned round robin (the logic is implemented by numa_init_array()) Note: The change in behavior only happens if the system doesn't have neither ACPI SRAT table nor AMD northbridge NUMA information. Round robin assignment doesn't work because init_numa_sched_groups_power() assumes all logical cpus in the same physical cpu share the same node (then it only accounts for group_first_cpu()), and the simple round robin breaks the above assumption. Thus, this patch implements a reassignment of node-ids if buggy firmware or numa emulation makes wrong cpu node map. Tt enforce all logical cpus in the same physical cpu share the same node. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20110415203928.1303.A69D9226@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-15 20:28:19 +02:00
Konrad Rzeszutek Wilk	beafbdc1df	xen/irq: Check if the PCI device is owned by a domain different than DOMID_SELF. We check if there is a domain owner for the PCI device. In case of failure (meaning no domain has registered for this device) we make DOMID_SELF the owner. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: deal with rebasing on v2.6.37-1] [v3: deal with rebasing on stable/irq.cleanup] [v4: deal with rebasing on stable/irq.ween_of_nr_irqs] [v5: deal with rebasing on v2.6.39-rc3] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>	2011-04-14 11:17:36 -04:00
Konrad Rzeszutek Wilk	c55fa78b13	xen/pci: Add xen_[find\|register\|unregister]_device_domain_owner functions. When the Xen PCI backend is told to enable or disable MSI/MSI-X functions, the initial domain performs these operations. The initial domain needs to know which domain (guest) is going to use the PCI device so when it makes the appropiate hypercall to retrieve the MSI/MSI-X vector it will also assign the PCI device to the appropiate domain (guest). This boils down to us needing a mechanism to find, set and unset the domain id that will be using the device. [v2: EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL.] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-14 11:16:54 -04:00
Peter Zijlstra	184748cc50	sched: Provide scheduler_ipi() callback in response to smp_send_reschedule() For future rework of try_to_wake_up() we'd like to push part of that function onto the CPU the task is actually going to run on. In order to do so we need a generic callback from the existing scheduler IPI. This patch introduces such a generic callback: scheduler_ipi() and implements it as a NOP. BenH notes: PowerPC might use this IPI on offline CPUs under rare conditions! Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.744338123@chello.nl	2011-04-14 08:52:32 +02:00
Linus Torvalds	aaa119a3d4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: fix XEN_SAVE_RESTORE Kconfig dependencies PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKS	2011-04-12 17:18:05 -07:00
Joerg Roedel	58fc7f1419	x86/amd-iommu: Add support for invalidate_all command This patch adds support for the invalidate_all command present in new versions of the AMD IOMMU. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-12 09:21:58 +02:00
Joerg Roedel	d99ddec3ee	x86/amd-iommu: Add extended feature detection This patch adds detection of the extended features of an AMD IOMMU. The available features are printed to dmesg on boot. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-12 09:21:51 +02:00
Jacob Pan	9d90e49da5	x86/mrst: Fix boot crash caused by incorrect pin to irq mapping Moorestown systems crash on boot because the secondary CPU clockevent (apbt1) will fail to request irq#1, which does not have ioapic chip in its irq_desc[] entry. Background: Moorestown platform does not have ISA bus nor legacy IRQs. It reuses the range of legacy IRQs for regular device interrupts. The routing information of early system device IRQs (timers) are obtained from firmware provided SFI tables. We reuse/fake MP configuration table to facilitate IRQ setup with IOAPIC. Maintaining a 1:1 mapping of IOAPIC pin (RTE entry) and IRQ# makes routing information clean and easy to understand on Moorestown. Though optional. This patch allows SFI timer and vRTC IRQ to be treated as ISA IRQ so that pin2irq mapping will be 1:1. Also fixed MP table type and use macros to clearly set MP IRQ entries. As a result, apbt timer and RTC interrupts on Moorestown are within legacy IRQ range: # cat /proc/interrupts CPU0 CPU1 0: 11249 0 IO-APIC-edge apbt0 1: 0 12271 IO-APIC-edge apbt1 8: 887 0 IO-APIC-fasteoi dw_spi 13: 0 0 IO-APIC-fasteoi INTEL_MID_DMAC2 14: 0 0 IO-APIC-fasteoi rtc0 Further discussion of this patch can be found at: https://lkml.org/lkml/2010/6/10/70 Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/1302286980-21139-1-git-send-email-jacob.jun.pan@linux.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-12 08:38:52 +02:00
Linus Torvalds	16ad56972c	Merge branch 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Allow PV-OPS kernel to detect whether XSAVE is supported xen: just completely disable XSAVE xen/debug: Don't be so verbose with WARN on 1-1 mapping errors. xen: events: fix error checks in bind_*_to_irqhandler()	2011-04-11 20:01:04 -07:00
Shriram Rajagopalan	d419e4c0f7	fix XEN_SAVE_RESTORE Kconfig dependencies Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS. Remove XEN_SAVE_RESTORE dependency from PM_SLEEP. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-04-11 22:54:48 +02:00
Sebastian Andrzej Siewior	30d746c680	x86/ce4100: Add reg property to bridges without the reg property Ben's new code won't find the PCI & ISA bridge and the devices won't get the DT-node attached. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: monstr@monstr.eu Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20110407121315.GA9204@linutronix.de Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-11 17:37:02 +02:00
Joerg Roedel	fd7b5535e1	x86/amd-iommu: Add ATS enable/disable code This patch adds the necessary code to the AMD IOMMU driver for enabling and disabling the ATS capability on a device and to setup the IOMMU data structures correctly. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-11 09:04:04 +02:00
Joerg Roedel	60f723b411	x86/amd-iommu: Add flag to indicate IOTLB support This patch adds a flag to the AMD IOMMU driver to indicate that all IOMMUs present in the system support device IOTLBs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-11 09:04:03 +02:00
Joerg Roedel	cb41ed85ef	x86/amd-iommu: Flush device IOTLB if ATS is enabled This patch implements a function to flush the IOTLB on devices supporting ATS and makes sure that this TLB is also flushed if necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-11 09:04:03 +02:00
Joerg Roedel	9844b4e5dd	x86/amd-iommu: Select PCI_IOV with AMD IOMMU driver In order to support ATS in the AMD IOMMU driver this patch makes sure that the generic support for ATS is compiled in. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-11 09:04:02 +02:00
Ian Campbell	ce9c99af8d	x86, cpu: Move AMD Elan Kconfig under "Processor family" Currently the option resides under X86_EXTENDED_PLATFORM due to historical nonstandard A20M# handling. However that is no longer the case and so Elan can be treated as part of the standard processor choice Kconfig option. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Link: http://lkml.kernel.org/r/1302245177.31620.47.camel@localhost.localdomain Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-04-08 13:01:25 -07:00
Linus Torvalds	8b9686ff4d	Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, fpu: Fix FPU exception handling on non-SSE systems x86, hibernate: Initialize mmu_cr4_features during boot x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change x86: visws: Fixup irq overhaul fallout * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Clean up rebalance_domains() load-balance interval calculation * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/mrst/vrtc: Fix boot crash in mrst_rtc_init() rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm() * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix cpumask leak in __setup_irq() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf probe: Fix listing incorrect line number with inline function perf probe: Fix to find recursively inlined function perf probe: Fix multiple --vars options behavior perf probe: Fix to remove redundant close perf probe: Fix to ensure function declared file	2011-04-07 12:12:58 -07:00
Feng Tang	09552b2696	x86/mrst/vrtc: Fix boot crash in mrst_rtc_init() The sfi_mrtc_array[] only gets initialized when the sfi mrtc table is parsed, so the vrtc_paddr should be initalized after it too. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1302140389-27603-1-git-send-email-feng.tang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-07 11:27:42 +02:00
Joerg Roedel	7d0c5cc5be	x86/amd-iommu: Flush all internal TLBs when IOMMUs are enabled The old code only flushed a DTE or a domain TLB before it is actually used by the IOMMU driver. While this is efficient and works when done right it is more likely to introduce new bugs when changing code (which happened in the past). This patch adds code to flush all DTEs and all domain TLBs in each IOMMU right after it is enabled (at boot and after resume). This reduces the complexity of the driver and makes it less likely to introduce stale-TLB bugs in the future. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 11:04:32 +02:00
Joerg Roedel	d8c1308577	x86/amd-iommu: Rename iommu_flush_device This function operates on a struct device, so give it a name that represents that. As a side effect a new function is introduced which operates on am iommu and a device-id. It will be used again in a later patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 10:46:07 +02:00
Joerg Roedel	ac0ea6e92b	x86/amd-iommu: Improve handling of full command buffer This patch improved the handling of commands when the IOMMU command buffer is nearly full. In this case it issues an completion wait command and waits until the IOMMU has processed it before continuing queuing new commands. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 10:46:07 +02:00
Joerg Roedel	17b124bf14	x86/amd-iommu: Rename iommu_flush* to domain_flush* These functions all operate on protection domains and not on singe IOMMUs. Represent that in their name. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 10:46:06 +02:00
Joerg Roedel	61985a040f	x86/amd-iommu: Remove command buffer resetting logic The logic to reset the command buffer caused more problems than it actually helped. The logic jumped in when the IOMMU hardware doesn't execute commands anymore but the reasons for this are usually not fixed by just resetting the command buffer. So the code can be removed to reduce complexity. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 10:46:05 +02:00
Joerg Roedel	815b33fdc2	x86/amd-iommu: Cleanup completion-wait handling This patch cleans up the implementation of completion-wait command sending. It also switches the completion indicator from the MMIO bit to a memory store which can be checked without IOMMU locking. As a side effect this patch makes the __iommu_queue_command function obsolete and so it is removed too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-07 10:46:00 +02:00
Tejun Heo	993ba1585c	x86-32, numa: Update remap allocator comments Now that remap allocator is cleaned up, update comments such that they are in docbook function description format and reflect the actual implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-15-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:56 -07:00
Tejun Heo	198bd06bbf	x86-32, numa: Remove redundant node_remap_size[] Remap area size can be determined from node_remap_start_vaddr[] and node_remap_end_vaddr[] making node_remap_size[] redundant. Remove it. While at it, make resume_map_numa_kva() use @nr_pages for number of pages instead of @size. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-14-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:50 -07:00
Tejun Heo	1d85b61baf	x86-32, numa: Remove now useless node_remap_offset[] With lowmem address reservation moved into init_alloc_remap(), node_remap_offset[] is no longer useful. Remove it and related offset handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-13-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:44 -07:00
Tejun Heo	b2e3e4fa3e	x86-32, numa: Make pgdat allocation use alloc_remap() pgdat allocation is handled differnetly from other remap allocations - it's reserved during initialization. There's no reason to handle this any differnetly. Remap allocator is initialized for every node and if init failed the allocation will fail and pgdat allocation can fall back to generic code like anyone else. Remove special init-time pgdat reservation and make allocate_pgdat() use alloc_remap() like everyone else. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-12-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:39 -07:00
Tejun Heo	2a286344f0	x86-32, numa: Move remapping for remap allocator into init_alloc_remap() There's no reason to perform the actual remapping separately. Collapse remap_numa_kva() into init_alloc_remap() and, while at it, make it less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-11-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:33 -07:00
Tejun Heo	0e9f93c1c0	x86-32, numa: Move lowmem address space reservation to init_alloc_remap() Remap alloc init is done in the following stages. 1. init_alloc_remap() calculates how much memory is necessary for each node and reserves node local memory. 2. initmem_init() collects how much each node needs and reserves a single contiguous lowmem area which can contain all. 3. init_remap_allocator() initializes allocator parameters from the determined lowmem address and per-node offsets. 4. Actual remap happens. There is no reason for the lowmem remap area to be reserved as a single contiguous area at one go. They don't interact with each other and the memblock allocator will put them side-by-side anyway. This patch breaks up the single lowmem address reservation and put per-node lowmem address reservation into init_alloc_remap() and initializes allocator parameters directly in the function as all the addresses are determined there. This merges steps 2 and 3 into 1. While at it, remove now largely irrelevant comments in init_alloc_remap(). This change causes the following behavior changes. * Remap lowmem areas are allocated in smaller per-node chunks. * Remap lowmem area reservation failure fail future remap allocations instead of panicking. * Remap allocator initialization is less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-10-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:27 -07:00
Tejun Heo	82044c328d	x86-32, numa: Make init_alloc_remap() less panicky Remap allocator failure isn't fatal. The callers are required to fall back to regular early memory allocation mechanisms on failure anyway, so there's no reason to panic on remap init failure. Whining and returning are enough. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-9-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:21 -07:00
Tejun Heo	7210cf9217	x86-32, numa: Calculate remap size in common code Only pgdat and memmap use remap area and there isn't much benefit in allowing per-node override. In addition, the use of node_remap_size[] is confusing in that it contains number of bytes before remap initialization and then number of pages afterwards. Move remap size calculation for memap from specific NUMA config implementations to init_alloc_remap() and make node_remap_size[] static. The only behavior difference is that, before this patch, numaq_32 didn't consider max_pfn when calculating the memmap size but it's enforced after this patch, which is the right thing to do. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-8-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:16 -07:00
Tejun Heo	af7c1a6e83	x86-32, numa: Make @size in init_aloc_remap() represent bytes @size variable in init_alloc_remap() is confusing in that it starts as number of bytes as its name implies and then becomes number of pages. Make it consistently represent bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-7-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:11 -07:00
Tejun Heo	c4d4f577d4	x86-32, numa: Rename @node_kva to @node_pa in init_alloc_remap() init_alloc_remap() is about to do more and using _kva suffix for physical address becomes confusing because the function will be handling both physical and virtual addresses. Rename @node_kva to @node_pa. This is trivial rename and doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-6-git-send-email-tj@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:04 -07:00
Tejun Heo	5510db9c1b	x86-32, numa: Reorganize calculate_numa_remap_page() Separate the outer node walking loop and per-node logic from calculate_numa_remap_pages(). The outer loop is collapsed into initmem_init() and the per-node logic is moved into a new function - init_alloc_remap(). The new function name is confusing with the existing init_remap_allocator() and the behavior is the function isn't very clean either at this point, but this is to prepare for further cleanups and it will become prettier. This function doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-5-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:01 -07:00
Tejun Heo	5b8443b25c	x86-32, numa: Remove redundant top-down alloc code from remap initialization memblock_find_in_range() now does top-down allocation by default, so there's no reason for its callers to explicitly implement it by gradually lowering the start address. Remove redundant top-down allocation logic from init_meminit() and calculate_numa_remap_pages(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-4-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:57 -07:00
Tejun Heo	a6c24f7a70	x86-32, numa: Align pgdat size while initializing alloc_remap When pgdat is reserved in init_remap_allocator(), PAGE_SIZE aligned size will be used. Match the size alignment in initialization to avoid allocation failure down the road. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-3-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:52 -07:00
Tejun Heo	3fe14ab541	x86-32, numa: Fix failure condition check in alloc_remap() node_remap_{start\|end}_vaddr[] describe [start, end) ranges; however, alloc_remap() incorrectly failed when the current allocation + size equaled the end but it should fail only when it goes over. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-2-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:46 -07:00
Hans Rosenfeld	f994d99cf1	x86-32, fpu: Fix FPU exception handling on non-SSE systems On 32bit systems without SSE (that is, they use FSAVE/FRSTOR for FPU context switches), FPU exceptions in user mode cause Oopses, BUGs, recursive faults and other nasty things: fpu exception: 0000 [#1] last sysfs file: /sys/power/state Modules linked in: psmouse evdev pcspkr serio_raw [last unloaded: scsi_wait_scan] Pid: 1638, comm: fxsave-32-excep Not tainted 2.6.35-07798-g58a992b-dirty #633 VP3-596B-DD/VT82C597 EIP: 0060:[<c1003527>] EFLAGS: 00010202 CPU: 0 EIP is at math_error+0x1b4/0x1c8 EAX: 00000003 EBX: cf9be7e0 ECX: 00000000 EDX: cf9c5c00 ESI: cf9d9fb4 EDI: c1372db3 EBP: 00000010 ESP: cf9d9f1c DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 Process fxsave-32-excep (pid: 1638, ti=cf9d8000 task=cf9be7e0 task.ti=cf9d8000) Stack: 00000000 00000301 00000004 00000000 00000000 cf9d3000 cf9da8f0 00000001 <0> 00000004 cf9b6b60 c1019a6b c1019a79 00000020 00000242 000001b6 cf9c5380 <0> cf806b40 cf791880 00000000 00000282 00000282 c108a213 00000020 cf9c5380 Call Trace: [<c1019a6b>] ? need_resched+0x11/0x1a [<c1019a79>] ? should_resched+0x5/0x1f [<c108a213>] ? do_sys_open+0xbd/0xc7 [<c108a213>] ? do_sys_open+0xbd/0xc7 [<c100353b>] ? do_coprocessor_error+0x0/0x11 [<c12d5965>] ? error_code+0x65/0x70 Code: a8 20 74 30 c7 44 24 0c 06 00 03 00 8d 54 24 04 89 d9 b8 08 00 00 00 e8 9b 6d 02 00 eb 16 8b 93 5c 02 00 00 eb 05 e9 04 ff ff ff <9b> dd 32 9b e9 16 ff ff ff 81 c4 84 00 00 00 5b 5e 5f 5d c3 c6 EIP: [<c1003527>] math_error+0x1b4/0x1c8 SS:ESP 0068:cf9d9f1c This usually continues in slight variations until the system is reset. This bug was introduced by commit `58a992b9cb`: x86-32, fpu: Rewrite fpu_save_init() Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Link: http://lkml.kernel.org/r/1302106003-366952-1-git-send-email-hans.rosenfeld@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 16:53:01 -07:00
H. Peter Anvin	4da9484bde	x86, hibernate: Initialize mmu_cr4_features during boot Restore the initialization of mmu_cr4_features during boot, which was removed without comment in checkin `e5f15b45dd` x86: Cleanup highmap after brk is concluded thereby breaking resume from hibernate. This restores previous functionality in approximately the same place, and corrects the reading of %cr4 on pre-CPUID hardware (%cr4 exists if and only if CPUID is supported.) However, part of the problem is that the hibernate suspend/resume sequence should manage the save/restore of %cr4 explicitly. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <201104020154.57136.rjw@sisk.pl>	2011-04-06 13:10:02 -07:00
Shan Haitao	947ccf9c3c	xen: Allow PV-OPS kernel to detect whether XSAVE is supported Xen fails to mask XSAVE from the cpuid feature, despite not historically supporting guest use of XSAVE. However, now that XSAVE support has been added to Xen, we need to reliably detect its presence. The most reliable way to do this is to look at the OSXSAVE feature in cpuid which is set iff the OS (Xen, in this case), has set CR4.OSXSAVE. [ Cleaned up conditional a bit. - Jeremy ] Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-06 08:31:13 -04:00
Jeremy Fitzhardinge	61f4237d5b	xen: just completely disable XSAVE Some (old) versions of Xen just kill the domain if it tries to set any unknown bits in CR4, so we can't reliably probe for OSXSAVE in CR4. Since Xen doesn't support XSAVE for guests at the moment, and no such support is being worked on, there's no downside in just unconditionally masking XSAVE support. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-06 08:31:00 -04:00
Andre Przywara	bd22f5cfcf	KVM: move and fix substitue search for missing CPUID entries If KVM cannot find an exact match for a requested CPUID leaf, the code will try to find the closest match instead of simply confessing it's failure. The implementation was meant to satisfy the CPUID specification, but did not properly check for extended and standard leaves and also didn't account for the index subleaf. Beside that this rule only applies to CPUID intercepts, which is not the only user of the kvm_find_cpuid_entry() function. So fix this algorithm and call it from kvm_emulate_cpuid(). This fixes a crash of newer Linux kernels as KVM guests on AMD Bulldozer CPUs, where bogus values were returned in response to a CPUID intercept. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-04-06 13:15:56 +03:00
Andre Przywara	20800bc940	KVM: fix XSAVE bit scanning When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area leaves, it assumes that the leaves are contigious and stops at the first zero one. On AMD hardware there is a gap, though, as LWP uses leaf 62 to announce it's state save area. So lets iterate through all 64 possible leaves and simply skip zero ones to also cover later features. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-04-06 13:15:55 +03:00
Joerg Roedel	11b6402c66	x86/amd-iommu: Cleanup inv_pages command handling This patch reworks the processing of invalidate-pages commands to the IOMMU. The function building the the command is extended so we can get rid of another function. It was also renamed to match with the other function names. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-06 11:49:28 +02:00
Joerg Roedel	94fe79e2f1	x86/amd-iommu: Move inv-dte command building to own function This patch moves command building for the invalidate-dte command into its own function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-06 11:44:15 +02:00
Joerg Roedel	ded467374a	x86/amd-iommu: Move compl-wait command building to own function This patch introduces a seperate function for building completion-wait commands. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-04-06 10:53:48 +02:00
Matthew Garrett	660e34cebf	x86: Reorder reboot method preferences We have a never ending stream of 'reboot quirks' for new boxes that will not reboot properly under Linux (they will hang on reboot). The reason is widespread 'Windows compatible' assumption of modern x86 hardware, which expects the following reboot sequence: - hitting the ACPI reboot vector (if available) - trying the keyboard controller - hitting the ACPI reboot vector again - then giving the keyboard controller one last go This sequence expectation gets more and more embedded in modern hardware, which often lacks a keyboard controller and may even lock up if the legacy io ports are hit - and which hardware is often not tested with Linux during development. The end result is that reboot works under Windows-alike OSs but not under Linux. Rework our reboot process to meet this hardware externality a little better and match this assumption of newer x86 hardware. In addition to the ACPI,kbd,ACPI,kbd sequence we'll still fall through to attempting a legacy triple fault if nothing else works - and keep trying that and the kbd reset. Signed-off-by: Matthew Garrett <mjg@redhat.com> [ this commit will also save special casing Oaktrail boards ] Acked-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <1301939705-2404-1-git-send-email-mjg@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-06 10:36:50 +02:00
Konrad Rzeszutek Wilk	d88885d092	xen/debug: Don't be so verbose with WARN on 1-1 mapping errors. There are valid situations in which this error is not a warning. Mainly when QEMU maps a guest memory and uses the VM_IO flag to set the MFNs. For right now make the WARN be WARN_ONCE. In the future we will: 1). Remove the VM_IO code handling.. 2). .. which will also remove this debug facility. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-04 14:48:20 -04:00
Jason Baron	ef64789413	jump label: Add _ASM_ALIGN for x86 and x86_64 The linker should not be adding holes to word size aligned pointers, but out of paranoia we are explicitly specifying that alignment. I have not seen any holes in the jump label section in practice. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <e119fbd060c9452c56063ea6148ba1070e7434cc.1300299760.git.jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-04-04 13:42:51 -04:00
Jason Baron	d430d3d7e6	jump label: Introduce static_branch() interface Introduce: static __always_inline bool static_branch(struct jump_label_key key); instead of the old JUMP_LABEL(key, label) macro. In this way, jump labels become really easy to use: Define: struct jump_label_key jump_key; Can be used as: if (static_branch(&jump_key)) do unlikely code enable/disale via: jump_label_inc(&jump_key); jump_label_dec(&jump_key); that's it! For the jump labels disabled case, the static_branch() becomes an atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(), atomic_dec() operations. We show testing results for this change below. Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct. Since we now require a 'struct jump_label_key key', we can store a pointer into the jump table addresses. In this way, we can enable/disable jump labels, in basically constant time. This change allows us to completely remove the previous hashtable scheme. Thanks to Peter Zijlstra for this re-write. Testing: I ran a series of 'tbench 20' runs 5 times (with reboots) for 3 configurations, where tracepoints were disabled. jump label configured in avg: 815.6 jump label not configured in (using atomic reads) avg: 800.1 jump label not configured in (regular reads) avg: 803.4 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110316212947.GA8792@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Suggested-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-04-04 12:48:08 -04:00
Linus Torvalds	d7c764c4c7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Fix kdump reboot x86, amd-nb: Rename CPU PCI id define for F4 sound: Add delay.h to sound/soc/codecs/sn95031.c x86, mtrr, pat: Fix one cpu getting out of sync during resume x86, microcode: Unregister syscore_ops after microcode unloaded x86: Stop including <linux/delay.h> in two asm header files	2011-04-04 08:37:45 -07:00
Linus Torvalds	4da7e90e65	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix task_struct reference leak perf: Fix task context scheduling perf: mmap 512 kiB by default perf: Rebase max unprivileged mlock threshold on top of page size perf tools: Fix NO_NEWT=1 python build error perf symbols: Properly align symbol_conf.priv_size perf tools: Emit clearer message for sys_perf_event_open ENOENT return perf tools: Fixup exit path when not able to open events perf symbols: Fix vsyscall symbol lookup oprofile, x86: Allow setting EDGE/INV/CMASK for counter events	2011-04-04 08:36:40 -07:00
Linus Torvalds	fb9a7d76da	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: create new rcu_access_index() and use in mce WARN_ON_SMP(): Add comment to explain ({0;})	2011-04-04 08:36:15 -07:00
Rakib Mullick	64d21fc194	x86, mpparse: Put check_slot() into .init section check_slot() is only called from replace_intsrc_all() - which is in the .init section. So, put check_slot into the .init section as well, so it can be freed after system boot. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <AANLkTing52ntzRcHkODCWDKOfRF=0uhXw5-cCUhx6M54@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-04 17:17:19 +02:00
Tejun Heo	765af22da8	x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Commit `d8fc3afc49` (x86, NUMA: Move *_numa_init() invocations into initmem_init()) moved acpi_numa_init() call into NUMA initmem_init() but forgot to update 32bit NUMA init breaking ACPI NUMA configuration for 32bit. acpi_numa_init() call was later moved again to srat_64.c. Match it by adding the call to get_memcfg_from_srat() in srat_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20110404100645.GE1420@mtj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-04 16:56:33 +02:00
Thomas Gleixner	43a6246f9c	x86: visws: Fixup irq overhaul fallout Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-04 16:51:15 +02:00
Ingo Molnar	9402efaa96	Merge branch 'x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm	2011-04-04 16:39:20 +02:00
Florian Mickler	711b8c87a5	x86-64, NUMA: Remove unused variable In case !CONFIG_ACPI_NUMA and !CONFIG_AMD_NUMA gcc emits a warning about the unused variable ret. As that variable is in fact not needed I choose to remove it. Signed-off-by: Florian Mickler <florian@mickler.org> LKML-Reference: <1301843624-22364-1-git-send-email-florian@mickler.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-04-04 01:21:00 +02:00
Paul E. McKenney	a4dd99250d	rcu: create new rcu_access_index() and use in mce The MCE subsystem needs to sample an RCU-protected index outside of any protection for that index. If this was a pointer, we would use rcu_access_pointer(), but there is no corresponding rcu_access_index(). This commit therefore creates an rcu_access_index() and applies it to MCE. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Zdenek Kabelac <zkabelac@redhat.com>	2011-04-01 07:27:31 -07:00
Tejun Heo	3b16651f80	x86: Clean up memory model related configs in arch/x86/Kconfig * Remove bogus dependency on ARCH_SELECT_MEMORY_MODEL from ARCH_FLATMEM_ENABLE. ENABLE configs don't interfere with SELECT_MEMORY_MODEL. They just need to indicate whether the specific memory model is supported. * Relocate HAVE_ARCH_ALLOC_REMAP, ARCH_PROC_KCORE_TEXT and ARCH_SPARSEMEM_DEFAULT so that memory model related configs are together in consistent order. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-04-01 11:15:12 +02:00
Tejun Heo	052936080c	x86-64, NUMA: Remove custom phys_to_nid() implementation phys_to_nid() maps physical address to NUMA node id. This is implemented by building perfect hash in compute_hash_shift() during initialization. However, with SPARSE memory model, the nid is encoded in page flags. The perfect hash implementation was for DISCONTIG memory model which got removed years ago by `b263295dbf` (x86: 64-bit, make sparsemem vmemmap the only memory model). So, the perfect hash ends up being used only during initialization when the core SPARSE code already provides perfectly acceptable generic early_pfn_to_nid() implementation. Drop phys_to_nid() and use the generic ealry_pfn_to_nid() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-04-01 11:15:12 +02:00
Cliff Wickman	818987e9a1	x86, UV: Fix kdump reboot After a crash dump on an SGI Altix UV system the crash kernel fails to cause a reboot. EFI mode is disabled in the kdump kernel, so only the reboot_type of BOOT_ACPI works. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: rja@sgi.com LKML-Reference: <E1Q5Iuo-00013b-UK@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-31 18:44:03 +02:00
Borislav Petkov	cb6c8520f6	x86, amd-nb: Rename CPU PCI id define for F4 With increasing number of PCI function ids, add the PCI function id in the define name instead of its symbolic name in the BKDG for more clarity. This renames function 4 define. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <20110330183447.GA3668@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-31 08:51:38 +02:00
Ingo Molnar	9f644c4ba8	Merge commit 'v2.6.39-rc1' into perf/urgent Merge reason: use the post-merge-window tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-30 09:07:43 +02:00
Ingo Molnar	ae18cbfe9f	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core	2011-03-30 09:06:51 +02:00
Suresh Siddha	84ac7cdbdd	x86, mtrr, pat: Fix one cpu getting out of sync during resume On laptops with core i5/i7, there were reports that after resume graphics workloads were performing poorly on a specific AP, while the other cpu's were ok. This was observed on a 32bit kernel specifically. Debug showed that the PAT init was not happening on that AP during resume and hence it contributing to the poor workload performance on that cpu. On this system, resume flow looked like this: 1. BP starts the resume sequence and we reinit BP's MTRR's/PAT early on using mtrr_bp_restore() 2. Resume sequence brings all AP's online 3. Resume sequence now kicks off the MTRR reinit on all the AP's. 4. For some reason, between point 2 and 3, we moved from BP to one of the AP's. My guess is that printk() during resume sequence is contributing to this. We don't see similar behavior with the 64bit kernel but there is no guarantee that at this point the remaining resume sequence (after AP's bringup) has to happen on BP. 5. set_mtrr() was assuming that we are still on BP and skipped the MTRR/PAT init on that cpu (because of 1 above) 6. But we were on an AP and this led to not reprogramming PAT on this cpu leading to bad performance. Fix this by doing unconditional mtrr_if->set_all() in set_mtrr() during MTRR/PAT init. This might be unnecessary if we are still running on BP. But it is of no harm and will guarantee that after resume, all the cpu's will be in sync with respect to the MTRR/PAT registers. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Keith Packard <keithp@keithp.com> Cc: stable@kernel.org [v2.6.32+] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-03-29 16:17:42 -07:00
Thomas Gleixner	86cc8dfc21	x86: apb_timer: Fixup genirq fallout The lonely user of the internal interface was not in the coccinelle script. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-30 00:13:30 +02:00
Linus Torvalds	90f1e7481e	Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Use new irq_move functions xen: Convert genirq namespace xen: fix p2m section mismatches xen/p2m: Allocate p2m tracking pages on override xen-gntdev: unlock on error path in gntdev_mmap() xen-gntdev: return -EFAULT on copy_to_user failure	2011-03-29 11:36:52 -07:00
Randy Dunlap	b83c6e55ac	xen: fix p2m section mismatches Fix section mismatch warnings: set_phys_range_identity() is called by __init xen_set_identity(), so also mark set_phys_range_identity() as __init. then: __early_alloc_p2m() is called set_phys_range_identity(), so also mark __early_alloc_p2m() as __init. WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk() The function __early_alloc_p2m() references the function __init extend_brk(). This is often because __early_alloc_p2m lacks a __init annotation or the annotation of extend_brk is wrong. WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk() The function set_phys_range_identity() references the function __init extend_brk(). This is often because set_phys_range_identity lacks a __init annotation or the annotation of extend_brk is wrong. [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-29 10:01:03 -04:00
Xiaotian Feng	4ac5fc6a3e	x86, microcode: Unregister syscore_ops after microcode unloaded Currently, microcode doesn't unregister syscore_ops after it's unloaded. So if we modprobe then rmmod microcode, the stale microcode syscore_ops info will stay on syscore_ops_list. Later when we're trying to reboot/halt/shutdown the machine, kernel will panic on syscore_shutdown(). With the patch applied, I can reboot/halt/shutdown my machine successfully. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1301387672-23661-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 11:12:04 +02:00
Christoph Lameter	fe5042138b	x86: Use this_cpu_has for thermal_interrupt current cpu It is more effective to use a segment prefix instead of calculating the address of the current cpu area amd then testing flags. Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-29 10:18:30 +02:00
Christoph Lameter	349c004e3d	x86: A fast way to check capabilities of the current cpu Add this_cpu_has() which determines if the current cpu has a certain ability using a segment prefix and a bit test operation. For that we need to add bit operations to x86s percpu.h. Many uses of cpu_has use a pointer passed to a function to determine the current flags. That is no longer necessary after this patch. However, this patch only converts the straightforward cases where cpu_has is used with this_cpu_ptr. The rest is work for later. -tj: Rolled up patch to add x86_ prefix and use percpu_read() instead of percpu_read_stable(). Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-29 10:18:30 +02:00
Jean Delvare	ca444564a9	x86: Stop including <linux/delay.h> in two asm header files Stop including <linux/delay.h> in x86 header files which don't need it. This will let the compiler complain when this header is not included by source files when it should, so that contributors can fix the problem before building on other architectures starts to fail. Credits go to Geert for the idea. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: James E.J. Bottomley <James.Bottomley@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20110325152014.297890ec@endymion.delvare> [ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 09:37:42 +02:00
Cyrill Gorcunov	1d321881af	perf, x86: P4 PMU - clean up the code a bit No change on the functional level, just align the table properly. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8FA213.5050108@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-29 09:36:33 +02:00
Ingo Molnar	1dfd7b494b	Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2011-03-29 09:32:28 +02:00
Linus Torvalds	036a98263a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes	2011-03-28 13:07:49 -07:00
Linus Torvalds	551b0bda46	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: Clean up max8997 IRQ namespace mfd: Fold irq_set_chip/irq_set_handler mfd: Cleanup irq namespace mfd: twl6030: Cleanup interrupt handling mfd: twl4030: Cleanup interrupt handling mfd: mx8925: Remove irq_desc leftovers mfd: htc-i2cpld: Cleanup interrupt handling mfd: htc-egpio: Cleanup interrupt handling mfd: ezx-pcap: Remvove open coded irq handling mfd: 88pm860x: Remove unused irq_desc leftovers mfd: asic3: Cleanup irq handling mfd: Select MFD_CORE if TPS6105X driver is configured mfd: Add MODULE_DEVICE_TABLE to rdc321x-southbridge mfd: Add MAX8997/8966 IRQ control mfd: Constify i2c_device_id tables mfd: OLPC: Clean up names to match what OLPC actually uses mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it	2011-03-27 20:07:01 -07:00
Christoph Lameter	d7c3f8cee8	percpu: Omit segment prefix in the UP case for cmpxchg_double Omit the segment prefix in the UP case. GS is not used then and we will generate segfaults if cmpxchg16b is used otherwise. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-27 19:25:36 -07:00
Tadeusz Struk	60af520cf2	crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes This patch fixes problem with packets that are not multiple of 64bytes. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2011-03-27 10:29:39 +08:00
Daniel Drake	adfa4bd4a8	mfd: OLPC: Clean up names to match what OLPC actually uses The cs5535-pms cell doesn't actually need to be cloned, so we can drop that and simply have the olpc-xo1.c driver use "cs5535-pms" directly. Also, rename the cs5535-acpi clones to what we actually use for the (currently out-of-tree) SCI driver. In the process, that fixes a subtle bug in olpc-xo1.c which broke powerdown on XO-1s.. olpc-xo1-ac-acpi was a typo, not something that actually existed. Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-27 00:09:31 +01:00
Andres Salomon	fa1df69168	mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it Replace mfd_shared_platform_driver_register with mfd_clone_cell. The former was called by an mfd client, and registered both a platform driver and device. The latter is called by an mfd driver, and registers only a platform device. The downside of this is that mfd drivers need to be modified whenever new clients are added that share a cell; the upside is that it fits Linux's driver model better. It's also simpler. This also converts cs5535-mfd/olpc-xo1 from the old API. cs5535-mfd now creates the olpc-xo1-{acpi,pms} devices, while olpc-xo1 binds to them via platform drivers. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2011-03-27 00:09:30 +01:00
Linus Torvalds	16c29dafcc	Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Introduce ARCH_NO_SYSDEV_OPS config option (v2) cpufreq: Use syscore_ops for boot CPU suspend/resume (v2) KVM: Use syscore_ops instead of sysdev class and sysdev PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev timekeeping: Use syscore_ops instead of sysdev class and sysdev x86: Use syscore_ops instead of sysdev classes and sysdevs	2011-03-25 21:07:59 -07:00
Linus Torvalds	95e14ed7fc	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kdb: add usage string of 'per_cpu' command kgdb,x86_64: fix compile warning found with sparse kdb: code cleanup to use macro instead of value kgdboc,kgdbts: strlen() doesn't count the terminator	2011-03-25 21:04:56 -07:00
Linus Torvalds	2a20b02c05	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows perf symbols: Look at .dynsym again if .symtab not found perf build-id: Add quirk to deal with perf.data file format breakage perf session: Pass evsel in event_ops->sample() perf: Better fit max unprivileged mlock pages for tools needs perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx() perf top: Fix uninitialized 'counter' variable tracing: Fix set_ftrace_filter probe function display perf, x86: Fix Intel fixed counters base initialization	2011-03-25 17:53:09 -07:00
Linus Torvalds	94df491c4a	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix WARN_ON() test for UP WARN_ON_SMP(): Allow use in if() statements on UP x86, dumpstack: Use %pB format specifier for stack trace vsprintf: Introduce %pB format specifier lockdep: Remove unused 'factor' variable from lockdep_stats_show()	2011-03-25 17:52:22 -07:00
Linus Torvalds	26ff6801f7	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional x86: DT: Fix return condition in irq_create_of_mapping() x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context	2011-03-25 17:51:51 -07:00
Jason Wessel	21431c2900	kgdb,x86_64: fix compile warning found with sparse Fix sparse warning: arch/x86/kernel/kgdb.c:123:9: warning: switch with no cases Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2011-03-25 16:37:31 -05:00
Ingo Molnar	45daae575e	perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue Eric Dumazet reported that hardware PMU events do not work on his system, due to the BIOS corrupting PMU state: Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only. [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c) Linus suggested that we continue in the face of such BIOS-induced CPU state corruption: http://lkml.org/lkml/2011/3/24/608 Such BIOSes will have to be fixed - Linux developers rely on a working and fully capable PMU and the BIOS interfering with the CPU's PMU state is simply not acceptable. So this patch changes perf to continue when it detects such BIOS interaction, some hardware events may be unreliable due to the BIOS writing and re-writing them - there's not much the kernel can do about that but to detect the corruption and report it. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-25 11:23:41 +01:00
Thomas Gleixner	07611dbda5	x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional That call escaped the name space cleanup. Fix it up. We really want to call there. The chip might have changed since the irq was setup initially. So let the core code and the chip decide what to do. The status is just an unreliable snapshot. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2011-03-24 23:17:56 +01:00
Thomas Gleixner	00a30b254b	x86: DT: Fix return condition in irq_create_of_mapping() The xlate() function returns 0 or a negative error code. Returning the error code blindly will be seen as an huge irq number by the calling function because irq_create_of_mapping() returns an unsigned value. Return 0 (NO_IRQ) as required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2011-03-24 23:17:56 +01:00
Don Zickus	242214f9c1	perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows The read of a proper MSR register was missed and instead of counter the configration register was tested (it has ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI hitting the system. As result the user may obtain "Dazed and confused, but trying to continue" message. Fix it by reading a proper MSR register. When an NMI happens on a P4, the perf nmi handler checks the configuration register to see if the overflow bit is set or not before taking appropriate action. Unfortunately, various P4 machines had a broken overflow bit, so a backup mechanism was implemented. This mechanism checked to see if the counter rolled over or not. A previous commit that implemented this backup mechanism was broken. Instead of reading the counter register, it used the configuration register to determine if the counter rolled over or not. Reading that bit would give incorrect results. This would lead to 'Dazed and confused' messages for the end user when using the perf tool (or if the nmi watchdog is running). The fix is to read the counter register before determining if the counter rolled over or not. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <4D8BAB49.3080701@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-24 21:40:01 +01:00
Andi Kleen	914a76ca5e	oprofile, x86: Allow setting EDGE/INV/CMASK for counter events For some performance events it's useful to set the EDGE and INV bits and the CMASK mask in the counter control register. The list of predefined events Intel releases for each CPU has some events which require these settings to get more "natural" to use higher level events. oprofile currently doesn't allow this. This patch adds new extra configuration fields for them, so that they can be specified in oprofilefs. An updated oprofile daemon can then make use of this to set them. v2: Write back masked extra value to variable. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2011-03-24 18:45:44 +01:00

... 3 4 5 6 7 ...

13302 Commits