linux

Author	SHA1	Message	Date
Ingo Molnar	932fed4e2e	Merge commit 'v2.6.39-rc7' into perf/core Merge reason: pull in the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-10 17:05:45 +02:00
Ingo Molnar	57d524154f	Merge branch 'perf/stat' into perf/core Merge reason: the perf stat improvements are tested and ready now. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 21:07:38 +02:00
Peter Zijlstra	63b6a6758e	perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions The Intel Nehalem offcore bits implemented in: `e994d7d23a`: perf: Fix LLC-* events on Intel Nehalem/Westmere ... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as MISS even though its clearly documented as an L3 hit ... Fix them and the Westmere definitions as well. Cc: Andi Kleen <ak@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 11:24:48 +02:00
Lin Ming	e04d1b23f9	perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events Extend the Intel SandyBridge PMU driver with definitions for generic front-end and back-end stall events. ( As commit `3011203` "perf events, x86: Add Westmere stalled-cycles-frontend/backend events" says, these are only approximations. ) Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-06 09:37:03 +02:00
Ingo Molnar	4d70230bb4	Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into perf/urgent	2011-05-06 08:11:28 +02:00
Ingo Molnar	98bb318864	Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent	2011-05-04 20:33:42 +02:00
Linus Torvalds	609cfda586	Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top xen/mmu: Add workaround "x86-64, mm: Put early page table high"	2011-05-03 09:25:42 -07:00
H. Peter Anvin	7806a49ab6	x86, reboot: Fix relocations in reboot_32.S The use of base for %ebx in this file is arbitrary, except that we also use it to compute the real-mode segment. Therefore, make it so that r_base really is the true address to which %ebx points. This resolves kernel bugzilla 33302. Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org	2011-05-02 14:44:46 -07:00
Stefano Stabellini	b9269dc7bf	xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top mask_rw_pte is currently checking if a pfn is a pagetable page if it falls in the range pgt_buf_start - pgt_buf_end but that is incorrect because pgt_buf_end is a moving target: pgt_buf_top is the real boundary. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-02 16:33:52 -04:00
Konrad Rzeszutek Wilk	a38647837a	xen/mmu: Add workaround "x86-64, mm: Put early page table high" As a consequence of the commit: commit `4b239f458c` Author: Yinghai Lu <yinghai@kernel.org> Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high it causes the Linux kernel to crash under Xen: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7) (XEN) mm.c:3027:d0 Error while pinning mfn b1d89 (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: ... The reason is that at some point init_memory_mapping is going to reach the pagetable pages area and map those pages too (mapping them as normal memory that falls in the range of addresses passed to init_memory_mapping as argument). Some of those pages are already pagetable pages (they are in the range pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and everything is fine. Some of these pages are not pagetable pages yet (they fall in the range pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they are going to be mapped RW. When these pages become pagetable pages and are hooked into the pagetable, xen will find that the guest has already a RW mapping of them somewhere and fail the operation. The reason Xen requires pagetables to be RO is that the hypervisor needs to verify that the pagetables are valid before using them. The validation operations are called "pinning" (more details in arch/x86/xen/mmu.c). In order to fix the issue we mark all the pages in the entire range pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation is completed only the range pgt_buf_start-pgt_buf_end is reserved by init_memory_mapping. Hence the kernel is going to crash as soon as one of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those ranges are RO). For this reason, this function is introduced which is called _after_ the init_memory_mapping has completed (in a perfect world we would call this function from init_memory_mapping, but lets ignore that). Because we are called _after_ init_memory_mapping the pgt_buf_[start, end,top] have all changed to new values (b/c another init_memory_mapping is called). Hence, the first time we enter this function, we save away the pgt_buf_start value and update the pgt_buf_[end,top]. When we detect that the "old" pgt_buf_start through pgt_buf_end PFNs have been reserved (so memblock_x86_reserve_range has been called), we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top. And then we update those "old" pgt_buf_[end\|top] with the new ones so that we can redo this on the next pagetable. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [v1: Updated with Jeremy's comments] [v2: Added the crash output] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-02 16:33:34 -04:00
Yinghai Lu	2be19102b7	x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() numa_cleanup_meminfo() trims each memblk between low (0) and high (max_pfn) limits and discards empty ones. However, the emptiness detection incorrectly used equality test. If the start of a memblk is higher than max_pfn, it is empty but fails the equality test and doesn't get discarded. The condition triggers when max_pfn is lower than start of a NUMA node and results in memory misconfiguration - leading to WARN_ON()s and other funnies. The bug was discovered in devel branch where 32bit too uses this code path for NUMA init. If a node is above the addressing limit, max_pfn ends up lower than the node triggering this problem. The failure hasn't been observed on x86-64 but is still possible with broken hardware e820/NUMA info. As the fix is very low risk, it would be better to apply it even for 64bit. Fix it by using >= instead of ==. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Extracted the actual fix from the original patch and rewrote patch description. ] Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 19:15:11 +02:00
Ingo Molnar	809435ff4f	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2011-05-01 19:09:39 +02:00
Boris Ostrovsky	e20a2d205c	x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit `b87cf80af3` ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 18:55:51 +02:00
Linus Torvalds	40a963502c	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, nmi: Move LVT un-masking into irq handlers perf events, x86: Work around the Nehalem AAJ80 erratum perf, x86: Fix BTS condition ftrace: Build without frame pointers on Microblaze	2011-04-29 15:08:53 -07:00
Ingo Molnar	301120396b	perf events, x86: Add Westmere stalled-cycles-frontend/backend events Extend the Intel Westmere PMU driver with definitions for generic front-end and back-end stall events. ( These are only approximations. ) Reported-by: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n008io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 16:22:37 +02:00
Ingo Molnar	91fc4cc000	perf, x86: Add new stalled cycles events for Intel and AMD CPUs Extend the Intel and AMD event definitions with generic front-end and back-end stall events. ( These are only approximations - suggestions are welcome for better events. ) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 14:24:15 +02:00
Ingo Molnar	8f62242246	perf events: Add generic front-end and back-end stalled cycle event definitions Add two generic hardware events: front-end and back-end stalled cycles. These events measure conditions when the CPU is executing code but its capabilities are not fully utilized. Understanding such situations and analyzing them is an important sub-task of code optimization workflows. Both events limit performance: most front end stalls tend to be caused by branch misprediction or instruction fetch cachemisses, backend stalls can be caused by various resource shortages or inefficient instruction scheduling. Front-end stalls are the more important ones: code cannot run fast if the instruction stream is not being kept up. An over-utilized back-end can cause front-end stalls and thus has to be kept an eye on as well. The exact composition is very program logic and instruction mix dependent. We use the terms 'stall', 'front-end' and 'back-end' loosely and try to use the best available events from specific CPUs that approximate these concepts. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 14:23:58 +02:00
Sebastian Andrzej Siewior	1ff42c32c7	x86: ce4100: Configure IOAPIC pins for USB and SATA to level type The USB and SATA ioapic interrrupt pins are configured as edge type, but need to be level type interrupts to work correctly. [ tglx: Split out from the combo patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-28 11:38:30 +02:00
Sebastian Andrzej Siewior	20443598d9	x86: devicetree: Configure IOAPIC pin only once We use io_apic_setup_irq_pin() in order to configure pin's interrupt number polarity and type. This is done on every irq_create_of_mapping() which happens for instance during pci enable calls. Level typed interrupts are masked by default, edge are unmasked. On the first ->xlate() call the level interrupt is configured and masked. The driver calls request_irq() and the line is unmasked. Lets assume the interrupt line is shared with another device and we call pci_enable_device() for this device. The ->xlate() configures the pin again and it is masked. request_irq() does not unmask the line because it _is_ already unmasked according to its internal state. So the interrupt will never be unmasked again. This patch is based on an earlier work by Torben Hohn and solves the problem by configuring the pin only once. Since all devices must agree on the same type and polarity there is no point in configuring the pin more than once. [ tglx: Split out the ce4100 part into a separate patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-04-28 11:38:30 +02:00
Ingo Molnar	8a850cadca	perf event, x86: Use better stalled cycles metric Use the UOPS_EXECUTED.*,c=1,i=1 event on Intel CPUs - it is a rather good indicator of CPU execution stalls, more sensitive and more inclusive than the 0xa2 resource stalls event (which does not count nearly as many stall types). Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn2dsrm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-28 08:39:33 +02:00
Don Zickus	2bce5daca2	perf, x86, nmi: Move LVT un-masking into irq handlers It was noticed that P4 machines were generating double NMIs for each perf event. These extra NMIs lead to 'Dazed and confused' messages on the screen. I tracked this down to a P4 quirk that said the overflow bit had to be cleared before re-enabling the apic LVT mask. My first attempt was to move the un-masking inside the perf nmi handler from before the chipset NMI handler to after. This broke Nehalem boxes that seem to like the unmasking before the counters themselves are re-enabled. In order to keep this change simple for 2.6.39, I decided to just simply move the apic LVT un-masking to the beginning of all the chipset NMI handlers, with the exception of Pentium4's to fix the double NMI issue. Later on we can move the un-masking to later in the handlers to save a number of 'extra' NMIs on those particular chipsets. I tested this change on a P4 machine, an AMD machine, a Nehalem box, and a core2quad box. 'perf top' worked correctly along with various other small 'perf record' runs. Anything high stress breaks all the machines but that is a different problem. Thanks to various people for testing different versions of this patch. Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Cyrill Gorcunov <gorcunov@gmail.com>	2011-04-27 17:59:11 +02:00
Ingo Molnar	32673822e4	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core Conflicts: include/linux/perf_event.h Merge reason: pick up the latest jump-label enhancements, they are cooked ready. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-27 10:40:21 +02:00
Ingo Molnar	5c543e3c44	perf events, x86: Mark constrant tables read mostly Various constraint tables were not marked read-mostly. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-wpqwwvmhxucy5e718wnamjiv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 20:04:53 +02:00
Ingo Molnar	94403f8863	perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES The new PERF_COUNT_HW_STALLED_CYCLES event tries to approximate cycles the CPU does nothing useful, because it is stalled on a cache-miss or some other condition. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-fue11vymwqsoo5to72jxxjyl@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 20:04:53 +02:00
Ingo Molnar	7bd5fafeb4	Merge branch 'perf/urgent' into perf/stat Merge reason: We want to queue up dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 19:36:17 +02:00
Ingo Molnar	ec75a71634	perf events, x86: Work around the Nehalem AAJ80 erratum On Nehalem CPUs the retired branch-misses event can be completely bogus, when there are no branch-misses occuring. When there are a lot of branch misses then the count is pretty accurate. Still, this leaves us with an event that over-counts a lot. Detect this erratum and work it around by using BR_MISP_EXEC.ANY events. These will also count speculated branches but still it's a lot more precise in practice than the architectural event. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 19:34:34 +02:00
Peter Zijlstra	18a073a3ac	perf, x86: Fix BTS condition Currently the x86 backend incorrectly assumes that any BRANCH_INSN with sample_period==1 is a BTS request. This is not true when we do frequency driven profiling such as 'perf record -e branches'. Solves this error: $ perf record -e branches ./array Error: sys_perf_event_open() syscall returned with 95 (Operation not supported). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ingo Molnar <mingo@elte.hu> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-26 13:34:34 +02:00
H. Peter Anvin	39b68976ac	x86, setup: When probing memory with e801, use ax/bx as a pair When we use BIOS function e801 to probe memory, we should use ax/bx (or cx/dx) as a pair, not mix and match. This was a typo during the translation from assembly code, and breaks at least one set of machines in the field (which return cx = dx = 0). Reported-and-tested-by: Chris Samuel <chris@csamuel.org> Fix-proposed-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1303566747.12067.10.camel@localhost.localdomain	2011-04-25 14:52:37 -07:00
Frederic Weisbecker	87dc669ba2	x86, hw_breakpoints: Fix racy access to ptrace breakpoints While the tracer accesses ptrace breakpoints, the child task may concurrently exit due to a SIGKILL and thus release its breakpoints at the same time. We can then dereference some freed pointers. To fix this, hold a reference on the child breakpoints before manipulating them. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Will Deacon <will.deacon@arm.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: v2.6.33.. <stable@kernel.org> Link: http://lkml.kernel.org/r/1302284067-7860-3-git-send-email-fweisbec@gmail.com	2011-04-25 17:32:40 +02:00
Justin P. Mattock	fa7b69475a	perf events, x86, P4: Fix typo in comment Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1303492132-3004-1-git-send-email-justinmattock@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-24 13:16:04 +02:00
Linus Torvalds	686c4cbb10	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Add missing syscore_suspend() and syscore_resume() calls PM: Fix error code paths executed after failing syscore_suspend()	2011-04-23 22:35:16 -07:00
Peter Zijlstra	f4929bd372	perf, x86: Update/fix Intel Nehalem cache events Change the Nehalem cache events to use retired memory instruction counters (similar to Westmere), this greatly improves the provided stats. Using: main () { int i; for (i = 0; i < 1000000000; i++) { asm("mov (%%rsp), %%rbx;" "mov %%rbx, (%%rsp);" : : : "rbx"); } } We find: $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% ) 4,999,502,846 l1-dcache-loads:u ( +- 0.008% ) 1,000,034,832 l1-dcache-stores:u ( +- 0.000% ) 1.565184942 seconds time elapsed ( +- 0.005% ) The 5b is surprising - we'd expect 1b: $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% ) 1,000,021,961 r10b:u ( +- 0.000% ) 1,000,030,951 l1-dcache-stores:u ( +- 0.000% ) 1.565055422 seconds time elapsed ( +- 0.003% ) Which this patch thus fixes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 13:50:27 +02:00
Cyrill Gorcunov	1ea5a6afd9	perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow It's not enough to simply disable event on overflow the cpuc->active_mask should be cleared as well otherwise counter may stall in "active" even in real being already disabled (which potentially may lead to the situation that user may not use this counter further). Don pointed out that: " I also noticed this patch fixed some unknown NMIs on a P4 when I stressed the box". Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:21:34 +02:00
Cyrill Gorcunov	103b393481	perf, x86: P4 PMU -- Use perf_sample_data_init helper Instead of opencoded assignments better to use perf_sample_data_init helper. Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:20:04 +02:00
Ingo Molnar	eff430de53	Merge branch 'linus' into perf/core Merge reason: Pick up upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:19:30 +02:00
Ingo Molnar	b52c55c6a2	x86, perf event: Turn off unstructured raw event access to offcore registers Andi Kleen pointed out that the Intel offcore support patches were merged without user-space tool support to the functionality: \| \| The offcore_msr perf kernel code was merged into 2.6.39-rc, but the \| user space bits were not. This made it impossible to set the extra mask \| and actually do the OFFCORE profiling \| Andi submitted a preliminary patch for user-space support, as an extension to perf's raw event syntax: \| \| Some raw events -- like the Intel OFFCORE events -- support additional \| parameters. These can be appended after a ':'. \| \| For example on a multi socket Intel Nehalem: \| \| perf stat -e r1b7:20ff -a sleep 1 \| \| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0 \| that measures any access to DRAM on another socket. \| But this kind of usability is absolutely unacceptable - users should not be expected to type in magic, CPU and model specific incantations to get access to useful hardware functionality. The proper solution is to expose useful offcore functionality via generalized events - that way users do not have to care which specific CPU model they are using, they can use the conceptual event and not some model specific quirky hexa number. We already have such generalization in place for CPU cache events, and it's all very extensible. "Offcore" events measure general DRAM access patters along various parameters. They are particularly useful in NUMA systems. We want to support them via generalized DRAM events: either as the fourth level of cache (after the last-level cache), or as a separate generalization category. That way user-space support would be very obvious, memory access profiling could be done via self-explanatory commands like: perf record -e dram ./myapp perf record -e dram-remote ./myapp ... to measure DRAM accesses or more expensive cross-node NUMA DRAM accesses. These generalized events would work on all CPUs and architectures that have comparable PMU features. ( Note, these are just examples: actual implementation could have more sophistication and more parameter - as long as they center around similarly simple usecases. ) Now we do not want to revert all* of the current offcore bits, as they are still somewhat useful for generic last-level-cache events, implemented in this commit: `e994d7d23a`: perf: Fix LLC-* events on Intel Nehalem/Westmere But we definitely do not yet want to expose the unstructured raw events to user-space, until better generalization and usability is implemented for these hardware event features. ( Note: after generalization has been implemented raw offcore events can be supported as well: there can always be an odd event that is marginally useful but not useful enough to generalize. DRAM profiling is definitely not such a category so generalization must be done first. ) Furthermore, PERF_TYPE_RAW access to these registers was not intended to go upstream without proper support - it was a side-effect of the above `e994d7d23a` commit, not mentioned in the changelog. As v2.6.39 is nearing release we go for the simplest approach: disable the PERF_TYPE_RAW offcore hack for now, before it escapes into a released kernel and becomes an ABI. Once proper structure is implemented for these hardware events and users are offered usable solutions we can revisit this issue. Reported-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 10:02:53 +02:00
Andi Kleen	b2508e828d	perf: Support Xeon E7's via the Westmere PMU driver There's a new model number public, 47, for Xeon E7 (aka Westmere EX). Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: a.p.zijlstra@chello.nl Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-22 08:27:29 +02:00
David Rientjes	7a6c654782	x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y when NUMA emulation is enabled is currently broken because it does not iterate through every emulated node and bind cpus that have affinity to it. NUMA emulation should bind each cpu to every local node to accurately represent the true NUMA topology of the underlying machine. debug_cpumask_set_cpu() needs to be fixed at the same time so that the debugging information that it emits shows the new cpumask of the node being assigned when the cpu is being added or removed. It can now take responsibility of setting or clearing the cpu itself to remove the need for duplicate code. Also change its last parameter, "enable", to have the correct bool type since it can only be true or false. -v2: Fix the return statements, by Kosaki Motohiro Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:31:00 +02:00
David Rientjes	37f8527dbf	Revert "x86, NUMA: Fix fakenuma boot failure" Andreas Herrmann reported that `7d6b46707f` ("x86, NUMA: Fix fakenuma boot failure") causes certain physical NUMA topologies (for example AMD Magny-Cours) to move sibling cpus to a single node when in reality they are in separate domains. This may result in some nodes being completely void of cpus, which doesn't accurately represent the correct topology. The system will boot, but will have suboptimal NUMA performance. This commit was intended as a fix for NUMA emulation, but should not cause a regression for real NUMA machines as a side effect. ( There will be a separate fix for the numa-debug code, which will not affect physical topologies. ) Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:30:59 +02:00
Linus Torvalds	8653b3f1d5	Merge branch 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 xen: do not create the extra e820 region at an addr lower than 4G	2011-04-20 17:40:25 -07:00
Stefano Stabellini	ee176455e2	xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32 The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on x86_64, in fact early_ioremap is not used at all to setup the initial pagetable on x86_32. Moreover on x86_32 the two checks are wrong because the range pgt_buf_start..pgt_buf_end initially should be mapped RW because the pages in the range are not pagetable pages yet and haven't been cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is part of the initial mapping, xen_alloc_pte is capable of turning the ptes RO when they become pagetable pages. Fix the issue and improve the readability of the code providing two different implementation of mask_rw_pte for x86_32 and x86_64. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-20 09:43:13 -04:00
Stefano Stabellini	24bdb0b62c	xen: do not create the extra e820 region at an addr lower than 4G Do not add the extra e820 region at a physical address lower than 4G because it breaks e820_end_of_low_ram_pfn(). It is OK for us to move the xen_extra_mem_start up and down because this is the index of the memory that can be ballooned in/out - it is memory not available to the kernel during bootup. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-04-20 09:04:40 -04:00
Rafael J. Wysocki	19234c0819	PM: Add missing syscore_suspend() and syscore_resume() calls Device suspend/resume infrastructure is used not only by the suspend and hibernate code in kernel/power, but also by APM, Xen and the kexec jump feature. However, commit `40dc166cb5` (PM / Core: Introduce struct syscore_ops for core subsystems PM) failed to add syscore_suspend() and syscore_resume() calls to that code, which generally leads to breakage when the features in question are used. To fix this problem, add the missing syscore_suspend() and syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c and drivers/xen/manage.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Ian Campbell <ian.campbell@citrix.com>	2011-04-20 00:36:11 +02:00
Linus Torvalds	9d914b3ef3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, gart: Make sure GART does not map physmem above 1TB x86, gart: Set DISTLBWALKPRB bit always x86, gart: Convert spaces to tabs in enable_gart_translation	2011-04-19 10:58:13 -07:00
Linus Torvalds	96ad999918	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Fix AMD family 15h FPU event constraints perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus perf evsel: Fix use of inherit perf hists browser: Fix seg fault when annotate null symbol	2011-04-19 10:56:02 -07:00
Robert Richter	c8e5910edf	perf, x86: Use ALTERNATIVE() to check for X86_FEATURE_PERFCTR_CORE Using ALTERNATIVE() when checking for X86_FEATURE_PERFCTR_CORE avoids an extra pointer chase and data cache hit. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1302913676-14352-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:08:12 +02:00
Robert Richter	855357a217	perf, x86: Fix AMD family 15h FPU event constraints Depending on the unit mask settings some FPU events may be scheduled only on cpu counter #3. This patch fixes this. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@googlemail.com> Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:07:55 +02:00
Andre Przywara	83112e688f	perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus With AMD cpu family 15h a unit mask was introduced for the Data Cache Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0 (first data cache miss or streaming store to a 64 B cache line) of this mask to proper count data cache misses. Now we set this bit for all families and models. In case a PMU does not implement a unit mask for event 0x041 the bit is ignored. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 10:07:54 +02:00
Ingo Molnar	68d2cf25d3	Merge branch 'perf/urgent' into perf/core Merge reason: we'll be queueing up dependent changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-19 07:56:17 +02:00
Joerg Roedel	665d3e2af8	x86, gart: Make sure GART does not map physmem above 1TB The GART can only map physical memory below 1TB. Make sure the gart driver in the kernel does not try to map memory above 1TB. Cc: <stable@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-18 09:26:49 -07:00

1 2 3 4 5 ...

12954 Commits