linux/arch
Peter Zijlstra 9536c8d2da perf/x86: Optimize intel_pmu_pebs_fixup_ip()
There's been reports of high NMI handler overhead, highlighted by
such kernel messages:

  [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
  [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs

Don Zickus analyzed the source of the overhead and reported:

 > While there are a few places that are causing latencies, for now I focused on
 > the longest one first.  It seems to be 'copy_user_from_nmi'
 >
 > intel_pmu_handle_irq ->
 >	intel_pmu_drain_pebs_nhm ->
 >		__intel_pmu_drain_pebs_nhm ->
 >			__intel_pmu_pebs_event ->
 >				intel_pmu_pebs_fixup_ip ->
 >					copy_from_user_nmi
 >
 > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
 > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
 > (there are some cases where only 10 iterations are needed to go that high
 > too, but in generall over 50 or so).  At this point copy_user_from_nmi
 > seems to account for over 90% of the nmi latency.

The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.

Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.

Don reported this test result:

 > Your patch made a huge difference in improvement.  The
 > copy_from_user_nmi() no longer hits the million of cycles.  I still
 > have a batch of 100,000-300,000 cycles.  My longest NMI paths used
 > to be dominated by copy_from_user_nmi, now it is not (I have to dig
 > up the new hot path).

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-16 15:44:00 +02:00
..
alpha Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
arc ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc" 2013-10-12 12:00:36 +05:30
arm ARM: SoC fixes for 3.12-rc 2013-10-13 09:59:10 -07:00
arm64 arm64: Remove duplicate DEBUG_STACK_USAGE config 2013-10-02 18:03:26 +01:00
avr32 avr32: cast syscall_return to silence compiler warning 2013-09-30 08:42:01 +02:00
blackfin Merge branch 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2013-09-13 07:31:38 -07:00
c6x Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
cris Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
frv Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
h8300 Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
hexagon Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
ia64 Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
m32r Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
m68k Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
metag Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
microblaze Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
mips Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-10-12 11:06:18 -07:00
mn10300 Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
openrisc openrisc: clean-up prom.h 2013-09-24 21:12:27 -05:00
parisc parisc: let probe_kernel_read() capture access to page zero 2013-10-13 17:46:31 +02:00
powerpc compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
s390 compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
score Score: Modify the Makefile of Score, remove -mlong-calls for compiling 2013-09-26 03:46:03 +08:00
sh Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
sparc compiler/gcc4: Add quirk for 'asm goto' miscompilation bug 2013-10-11 07:39:14 +02:00
tile arch: tile: re-use kbasename() helper 2013-09-30 10:34:46 -04:00
um Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
unicore32 Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
x86 perf/x86: Optimize intel_pmu_pebs_fixup_ip() 2013-10-16 15:44:00 +02:00
xtensa Xtensa patchset for v3.12 2013-09-13 10:57:48 -07:00
.gitignore
Kconfig mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef 2013-09-28 12:46:21 +02:00