linux/arch/x86
Don Zickus 13beacee81 perf/x86/p4: Fix counter corruption when using lots of perf groups
On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:23 +01:00
..
boot x86, boot: Fix word-size assumptions in has_eflag() inline asm 2014-01-30 08:04:32 -08:00
configs x86, defconfig: Add DEVTMPFS and DEVTMPFS_MOUNT to *86*_defconfig 2013-11-04 20:01:55 -08:00
crypto crypto: aesni - fix build on x86 (32bit) 2014-01-15 11:36:34 +08:00
ia32 constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
include x86/nmi: Push duration printk() to irq context 2014-02-09 13:17:22 +01:00
kernel perf/x86/p4: Fix counter corruption when using lots of perf groups 2014-02-09 13:17:23 +01:00
kvm kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef 2014-01-29 18:10:45 +01:00
lguest x86, asmlinkage, lguest: Fix C functions used by inline assembler 2014-01-29 22:17:17 -08:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
math-emu x86: math-emu: Drop already-disabled print of build date 2014-01-27 23:14:12 +01:00
mm Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-02-08 11:54:43 -08:00
net bpf: do not use reciprocal divide 2014-01-15 17:02:08 -08:00
oprofile perf: Fix arch_perf_out_copy_user default 2013-11-06 12:34:25 +01:00
pci ACPI and power management updates for 3.14-rc1 2014-01-24 15:51:02 -08:00
platform x86/efi: Allow mapping BGRT on x86-32 2014-02-05 23:39:34 +00:00
power
realmode Merge commit 'f4bcd8ccddb02833340652e9f46f5127828eb79d' into x86/build 2014-01-29 09:07:00 -08:00
syscalls sched: Add new scheduler syscalls to support an extended scheduling parameters ABI 2014-01-13 13:41:04 +01:00
tools Merge commit 'f4bcd8ccddb02833340652e9f46f5127828eb79d' into x86/build 2014-01-29 09:07:00 -08:00
um um, x86: Fix vDSO build 2014-01-12 16:47:31 +01:00
vdso Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:03:57 -08:00
video
xen Bug-fixes: 2014-02-05 16:01:11 -08:00
.gitignore
Kbuild
Kconfig * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). 2014-02-07 11:27:30 -08:00
Kconfig.cpu
Kconfig.debug x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs 2014-02-05 14:10:30 -08:00
Makefile x86, build: Build 16-bit code with -m16 where possible 2014-01-30 08:05:36 -08:00
Makefile_32.cpu
Makefile.um