Pull perf updates from Ingo Molnar:
"Kernel improvements:
- watchdog driver improvements by Li Zefan
- Power7 CPI stack events related improvements by Sukadev Bhattiprolu
- event multiplexing via hrtimers and other improvements by Stephane
Eranian
- kernel stack use optimization by Andrew Hunter
- AMD IOMMU uncore PMU support by Suravee Suthikulpanit
- NMI handling rate-limits by Dave Hansen
- various hw_breakpoint fixes by Oleg Nesterov
- hw_breakpoint overflow period sampling and related signal handling
fixes by Jiri Olsa
- Intel Haswell PMU support by Andi Kleen
Tooling improvements:
- Reset SIGTERM handler in workload child process, fix from David
Ahern.
- Makefile reorganization, prep work for Kconfig patches, from Jiri
Olsa.
- Add automated make test suite, from Jiri Olsa.
- Add --percent-limit option to 'top' and 'report', from Namhyung
Kim.
- Sorting improvements, from Namhyung Kim.
- Expand definition of sysfs format attribute, from Michael Ellerman.
Tooling fixes:
- 'perf tests' fixes from Jiri Olsa.
- Make Power7 CPI stack events available in sysfs, from Sukadev
Bhattiprolu.
- Handle death by SIGTERM in 'perf record', fix from David Ahern.
- Fix printing of perf_event_paranoid message, from David Ahern.
- Handle realloc failures in 'perf kvm', from David Ahern.
- Fix divide by 0 in variance, from David Ahern.
- Save parent pid in thread struct, from David Ahern.
- Handle JITed code in shared memory, from Andi Kleen.
- Fixes for 'perf diff', from Jiri Olsa.
- Remove some unused struct members, from Jiri Olsa.
- Add missing liblk.a dependency for python/perf.so, fix from Jiri
Olsa.
- Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
- No need to do locking when adding hists in perf report, only 'top'
needs that, from Namhyung Kim.
- Fix alignment of symbol column in in the hists browser (top,
report) when -v is given, from NAmhyung Kim.
- Fix 'perf top' -E option behavior, from Namhyung Kim.
- Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
- Fix compile errors in bp_signal 'perf test', from Sukadev
Bhattiprolu.
... and more things"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
perf/x86: Fix shared register mutual exclusion enforcement
perf/x86/intel: Support full width counting
x86: Add NMI duration tracepoints
perf: Drop sample rate when sampling is too slow
x86: Warn when NMI handlers take large amounts of time
hw_breakpoint: Introduce "struct bp_cpuinfo"
hw_breakpoint: Simplify *register_wide_hw_breakpoint()
hw_breakpoint: Introduce cpumask_of_bp()
hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
perf/x86/intel: Add mem-loads/stores support for Haswell
perf/x86/intel: Support Haswell/v4 LBR format
perf/x86/intel: Move NMI clearing to end of PMI handler
perf/x86/intel: Add Haswell PEBS support
perf/x86/intel: Add simple Haswell PMU support
perf/x86/intel: Add Haswell PEBS record support
perf/x86/intel: Fix sparse warning
perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
perf/x86/amd: Add IOMMU Performance Counter resource management
...
Pull WW mutex support from Ingo Molnar:
"This tree adds support for wound/wait style locks, which the graphics
guys would like to make use of in the TTM graphics subsystem.
Wound/wait mutexes are used when other multiple lock acquisitions of a
similar type can be done in an arbitrary order. The deadlock handling
used here is called wait/wound in the RDBMS literature: The older
tasks waits until it can acquire the contended lock. The younger
tasks needs to back off and drop all the locks it is currently
holding, ie the younger task is wounded.
See this LWN.net description of W/W mutexes:
https://lwn.net/Articles/548909/
The comments there outline specific usecases for this facility (which
have already been implemented for the DRM tree).
Also see Documentation/ww-mutex-design.txt for more details"
* 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking-selftests: Handle unexpected failures more strictly
mutex: Add more w/w tests to test EDEADLK path handling
mutex: Add more tests to lib/locking-selftest.c
mutex: Add w/w tests to lib/locking-selftest.c
mutex: Add w/w mutex slowpath debugging
mutex: Add support for wound/wait style locks
arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
Here's the big driver core merge for 3.11-rc1
Lots of little things, and larger firmware subsystem updates, all
described in the shortlog. Nice thing here is that we finally get rid
of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
been always on for a number of kernel releases, now it's just removed.)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlHRsGMACgkQMUfUDdst+ylIIACfW8lLxOPVK+iYG699TWEBAkp0
LFEAnjlpAMJ1JnoZCuWDZObNCev93zGB
=020+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the big driver core merge for 3.11-rc1
Lots of little things, and larger firmware subsystem updates, all
described in the shortlog. Nice thing here is that we finally get rid
of CONFIG_HOTPLUG, after 10+ years, thanks to Stephen Rohtwell (it had
been always on for a number of kernel releases, now it's just
removed)"
* tag 'driver-core-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
driver core: device.h: fix doc compilation warnings
firmware loader: fix another compile warning with PM_SLEEP unset
build some drivers only when compile-testing
firmware loader: fix compile warning with PM_SLEEP set
kobject: sanitize argument for format string
sysfs_notify is only possible on file attributes
firmware loader: simplify holding module for request_firmware
firmware loader: don't export cache_firmware and uncache_firmware
drivers/base: Use attribute groups to create sysfs memory files
firmware loader: fix compile warning
firmware loader: fix build failure with !CONFIG_FW_LOADER_USER_HELPER
Documentation: Updated broken link in HOWTO
Finally eradicate CONFIG_HOTPLUG
driver core: firmware loader: kill FW_ACTION_NOHOTPLUG requests before suspend
driver core: firmware loader: don't cache FW_ACTION_NOHOTPLUG firmware
Documentation: Tidy up some drivers/base/core.c kerneldoc content.
platform_device: use a macro instead of platform_driver_register
firmware: move EXPORT_SYMBOL annotations
firmware: Avoid deadlock of usermodehelper lock at shutdown
dell_rbu: Select CONFIG_FW_LOADER_USER_HELPER explicitly
...
Pull VFS patches (part 1) from Al Viro:
"The major change in this pile is ->readdir() replacement with
->iterate(), dealing with ->f_pos races in ->readdir() instances for
good.
There's a lot more, but I'd prefer to split the pull request into
several stages and this is the first obvious cutoff point."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
[readdir] constify ->actor
[readdir] ->readdir() is gone
[readdir] convert ecryptfs
[readdir] convert coda
[readdir] convert ocfs2
[readdir] convert fatfs
[readdir] convert xfs
[readdir] convert btrfs
[readdir] convert hostfs
[readdir] convert afs
[readdir] convert ncpfs
[readdir] convert hfsplus
[readdir] convert hfs
[readdir] convert befs
[readdir] convert cifs
[readdir] convert freevxfs
[readdir] convert fuse
[readdir] convert hpfs
reiserfs: switch reiserfs_readdir_dentry to inode
reiserfs: is_privroot_deh() needs only directory inode, actually
...
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.
There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:
group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE
The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.
This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf fixes from Ingo Molnar:
"Three small fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
kprobes: Fix arch_prepare_kprobe to handle copy insn failures
This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.
Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.
Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)
When the new capability is set use the alternative range which
do not have these restrictions.
This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.
( See the patch "perf/x86/intel: Avoid checkpointed counters
causing excessive TSX aborts" for more details. )
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch has been invaluable in my adventures finding
issues in the perf NMI handler. I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.
Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second. If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.
This way, we don't have a runaway sampling process eating up the
CPU.
This patch can tend to drop the sample rate down to level where
perf doesn't work very well. *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.
I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.
BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I have a system which is causing all kinds of problems. It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.
If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs. The biggest problem, though, is that this happens
_silently_. You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.
This patch should at least give us some warning before we fall
off the cliff. the warnings look like this:
nmi_handle: perf_event_nmi_handler() took: 26095071 ns
The message is triggered whenever we notice the longest NMI
we've seen to date. You can always view and reset this value
via the debugfs interface if you like.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull vfs fixes from Al Viro:
"Several fixes for bugs caught while looking through f_pos (ab)users"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aout32 coredump compat fix
splice: don't pass the address of ->f_pos to methods
mconsole: we'd better initialize pos before passing it to vfs_read()...
dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull x86 fixes from Peter Anvin:
"This series fixes a couple of build failures, and fixes MTRR cleanup
and memory setup on very specific memory maps.
Finally, it fixes triggering backtraces on all CPUs, which was
inadvertently disabled on x86."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Fix dummy variable buffer allocation
x86: Fix trigger_all_cpu_backtrace() implementation
x86: Fix section mismatch on load_ucode_ap
x86: fix build error and kconfig for ia32_emulation and binfmt
range: Do not add new blank slot with add_range_with_merge
x86, mtrr: Fix original mtrr range get for mtrr_cleanup
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRwc4AAAoJEBvWZb6bTYbyT/wQAIAvvjM+jlqOsZCUjnutdWXf
VGvJB1jXPEa6cuUkaE3RYO8rLLJz/PC4Geg8I/awtjeSkPvldeu0FihSkdbrNkpJ
3dSAJ0cR8+ScUZB74Lt1pfwwEAdZk61MU1opD1qy3wWTy5Rk2xp4qzE2aCVJoLkh
K8gscLrCHR47DELcm+AWCtHt/0VOZSAVe9z/7Qf9eAMCNmH/efHgrZZTxF/1aBkI
7XaZdqT+d84D/Bc2isdRNvFYt9rkuII2GCeRciw2t3QsgVoBXn7FVK2OXbufIaYW
/XX0YiNpSnUpcpOtGw2MwzpKgDSRE9QbnOUVsWThTDeG7Q5d72ioCFuRDba8p3Lc
u146nQM495bAdJQz6IW3JCeiZlQ86/QjFquk9hJuCwQpuHmlRpVEqFbLVnZEnYvk
lc9dnK7BcaftbMgo/j6DMv6Bpq/EDp1JrPXzNT4BwG7ptArEXzmdoktwpONjrZzI
1oZV5xCcNhTjyqlndiZKxneKXoqz/3NOdR+EV0yh3+0l4PK6C52jd6PaKRv7qN3l
Lt35uFrmKqzOR12KVStYREyh1Sy43ADE4tIIPnOBbcXz23Df4/vVSb1Bqv0nPq4i
JhsRsGoR8ZdDBd7c40XlvpxOiMmubFLSklX0WaKn2yIZH9FMj63Ak0wet7qhaqb8
lbJRUH8gnK0RCVEoUFpB
=yGPS
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Three one-line fixes for my first pull request; one for x86 host, one
for x86 guest, one for PPC"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86: kvmclock: zero initialize pvclock shared memory area
kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
KVM: x86: remove vcpu's CPL check in host-invoked XCR set
Pull crypto fix from Herbert Xu:
"This fixes an unaligned crash in XTS mode when using aseni_intel"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni_intel - fix accessing of unaligned memory
to initiate garbage collection. Also, free the kernel memory when
we're done with it instead of leaking - Ben Hutchings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRxCKWAAoJEC84WcCNIz1VIDcP/icYJSdXw/fKUgCG9+zAk3FE
xivUa6aJKG+GBL4Cg2VYTJ/sBDA1v0xdFcf73amCdTcWuor6ha5LxrYFwFhhKtYk
3O05spiCZ+F6nveZTrrxqKce3cN5XOYwbF8bZOAuJYmjaTvZQ0Cb89zf/asC4+Fx
ui6HCRmCy6nUy5Fn4A7fBHlTyuOE4xZXvZjOUXWGiUwmZK/SbHYtN1+4nXBTnAWF
yDcB8fkl4v7cYLIl9U8dc7Dg8ita4bj6F8RRNJf2WXipsOpMoMRtiua8VLCZ0ECc
9cpONx7wMQS5nDEfNefWm4vdqS8PEG1CkZ8tN37nIU/AlTi6mTyTqZv0ICQfgVna
6nChmkz9uxvzTzlMlOR5eGBu68xQLiyHhauTJBpMkwOp4q4gg/Ui+bhV7Vo9ySF3
DlrSekT0Uo1PLgdAUYg7+hZhN98HGHoGSEx7FsymOjqlY3nF+4k7ZeTerEI4vI2D
zqyj5QVWqS/M9zj1ojUu3nFDNhfQHO4PvVt9O8ca4/IJS1pbb/J8qtwIOFTNdJHf
2Iyk6GnT9UaKvOs4trWXmPjiZkBcS8JdQH80fDSIgIxLhwOQksT+q90TY5h3mAED
Hwm9JywXlxpFncPixAkpoorqDWSZmBTmUw9JR0FQNTFGuENPA0SGn51P2gGbTA+e
82ECeWRXMMoYA/qNxMI2
=kfTf
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
we're done with it instead of leaking - Ben Hutchings
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
1. Check for allocation failure
2. Clear the buffer contents, as they may actually be written to flash
3. Don't leak the buffer
Compile-tested only.
[ Tested successfully on my buggy ASUS machine - Matt ]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull scheduler fixes from Ingo Molnar:
"Two smaller fixes - plus a context tracking tracing fix that is a bit
bigger"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/context-tracking: Add preempt_schedule_context() for tracing
sched: Fix clear NOHZ_BALANCE_KICK
sched/x86: Construct all sibling maps if smt
Pull perf fixes from Ingo Molnar:
"Four fixes. The mmap ones are unfortunately larger than desired -
fuzzing uncovered bugs that needed perf context life time management
changes to fix properly"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
perf: Fix mmap() accounting hole
perf: Fix perf mmap bugs
kprobes: Fix to free gone and unused optprobes
Pull cpu idle fixes from Thomas Gleixner:
- Add a missing irq enable. Fallout of the idle conversion
- Fix stackprotector wreckage caused by the idle conversion
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
idle: Enable interrupts in the weak arch_cpu_idle() implementation
idle: Add the stack canary init to cpu_startup_entry()
Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.
This fix is also related to the following bug-report:
https://bugzilla.redhat.com/show_bug.cgi?id=910649
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Jonathan Lebon <jlebon@redhat.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.
trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.
x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.
I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.
Tested via: echo l > /proc/sysrq-trigger
Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )
After the change, this shows NMI backtraces on all CPUs
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are in the process of removing all the __cpuinit annotations.
While working on making that change, an existing problem was
made evident:
WARNING: arch/x86/kernel/built-in.o(.text+0x198f2): Section mismatch
in reference from the function cpu_init() to the function
.init.text:load_ucode_ap() The function cpu_init() references
the function __init load_ucode_ap(). This is often because cpu_init
lacks a __init annotation or the annotation of load_ucode_ap is wrong.
This now appears because in my working tree, cpu_init() is no longer
tagged as __cpuinit, and so the audit picks up the mismatch. The 2nd
hypothesis from the audit is the correct one, as there was an incorrect
__init tag on the prototype in the header (but __cpuinit was used on
the function itself.)
The audit is telling us that the prototype's __init annotation took
effect and the function did land in the .init.text section. Checking
with objdump on a mainline tree that still has __cpuinit shows that
the __cpuinit on the function takes precedence over the __init on the
prototype, but that won't be true once we make __cpuinit a no-op.
Even though we are removing __cpuinit, we temporarily align both
the function and the prototype on __cpuinit so that the changeset
can be applied to stable trees if desired.
[ hpa: build fix only, no object code change ]
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@vger.kernel.org> # 3.9+
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1371654926-11729-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.
Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.
There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.
The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.
Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.
(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add simple PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new
events.
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similar to SandyBridge, but has a few new events and two
new counter bits.
There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.
Also we add support for the counter 2 constraint to handle
all raw events.
(Contains fixes from Stephane Eranian.)
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for the Haswell extended (fmt2) PEBS format.
It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.
The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.
Some other features are added in later patches.
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.
To invoke the AMD IOMMU PMU, issue a perf tool command such as:
./perf stat -a -e amd_iommu/<events>/ <command>
or:
./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>
For example:
./perf stat -a -e amd_iommu/mem_trans_total/ <command>
The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
intel_pmu_handle_irq() has a warning in it if it does too many
loops. It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages. I doubt this is what was
intended.
This patch will only print the PMU state when paired with the
WARN_ON() text. It effectively open-codes WARN_ONCE()'s
one-time-only logic.
My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th. From what I've seen, this is very
unlikely to happen since we also clear the PMU state.
After this patch, instead of seeing the PMU state dumped each
time, you will just see:
[57494.894540] perf_event_intel: clearing PMU state on CPU#129
[57579.539668] perf_event_intel: clearing PMU state on CPU#10
[57587.137762] perf_event_intel: clearing PMU state on CPU#134
[57623.039912] perf_event_intel: clearing PMU state on CPU#114
[57644.559943] perf_event_intel: clearing PMU state on CPU#118
...
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
x86_schedule_events() caches event constraints on the stack during
scheduling. Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.
Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event. For 8 bytes per event (1% of
its size) we can save the giant stack frame.
This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.
Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong. I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.
Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.
This patch moves the definition of LDLAT back into the
snb_ep table.
Thanks to Don Zickus for tracking this one down.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:
do {
version = __pvclock_read_cycles(src, &ret, &flags);
} while ((src->version & 1) || version != src->version);
if secondary kvmclock is accessed before it's registered with kvm.
Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis. We may want a different fix for 3.11, but
this is the safest for now - Paolo]
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF
when COMPAT_BINFMT_ELF is being selected.
warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF)
fs/built-in.o: In function `elf_core_dump':
compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size'
compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs'
compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data'
[ hpa: This was sent to me for -next but it is a low risk build fix ]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/51C0B614.5000708@infradead.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.
*BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G
https://bugzilla.kernel.org/show_bug.cgi?id=59491
So it rejects new var mtrr layout.
It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
x86_get_mtrr_mem_range
==> bunchs of add_range_with_merge
==> bunchs of subract_range
==> clean_sort_range
add_range_with_merge for [0,1M)
sort_range()
add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.
So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.
Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
__kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is
called in two flows, one is invoked by guest, call stack shown as below,
handle_xsetbv(or xsetbv_interception)
kvm_set_xcr
__kvm_set_xcr
the other one is invoked by host, for example during system reset:
kvm_arch_vcpu_ioctl
kvm_vcpu_ioctl_x86_set_xcrs
__kvm_set_xcr
The former does need the CPL check, but the latter does not.
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Haoyu <haoyu.zhang@huawei.com>
[Tweaks to commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull x86 fixes from Peter Anvin:
"Another set of fixes, the biggest bit of this is yet another tweak to
the UEFI anti-bricking code; apparently we finally got some feedback
from Samsung as to what makes at least their systems fail. This set
should actually fix the boot regressions that some other systems (e.g.
SGI) have exhibited.
Other than that, there is a patch to avoid a panic with particularly
unhappy memory layouts and two minor protocol fixes which may or may
not be manifest bugs"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Fix typo in kexec register clearing
x86, relocs: Move __vvar_page from S_ABS to S_REL
Modify UEFI anti-bricking code
x86: Fix adjust_range_size_mask calling position
few users were reporting boot regressions in v3.9. This has now been
fixed with a more accurate "minimum storage requirement to avoid
bricking" value from Samsung (5K instead of 50%) and code to trigger
garbage collection when we near our limit - Matthew Garrett.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRtkY2AAoJEC84WcCNIz1VJOsP/00xwiY4VKh2RfqNkYKSl/w5
gEshIHFEAXHX5X8C4ReocZVywvdjTgbJoKBbBy3FePYRzLddrmavvjen17hk7BzS
/cO8/eXForkNWCGR1kLagA6HLpgKP5DPayKizoMb4Mg6muzfT1SCcN6Pzh8cDMWe
btcq/l9JZejXdJ4Wfoq1My+WdXs19OT/BNeD3y65K4x29vNUjop6oaIdDJWLlH/S
aeLHh8d4xbSHNWzK1fBP7CnFTYU27xxs1BFNAReU6McxeQCYZAIaRovYnjTZEvfJ
twd2tLrOn9HBVTbWa8T4XGNSr+QcT4XGMadLvdwuqltmKDfH6Onm8aWQM3IqA7gy
Qimbcv2B7HrITgXWTzp3DPkXF1LA8/8QHSBXVMUU9Rl6QOLy18vIdKiQy3M1Ng9Z
0q+Ow93JtnL11zf9wLDMdKaKcA9HOxbG/wRTK6XO4vGaWj9brFv3n5Ib7OreHH6D
GP58zDEnThFuj97K/NKREBZZFcFOMZpKk5MAipVkzltihUQmNeTF/dAtBJ3Ncu/A
PqQE6uuKVXjASJR8Gy0bI3WHtSTZK4L/sg9c2MF3bdJa9BswN+m8IEbls+S+iFOx
+sYPQx7Zw6SFENxDw8cDYNzC14yfr60qyOxTWfkHH7l/FnvhOgwHzqPsLcXx0ouR
C6k1yPYSTgiqFdWC2sjn
=TZuM
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' into x86/urgent
* More tweaking to the EFI variable anti-bricking algorithm. Quite a
few users were reporting boot regressions in v3.9. This has now been
fixed with a more accurate "minimum storage requirement to avoid
bricking" value from Samsung (5K instead of 50%) and code to trigger
garbage collection when we near our limit - Matthew Garrett.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The new XTS code for aesni_intel uses input buffers directly as memory operands
for pxor instructions, which causes crash if those buffers are not aligned to
16 bytes.
Patch changes XTS code to handle unaligned memory correctly, by loading memory
with movdqu instead.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The __vvar_page relocation should actually be listed in S_REL instead
of S_ABS. Oddly, this didn't always cause things to break, presumably
because there are no users for relocation information on 64 bits yet.
[ hpa: Not for stable - new code in 3.10 ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130611185652.GA23674@www.outflux.net
Reported-by: Michael Davidson <md@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>