The test should be >= ARRAY_SIZE() instead of > ARRAY_SIZE().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20120905123126.GC6128@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds a cpumask file to the uncore pmu sysfs directory. The
cpumask file contains one active cpu for every socket.
Signed-off-by: "Yan, Zheng" <zheng.z.yan@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: "Yan, Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1347263631-23175-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
arch_uprobe_disable_step() should also take UTASK_SSTEP_TRAPPED into
account. In this case the probed insn was not executed, we need to
clear X86_EFLAGS_TF if it was set by us and that is all.
Again, this code will look more clean when we move it into
arch_uprobe_post_xol() and arch_uprobe_abort_xol().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and
returns to user-mode. But this means the application gets SIGTRAP
only after the next insn.
This means that UPROBE_CLEAR_TF logic is not really right. _enable
should only record the state of X86_EFLAGS_TF, and _disable should
check it separately from UPROBE_FIX_SETF.
Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and
change enable/disable accordingly. This assumes that the probed insn
was not trapped, see the next patch.
arch_uprobe_skip_sstep() logic has the same problem, change it to
check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this
all after we fold enable/disable_step into pre/post_hol hooks.
Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap().
But this needs more changes, handle_swbp() does the same and this is
equally wrong.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
user_enable/disable_single_step() was designed for ptrace, it assumes
a single user and does unnecessary and wrong things for uprobes. For
example:
- arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an
application itself can set X86_EFLAGS_TF which must be
preserved after arch_uprobe_disable_step().
- we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in
arch_uprobe_enable_step(), this only makes sense for ptrace.
- otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step()
doesn't do user_disable_single_step(), the application will
be killed after the next syscall.
- arch_uprobe_enable_step() does access_process_vm() we do
not need/want.
Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF
directly, this is much simpler and more correct. However, we need to
clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn,
add set_task_blockstep(false).
Note: with or without this patch, there is another (hopefully minor)
problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by
uprobes. Perhaps we should change _disable to update the stack, or
teach arch_uprobe_skip_sstep() to emulate this insn.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in
step.c was always very wrong.
1. update_debugctlmsr() was simply unneeded. The child sleeps
TASK_TRACED, __switch_to_xtra(next_p => child) should notice
TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if
needed.
2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register
should always match the state of current's TIF_BLOCKSTEP bit.
3. Even get_debugctlmsr() + update_debugctlmsr() itself does not
look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR
register or the caller can be preempted in between.
4. It is not safe to play with TIF_BLOCKSTEP if task != current.
DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each
other if the task is running. The tracee is stopped but it
can be SIGKILL'ed right before set/clear_tsk_thread_flag().
However, now that uprobes uses user_enable_single_step(current)
we can't simply remove update_debugctlmsr(). So this patch adds
the additional "task == current" check and disables irqs to avoid
the race with interrupts/preemption.
Unfortunately this patch doesn't solve the last problem, we need
another fix. Probably we should teach ptrace_stop() to set/clear
single/block stepping after resume.
And afaics there is yet another problem: perf can play with
MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even
__switch_to_xtra() has problems.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
No functional changes, preparation for the next fix and for uprobes
single-step fixes.
Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the
new helper, set_task_blockstep().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
The arch specific implementation behaves like user_enable_single_step()
except that it does not disable single stepping if it was already
enabled by ptrace. This allows the debugger to single step over an
uprobe. The state of block stepping is not restored. It makes only sense
together with TF and if that was enabled then the debugger is notified.
Note: this is still not correct. For example, TIF_SINGLESTEP check
is not right, the application itself can set X86_EFLAGS_TF. And otoh
we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf".
See the next patches, we need the changes in arch/x86/kernel/step.c
first.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Fix kprobes/x86 to support jprobes on ftrace-based kprobes.
Because of -mfentry support of ftrace, ftrace is now put
on the beginning of function where jprobes are put.
Originally ftrace-based kprobes doesn't support jprobe
because it will change regs->ip and ftrace doesn't support
changing IP and ftrace itself doesn't conflict jprobe.
However, ftrace -mfentry support moves mcount call on the
top of functions where jprobes are put. This means that
jprobe always conflicts with ftrace-based kprobe and fails.
This patch allows ftrace-based kprobes to support jprobes
by allowing to modify regs->ip and kprobes breakpoint
handler also allows to skip singlestepping because there
is a ftrace call (not an original instruction).
Link: http://lkml.kernel.org/r/20120905143125.10329.90836.stgit@localhost.localdomain
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Allow ftrace handlers to change RIP register (regs->ip)
in handlers. This will allow handlers to call another
function instead of original function.
Link: http://lkml.kernel.org/r/20120905143118.10329.5078.stgit@localhost.localdomain
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Current kprobe_ftrace_handler expects regs->ip == ip, but it is
incorrect (originally on x86-64). Actually, ftrace handler sets
regs->ip = ip + MCOUNT_INSN_SIZE.
kprobe_ftrace_handler must take care for that.
Link: http://lkml.kernel.org/r/20120905143112.10329.72069.stgit@localhost.localdomain
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Adjust x86 regs.ip to ip + MCOUNT_INSN_SIZE as like as
on x86-64. This helps us to consolidate codes which use
regs->ip on both of x86/x86-64.
Link: http://lkml.kernel.org/r/20120905143100.10329.60109.stgit@localhost.localdomain
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The following patch makes the microcode update code path
actually invoke the perf_check_microcode() function and
thus potentially renabling SNB PEBS.
By default, CONFIG_MICROCODE_OLD_INTERFACE is
forced to Y in arch/x86/Kconfig. There is no
way to disable this. That means that the code
path used in arch/x86/kernel/microcode_core.c
did not include the call to perf_check_microcode().
Thus, even though the microcode was updated to a
version that fixes the SNB PEBS problem, perf_event
would still return EOPNOTSUPP when enabling precise
sampling.
This patch simply adds a call to perf_check_microcode()
in the call path used when OLD_INTERFACE=y.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: peterz@infradead.org
Cc: andi@firstfloor.org
Link: http://lkml.kernel.org/r/20120824133434.GA8014@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
then use that instead of mcount.
With mcount, frame pointers are forced with the -pg option and we
get something like:
<can_vma_merge_before>:
55 push %rbp
48 89 e5 mov %rsp,%rbp
53 push %rbx
41 51 push %r9
e8 fe 6a 39 00 callq ffffffff81483d00 <mcount>
31 c0 xor %eax,%eax
48 89 fb mov %rdi,%rbx
48 89 d7 mov %rdx,%rdi
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi
With -mfentry, frame pointers are no longer forced and the call looks
like this:
<can_vma_merge_before>:
e8 33 af 37 00 callq ffffffff81461b40 <__fentry__>
53 push %rbx
48 89 fb mov %rdi,%rbx
31 c0 xor %eax,%eax
48 89 d7 mov %rdx,%rdi
41 51 push %r9
48 33 73 30 xor 0x30(%rbx),%rsi
48 f7 c6 ff ff ff f7 test $0xfffffffff7ffffff,%rsi
This adds the ftrace hook at the beginning of the function before a
frame is set up, and allows the function callbacks to be able to access
parameters. As kprobes now can use function tracing (at least on x86)
this speeds up the kprobe hooks that are at the beginning of the
function.
Link: http://lkml.kernel.org/r/20120807194100.130477900@goodmis.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This issue was recently observed on an AMD C-50 CPU where a patch of
maximum size was applied.
Commit be62adb492 ("x86, microcode, AMD: Simplify ucode verification")
added current_size in get_matching_microcode(). This is calculated as
size of the ucode patch + 8 (ie. size of the header). Later this is
compared against the maximum possible ucode patch size for a CPU family.
And of course this fails if the patch has already maximum size.
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Probably a leftover from the early days of self-patching, p6nops
are marked __initconst_or_module, which causes them to be
discarded in a non-modular kernel. If something later triggers
patching, it will overwrite kernel code with garbage.
Reported-by: Tomas Racek <tracek@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cc: Michael Tokarev <mjt@tls.msk.ru>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When one CPU is going down and this CPU is the last one in irq
affinity, current code is setting cpu_all_mask as the new
affinity for that irq.
But for some systems (such as in Medfield Android mobile) the
firmware sends the interrupt to each CPU in the irq affinity
mask, averaged, and cpu_all_mask includes all potential CPUs,
i.e. offline ones as well.
So replace cpu_all_mask with cpu_online_mask.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
. Fix include order for bison/flex-generated C files, from Ben Hutchings
. Build fixes and documentation corrections from David Ahern
. Group parsing support, from Jiri Olsa
. UI/gtk refactorings and improvements from Namhyung Kim
. NULL deref fix for perf script, from Namhyung Kim
. Assorted cleanups from Robert Richter
. Let O= makes handle relative paths, from Steven Rostedt
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJQMkGhAAoJENZQFvNTUqpAqjsQAJE5iD1LFogC8o/WjvRHz0TY
Y0x+sR/XfW61KYpeq5g+UaKuFU3P44ijCoyks3y5sza97DkYgUwMpEHlLXFSM8Pp
sNOapqY57s24nq3MLrhH1V9w+cSE+m2u/Gi5fGLCQekio9gkOBwYxNGk7vpKri/n
LBRsMozBu/mZjMy20uWOb7Uk8xsAToh+TFaAtjyQ9Snn9nNJj49NUAp37uN888H/
ducMLq32HN5v/6Zd3q6IWdDWgZsHLkIa3R5FIs/GNe3Dih07gtYLmDol4ktPbTFm
yoaWpP5wbtu/62EZlJwE393vMuoeqN/96394ZZQGFafhHVxN4+rcBhXbejBs0T2b
wk/0CzntW8bbUAI/cl3SB9aui//FWOxcjG9aDQ7PsmHzPw1Q4VD0F9Mcod4p+dRX
PsA9q/tST1eAiwzWYthDtj81U7iChINcXKhoZn2xn6+0+aMH+6FFNBmCH8MR5aCU
BvrXhTJjvau/Ym/sILl4Tf4wfssTq49yMsn/YKCwLJ0hg0XlTObWfQRy2MOayXH9
NJvUE+9GSXoTEKhmr1AfTYEG9vObaXZyFwAI74xvPPwUYojCb4ZjEKmG0egW+VGk
IJKFCaJZwwVsGau4aIbFAMP12/L8Qs/Ox91ddCJ0j5TIlSGMaqW5lbV1N1crzlTT
a0GsN49NvhbFttBXrcNX
=0a2X
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Fix include order for bison/flex-generated C files, from Ben Hutchings
* Build fixes and documentation corrections from David Ahern
* Group parsing support, from Jiri Olsa
* UI/gtk refactorings and improvements from Namhyung Kim
* NULL deref fix for perf script, from Namhyung Kim
* Assorted cleanups from Robert Richter
* Let O= makes handle relative paths, from Steven Rostedt
* perf script python fixes, from Feng Tang.
* Improve 'perf lock' error message when the needed tracepoints
are not present, from David Ahern.
* Initial bash completion support, from Frederic Weisbecker
* Allow building without libelf, from Namhyung Kim.
* Support DWARF CFI based unwind to have callchains when %bp
based unwinding is not possible, from Jiri Olsa.
* Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF
section was the end goal, several fixes for code that handles all
architectures and cleanups are included, from Cody Schafer.
* Add a description for the JIT interface, from Andi Kleen.
* Assorted fixes for Documentation and build in 32 bit, from Robert Richter
* Add support for non-tracepoint events in perf script python, from Feng Tang
* Cache the libtraceevent event_format associated to each evsel early, so that we
avoid relookups, i.e. calling pevent_find_event repeatedly when processing
tracepoint events.
[ This is to reduce the surface contact with libtraceevents and make clear what
is that the perf tools needs from that lib: so far parsing the common and per
event fields. ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ftrace updates from Steve Rostedt:
" This patch series extends ftrace function tracing utility to be
more dynamic for its users. It allows for data passing to the callback
functions, as well as reading regs as if a breakpoint were to trigger
at function entry.
The main goal of this patch series was to allow kprobes to use ftrace
as an optimized probe point when a probe is placed on an ftrace nop.
With lots of help from Masami Hiramatsu, and going through lots of
iterations, we finally came up with a good solution. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar.
A x32 socket ABI fix with a -stable backport tag among other fixes.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x32: Use compat shims for {g,s}etsockopt
Revert "x86-64/efi: Use EFI to deal with platform wall clock"
x86, apic: fix broken legacy interrupts in the logical apic mode
x86, build: Globally set -fno-pic
x86, avx: don't use avx instructions with "noxsave" boot param
Recent commit 332afa656e cleaned up
a workaround that updates irq_cfg domain for legacy irq's that
are handled by the IO-APIC. This was assuming that the recent
changes in assign_irq_vector() were sufficient to remove the workaround.
But this broke couple of AMD platforms. One of them seems to be
sending interrupts to the offline cpu's, resulting in spurious
"No irq handler for vector xx (irq -1)" messages when those cpu's come online.
And the other platform seems to always send the interrupt to the last logical
CPU (cpu-7). Recent changes had an unintended side effect of using only logical
cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this
broke the legacy interrupts not getting routed to the cpu-7 on the AMD
platform, resulting in a boot hang.
For now, reintroduce the removed workaround, (essentially not allowing the
vector to change for legacy irq's when io-apic starts to handle the irq. Which
also addressed the uninteded sife effect of just specifying cpu-0 in the
IO-APIC RTE for those irq's during boot).
Reported-and-tested-by: Robert Richter <robert.richter@amd.com>
Reported-and-tested-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If PMU counter has PEBS enabled it is not enough to disable counter
on a guest entry since PEBS memory write can overshoot guest entry
and corrupt guest memory. Disabling PEBS during guest entry solves
the problem.
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120809085234.GI3341@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The Westmere-EX uncore is similar to the Nehalem-EX uncore. The
differences are:
- Westmere-EX uncore has 10 instances of Cbox. The MSRs for Cbox8
and Cbox9 in the Westmere-EX aren't contiguous with Cbox 0~7.
- The fvid field in the ZDP_CTL_FVC register in the Mbox is
different. It's 5 bits in the Nehalem-EX, 6 bits in the
Westmere-EX.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344229882-3907-3-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch includes following fixes and update:
- Only some events in the Sbox and Mbox can use the match/mask
registers, add code to check this.
- The format definitions for xbr_mm_cfg and xbr_match registers
in the Rbox are wrong, xbr_mm_cfg should use 32 bits, xbr_match
should use 64 bits.
- Cleanup the Rbox code. Compute the addresses extra registers in
the enable_event function instead of the hw_config function.
This simplifies the code in nhmex_rbox_alter_er().
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344229882-3907-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix the following section mismatch:
WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x7ad9): Section mismatch in reference from the function uncore_types_exit() to the function .init.text:uncore_type_exit()
The function uncore_types_exit() references the function __init
uncore_type_exit(). This is often because uncore_types_exit lacks a
__init annotation or the annotation of uncore_type_exit is wrong.
caused by 14371cce03 ("perf: Add generic PCI uncore PMU device
support").
Cc: Zheng Yan <zheng.z.yan@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
user level registers on sample. Registers we want to dump are specified
by sample_regs_user bitmask.
Only user level registers are dumped at the moment. Meaning the register
values of the user space context as it was before the user entered the
kernel for whatever reason (syscall, irq, exception, or a PMI happening
in userspace).
The layout of the sample_regs_user bitmap is described in
asm/perf_regs.h for archs that support register dump.
This is going to be useful to bring Dwarf CFI based stack unwinding on
top of samples.
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Dump registers ABI specification. ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This brings a new API to help the selective dump of registers on event
sampling, and its implementation for x86 arch.
Added HAVE_PERF_REGS config option to determine if the architecture
provides perf registers ABI.
The information about desired registers will be passed in u64 mask.
It's up to the architecture to map the registers into the mask bits.
For the x86 arch implementation, both 32 and 64 bit registers bits are
defined within single enum to ensure 64 bit system can provide register
dump for compat task if needed in the future.
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Added missing linux/errno.h include ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Clear AVX, AVX2 features along with clearing XSAVE feature bits,
as part of the parsing "noxsave" parameter.
Fixes the kernel boot panic with "noxsave" boot parameter.
We could have checked cpu_has_osxsave along with cpu_has_avx etc, but Peter
mentioned clearing the feature bits will be better for uses like
static_cpu_has() etc.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343755754.2041.2.camel@sbsiddha-desk.sc.intel.com
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Pull ACPI and power management fixes from Len Brown:
"A 3.3 sleep regression fixed, numa bugfix, plus some minor cleanups"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPI processor: Fix tick_broadcast_mask online/offline regression
ACPI: Only count valid srat memory structures
ACPI: Untangle a return statement for better readability
ACPI / PCI: Do not try to acquire _OSC control if that is hopeless
ACPI: delete _GTS/_BFS support
ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler'
ACPI: replace strlen("string") with sizeof("string") -1
ACPI / PM: Fix build warning in sleep.c for CONFIG_ACPI_SLEEP unset
Pull x86 fixes from Ingo Molnar:
"Various fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, kcmp: The kcmp system call can be common
arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case
x86/mce: Add quirk for instruction recovery on Sandy Bridge processors
x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h>
x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
x86, nops: Missing break resulting in incorrect selection on Intel
x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental
Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
Pull perf updates from Ingo Molnar:
"The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
fixes) now that Arnaldo is back from vacation."
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
uprobes: __replace_page() needs munlock_vma_page()
uprobes: Rename vma_address() and make it return "unsigned long"
uprobes: Fix register_for_each_vma()->vma_address() check
uprobes: Introduce vaddr_to_offset(vma, vaddr)
uprobes: Teach build_probe_list() to consider the range
uprobes: Remove insert_vm_struct()->uprobe_mmap()
uprobes: Remove copy_vma()->uprobe_mmap()
uprobes: Fix overflow in vma_address()/find_active_uprobe()
uprobes: Suppress uprobe_munmap() from mmput()
uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
uprobes: Clean up and document write_opcode()->lock_page(old_page)
uprobes: Kill write_opcode()->lock_page(new_page)
uprobes: __replace_page() should not use page_address_in_vma()
uprobes: Don't recheck vma/f_mapping in write_opcode()
perf/x86: Fix missing struct before structure name
perf/x86: Fix format definition of SNB-EP uncore QPI box
perf/x86: Make bitfield unsigned
perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
perf/x86: Add Intel Nehalem-EX uncore support
perf/x86: Fix typo in format definition of uncore PCU filter
...
Some PMUs don't provide a full register set for their sample,
specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide
'better' than regular interrupt accuracy.
In this case we use the interrupt regs as basis and over-write some
fields (typically IP) with different information.
The perf core however uses user_mode() to distinguish user/kernel
samples, user_mode() relies on regs->cs. If the interrupt skid pushed
us over a boundary the new IP might not be in the same domain as the
interrupt.
Commit ce5c1fe9a9 ("perf/x86: Fix USER/KERNEL tagging of samples")
tried to fix this by making the perf core use kernel_ip(). This
however is wrong (TM), as pointed out by Linus, since it doesn't allow
for VM86 and non-zero based segments in IA32 mode.
Therefore, provide a new helper to set the regs->ip field,
set_linear_ip(), which massages the regs into a suitable state
assuming the provided IP is in fact a linear address.
Also modify perf_instruction_pointer() and perf_callchain_user() to
deal with segments base offsets.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
i386 allmodconfig:
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_hrtimer':
arch/x86/kernel/cpu/perf_event_intel_uncore.c:728: warning: integer overflow in expression
arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_start_hrtimer':
arch/x86/kernel/cpu/perf_event_intel_uncore.c:735: warning: integer overflow in expression
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-h84qlqj02zrojmxxybzmy9hi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add function tracer based kprobe optimization support
handlers on x86. This allows kprobes to use function
tracer for probing on mcount call.
Link: http://lkml.kernel.org/r/20120605102838.27845.26317.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Updated to new port of ftrace save regs functions ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The graph caller is called by the mcount callers, which already does
the check against the function_trace_stop variable. No reason to
check it again.
Link: http://lkml.kernel.org/r/20120711195745.588538769@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The final position of the stack after saving regs and setting up
the parameters for ftrace_regs_call, is the position of the pt_regs
needed for the 4th parameter. Instead of saving it into a temporary
reg and pushing the reg, simply push the stack pointer.
Link: http://lkml.kernel.org/r/1342702344.12353.16.camel@gandalf.stny.rr.com
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cd74257b97
patched up GTS/BFS -- a feature we want to remove.
So revert it (by hand, due to conflict in sleep.h)
to prepare for GTS/BFS removal.
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There are two ways to create /sys/firmware/memmap/X sysfs:
- firmware_map_add_early
When the system starts, it is calledd from e820_reserve_resources()
- firmware_map_add_hotplug
When the memory is hot plugged, it is called from add_memory()
But these functions are called without unifying value of end argument as
below:
- end argument of firmware_map_add_early() : start + size - 1
- end argument of firmware_map_add_hogplug() : start + size
The patch unifies them to "start + size". Even if applying the patch,
/sys/firmware/memmap/X/end file content does not change.
[akpm@linux-foundation.org: clarify comments]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.
It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."
Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros
Pull scheduler changes from Ingo Molnar:
"The biggest change is a performance improvement on SMP systems:
| 4 socket 40 core + SMT Westmere box, single 30 sec tbench
| runs, higher is better:
|
| clients 1 2 4 8 16 32 64 128
|..........................................................................
| pre 30 41 118 645 3769 6214 12233 14312
| post 299 603 1211 2418 4697 6847 11606 14557
|
| A nice increase in performance.
which speedup is particularly noticeable on heavily interacting
few-tasks workloads, so the changes should help desktop-style Xorg
workloads and interactivity as well, on multi-core CPUs.
There are also cpuset suspend behavior fixes/restructuring and various
smaller tweaks."
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix race in task_group()
sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
sched: Reset loop counters if all tasks are pinned and we need to redo load balance
sched: Reorder 'struct lb_env' members to reduce its size
sched: Improve scalability via 'CPU buddies', which withstand random perturbations
cpusets: Remove/update outdated comments
cpusets, hotplug: Restructure functions that are invoked during hotplug
cpusets, hotplug: Implement cpuset tree traversal in a helper function
CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
sched/x86: Remove broken power estimation
Typically, the return value desired for the failure of a
function with an integer return value is a negative integer. In
these cases, the return value is sometimes a negative integer
and sometimes 0, due to a subsequent initialization of the
return variable within the loop.
A simplified version of the semantic match that finds this
problem is: (http://coccinelle.lip6.fr/)
//<smpl>
@r exists@
identifier ret;
position p;
constant C;
expression e1,e3,e4;
statement S;
@@
ret = -C
... when != ret = e3
when any
if@p (...) S
... when any
if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; }
... when != ret = e3
when any
*if@p (...)
{
... when != ret = e4
return ret;
}
//</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Link: http://lkml.kernel.org/r/1342284188-19176-7-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and
set both the RIPV and EIPV bits in the MCG_STATUS register to
zero for machine checks during instruction fetch. This is more
than a little counter-intuitive and means that Linux cannot
recover from these errors. Rather than insert special case code
at several places in mce.c and mce-severity.c, we pretend the
EIPV bit was set for just this case early in processing the
machine check.
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/180a06f3f357cf9f78259ae443a082b14a29535b.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We will need some of these values in mce.c. Move them to the
appropriate header file so they are available.
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the current kernel, percpu variable `vector_irq' is not always
cleared when a CPU is offlined. If the CPU that has the disabled
irqs in vector_irq is hotplugged again, __setup_vector_irq()
hits invalid irq vector and may crash.
This bug can be reproduced as following;
# echo 0 > /sys/devices/system/cpu/cpu7/online
# modprobe -r some_driver_using_interrupts # vector_irq@cpu7 uncleared
# echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash
To fix this problem, this patch clears vector_irq in
__fixup_irqs() when the CPU is offlined.
This also reverts commit f6175f5bfb, which partially fixes
this bug by clearing vector in __clear_irq_vector(). But in
environments with IOMMU IRQ remapper, it could fail because
cfg->domain doesn't contain offlined CPUs. With this patch, the
fix in __clear_irq_vector() can be reverted because every
vector_irq is already cleared in __fixup_irqs() on offlined CPUs.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120726104732.2889.19144.stgit@kvmdev
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The event control register of SNB-EP uncore QPI box has a one bit
extension at bit position 21.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1343097850-4348-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>