Commit Graph

1127 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
64c6f0c7f8 perf tools: Make --no-asm-raw the default
And add the annotation output knobs to all the tools that have
integrated annotation (top, report).

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-gnlob67mke6sji2kf4nstp7m@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:01:32 -03:00
Stephane Eranian
fbe96f29ce perf tools: Make perf.data more self-descriptive (v8)
The goal of this patch is to include more information about the host
environment into the perf.data so it is more self-descriptive. Overtime,
profiles are captured on various machines and it becomes hard to track
what was recorded, on what machine and when.

This patch provides a way to solve this by extending the perf.data file
with basic information about the host machine. To add those extensions,
we leverage the feature bits capabilities of the perf.data format.  The
change is backward compatible with existing perf.data files.

We define the following useful new extensions:
 - HEADER_HOSTNAME: the hostname
 - HEADER_OSRELEASE: the kernel release number
 - HEADER_ARCH: the hw architecture
 - HEADER_CPUDESC: generic CPU description
 - HEADER_NRCPUS: number of online/avail cpus
 - HEADER_CMDLINE: perf command line
 - HEADER_VERSION: perf version
 - HEADER_TOPOLOGY: cpu topology
 - HEADER_EVENT_DESC: full event description (attrs)
 - HEADER_CPUID: easy-to-parse low level CPU identication

The small granularity for the entries is to make it easier to extend
without breaking backward compatiblity. Many entries are provided as
ASCII strings.

Perf report/script have been modified to print the basic information as
easy-to-parse ASCII strings. Extended information about CPU and NUMA
topology may be requested with the -I option.

Thanks to David Ahern for reviewing and testing the many versions of
this patch.

 $ perf report --stdio
 # ========
 # captured on : Mon Sep 26 15:22:14 2011
 # hostname : quad
 # os release : 3.1.0-rc4-tip
 # perf version : 3.1.0-rc4
 # arch : x86_64
 # nrcpus online : 4
 # nrcpus avail : 4
 # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
 # cpuid : GenuineIntel,6,15,11
 # total memory : 8105360 kB
 # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
 # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
 # HEADER_CPU_TOPOLOGY info available, use -I to display
 # HEADER_NUMA_TOPOLOGY info available, use -I to display
 # ========
 #
 ...

 $ perf report --stdio -I
 # ========
 # captured on : Mon Sep 26 15:22:14 2011
 # hostname : quad
 # os release : 3.1.0-rc4-tip
 # perf version : 3.1.0-rc4
 # arch : x86_64
 # nrcpus online : 4
 # nrcpus avail : 4
 # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
 # cpuid : GenuineIntel,6,15,11
 # total memory : 8105360 kB
 # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date
 # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31,
 # sibling cores   : 0-3
 # sibling threads : 0
 # sibling threads : 1
 # sibling threads : 2
 # sibling threads : 3
 # node0 meminfo  : total = 8320608 kB, free = 7571024 kB
 # node0 cpu list : 0-3
 # ========
 #
 ...

Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
[ committer notes: Use --show-info in the tools as was in the docs, rename
  perf_header_fprintf_info to perf_file_section__fprintf_info, fixup
  conflict with f69b64f7 "perf: Support setting the disassembler style" ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:01:24 -03:00
Arnaldo Carvalho de Melo
be83f5ed6b perf hists browser: Update the browser.nr_entries after the timer
Previously the hist_browser dealt with a static tree of entries, now it
needs to update the nr_entries in the browser after the timer runs.

A better solution will come when moving using another thread for the
collapse_resort, etc, but for now this is ok.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-9eno2iq55sjr4iyo899buzaw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:01:11 -03:00
Arnaldo Carvalho de Melo
7d16320e23 perf hists browser: Fix TAB/UNTAB use with multiple events
When requesting multiple events, say:

  # perf top -e instructions -e cycles -e cache-misses

The first screen lets the user chose what to see first, then to switch
one can either use the left key to get back to the event menu or simply
use TAB to go the next and shift+TAB to go the prev.

When using TAB/UNTAB the call to perf_evlist__set_selected(event) was
missing, fix it.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-3xqqh3fwmt914gg43frey14y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:01:04 -03:00
Arnaldo Carvalho de Melo
724c9c9f20 perf hists browser: Don't offer symbol actions when symbols not on --sort
Removing all the entries that only apply to symbols from the menu.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-7bap0cy2fxtorlj5hgsp48m1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:00:52 -03:00
Arnaldo Carvalho de Melo
234a5375f6 perf annotate browser: Use -> to navigate on assembly lines
And add better explanations when the line isn't actionable, like non
assembly lines and on other instructions.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-375n844b5wra7lgq08ou153j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:00:42 -03:00
Stephane Eranian
e39622ceb1 perf tools: Fix broken number of samples for perf report -n
The perf report -n option was broken because it was not reporting the
correct number of samples depending on the sorting mode. By default,
samples are sorted by comm,dso,sym. That means that samples for the same
command (binary) get collapsed.

The hists__collapse_insert_entry() had a bug whereby it was aggregating
the number of events observed (periods) but not the number of samples.
Consequently, the number of samples reported could be below reality. The
percentage remained correct because based on the periods.

This patch fixes the problem by also aggregating the number of samples.
Here is an example:

$ perf report -n --stdio
    12.38%        842     pong  [kernel.kallsyms]     [k] __lock_acquire

Here pong (a ctxsw stress test), is the only program running
and thus it is the only one responsible for the lock_acquire samples.

If we change the sorting mode:

$ perf report -n --stdio --sort=sym
    12.38%       1732  [k] __lock_acquire

The actual number of samples is shown.

With the fix:

$ perf report -n --stdio
    12.38%       1732     pong  [kernel.kallsyms]     [k] __lock_acquire

Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20111003093815.GA6393@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:00:31 -03:00
Arnaldo Carvalho de Melo
34958544b3 perf annotate browser: Allow navigation to called functions
I.e. when in the annotate TUI window, if Enter is pressed over an
assembly line with a 'callq' it will try to open another TUI window with
that symbol.

This is just a proof of concept and works only on x86_64, more work is
needed to support kernel modules, userland, other arches, etc, but
should already be useful as-is.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-opyvskw5na3qdmkv8vxi3zbr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 17:00:09 -03:00
Arnaldo Carvalho de Melo
ab81f3fd35 perf top: Reuse the 'report' hist_entry/hists classes
This actually fixes several problems we had in the old 'perf top':

1. Unresolved symbols not show, limitation that came from the old
   "KernelTop" codebase, to solve it we would need to do changes
   that would make sym_entry have most of the hist_entry fields.
2. It was using the number of samples, not the sum of sample->period.

And brings the --sort code that allows us to have all the views in
'perf report', for instance:

[root@emilia ~]# perf top --sort dso
PerfTop: 5903 irqs/sec kernel:77.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs)
------------------------------------------------------------------------------

    31.59%  libcrypto.so.1.0.0
    21.55%  [kernel]
    18.57%  libpython2.6.so.1.0
     7.04%  libc-2.12.so
     6.99%  _backend_agg.so
     4.72%  sshd
     1.48%  multiarray.so
     1.39%  libfreetype.so.6.3.22
     1.37%  perf
     0.71%  libgobject-2.0.so.0.2200.5
     0.53%  [tg3]
     0.48%  libglib-2.0.so.0.2200.5
     0.44%  libstdc++.so.6.0.13
     0.40%  libcairo.so.2.10800.8
     0.38%  libm-2.12.so
     0.34%  umath.so
     0.30%  libgdk-x11-2.0.so.0.1800.9
     0.22%  libpthread-2.12.so
     0.20%  libgtk-x11-2.0.so.0.1800.9
     0.20%  librt-2.12.so
     0.15%  _path.so
     0.13%  libpango-1.0.so.0.2800.1
     0.11%  libatlas.so.3.0
     0.09%  ft2font.so
     0.09%  libpangoft2-1.0.so.0.2800.1
     0.08%  libX11.so.6.3.0
     0.07%  [vdso]
     0.06%  cyclictest
^C

All the filter lists can be used as well: --dsos, --comms, --symbols,
etc.

The 'perf report' TUI is also reused, being possible to apply all the
zoom operations, do annotation, etc.

This change will allow multiple simplifications in the symbol system as
well, that will be detailed in upcoming changesets.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xzaaldxq7zhqrrxdxjifk1mh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 16:56:44 -03:00
Arnaldo Carvalho de Melo
81cce8de94 perf browsers: Add live mode to the hists, annotate browsers
This allows passing a timer to be run periodically, which will update
the hists tree that then gers refreshed on the screen, just like the
Live mode (symbol entries, annotation) we already have in 'perf top
--tui'.

Will be used by the new hist_entry/hists based 'top' tool.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-2r44qd8oe4sagzcgoikl8qzc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 12:12:51 -03:00
Arnaldo Carvalho de Melo
1980c2ebd7 perf hists: Threaded addition and sorting of entries
By using a mutex just for inserting and rotating two hist_entry rb
trees, so that when sorting we can get the last batch of entries created
from the ring buffer, merge it with whatever we have processed so far
and show the output while new entries are being added.

The 'report' tool continues, for now, to do it without threading, but
will use this in the future to allow visualization of results in long
perf.data sessions while the entries are being processed.

The new 'top' tool will be the first user.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-9b05atsn0q6m7fqgrug8fk2i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 12:12:29 -03:00
Arnaldo Carvalho de Melo
3f2728bdb6 perf report: Add option to show total period
Just like --show-nr-samples, to help in diagnosing problems in the
tools.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-1lr7ejdjfvy2uwy2wkmatcpq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 12:12:13 -03:00
Arnaldo Carvalho de Melo
ef9dfe6ec3 perf hists: Allow limiting the number of rows and columns in fprintf
So that we can reuse hists__fprintf for in the new perf top tool.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-huazw48x05h8r9niz5cf63za@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 12:11:49 -03:00
Arnaldo Carvalho de Melo
42b28ac071 perf hists: Stop using 'self' for struct hists
Stop using this python/OOP convention, doesn't really helps. Will do
more from time to time till we get it cleaned up in all of /perf.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-91i56jwnzq9edhsj9y2y9l3b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-10-07 12:11:36 -03:00
Ingo Molnar
9d01402023 Merge commit 'v3.1-rc9' into perf/core
Merge reason: pick up latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-06 12:49:21 +02:00
Jiri Olsa
87ffef79ab perf symbols: Treat all memory maps without dso file as loaded
The stack/vdso/heap memory maps dont have any dso file.  Setting the
perf dso objects as 'loaded' for these maps, we avoid unnecessary
warnings like:

  "Failed to open [stack], continuing without symbols"

All map__find_* functions still return NULL when searching for symbols
in these maps.

Link: http://lkml.kernel.org/r/20110824131834.GA2007@jolsa.brq.redhat.com
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 17:10:48 -03:00
Andi Kleen
f69b64f73e perf: Support setting the disassembler style
Add -M option to report/annotate to pass directly to objdump.  This
allows to use -M intel for intel style disassembler syntax, which is
useful for people who are very used to the Intel syntax.

Link: http://lkml.kernel.org/r/1316122302-24306-2-git-send-email-andi@firstfloor.org
[committer note: Add missing Documentation bits, fixup conflicts with 3e6a2a7]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 17:10:00 -03:00
Arnaldo Carvalho de Melo
dcc101d1d0 perf top: Improve lost events warning
Now it warns everytime that new events are lost.

And the TUI also warns now.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-w1n168yrvrppnq6887s4u0wx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 16:41:38 -03:00
Arnaldo Carvalho de Melo
eb48900831 perf top browser: Fix up line width calculation
Fixing an artifact where the last 3 chars of a long DSO name would
remain on the screen sometimes.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-dkiakcl3z69dh1bt9uegaktv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 16:41:37 -03:00
Arnaldo Carvalho de Melo
98dfd55d80 perf symbols: Stop using 'self' in map_groups__ methods
Stop using this python/OOP convention, doesn't really helps. Will do
more from time to time till we get it cleaned up in all of /perf.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-rl9e690y60vnuyng05yp1zd3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 16:41:36 -03:00
Jiri Olsa
8e303f20f4 perf tools: Fix raw sample reading
Wrong pointer is being passed for raw data sanity checking, when parsing
sample event.

This ends up with invalid event and perf record being stuck in
__perf_session__process_events function during processing build IDs
(process_buildids function).

Following command hangs up in my setup:
	./perf record -e raw_syscalls:sys_enter ls

The fix is to use proper pointer to the raw data instead of the 'u'
union.

Reviewed-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1317308709-9474-2-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-29 16:29:53 -03:00
Arnaldo Carvalho de Melo
2b022a82a0 perf python: Add missing perf_event__parse_sample 'swapped' parm
Problem introduced in 936be50, that missed one perf_event__parse_sample
user, the python binding.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 15:38:53 -03:00
Stephane Eranian
be96ea8ffa perf symbols: Fix issue with binaries using 16-bytes buildids (v2)
Buildid can vary in size. According to the man page of ld, buildid can
be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of
20 bytes (160 bits) regardless. When dealing with md5 buildids, it would
thus read more than needed and that would cause mismatches and samples
without symbols.

This patch fixes this by taking into account the actual buildid size as
encoded int he section header. The leftover bytes are also cleared.

This second version fixes a minor issue with the memset() base position.

Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@gmail.com>
Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.com
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:37:41 -03:00
David Ahern
936be50306 perf tool: Fix endianness handling of u32 data in samples
Currently, analyzing PPC data files on x86 the cpu field is always 0 and
the tid and pid are backwards. For example, analyzing a PPC file on PPC
the pid/tid fields show:

        rsyslogd  1210/1212

and analyzing the same PPC file using an x86 perf binary shows:

        rsyslogd  1212/1210

The problem is that the swap_op method for samples is
perf_event__all64_swap which assumes all elements in the sample_data
struct are u64s. cpu, tid and pid are u32s and need to be handled
individually. Given that the swap is done before the sample is parsed,
the simplest solution is to undo the 64-bit swap of those elements when
the sample is parsed and do the proper swap.

The RAW data field is generic and perf cannot have programmatic knowledge
of how to treat that data. Instead a warning is given to the user.

Thanks to Anton Blanchard for providing a data file for a mult-CPU
PPC system so I could verify the fix for the CPU fields.

v3 -> v4:
- fixed use of WARN_ONCE

v2 -> v3:
- used WARN_ONCE for message regarding raw data
- removed struct wrapper around union
- fixed whitespace issues

v1 -> v2:
- added a union for undoing the byte-swap on u64 and redoing swap on
  u32's to address compiler errors (see git commit 65014ab3)

Cc: Anton Blanchard <anton@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:37:27 -03:00
Anton Blanchard
6bb8f311a8 perf sort: Fix symbol sort output by separating unresolved samples by type
I took a profile that suggested 60% of total CPU time was in the
hypervisor:

...
    60.20%  [H] 0x33d43c
     4.43%  [k] ._spin_lock_irqsave
     1.07%  [k] ._spin_lock

Using perf stat to get the user/kernel/hypervisor breakdown contradicted
this.

The problem is we merge all unresolved samples into the one unknown
bucket. If add a comparison by sample type to sort__sym_cmp we get the
real picture:

...
    57.11%  [.] 0x80fbf63c
     4.43%  [k] ._spin_lock_irqsave
     1.07%  [k] ._spin_lock
     0.65%  [H] 0x33d43c

So it was almost all userspace, not hypervisor as the initial profile
suggested.

I found another issue while adding this. Symbol sorting sometimes shows
multiple entries for the unknown bucket:

...
    16.65%  [.] 0x6cd3a8
     7.25%  [.] 0x422460
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.82%  [.] 0x532908
     2.64%  [.] 0x36b538
     0.94%  [H] 0x8000000000e132a4
     0.82%  [H] 0x800000000000e8b0

This happens because we aren't consistent with our sorting. On
one hand we check to see if both symbols match and for two unresolved
samples sym is NULL so we match:

        if (left->ms.sym == right->ms.sym)
                return 0;

On the other hand we use sample IP for unresolved samples when
comparing against a symbol:

       ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
       ip_r = right->ms.sym ? right->ms.sym->start : right->ip;

This means unresolved samples end up spread across the rbtree and we
can't merge them all.

If we use cmp_null all unresolved samples will end up in the one bucket
and the output makes more sense:

...
    39.12%  [.] 0x36b538
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.26%  [H] 0x800000000000e8b0

Acked-by: Eric B Munson <emunson@mgebm.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ian Munsie <imunsie@au1.ibm.com>
Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:37:17 -03:00
Anton Blanchard
6a0e55d85b perf symbols: Synthesize anonymous mmap events
perf_event__synthesize_mmap_events does not create anonymous mmap events
even though the kernel does. As a result an already running application
with dynamically created code will not get profiled - all samples end up
in the unknown bucket.

This patch skips any entries with '[' in the name to avoid adding events
for special regions (eg the vsyscall page). All other executable mmaps
are assumed to be anonymous and an event is synthesized.

Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:37:06 -03:00
David Ahern
764e16a30a perf record: Create events initially disabled and enable after init
perf-record currently creates events enabled. When doing a system wide
collection (-a arg) this causes data collection for perf's
initialization activities -- eg., perf_event__synthesize_threads().

For some events (e.g., context switch S/W event or tracepoints like
syscalls) perf's initialization causes a lot of events to be captured
frequently generating "Check IO/CPU overload!" warnings on larger
systems (e.g., 2 socket, quad core, hyperthreading).

perf's initialization phase can be skipped by creating events
disabled and then enabling them once the initialization is done.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:36:53 -03:00
Anton Blanchard
694bf407b0 perf symbols: Add some heuristics for choosing the best duplicate symbol
Try and pick the best symbol based on a few heuristics:

-  Prefer a non weak symbol over a weak one
-  Prefer a global symbol over a non global one
-  Prefer a symbol with less underscores (idea taken from kallsyms.c)
-  If all else fails, choose the symbol with the longest name

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:36:36 -03:00
Anton Blanchard
3187790860 perf symbols: Preserve symbol scope when parsing /proc/kallsyms
kallsyms__parse capitalises the symbol type, so every symbol is marked
global. Remove this and fix symbol_type__is_a to handle both local and
global symbols.

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:36:25 -03:00
Anton Blanchard
3f5a42722b perf symbols: /proc/kallsyms does not sort module symbols
kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end
of the previous symbol to the start of the current one.

Unfortunately module symbols are not sorted, eg:

ffffffffa0081f30 t e1000_clean_rx_irq   [e1000e]
ffffffffa00817a0 t e1000_alloc_rx_buffers       [e1000e]

Some symbols end up with a negative length and others have a length
larger than they should. This results in confusing perf output.

We already have a function to fixup the end of zero length symbols so
use that instead.

Cc: Eric B Munson <emunson@mgebm.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:36:12 -03:00
Anton Blanchard
adb0918463 perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files
64bit PowerPC debuginfo files have an empty function descriptor section.
I hit a SEGV when perf tried to use this section for symbol resolution.

To fix this we need to check the section is valid and we can do this by
checking for type SHT_PROGBITS.

Cc: <stable@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric B Munson <emunson@mgebm.net>
Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:35:57 -03:00
Masami Hiramatsu
f66fedcb72 perf probe: Fix regression of variable finder
Fix to call convert_variable() if previous call does not fail.

To call convert_variable, it ensures "ret" is 0. However, since
"ret" has the return value of synthesize_perf_probe_arg() which
always returns positive value if it succeeded, perf probe doesn't
call convert_variable(). This will cause a SEGV when we add an
event with arguments.

This has to be fixed as it ensures "ret" is greater than 0
(or not negative).

This regression has been introduced by my previous patch, f182e3e1.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23 14:33:19 -03:00
Ingo Molnar
51887c8230 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Conflicts:
	tools/perf/builtin-stat.c
2011-08-18 21:58:46 +02:00
Stephane Eranian
4aa9015f8b perf stat: Add -o and --append options
This patch adds an option (-o) to save the output of perf stat into a
file. You could do this with perf record but not with perf stat.
Instead, you had to fiddle with stderr to save the counts into a
separate file.

The patch also adds the --append option so that results can be
concatenated into a single file across runs. Each run of the tool is
clearly separated by a comment line starting with a hash mark. The -A
option of perf record is already used by perf stat, so we only add a
long option.

$ perf stat -o res.txt date
$ cat res.txt

 Performance counter stats for 'date':

          0.791306 task-clock                #    0.668 CPUs utilized
                 2 context-switches          #    0.003 M/sec
                 0 CPU-migrations            #    0.000 M/sec
               197 page-faults               #    0.249 M/sec
           1878143 cycles                    #    2.373 GHz
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
           1083367 instructions              #    0.58  insns per cycle
            193027 branches                  #  243.935 M/sec
              9014 branch-misses             #    4.67% of all branches

       0.001184746 seconds time elapsed

The option can be combined with -x to make the output file much easier
to parse.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110815202233.GA18535@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:46:13 -03:00
Stephane Eranian
3e6a2a7f3b perf annotate: Make output more readable
This patch adds two new options to perf annotate:
	- --no-asm-raw : Do not display raw instruction encodings
	- --no-source  : Do not interleave source code with assembly code

We believe those options make the output of annotate more readable.

Systematically displaying source can make it hard to follow code and
especially optimized code.

Raw encodings are not useful in most cases.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110517153207.GA9834@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
[committer note: Use the 'no-' option inverting logic]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:38:21 -03:00
Josh Boyer
195bcbf507 perf tools: Fix build against newer glibc
Upstream glibc commit 295e904 added a definition for __attribute_const__
to cdefs.h.  This causes the following error when building perf:

util/include/linux/compiler.h:8:0: error: "__attribute_const__"
redefined [-Werror] /usr/include/sys/cdefs.h:226:0: note: this is the
location of the previous definition

Wrap __attribute_const__ in #ifndef as we do for __always_inline.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110818113720.GL2227@zod.bos.redhat.com
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:24:53 -03:00
Stephane Eranian
777d1d71db perf tools: Fix error handling of unknown events
There was a problem with the parse_events() code not printing the
correct event name when an event was unknown and starting with an 'r'.
The source of the problem was the way raw notation was parsed.

Without the patch:
	$ perf stat -e retired_foo
	invalid event modifier: 'tired_foo'

With the patch:
	$ perf stat -e retired_foo
	invalid or unsupported event: 'retired_foo'

This also covers the case where the name of the event was not printed at
all when perf was linked with libpfm4.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110723021043.GA20178@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:21:13 -03:00
Stephane Eranian
cc2d86b04d perf evlist: Fix missing event name init for default event
When no event is given to perf record, perf top, a default event is
initialized (cycles). However, perf_evlist__add_default() was not
setting the symbolic name for the event. Perf top worked simply because
it was reconstructing the name from the event code. But it should not
have to do this. This patch initializes the evsel->name field properly.

This second version improves the code flow on the non error path.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110607161936.GA8163@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
[committer note: Use perf_evsel__delete() instead of plain free()]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:20:31 -03:00
Stephane Eranian
77e57297b4 perf list: Fix exit value
This patch fixes an issue with the exit value of perf list:

$ perf list; echo $?
129

perf list returns an error exit code even though there is no error.

There was a stray exit(129) in print_events(). This patch removes this
exit().

$ perf list; echo $?
0

$ perf list hw sw
  cpu-cycles OR cycles                               [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]

  cpu-clock                                          [Software event]
  task-clock                                         [Software event]
  page-faults OR faults                              [Software event]
  minor-faults                                       [Software event]
  major-faults                                       [Software event]
  context-switches OR cs                             [Software event]
  cpu-migrations OR migrations                       [Software event]
  alignment-faults                                   [Software event]
  emulation-faults                                   [Software event]
$ echo $?
0

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110523123917.GA31060@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-18 07:19:15 -03:00
Masami Hiramatsu
3f4460a28f perf probe: Filter out redundant inline-instances
With gcc4.6, some instances of concrete inlined function looks redundant
and broken, because it appears inside of a concrete instance and its
call_file and call_line are same as the original abstruct's decl_file
and decl_line respectively.

e.g.
 [  d1aa]    subprogram
             external             (flag) Yes
             name                 (strp) "add_timer"
             decl_file            (data1) 2		;here is original
             decl_line            (data2) 847		;line and file
             prototyped           (flag) Yes
             inline               (data1) inlined (1)
             sibling              (ref4) [  d1c6]
...
 [ 11d84]    subprogram
             abstract_origin      (ref4) [  d1aa]	; concrete instance
             low_pc               (addr) .text+0x000000000000246f <add_timer>
             high_pc              (addr) .text+0x000000000000248b <mod_timer_pending>
             frame_base           (block1)               [   0] call_frame_cfa
             sibling              (ref4) [ 11dd9]
 [ 11d9f]      formal_parameter
               abstract_origin      (ref4) [  d1b9]
               location             (data4) location list [  701b]
 [ 11da8]      inlined_subroutine
               abstract_origin      (ref4) [  d1aa]	; redundant instance
               low_pc               (addr) .text+0x000000000000247e <add_timer+0xf>
               high_pc              (addr) .text+0x0000000000002480 <add_timer+0x11>
               call_file            (data1) 2		; call line and file
               call_line            (data2) 847		; are same as above

Those redundant instances leads unwilling results;

e.g. find probe points inside of functions even if we specify
a function entry as below;

$ perf probe -V add_timer
Available variables at add_timer
        @<add_timer+0>
                struct timer_list*      timer
        @<add_timer+15>
                (No matched variables)

So, this filters out those redundant instances based on call-site and
decl-site information.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110317.19900.59525.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:34:35 -03:00
Masami Hiramatsu
db0d2c6420 perf probe: Search concrete out-of-line instances
gcc 4.6 generates a concrete out-of-line instance when there is a
function which is implicitly inlined somewhere but also has its own
instance. The concrete out-of-line instance means that it has an
abstract origin of the function which is referred by not only
inlined-subroutines but also a concrete subprogram.

Since current dwarf_func_inline_instances() can find only instances of
inlined-subroutines, this introduces new die_walk_instances() to find
both of subprogram and inlined-subroutines.

e.g. without this,
Available variables at sched_group_rt_period
        @<cpu_rt_period_read_uint+9>
                struct task_group*      tg

perf probe failed to find actual subprogram instance of
sched_group_rt_period().

With this,

Available variables at sched_group_rt_period
        @<cpu_rt_period_read_uint+9>
                struct task_group*      tg
        @<sched_group_rt_period+0>
                struct task_group*      tg

Now it found the sched_group_rt_period() itself.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110311.19900.63997.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:32:10 -03:00
Masami Hiramatsu
f182e3e13c perf probe: Avoid searching variables in intermediate scopes
Fix variable searching logic to search one in inner than local scope or
global(CU) scope. In the other words, skip searching in intermediate
scopes.

e.g., in the following code,

int var1;

void inline infunc(int i)
{
    i++;   <--- [A]
}

void func(void)
{
   int var1, var2;
   infunc(var2);
}

At [A], "var1" should point the global variable "var1", however, if user
mis-typed as "var2", variable search should be failed. However, current
logic searches variable infunc() scope, global scope, and then func()
scope. Thus, it can find "var2" variable in func() scope. This may not
be what user expects.

So, it would better not search outer scopes except outermost (compile
unit) scope which contains only global variables, when it failed to find
given variable in local scope.

E.g.

Without this:
$ perf probe -V pre_schedule --externs > without.vars

With this:
$ perf probe -V pre_schedule --externs > with.vars

Check the diff:
$ diff without.vars with.vars
88d87
<               int     cpu
133d131
<               long unsigned int*      switch_count

These vars are actually in the scope of schedule(), the caller of
pre_schedule().

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110305.19900.94374.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:29:34 -03:00
Masami Hiramatsu
221d061182 perf probe: Fix to search local variables in appropriate scope
Fix perf probe to search local variables in appropriate local inlined
function scope. For example, pre_schedule() has only 2 local variables,
as below;

$ perf probe -L pre_schedule
<pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
      0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev)
         {
      2         if (prev->sched_class->pre_schedule)
      3                 prev->sched_class->pre_schedule(rq, prev);
         }

However, current perf probe shows 4 local variables on pre_schedule(),
because it searches variables in the caller(schedule()) scope.

$ perf probe -V pre_schedule
Available variables at pre_schedule
        @<schedule+445>
                int     cpu
                long unsigned int*      switch_count
                struct rq*      rq
                struct task_struct*     prev

This patch fixes this issue by searching variables in the local scope of
the instance of inlined function. Here is the result.

$ perf probe -V pre_schedule
Available variables at pre_schedule
        @<schedule+445>
                struct rq*      rq
                struct task_struct*     prev

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110259.19900.85664.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:28:45 -03:00
Masami Hiramatsu
36c0c588b9 perf probe: Fix to walk all inline instances
Fix line-range collector to walk all instances of inlined function,
because some execution paths can be optimized out depending on the
function argument of instances.

E.g.)
inline_func (arg) {
	if (arg)
		do_something;
	else
		do_another;
}

func_A() {
	inline_func(1)
}

func_B() {
	inline_func(0)
}

In this case, func_A may have only do_something code and func_B may have
only do_another.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110247.19900.93702.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:25:38 -03:00
Masami Hiramatsu
b0e9cb2802 perf probe: Fix to search nested inlined functions in CU
Fix perf probe to walk through the lines of all nested inlined function
call sites and declared lines when a whole CU is passed to the line
walker.

The die_walk_lines() can have two different type of DIEs, subprogram (or
inlined-subroutine) DIE and CU DIE.

If a caller passes a subprogram DIE, this means that the walker walk on
lines of given subprogram. In this case, it just needs to search on
direct children of DIE tree for finding call-site information of inlined
function which directly called from given subprogram.

On the other hand, if a caller passes a CU DIE to the walker, this means
that the walker have to walk on all lines in the source files included
in given CU DIE. In this case, it has to search whole DIE trees of all
subprograms to find the call-site information of all nested inlined
functions.

Without this patch:

$ perf probe --line kernel/cpu.c:151-157
</home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>

         static int cpu_notify(unsigned long val, void *v)
         {
    154         return __cpu_notify(val, v, -1, NULL);
         }

With this:
$ perf probe --line kernel/cpu.c:151-157
</home/mhiramat/ksrc/linux-2.6/kernel/cpu.c:151>

    152  static int cpu_notify(unsigned long val, void *v)
         {
    154         return __cpu_notify(val, v, -1, NULL);
         }

As you can see, --line option with source line range shows the declared
lines as probe-able.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110241.19900.34994.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:23:39 -03:00
Masami Hiramatsu
a128405c6b perf probe: Fix line walker to check CU correctly
Fix line walker to check whether a given DIE is CU or not.

Actually this function accepts CU, subprogram and inlined_subroutine
DIEs.

Without this fix, perf probe always fails to analyze lines on inlined
functions;

$ perf probe -L pre_schedule
Debuginfo analysis failed. (-2)
  Error: Failed to show lines. (-2)

This fixes that bug, as below.

$ perf probe -L pre_schedule
<pre_schedule@/home/mhiramat/ksrc/linux-2.6/kernel/sched.c:0>
      0  static inline void pre_schedule(struct rq *rq, struct task_struct *prev
         {
      2         if (prev->sched_class->pre_schedule)
      3                 prev->sched_class->pre_schedule(rq, prev);
         }

         /* rq->lock is NOT held, but preemption is disabled */

Changes from v1:
 - Update against current tip tree.(Fix dwarf-aux.c)

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110235.19900.20614.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:22:46 -03:00
Masami Hiramatsu
8afa2a707d perf probe: Fix a memory leak for scopes array
Fix a memory leak for scopes array when it finds a variable in the
global scope.

Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20110811110229.19900.63019.stgit@fedora15
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 09:21:15 -03:00
Vasiliy Kulikov
e9b52ef222 perf: fix temporary file ownership check
A file in /tmp/ might be a symlink, so lstat() should be used instead of
stat().

Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110811205537.GA22864@albatros
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-12 08:28:17 -03:00
Jiri Olsa
f57b05ed53 perf report: Use properly build_id kernel binaries
If we bring the recorded perf data together with kernel binary from another
machine using:

	on server A:
	perf archive

	on server B:
	tar xjvf perf.data.tar.bz2 -C ~/.debug

the build_id kernel dso is not properly recognized during the "perf report"
command on server B.

The reason is, that build_id dsos are added during the session initialization,
while the kernel maps are created during the sample event processing.

The machine__create_kernel_maps functions ends up creating new dso object for
kernel, but it does not check if we already have one added by build_id
processing.

Also the build_id reading ABI quirk added in commit:

 - commit b25114817a
   perf build-id: Add quirk to deal with perf.data file format breakage

populates the "struct build_id_event::pid" with 0, which
is later interpreted as DEFAULT_GUEST_KERNEL_ID.

This is not always correct, so it's better to guess the pid
value based on the "struct build_id_event::header::misc" value.

- Tested with data generated on x86 kernel version v2.6.34
  and reported back on x86_64 current kernel.
- Not tested for guest kernel case.

Note the problem stays for PERF_RECORD_MMAP events recorded by perf that
does not use proper pid (HOST_KERNEL_ID/DEFAULT_GUEST_KERNEL_ID). They are
misinterpreted within the current perf code. Probably there's not much we
can do about that.

Cc: Avi Kivity <avi@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20110601194346.GB1934@jolsa.brq.redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-11 08:58:03 -03:00
Arnaldo Carvalho de Melo
fc8ed7be73 perf top browser: Remove spurious helpline update
It will be immediately replaced in perf_top_browser__run.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-q7e2jzb44elqpkvdllk94x0i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-08-10 12:42:26 -03:00