linux/tools/perf/util
Kan Liang 384b60557b perf tools: Construct LBR call chain
LBR call stack only has user-space callchains. It is output in the
PERF_SAMPLE_BRANCH_STACK data format. For kernel callchains, it's
still in the form of PERF_SAMPLE_CALLCHAIN.

The perf tool has to handle both data sources to construct a
complete callstack.

For the "perf report -D" option, both lbr and fp information will be
displayed.

A new call chain recording option "lbr" is introduced into the perf
tool for LBR call stack. The user can use --call-graph lbr to get
the call stack information from hardware.

Here are some examples.

When profiling bc(1) on Fedora 19:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd

If enabling LBR, perf report output looks like:

    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start

If using FP, perf report output looks like:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd

    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:

3458145275743 0x2fd750 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x2): 9748/9748: 0x408ea8 period: 609644 addr: 0
... LBR call chain: nr:8
.....  0: fffffffffffffe00
.....  1: 0000000000408e50
.....  2: 000000000040a458
.....  3: 000000000040562e
.....  4: 0000000000408590
.....  5: 00000000004022c0
.....  6: 00000000004015dd
.....  7: 0000003d1cc21b43
... FP chain: nr:2
.....  0: fffffffffffffe00
.....  1: 0000000000408ea8
 ... thread: bc:9748
 ...... dso: /usr/bin/bc

The LBR call stack has the following known limitations:

 - Zero length calls are not filtered out by the hardware

 - Exception handing such as setjmp/longjmp will have calls/returns not
   match

 - Pushing different return address onto the stack will have
   calls/returns not match

 - If callstack is deeper than the LBR, only the last entries are
   captured

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1420482185-29830-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:18 +01:00
..
include tools: Remove bitops/hweight usage of bits in tools/perf 2015-01-16 17:49:29 -03:00
scripting-engines perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
abspath.c
alias.c perf tools: Introduce zfree 2013-12-27 15:17:00 -03:00
annotate.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
annotate.h perf tools: Fix segfault for symbol annotation on TUI 2015-01-16 17:49:29 -03:00
bitmap.c
build-id.c perf buildid-cache: Remove extra debugdir variables 2014-12-09 09:14:34 -03:00
build-id.h perf build-id: Move disable_buildid_cache() to util/build-id.c 2014-11-19 12:33:46 -03:00
cache.h perf tools: Elide strlcpy warning with uclibc 2015-01-16 17:49:29 -03:00
callchain.c perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
callchain.h perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
cgroup.c perf evlist: Introduce evlist__for_each() & friends 2014-01-13 10:06:25 -03:00
cgroup.h
cloexec.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
cloexec.h perf tools: Enable close-on-exec flag on perf file descriptor 2014-07-18 09:09:34 +02:00
color.c perf tools: Remove some unused functions from color.c 2015-01-21 13:24:32 -03:00
color.h perf tools: Remove some unused functions from color.c 2015-01-21 13:24:32 -03:00
comm.c perf tools: Identify which comms are from exec 2014-08-13 19:23:08 -03:00
comm.h perf tools: Add facility to export data in database-friendly way 2014-10-29 10:32:49 -02:00
config.c perf tools: Add --buildid-dir option to set cache directory 2014-12-09 09:14:35 -03:00
cpumap.c perf tools: Use cpu/possible instead of cpu/kernel_max 2014-04-22 17:39:16 +02:00
cpumap.h perf tools: Allow ability to map cpus to nodes easily 2014-04-22 17:39:12 +02:00
ctype.c
data.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
data.h perf tools: Add perf_data_file__write interface 2013-12-02 09:22:46 -03:00
db-export.c perf tools: Defer export of comms that were not 'set' 2014-11-03 18:11:59 -03:00
db-export.h perf tools: Defer export of comms that were not 'set' 2014-11-03 18:11:59 -03:00
debug.c perf tools: Allow to force redirect pr_debug to stderr. 2014-11-24 18:03:48 -03:00
debug.h perf: Use strerror_r instead of strerror 2014-08-15 10:54:29 -03:00
dso.c perf symbols: Convert lseek + read to pread 2015-01-29 17:02:01 -03:00
dso.h perf callchain: Cache eh/debug frame offset for dwarf unwind 2015-01-29 16:20:42 -03:00
dwarf-aux.c perf probe: Fix perf probe to find correct variable DIE 2014-06-04 14:49:20 +02:00
dwarf-aux.h perf probe: Fix to find line information for probe list 2013-10-04 15:16:05 -03:00
environment.c
event.c perf tools: Add id index 2014-10-29 11:24:47 -02:00
event.h Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-09 21:18:06 -08:00
evlist.c tools lib fs: Adopt debugfs open strerrno method 2015-01-22 10:34:22 -03:00
evlist.h tools lib fs: Adopt debugfs open strerrno method 2015-01-22 10:34:22 -03:00
evsel.c perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
evsel.h perf tools: Construct LBR call chain 2015-02-18 17:16:18 +01:00
exec_cmd.c
exec_cmd.h
find-vdso-map.c perf tools: Build programs to copy 32-bit compatibility 2014-10-29 10:32:48 -02:00
generate-cmdlist.sh tools/perf: Standardize feature support define names to: HAVE_{FEATURE}_SUPPORT 2013-10-09 08:48:28 +02:00
header.c perf header: Set header version correctly 2015-01-29 16:53:11 -03:00
header.h perf build-id: Move build-id related functions to util/build-id.c 2014-11-05 10:14:07 -03:00
help.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
help.h
hist.c perf tools: Pass struct perf_hpp_fmt to its callbacks 2015-01-21 13:24:34 -03:00
hist.h perf tools: Pass struct perf_hpp_fmt to its callbacks 2015-01-21 13:24:34 -03:00
intlist.c perf util: Add findnew method to intlist 2013-10-14 10:28:48 -03:00
intlist.h perf util: Add findnew method to intlist 2013-10-14 10:28:48 -03:00
kvm-stat.h perf kvm stat report: Save pid string in opts.target.pid 2014-09-17 17:08:07 -03:00
levenshtein.c
levenshtein.h
machine.c perf tools: Construct LBR call chain 2015-02-18 17:16:18 +01:00
machine.h perf tools: Add facility to export data in database-friendly way 2014-10-29 10:32:49 -02:00
map.c perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
map.h perf symbols: Introduce 'for' method to iterate over the symbols with a given name 2015-01-21 10:06:15 -03:00
ordered-events.c perf session: Add option to copy events when queueing 2014-10-15 17:39:03 -03:00
ordered-events.h perf session: Add option to copy events when queueing 2014-10-15 17:39:03 -03:00
pager.c perf tools: Add cat as fallback pager 2014-05-21 11:48:33 +02:00
parse-events.c Merge branch 'perf/hw_breakpoints' into perf/core 2015-01-28 15:48:59 +01:00
parse-events.h Merge branch 'perf/hw_breakpoints' into perf/core 2015-01-28 15:48:59 +01:00
parse-events.l perf tools: allow user to specify hardware breakpoint bp_len 2014-12-03 15:14:29 +01:00
parse-events.y perf tools: allow user to specify hardware breakpoint bp_len 2014-12-03 15:14:29 +01:00
parse-options.c perf tools: Allow use of an exclusive option more than once 2015-01-21 13:24:33 -03:00
parse-options.h perf tools: Add support for exclusive option 2014-10-29 10:32:47 -02:00
path.c tools/perf: Turn strlcpy() into a __weak function 2013-10-09 08:48:49 +02:00
perf_regs.c perf tools: Cache register accesses for unwind processing 2014-06-12 16:53:19 +02:00
perf_regs.h perf tools: Cache register accesses for unwind processing 2014-06-12 16:53:19 +02:00
PERF-VERSION-GEN perf tools: Fix version when building out of tree 2013-11-07 10:40:47 -03:00
pmu.c perf tools: Extend format_alias() to include event parameters 2015-01-21 13:24:33 -03:00
pmu.h perf tools: Add snapshot format file parsing 2014-11-24 18:03:51 -03:00
pmu.l
pmu.y perf tools: Fix build with bison 2.3 and older. 2013-02-14 16:12:34 -03:00
probe-event.c perf probe: Fix probing kretprobes 2015-01-21 10:06:24 -03:00
probe-event.h perf probe: Do not access kallsyms when analyzing user binaries 2014-09-17 18:01:14 -03:00
probe-finder.c perf probe: Fix crash in dwarf_getcfi_elf 2015-01-02 12:44:01 -03:00
probe-finder.h perf probe: Support distro-style debuginfo for uprobe 2014-02-18 09:38:44 -03:00
pstack.c perf tools: Move pr_* debug macros into debug object 2014-07-17 12:58:39 -03:00
pstack.h perf tools: Finish the removal of 'self' arguments 2013-11-05 15:32:36 -03:00
python-ext-sources tools: Remove bitops/hweight usage of bits in tools/perf 2015-01-16 17:49:29 -03:00
python.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
quote.c
quote.h
rblist.c perf util: Add findnew method to intlist 2013-10-14 10:28:48 -03:00
rblist.h perf util: Add findnew method to intlist 2013-10-14 10:28:48 -03:00
record.c perf tools: Use sysctl__read_int instead of ad-hoc copies 2014-12-11 17:53:04 -03:00
run-command.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
run-command.h
session.c perf tools: Construct LBR call chain 2015-02-18 17:16:18 +01:00
session.h perf tools: Do not use __perf_session__process_events() directly 2015-01-29 16:36:32 -03:00
setup.py tools/: Convert to new topic libraries 2013-12-16 16:03:27 -03:00
sigchain.c
sigchain.h
sort.c perf tools: Pass struct perf_hpp_fmt to its callbacks 2015-01-21 13:24:34 -03:00
sort.h perf tools: Add +field argument support for --field option 2014-08-24 08:11:19 -03:00
srcline.c perf: Fix building warning on ARM 32 2014-12-19 13:09:43 +01:00
stat.c perf stats: Add max and min stats 2013-08-07 17:35:26 -03:00
stat.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
strbuf.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
strbuf.h
strfilter.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
strfilter.h perf tools: Finish the removal of 'self' arguments 2013-11-05 15:32:36 -03:00
string.c Revert "perf tools: Default to cpu// for events v5" 2014-10-15 16:04:33 -03:00
strlist.c perf tools: Fix build error due to zfree() cast 2014-01-15 15:10:04 -03:00
strlist.h perf tools: Stop using 'self' in strlist 2013-01-25 12:49:28 -03:00
svghelper.c perf timechart: Implement IO mode 2014-07-10 00:22:54 +02:00
svghelper.h perf timechart: Implement IO mode 2014-07-10 00:22:54 +02:00
symbol-elf.c perf symbols: Support to read compressed module from build-id cache 2015-01-29 16:56:54 -03:00
symbol-minimal.c perf symbols: Fix use after free in filename__read_build_id 2014-12-17 11:58:17 -03:00
symbol.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
symbol.h perf symbols: Introduce method to iterate symbols ordered by name 2015-01-21 10:05:54 -03:00
target.c perf record: Make per-cpu mmaps the default. 2013-11-27 14:58:36 -03:00
target.h perf target: Move the checking of which map function to call into function. 2013-12-04 13:46:37 -03:00
thread_map.c perf thread_map: Create dummy constructor out of open coded equivalent 2014-10-14 17:32:52 -03:00
thread_map.h perf thread_map: Create dummy constructor out of open coded equivalent 2014-10-14 17:32:52 -03:00
thread-stack.c perf tools: Enhance the thread stack to output call/return data 2014-11-03 17:43:56 -03:00
thread-stack.h perf tools: Enhance the thread stack to output call/return data 2014-11-03 17:43:56 -03:00
thread.c perf tools: Only override the default :tid comm entry 2014-11-19 12:37:26 -03:00
thread.h perf tools: Add a thread stack for synthesizing call chains 2014-11-03 17:10:59 -03:00
tool.h perf tools: Add id index 2014-10-29 11:24:47 -02:00
top.c perf tools: Rename 'perf_record_opts' to 'record_opts 2013-12-19 14:43:45 -03:00
top.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
trace-event-info.c perf tools: Move pr_* debug macros into debug object 2014-07-17 12:58:39 -03:00
trace-event-parse.c perf tools: Fix memory leak in event_format__print function 2014-02-18 09:34:47 -03:00
trace-event-read.c perf tools: Remove needless getopt.h includes 2014-07-17 12:59:00 -03:00
trace-event-scripting.c perf scripting: Add 'flush' callback to scripting API 2014-08-22 13:12:11 -03:00
trace-event.c tools lib traceevent: Make plugin unload function receive pevent 2014-01-15 15:10:40 -03:00
trace-event.h perf scripting: Add 'flush' callback to scripting API 2014-08-22 13:12:11 -03:00
tsc.c perf tools: Move rdtsc() function 2014-07-23 11:48:11 -03:00
tsc.h perf tools: Move rdtsc() function 2014-07-23 11:48:11 -03:00
unwind-libdw.c perf callchains: Use thread->mg->machine 2014-10-29 10:32:46 -02:00
unwind-libdw.h perf tools: Add libdw DWARF post unwind support 2014-02-24 09:29:36 -03:00
unwind-libunwind.c perf callchain: Cache eh/debug frame offset for dwarf unwind 2015-01-29 16:20:42 -03:00
unwind.h perf callchains: Use thread->mg->machine 2014-10-29 10:32:46 -02:00
usage.c
util.c perf tools: Use sysctl__read_int instead of ad-hoc copies 2014-12-11 17:53:04 -03:00
util.h perf: Fix building warning on ARM 32 2014-12-19 13:09:43 +01:00
values.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
values.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
vdso.c perf tools: Do not attempt to run perf-read-vdso32 if it wasn't built 2014-10-29 10:32:48 -02:00
vdso.h perf tools: Add support for 32-bit compatibility VDSOs 2014-10-29 10:32:48 -02:00
wrapper.c perf tools: Use __maybe_used for unused variables 2012-09-11 12:19:15 -03:00
xyarray.c
xyarray.h
zlib.c perf tools: Add gzip decompression support for kernel module 2014-11-05 10:11:26 -03:00