linux/tools/perf/util
Andi Kleen 8b7bad58ef perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.

This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.

Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:

tcall.c:

volatile a = 10000, b = 100000, c;

__attribute__((noinline)) f2()
{
	c = a / b;
}

__attribute__((noinline)) f1()
{
	f2();
	f2();
}
main()
{
	int i;
	for (i = 0; i < 1000000; i++)
		f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
    54.91%  tcall.c:6  [.] f2                      tcall
            |
            |--65.53%-- f2 tcall.c:5
            |          |
            |          |--70.83%-- f1 tcall.c:11
            |          |          f1 tcall.c:10
            |          |          main tcall.c:18
            |          |          main tcall.c:18
            |          |          main tcall.c:17
            |          |          main tcall.c:17
            |          |          f1 tcall.c:13
            |          |          f1 tcall.c:13
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:12
            |          |          f1 tcall.c:12
            |          |          f2 tcall.c:7
            |          |          f2 tcall.c:5
            |          |          f1 tcall.c:11
            |          |
            |           --29.17%-- f1 tcall.c:12
            |                     f1 tcall.c:12
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:11
            |                     f1 tcall.c:10
            |                     main tcall.c:18
            |                     main tcall.c:18
            |                     main tcall.c:17
            |                     main tcall.c:17
            |                     f1 tcall.c:13
            |                     f1 tcall.c:13
            |                     f2 tcall.c:7
            |                     f2 tcall.c:5
            |                     f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record or anywhere
else.

This adds the basic code to report:

- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.

The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.

- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:

- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
    -b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
    patch. Skip initial entries in callchain. Minor cleanups.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-12-01 20:00:31 -03:00
..
include perf tools: Add test_and_set_bit function 2014-11-06 17:42:13 -03:00
scripting-engines perf script python: Removing event cache as it's no longer needed 2014-11-06 17:44:06 -03:00
abspath.c
alias.c perf tools: Introduce zfree 2013-12-27 15:17:00 -03:00
annotate.c perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
annotate.h perf annotate: Support source line numbers in annotate 2014-11-19 12:33:48 -03:00
bitmap.c
build-id.c perf build-id: Move disable_buildid_cache() to util/build-id.c 2014-11-19 12:33:46 -03:00
build-id.h perf build-id: Move disable_buildid_cache() to util/build-id.c 2014-11-19 12:33:46 -03:00
cache.h perf tools: Add perf_config_u64 function 2014-08-12 12:03:00 -03:00
callchain.c perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
callchain.h perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
cgroup.c perf evlist: Introduce evlist__for_each() & friends 2014-01-13 10:06:25 -03:00
cgroup.h
cloexec.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
cloexec.h perf tools: Enable close-on-exec flag on perf file descriptor 2014-07-18 09:09:34 +02:00
color.c perf tools: Make __hpp__fmt() receive an additional len argument 2014-08-12 12:03:05 -03:00
color.h perf tools: Make __hpp__fmt() receive an additional len argument 2014-08-12 12:03:05 -03:00
comm.c perf tools: Identify which comms are from exec 2014-08-13 19:23:08 -03:00
comm.h perf tools: Add facility to export data in database-friendly way 2014-10-29 10:32:49 -02:00
config.c perf tools: Fix line number in the config file error message 2014-09-26 12:45:23 -03:00
cpumap.c perf tools: Use cpu/possible instead of cpu/kernel_max 2014-04-22 17:39:16 +02:00
cpumap.h perf tools: Allow ability to map cpus to nodes easily 2014-04-22 17:39:12 +02:00
ctype.c
data.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
data.h perf tools: Add perf_data_file__write interface 2013-12-02 09:22:46 -03:00
db-export.c perf tools: Defer export of comms that were not 'set' 2014-11-03 18:11:59 -03:00
db-export.h perf tools: Defer export of comms that were not 'set' 2014-11-03 18:11:59 -03:00
debug.c perf tools: Allow to force redirect pr_debug to stderr. 2014-11-24 18:03:48 -03:00
debug.h perf: Use strerror_r instead of strerror 2014-08-15 10:54:29 -03:00
dso.c perf tools: Add gzip decompression support for kernel module 2014-11-05 10:11:26 -03:00
dso.h perf symbols: Preparation for compressed kernel module support 2014-11-04 10:15:53 -03:00
dwarf-aux.c perf probe: Fix perf probe to find correct variable DIE 2014-06-04 14:49:20 +02:00
dwarf-aux.h
environment.c
event.c perf tools: Add id index 2014-10-29 11:24:47 -02:00
event.h perf tools: Add core support for sampling intr machine state regs 2014-11-16 11:41:59 +01:00
evlist.c perf evlist: Do not poll events that use the system_wide flag 2014-11-19 12:33:48 -03:00
evlist.h perf evlist: Add missing 'struct option' forward declaration 2014-10-17 12:16:00 -03:00
evsel.c perf tools: Remove perf_evsel__read interface 2014-12-01 20:00:30 -03:00
evsel.h perf stat: Add support for per-pkg counters 2014-12-01 20:00:30 -03:00
exec_cmd.c
exec_cmd.h
find-vdso-map.c perf tools: Build programs to copy 32-bit compatibility 2014-10-29 10:32:48 -02:00
generate-cmdlist.sh
header.c perf build-id: Move disable_buildid_cache() to util/build-id.c 2014-11-19 12:33:46 -03:00
header.h perf build-id: Move build-id related functions to util/build-id.c 2014-11-05 10:14:07 -03:00
help.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
help.h
hist.c perf tools: Remove hists from evsel 2014-10-14 17:32:52 -03:00
hist.h perf tools: Remove hists from evsel 2014-10-14 17:32:52 -03:00
hweight.c
intlist.c
intlist.h
kvm-stat.h perf kvm stat report: Save pid string in opts.target.pid 2014-09-17 17:08:07 -03:00
levenshtein.c
levenshtein.h
machine.c perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
machine.h perf tools: Add facility to export data in database-friendly way 2014-10-29 10:32:49 -02:00
map.c perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
map.h perf tools: Set thread->mg.machine in all places 2014-10-29 10:32:46 -02:00
ordered-events.c perf session: Add option to copy events when queueing 2014-10-15 17:39:03 -03:00
ordered-events.h perf session: Add option to copy events when queueing 2014-10-15 17:39:03 -03:00
pager.c perf tools: Add cat as fallback pager 2014-05-21 11:48:33 +02:00
parse-events.c perf tools: Add snapshot format file parsing 2014-11-24 18:03:51 -03:00
parse-events.h perf tools: Parse the pmu event prefix and suffix 2014-10-15 16:05:01 -03:00
parse-events.l perf tools: Add support to new style format of kernel PMU event 2014-10-15 16:05:45 -03:00
parse-events.y perf tools: Add support to new style format of kernel PMU event 2014-10-15 16:05:45 -03:00
parse-options.c perf tools: Add support for exclusive option 2014-10-29 10:32:47 -02:00
parse-options.h perf tools: Add support for exclusive option 2014-10-29 10:32:47 -02:00
path.c
perf_regs.c perf tools: Cache register accesses for unwind processing 2014-06-12 16:53:19 +02:00
perf_regs.h perf tools: Cache register accesses for unwind processing 2014-06-12 16:53:19 +02:00
PERF-VERSION-GEN perf tools: Fix version when building out of tree 2013-11-07 10:40:47 -03:00
pmu.c perf tools: Add snapshot format file parsing 2014-11-24 18:03:51 -03:00
pmu.h perf tools: Add snapshot format file parsing 2014-11-24 18:03:51 -03:00
pmu.l
pmu.y
probe-event.c perf probe: Add --quiet option to suppress output result message 2014-10-29 10:32:49 -02:00
probe-event.h perf probe: Do not access kallsyms when analyzing user binaries 2014-09-17 18:01:14 -03:00
probe-finder.c perf probe: Do not use dwfl_module_addrsym if dwarf_diename finds symbol name 2014-09-17 18:01:43 -03:00
probe-finder.h perf probe: Support distro-style debuginfo for uprobe 2014-02-18 09:38:44 -03:00
pstack.c perf tools: Move pr_* debug macros into debug object 2014-07-17 12:58:39 -03:00
pstack.h perf tools: Finish the removal of 'self' arguments 2013-11-05 15:32:36 -03:00
python-ext-sources perf tools: Move fs.* to lib/api/fs/ 2014-02-18 09:34:49 -03:00
python.c tools lib api: Adopt fdarray class from perf's evlist 2014-09-25 16:46:55 -03:00
quote.c
quote.h
rblist.c
rblist.h
record.c perf evlist: Add perf_evlist__set_tracking_event() 2014-08-13 19:21:32 -03:00
run-command.c perf util: Replace strerror with strerror_r for thread-safety 2014-08-15 10:58:35 -03:00
run-command.h
session.c perf tools: Add core support for sampling intr machine state regs 2014-11-16 11:41:59 +01:00
session.h perf session: Add perf_session__deliver_synth_event() 2014-10-29 11:36:15 -02:00
setup.py tools/: Convert to new topic libraries 2013-12-16 16:03:27 -03:00
sigchain.c
sigchain.h
sort.c perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
sort.h perf tools: Add +field argument support for --field option 2014-08-24 08:11:19 -03:00
srcline.c perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
stat.c
stat.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
strbuf.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
strbuf.h
strfilter.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
strfilter.h perf tools: Finish the removal of 'self' arguments 2013-11-05 15:32:36 -03:00
string.c Revert "perf tools: Default to cpu// for events v5" 2014-10-15 16:04:33 -03:00
strlist.c perf tools: Fix build error due to zfree() cast 2014-01-15 15:10:04 -03:00
strlist.h
svghelper.c perf timechart: Implement IO mode 2014-07-10 00:22:54 +02:00
svghelper.h perf timechart: Implement IO mode 2014-07-10 00:22:54 +02:00
symbol-elf.c perf symbols: Move bfd_demangle stubbing to its only user 2014-11-24 18:03:47 -03:00
symbol-minimal.c perf symbols: Fallback to kallsyms when using the minimal 'ELF' loader 2014-11-19 12:33:46 -03:00
symbol.c perf record: Do not save pathname in ./debug/.build-id directory for vmlinux 2014-11-05 10:14:08 -03:00
symbol.h perf callchain: Support handling complete branch stacks as histograms 2014-12-01 20:00:31 -03:00
target.c perf record: Make per-cpu mmaps the default. 2013-11-27 14:58:36 -03:00
target.h perf target: Move the checking of which map function to call into function. 2013-12-04 13:46:37 -03:00
thread_map.c perf thread_map: Create dummy constructor out of open coded equivalent 2014-10-14 17:32:52 -03:00
thread_map.h perf thread_map: Create dummy constructor out of open coded equivalent 2014-10-14 17:32:52 -03:00
thread-stack.c perf tools: Enhance the thread stack to output call/return data 2014-11-03 17:43:56 -03:00
thread-stack.h perf tools: Enhance the thread stack to output call/return data 2014-11-03 17:43:56 -03:00
thread.c perf tools: Only override the default :tid comm entry 2014-11-19 12:37:26 -03:00
thread.h perf tools: Add a thread stack for synthesizing call chains 2014-11-03 17:10:59 -03:00
tool.h perf tools: Add id index 2014-10-29 11:24:47 -02:00
top.c perf tools: Rename 'perf_record_opts' to 'record_opts 2013-12-19 14:43:45 -03:00
top.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
trace-event-info.c perf tools: Move pr_* debug macros into debug object 2014-07-17 12:58:39 -03:00
trace-event-parse.c perf tools: Fix memory leak in event_format__print function 2014-02-18 09:34:47 -03:00
trace-event-read.c perf tools: Remove needless getopt.h includes 2014-07-17 12:59:00 -03:00
trace-event-scripting.c perf scripting: Add 'flush' callback to scripting API 2014-08-22 13:12:11 -03:00
trace-event.c tools lib traceevent: Make plugin unload function receive pevent 2014-01-15 15:10:40 -03:00
trace-event.h perf scripting: Add 'flush' callback to scripting API 2014-08-22 13:12:11 -03:00
tsc.c perf tools: Move rdtsc() function 2014-07-23 11:48:11 -03:00
tsc.h perf tools: Move rdtsc() function 2014-07-23 11:48:11 -03:00
unwind-libdw.c perf callchains: Use thread->mg->machine 2014-10-29 10:32:46 -02:00
unwind-libdw.h perf tools: Add libdw DWARF post unwind support 2014-02-24 09:29:36 -03:00
unwind-libunwind.c perf callchains: Use thread->mg->machine 2014-10-29 10:32:46 -02:00
unwind.h perf callchains: Use thread->mg->machine 2014-10-29 10:32:46 -02:00
usage.c
util.c perf callchain: Move callchain_param to util object in to fix python test 2014-10-03 09:39:48 -03:00
util.h perf callchain: Make get_srcline fall back to sym+offset 2014-11-24 18:03:47 -03:00
values.c perf tools: Use zfree to help detect use after free bugs 2013-12-27 17:08:19 -03:00
values.h tools: Consolidate types.h 2014-05-01 21:22:39 +02:00
vdso.c perf tools: Do not attempt to run perf-read-vdso32 if it wasn't built 2014-10-29 10:32:48 -02:00
vdso.h perf tools: Add support for 32-bit compatibility VDSOs 2014-10-29 10:32:48 -02:00
wrapper.c
xyarray.c
xyarray.h
zlib.c perf tools: Add gzip decompression support for kernel module 2014-11-05 10:11:26 -03:00