Fix the debug print statements in these tests where they reference
si_codes and in particular __SI_FAULT. __SI_FAULT is a kernel
internal value and should never be seen by userspace.
While I am in there also fix si_code_str. si_codes are an enumeration
there are not a bitmap so == and not & is the apropriate operation to
test for an si_code.
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 5f23f6d082 ("x86/pkeys: Add self-tests")
Fixes: e754aedc26 ("x86/mpx, selftests: Add MPX self test")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).
For example:
perf record -g -j any,save_type
perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
Change log
v6: Remove the branch_type_str() since it's moved to branch.c.
v5: Rewrite the branch info print code in util/callchain.c.
v4: Comparing to previous version, the major changes are:
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-8-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show the branch type statistics at the end of perf report --stdio.
For example:
perf report --stdio
COND_FWD: 28.5%
COND_BWD: 9.4%
CROSS_4K: 0.7%
CROSS_2M: 14.1%
COND: 37.9%
UNCOND: 0.2%
IND: 6.7%
CALL: 26.5%
RET: 28.7%
SYSRET: 0.0%
The branch types are:
COND_FWD: conditional forward
COND_BWD: conditional backward
COND: conditional branch
UNCOND: unconditional branch
IND: indirect
CALL: function call
IND_CALL: indirect function call
RET: function return
SYSCALL: syscall
SYSRET: syscall return
COND_CALL: conditional function call
COND_RET: conditional function return
CROSS_4K and CROSS_2M:
They are the metrics checking for branches cross 4K or 2MB pages.
It's an approximate computing. We don't know if the area is 4K or
2MB, so always compute both.
To make the output simple, if a branch crosses 2M area, CROSS_4K
will not be incremented.
Change log
v7: Since the common branch type definitions are changed, some
tags/strings are updated accordingly.
v6: Remove branch_type_stat_display() since it's moved to branch.c.
v5: Remove the unnecessary sort__mode checking in
hist_iter__branch_callback().
v4: Comparing to previous version, the major changes are:
Add the computing of JCC forward/JCC backward and cross page checking
by using the from and to addresses.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Create new util/branch.c and util/branch.h to contain the common branch
functions. Such as:
branch_type_count(): Count the numbers of branch types
branch_type_name() : Return the name of branch type
branch_type_stat_display(): Display branch type statistics info
branch_type_str(): Construct the branch type string.
The branch type is saved in branch_flags.
Change log:
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
v7: Since the common branch type name is changed (e.g. JCC->COND),
this patch is performed the modification accordingly.
v6: Move that multiline conditional code inside {} brackets.
Move branch_type_stat_display() from builtin-report.c to
branch.c.
Move branch_type_str() from callchain.c to branch.c.
v5: It's a new patch in v5 patch series.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-6-git-send-email-yao.jin@linux.intel.com
[ Don't use 'index' and 'stat' as names for variables, it shadows global decls in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch info such as predicted/cycles/... are printed at the
callchain entries.
For example: perf report --branch-history --no-children --stdio
--1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
main div.c:44 (predicted:52.4% cycles:1)
main div.c:42 (cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
But the current code is difficult to maintain and extend. This patch
refactors the code for easy maintenance.
Change log:
v6: 1. Put the multiline condition code into {} brackets in
counts_str_build()
2. Keep the original display order, that is:
predicted, abort, cycles, iterations
v5: It's a new patch in v5 patch series.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-5-git-send-email-yao.jin@linux.intel.com
[ Don't use 'index' as a name for a variable, it shadows a globa decl in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is often useful to know the branch types while analyzing branch data.
For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later not
be available and even the binary is available but user has to take some
time. It is very useful for user to check it directly in perf report.
Perf already has support for disassembling the branch instruction to get
the x86 branch type.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
The patch only defines a minimum but most common set of branch types.
PERF_BR_UNKNOWN : unknown
PERF_BR_COND :conditional
PERF_BR_UNCOND : unconditional
PERF_BR_IND : indirect
PERF_BR_CALL : function call
PERF_BR_IND_CALL : indirect function call
PERF_BR_RET : function return
PERF_BR_SYSCALL : syscall
PERF_BR_SYSRET : syscall return
PERF_BR_COND_CALL : conditional function call
PERF_BR_COND_RET : conditional function return
The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.
Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.
Change log:
v10: Not changed.
v9: Not changed.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
No other change.
v7: Just keep the most common branch types.
Others are removed.
v6: Not changed.
v5: Not changed. The v5 patch series just change the userspace.
v4: Comparing to previous version, the major changes are:
1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
computed later in userspace.
2. Remove the "cross" field in perf_branch_entry. The cross page
computing will be done later in userspace.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1500379995-6449-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add header record types to pipe-mode, reusing the functions
used in file-mode and leveraging the new struct feat_fd.
For alignment, check that synthesized events don't exceed
pagesize.
Add the perf_event__synthesize_feature event call back to
process the new header records.
Before this patch:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
...
After this patch:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
# ========
# captured on: Mon May 22 16:33:43 2017
# ========
#
# hostname : my_hostname
# os release : 4.11.0-dbx-up_perf
# perf version : 4.11.rc6.g6277c80
# arch : x86_64
# nrcpus online : 72
# nrcpus avail : 72
# cpudesc : Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz
# cpuid : GenuineIntel,6,63,2
# total memory : 263457192 kB
# cmdline : /root/perf record -o - -e cycles -c 100000 sleep 1
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_bts = 6, uncore_imc_4 = 22, uncore_sbox_1 = 47, uncore_cbox_5 = 33, uncore_ha_0 = 16, uncore_cbox
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
...
Support added for the subcommands: report, inject, annotate and script.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170718042549.145161-16-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are three FEAT_OP* macros:
- FEAT_OPA: for features without process record.
- FEAT_OPP: for features with process record.
- FEAT_OPF: like FEAT_OPP but to show only if show_full_info flags
is set.
To add pipe-mode headers we need yet another variation of the macros
(one to specify whether a feature generates an auxiliar record).
Instead, we redefine macros so that:
- show_full_info is specified as an argument (to remove the
FEAT_OPF variation) and,
- it always sets "process" handler (to remove the FEAT_OPA variation).
Individual process handlers can be NULLed individually.
This allows to define two variations only:
- FEAT_OPR: synthesizes auxiliar event record.
- FEAT_OPN: doesn't synthesize an auxiliar event record.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170718042549.145161-14-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf trace' tool is suppressing args set to zero, with the
exception of string tables (strarrays), which are kinda like enums, i.e.
we have maps to go from numbers to strings.
But the 'cmd' fcntl arg requires more specialized treatment, as its
value will regulate if the next fcntl syscall arg, 'arg', should be
ignored (not used) and also how to format the syscall return (fd, file
flags, etc), so add a 'show_zero" bool to struct syscall_arg_fmt, to
regulate this more explicitely.
Will be used in a following patch with fcntl, here is just the
mechanism.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-all738jctxets8ffyizp5lzo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of having syscall_fmt.{arg_scnprintf,arg_parm}, introduce
struct syscall_arg_fmt and have these two, paving the way for more
state to change the formatting algorithms.
For instance, in the 'fcntl' 'cmd' case it is better not to suppress
it when being zero, showing instead its name "DUPFD".
We had that in an ad-hoc way just for strarrays, but with more involved
cases like fcntl, that can't be done with just a strarray, we'll need
a ".show_zero" arg in the 'cmd' syscall_arg_fmt.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ch06o2j72zbjx5xww4qp67au@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It can return NULL, in which case we should bail out and remove the
directory created with mkdtemp(), which is stored in the "__tempdir"
variable, not in "tempdir".
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 8e5dc84835 ("perf test: Add a test case for SDT event")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the user doesn't specify an event with -e/--event, 'perf record'
will use as a default the "cycles" event with the highest level of
precision in perf_event_attr.precise_ip, but --no-samples, if present,
is incompatible with precise_ip != 0, so use the newly introduced
__perf_event__add_default(precise = false) to fix that:
Before:
# perf record -n usleep 1
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles:ppp).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
#
After:
# perf record -n usleep 1
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data ]
[root@jouet /]# perf evlist -v
cycles: size: 112, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
[root@jouet /]#
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-q991fw6s6rhjvrd5ye4t7qom@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>