d20e834e13
6730 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
2c46f54249 |
perf metric: Rename expr__add_id() to expr__add_val()
Rename expr__add_id() to expr__add_val() so we can use expr__add_id() to actually add just the id without any value in following changes. There's no functional change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200712132634.138901-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
3de2bf9dfb |
perf probe: Warn if the target function is a GNU indirect function
Warn if the probe target function is a GNU indirect function (GNU_IFUNC) because it may not be what the user wants to probe. The GNU indirect function ( https://sourceware.org/glibc/wiki/GNU_IFUNC ) is the dynamic symbol solved at runtime. An IFUNC function is a selector which is invoked from the ELF loader, but the symbol address of the function which will be modified by the IFUNC is the same as the IFUNC in the symbol table. This can confuse users trying to probe such functions. For example, memcpy is an IFUNC. probe_libc:memcpy (on __new_memcpy_ifunc@x86_64/multiarch/memcpy.c in /usr/lib64/libc-2.30.so) the probe is put on an IFUNC. perf 1742 [000] 26201.715632: probe_libc:memcpy: (7fdaa53824c0) 7fdaa53824c0 __new_memcpy_ifunc+0x0 (inlined) 7fdaa5d4a980 elf_machine_rela+0x6c0 (inlined) 7fdaa5d4a980 elf_dynamic_do_Rela+0x6c0 (inlined) 7fdaa5d4a980 _dl_relocate_object+0x6c0 (/usr/lib64/ld-2.30.so) 7fdaa5d42155 dl_main+0x1cc5 (/usr/lib64/ld-2.30.so) 7fdaa5d5831a _dl_sysdep_start+0x54a (/usr/lib64/ld-2.30.so) 7fdaa5d3ffeb _dl_start_final+0x25b (inlined) 7fdaa5d3ffeb _dl_start+0x25b (/usr/lib64/ld-2.30.so) 7fdaa5d3f117 .annobin_rtld.c+0x7 (inlined) And the event is invoked from the ELF loader instead of the target program's main code. Moreover, at this moment, we can not probe on the function which will be selected by the IFUNC, because it is determined at runtime. But uprobe will be prepared before running the target binary. Thus, I decided to warn user when 'perf probe' detects that the probe point is on an GNU IFUNC symbol. Someone who wants to probe an IFUNC symbol to debug the IFUNC function can ignore this warning. Committer notes: I.e., this warning will be emitted if the probe point is an IFUNC: "Warning: The probe function (%s) is a GNU indirect function.\n" "Consider identifying the final function used at run time and set the probe directly on that.\n" Complete set of steps: # readelf -sW /lib64/libc-2.29.so | grep IFUNC | tail 22196: 0000000000109a80 183 IFUNC GLOBAL DEFAULT 14 __memcpy_chk 22214: 00000000000b7d90 191 IFUNC GLOBAL DEFAULT 14 __gettimeofday 22336: 000000000008b690 60 IFUNC GLOBAL DEFAULT 14 memchr 22350: 000000000008b9b0 89 IFUNC GLOBAL DEFAULT 14 __stpcpy 22420: 000000000008bb10 76 IFUNC GLOBAL DEFAULT 14 __strcasecmp_l 22582: 000000000008a970 60 IFUNC GLOBAL DEFAULT 14 strlen 22585: 00000000000a54d0 92 IFUNC WEAK DEFAULT 14 wmemset 22600: 000000000010b030 92 IFUNC GLOBAL DEFAULT 14 __wmemset_chk 22618: 000000000008b8a0 183 IFUNC GLOBAL DEFAULT 14 __mempcpy 22675: 000000000008ba70 76 IFUNC WEAK DEFAULT 14 strcasecmp # # perf probe -x /lib64/libc-2.29.so strlen Warning: The probe function (strlen) is a GNU indirect function. Consider identifying the final function used at run time and set the probe directly on that. Added new event: probe_libc:strlen (on strlen in /usr/lib64/libc-2.29.so) You can now use it in all perf tools, such as: perf record -e probe_libc:strlen -aR sleep 1 # Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/159438669349.62703.5978345670436126948.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
12d572e785 |
perf probe: Fix memory leakage when the probe point is not found
Fix the memory leakage in debuginfo__find_trace_events() when the probe
point is not found in the debuginfo. If there is no probe point found in
the debuginfo, debuginfo__find_probes() will NOT return -ENOENT, but 0.
Thus the caller of debuginfo__find_probes() must check the tf.ntevs and
release the allocated memory for the array of struct probe_trace_event.
The current code releases the memory only if the debuginfo__find_probes()
hits an error but not checks tf.ntevs. In the result, the memory allocated
on *tevs are not released if tf.ntevs == 0.
This fixes the memory leakage by checking tf.ntevs == 0 in addition to
ret < 0.
Fixes:
|
||
|
11fd3eb874 |
perf probe: Fix wrong variable warning when the probe point is not found
Fix a wrong "variable not found" warning when the probe point is not
found in the debuginfo.
Since the debuginfo__find_probes() can return 0 even if it does not find
given probe point in the debuginfo, fill_empty_trace_arg() can be called
with tf.ntevs == 0 and it can emit a wrong warning. To fix this, reject
ntevs == 0 in fill_empty_trace_arg().
E.g. without this patch;
# perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
Failed to find the location of the '%di' variable at this address.
Perhaps it has been optimized out.
Use -V with the --range option to show '%di' location range.
Added new events:
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
You can now use it in all perf tools, such as:
perf record -e probe_libc:memcpy -aR sleep 1
With this;
# perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di"
Added new events:
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di)
You can now use it in all perf tools, such as:
perf record -e probe_libc:memcpy -aR sleep 1
Fixes:
|
||
|
26bbf45fc8 |
perf probe: Avoid setting probes on the same address for the same event
There is a case that several same-name symbols points to the same address. In that case, 'perf probe' returns an error. E.g. # perf probe -x /lib64/libc-2.30.so -v -a "memcpy arg1=%di" probe-definition(0): memcpy arg1=%di symbol:memcpy file:(null) line:0 offset:0 return:0 lazy:(null) parsing arg: arg1=%di into name:arg1 %di 1 arguments symbol:setjmp file:(null) line:0 offset:0 return:0 lazy:(null) symbol:longjmp file:(null) line:0 offset:0 return:0 lazy:(null) symbol:longjmp_target file:(null) line:0 offset:0 return:0 lazy:(null) symbol:lll_lock_wait_private file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_arena_max file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_arena_test file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_tunable_tcache_max_bytes file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_tunable_tcache_count file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_tunable_tcache_unsorted_limit file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_trim_threshold file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_top_pad file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_mmap_threshold file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_mmap_max file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_perturb file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_mxfast file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_heap_new file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_arena_reuse_free_list file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_arena_reuse file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_arena_reuse_wait file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_arena_new file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_arena_retry file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_sbrk_less file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_heap_free file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_heap_less file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_tcache_double_free file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_heap_more file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_sbrk_more file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_malloc_retry file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_memalign_retry file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt_free_dyn_thresholds file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_realloc_retry file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_calloc_retry file:(null) line:0 offset:0 return:0 lazy:(null) symbol:memory_mallopt file:(null) line:0 offset:0 return:0 lazy:(null) Open Debuginfo file: /usr/lib/debug/usr/lib64/libc-2.30.so.debug Try to find probe point from debuginfo. Opening /sys/kernel/debug/tracing//README write=0 Failed to find the location of the '%di' variable at this address. Perhaps it has been optimized out. Use -V with the --range option to show '%di' location range. An error occurred in debuginfo analysis (-2). Trying to use symbols. Opening /sys/kernel/debug/tracing//uprobe_events write=1 Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di Writing event: p:probe_libc/memcpy /usr/lib64/libc-2.30.so:0x914c0 arg1=%di Failed to write event: File exists Error: Failed to add events. Reason: File exists (Code: -17) You can see that perf tried to write completely the same probe definition twice, which caused an error. To fix this issue, check the symbol list and drop duplicated symbols (which has the same symbol name and address) from it. With this patch: # perf probe -x /lib64/libc-2.30.so -a "memcpy arg1=%di" Failed to find the location of the '%di' variable at this address. Perhaps it has been optimized out. Use -V with the --range option to show '%di' location range. Added new events: probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) probe_libc:memcpy (on memcpy in /usr/lib64/libc-2.30.so with arg1=%di) You can now use it in all perf tools, such as: perf record -e probe_libc:memcpy -aR sleep 1 Committer notes: Fix this build error on 32-bit arches by using PRIx64 for symbol->start, that is an u64: In file included from util/probe-event.c:27: util/probe-event.c: In function 'find_probe_trace_events_from_map': util/probe-event.c:2978:14: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] pr_debug("Found duplicated symbol %s @ %lx\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ util/debug.h:17:21: note: in definition of macro 'pr_fmt' #define pr_fmt(fmt) fmt ^~~ util/probe-event.c:2978:5: note: in expansion of macro 'pr_debug' pr_debug("Found duplicated symbol %s @ %lx\n", ^~~~~~~~ Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lore.kernel.org/lkml/159438666401.62703.15196394835032087840.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
5f634c8e40 |
perf parse-events: Report BPF errors
Setting the parse_events_error directly doesn't increment num_errors causing the error message not to be displayed. Use the parse_events__handle_error function that sets num_errors and handle multiple errors. Committer notes: Ian provided a before/after upon request: Before: $ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available event After: $ /tmp/perf/perf record -e /tmp/perf/util/parse-events.o event syntax error: '/tmp/perf/util/parse-events.o' \___ Failed to load /tmp/perf/util/parse-events.o: BPF object format invalid (add -v to see detail) Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: bpf@vger.kernel.org Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20200707211449.3868944-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
7eeb9855c1 |
perf script: Show text poke address symbol
It is generally more useful to show the symbol with an address. In this case, the print function requires the 'machine' which means changing callers to provide it as a parameter. It is optional because most events do not need it and the callers that matter can provide it. Committer notes: Made 'union perf_event' continue to be the first parameter to the perf_event__fprintf() and perf_event__fprintf_text_poke() events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-16-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
b22f90aaea |
perf intel-pt: Add support for text poke events
Select text poke events when available and the kernel is being traced. Process text poke events to invalidate entries in Intel PT's instruction cache. Example: The example requires kernel config: CONFIG_PROC_SYSCTL=y CONFIG_SCHED_DEBUG=y CONFIG_SCHEDSTATS=y Before: # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M & # cat /proc/sys/kernel/sched_schedstats 0 # echo 1 > /proc/sys/kernel/sched_schedstats # cat /proc/sys/kernel/sched_schedstats 1 # echo 0 > /proc/sys/kernel/sched_schedstats # cat /proc/sys/kernel/sched_schedstats 0 # kill %1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.341 MB perf.data.before ] [1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M # perf script -i perf.data.before --itrace=e >/dev/null Warning: 474 instruction trace errors After: # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M & # cat /proc/sys/kernel/sched_schedstats 0 # echo 1 > /proc/sys/kernel/sched_schedstats # cat /proc/sys/kernel/sched_schedstats 1 # echo 0 > /proc/sys/kernel/sched_schedstats # cat /proc/sys/kernel/sched_schedstats 0 # kill %1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.646 MB perf.data.after ] [1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M # perf script -i perf.data.after --itrace=e >/dev/null Example: The example requires kernel config: # CONFIG_FUNCTION_TRACER is not set Before: # perf record --kcore -m,64M -o t1 -a -e intel_pt//k & # perf probe __schedule Added new event: probe:__schedule (on __schedule) You can now use it in all perf tools, such as: perf record -e probe:__schedule -aR sleep 1 # perf record -e probe:__schedule -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ] # perf probe -d probe:__schedule Removed event: probe:__schedule # kill %1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 41.268 MB t1 ] [1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k # perf script -i t1 --itrace=e >/dev/null Warning: 207 instruction trace errors After: # perf record --kcore -m,64M -o t1 -a -e intel_pt//k & # perf probe __schedule Added new event: probe:__schedule (on __schedule) You can now use it in all perf tools, such as: perf record -e probe:__schedule -aR sleep 1 # perf record -e probe:__schedule -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ] # perf probe -d probe:__schedule Removed event: probe:__schedule # kill %1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 39.978 MB t1 ] [1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k # perf script -i t1 --itrace=e >/dev/null # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL' 6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page 6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6 6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page 6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106 6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5 7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5 6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0 6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0 Example: The example requires kernel config: CONFIG_FUNCTION_TRACER=y Before: # perf record --kcore -m,64M -o t1 -a -e intel_pt//k & # perf probe __kmalloc Added new event: probe:__kmalloc (on __kmalloc) You can now use it in all perf tools, such as: perf record -e probe:__kmalloc -aR sleep 1 # perf record -e probe:__kmalloc -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ] # perf probe -d probe:__kmalloc Removed event: probe:__kmalloc # kill %1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 43.850 MB t1 ] [1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k # perf script -i t1 --itrace=e >/dev/null Warning: 8 instruction trace errors After: # perf record --kcore -m,64M -o t1 -a -e intel_pt//k & # perf probe __kmalloc Added new event: probe:__kmalloc (on __kmalloc) You can now use it in all perf tools, such as: perf record -e probe:__kmalloc -aR sleep 1 # perf record -e probe:__kmalloc -aR sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ] # perf probe -d probe:__kmalloc Removed event: probe:__kmalloc # kill %1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 41.442 MB t1 ] [1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k # perf script -i t1 --itrace=e >/dev/null # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL' 5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline 5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415 5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5 5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5 5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5 5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5 5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5 7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5 7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5 7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5 7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5 7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-14-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
789e241998 |
perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map backed only by memory, which will be populated as necessary by text poke events. Committer notes: From the patch: OOL stands for "Out of line" code such as kprobe-replaced instructions or optimized kprobes or ftrace trampolines. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-13-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
246eba8e90 |
perf tools: Add support for PERF_RECORD_TEXT_POKE
Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event is processed, then the kernel dso data cache is updated with the poked bytes. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
b39730a663 |
perf annotate: Fix non-null terminated buffer returned by readlink()
Our local MSAN (Memory Sanitizer) build of perf throws a warning that comes from the "dso__disassemble_filename" function in "tools/perf/util/annotate.c" when running perf record. The warning stems from the call to readlink, in which "build_id_path" was being read into "linkname". Since readlink does not null terminate, an uninitialized memory access would later occur when "linkname" is passed into the strstr function. This is simply fixed by null-terminating "linkname" after the call to readlink. To reproduce this warning, build perf by running: $ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins" (Additionally, llvm might have to be installed and clang might have to be specified as the compiler - export CC=/usr/bin/clang) Then running: tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio Please see the cover letter for why false positive warnings may be generated. Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Drayton <mbd@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
c8f6ae1fb2 |
perf inject jit: Remove //anon mmap events
**perf-<pid>.map and jit-<pid>.dump designs: When a JIT generates code to be executed, it must allocate memory and mark it executable using an mmap call. *** perf-<pid>.map design The perf-<pid>.map assumes that any sample recorded in an anonymous memory page is JIT code. It then tries to resolve the symbol name by looking at the process' perf-<pid>.map. *** jit-<pid>.dump design The jit-<pid>.dump mechanism takes a different approach. It requires a JIT to write a `<path>/jit-<pid>.dump` file. This file must also be mmapped so that perf inject -jit can find the file. The JIT must also add JIT_CODE_LOAD records for any functions it generates. The records are timestamped using a clock which can be correlated to the perf record clock. After perf record, the `perf inject -jit` pass parses the recording looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it parses it and for each JIT_CODE_LOAD record: * creates an elf file `<path>/jitted-<pid>-<code_index>.so * injects a new mmap record mapping the new elf file into the process. *** Coexistence design The kernel and perf support both of these mechanisms. We need to make sure perf works on an app supporting either or both of these mechanisms. Both designs rely on mmap records to determine how to resolve an ip address. The mmap records of both techniques by definition overlap. When the JIT compiles a method, it must: * allocate memory (mmap) * add execution privilege (mprotect or mmap. either will generate an mmap event form the kernel to perf) * compile code into memory * add a function record to perf-<pid>.map and/or jit-<pid>.dump Because the jit-<pid>.dump mechanism supports greater capabilities, perf prefers the symbols from jit-<pid>.dump. It implements this based on timestamp ordering of events. There is an implicit ASSUMPTION that the JIT_CODE_LOAD record timestamp will be after the // anon mmap event that was generated during memory allocation or adding the execution privilege setting. *** Problems with the ASSUMPTION The ASSUMPTION made in the Coexistence design section above is violated in the following scenario. *** Scenario While a JIT is jitting code it will eventually need to commit more pages and change these pages to executable permissions. Typically the JIT will want these collocated to minimize branch displacements. The kernel will coalesce these anonymous mapping with identical permissions before sending an MMAP event for the new pages. The address range of the new mmap will not be just the most recently mmap pages. It will include the entire coalesced mmap region. See mm/mmap.c unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) { ... /* * Can we just expand an old mapping? */ ... perf_event_mmap(vma); ... } *** Symptoms The coalesced // anon mmap event will be timestamped after the JIT_CODE_LOAD records. This means it will be used as the most recent mapping for that entire address range. For remaining events it will look at the inferior perf-<pid>.map for symbols. If both mechanisms are supported, the symbol will appear twice with different module names. This causes weird behavior in reporting. If only jit-<pid>.dump is supported, the symbol will no longer be resolved. ** Implemented solution This patch solves the issue by removing // anon mmap events for any process which has a valid jit-<pid>.dump file. It tracks on a per process basis to handle the case where some running apps support jit-<pid>.dump, but some only support perf-<pid>.map. It adds new assumptions: * // anon mmap events are only required for perf-<pid>.map support. * An app that uses jit-<pid>.dump, no longer needs perf-<pid>.map support. It assumes that any perf-<pid>.map info is inferior. *** Details Use thread->priv to store whether a jitdump file has been processed During "perf inject --jit", discard "//anon*" mmap events for any pid which has sucessfully processed a jitdump file. ** Testing: // jitdump case perf record <app with jitdump> perf inject --jit --input perf.data --output perfjit.data // verify mmap "//anon" events present initially perf script --input perf.data --show-mmap-events | grep '//anon' // verify mmap "//anon" events removed perf script --input perfjit.data --show-mmap-events | grep '//anon' // no jitdump case perf record <app without jitdump> perf inject --jit --input perf.data --output perfjit.data // verify mmap "//anon" events present initially perf script --input perf.data --show-mmap-events | grep '//anon' // verify mmap "//anon" events not removed perf script --input perfjit.data --show-mmap-events | grep '//anon' ** Repro: This issue was discovered while testing the initial CoreCLR jitdump implementation. https://github.com/dotnet/coreclr/pull/26897. ** Alternate solutions considered These were also briefly considered: * Change kernel to not coalesce mmap regions. * Change kernel reporting of coalesced mmap regions to perf. Only include newly mapped memory. * Only strip parts of // anon mmap events overlapping existing jitted-<pid>-<code_index>.so mmap events. Signed-off-by: Steve MacLean <Steve.MacLean@Microsoft.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1590544271-125795-1-git-send-email-steve.maclean@linux.microsoft.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
facbf0b982 |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and move perf/core forward, minor conflict as perf_evlist__add_dummy() lost its 'perf_' prefix as it operates on a 'struct evlist', not on a 'struct perf_evlist', i.e. its tools/perf/ specific, it is not in libperf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
19bf119ccf |
perf symbols: Add s390 idle functions 'psw_idle' and 'psw_idle_exit' to list of idle symbols
Add the s390 idle functions so they don't show up in top when using software sampling. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200707171457.85707-1-svens@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
4c95ad261c |
perf intel-pt: Fix PEBS sample for XMM registers
The condition to add XMM registers was missing, the regs array needed to
be in the outer scope, and the size of the regs array was too small.
Fixes:
|
||
|
75bcb8776d |
perf intel-pt: Fix recording PEBS-via-PT with registers
When recording PEBS-via-PT, the kernel will not accept the intel_pt
event with register sampling e.g.
# perf record --kcore -c 10000 -e '{intel_pt/branch=0/,branch-loads/aux-output/ppp}' -I -- ls -l
Error:
intel_pt/branch=0/: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Fix by suppressing register sampling on the intel_pt evsel.
Committer notes:
Adrian informed that this is only available from Tremont onwards, so on
older processors the error continues the same as before.
Fixes:
|
||
|
442ad2254a |
perf record: Fix duplicated sideband events with Intel PT system wide tracing
Commit
|
||
|
1f16fcad68 |
perf parse-events: Disable a subset of bison warnings
Rather than disable all warnings with -w, disable specific warnings. Predicate enabling the warnings on a recent version of bison. Tested with GCC 9.3.0 and clang 9.0.1. Committer testing: The full set of compilers, gcc and clang that this will be tested on will be on the signed tag when this change goes upstream. Had to add -Wno-switch-enum to build on opensuse tumbleweed: /tmp/build/perf/util/parse-events-bison.c: In function 'yydestruct': /tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEMPTY' not handled in switch [-Werror=switch-enum] 1200 | switch (yykind) | ^~~~~~ /tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEOF' not handled in switch [-Werror=switch-enum] Also replace -Wno-error=implicit-function-declaration with -Wno-implicit-function-declaration. Also needed to check just the first two levels of the bison version, as the patch was assuming that all versions were of the form x.y.z, and there are several cases where it is just x.y, breaking the build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
304d7a90c4 |
perf parse-events: Disable a subset of flex warnings
Rather than disable all warnings with -w, disable specific warnings. Predicate enabling the warnings on more recent flex versions. Tested with GCC 9.3.0 and clang 9.0.1. Committer notes: The full set of compilers, gcc and clang that this will be tested on will be on the signed tag when this change goes upstream. Added -Wno-misleading-indentation to the flex_flags to overcome this on opensuse tumbleweed when building with clang: CC /tmp/build/perf/util/parse-events-flex.o CC /tmp/build/perf/util/pmu.o /tmp/build/perf/util/parse-events-flex.c:5038:13: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation] if ( ! yyg->yy_state_buf ) ^ /tmp/build/perf/util/parse-events-flex.c:5036:9: note: previous statement is here if ( ! yyg->yy_state_buf ) ^ And we need to use this to redirect stderr to stdin and then grep in a way that is acceptable for BusyBox shell: 2>&1 | Previously I was using: |& Which seems to be bash specific. Added -Wno-sign-compare to overcome this on systems such as centos:7: CC /tmp/build/perf/util/parse-events-flex.o CC /tmp/build/perf/util/pmu.o CC /tmp/build/perf/util/pmu-flex.o util/parse-events.l: In function 'parse_events_lex': /tmp/build/perf/util/parse-events-flex.c:193:36: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for ( yyl = n; yyl < yyleng; ++yyl )\ ^ /tmp/build/perf/util/parse-events-flex.c:204:9: note: in expansion of macro 'YY_LESS_LINENO' Added -Wno-unused-parameter to overcome this in systems such as centos:7: CC /tmp/build/perf/util/parse-events-flex.o CC /tmp/build/perf/util/pmu.o /tmp/build/perf/util/parse-events-flex.c: In function 'yy_fatal_error': /tmp/build/perf/util/parse-events-flex.c:6265:58: error: unused parameter 'yyscanner' [-Werror=unused-parameter] static void yy_fatal_error (yyconst char* msg , yyscan_t yyscanner) ^ Added -Wno-missing-declarations to build in systems such as centos:6: /tmp/build/perf/util/parse-events-flex.c:6313: error: no previous prototype for 'parse_events_get_column' /tmp/build/perf/util/parse-events-flex.c:6389: error: no previous prototype for 'parse_events_set_column' And -Wno-missing-prototypes to cover older compilers: -Wmissing-prototypes (C only) Warn if a global function is defined without a previous prototype declaration. This warning is issued even if the definition itself provides a prototype. The aim is to detect global functions that fail to be declared in header files. -Wmissing-declarations (C only) Warn if a global function is defined without a previous declaration. Do so even if the definition itself provides a prototype. Use this option to detect global functions that are not declared in header files. Older C compilers lack -Wno-misleading-indentation, check if it is available before using it. Also needed to check just the first two levels of the flex version, as the patch was assuming that all versions were of the form x.y.z, and there are several cases where it is just x.y, breaking the build. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ef9894d966 |
perf parse-events: Declare bison header file output
Declare bison header file output so that C files can depend upon them. As there are multiple output targets $@ is replaced by the target name. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
3744ca1e67 |
perf expr: Add missing headers noticed when building with NO_LIBBPF=1
These will break the build as soon as we stop disabling all warnings when building flex and bison generated files, so add them before we do that to keep the tree bisectable. Noticed when building on centos:7 with NO_LIBBPF=1: util/expr.c: In function 'key_equal': util/expr.c:29:2: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration] return !strcmp((const char *)key1, (const char *)key2); ^ util/expr.c: In function 'expr__add_id': util/expr.c:40:3: error: implicit declaration of function 'malloc' [-Werror=implicit-function-declaration] val_ptr = malloc(sizeof(double)); ^ util/expr.c:40:13: error: incompatible implicit declaration of built-in function 'malloc' [-Werror] val_ptr = malloc(sizeof(double)); ^ util/expr.c:42:12: error: 'ENOMEM' undeclared (first use in this function) return -ENOMEM; ^ Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
4b971df992 |
perf parse-events: Declare flex header file output
Declare flex header file output so that bison C files can depend upon them. As there are multiple output targets $@ is replaced by the target name. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
970a4a3418 |
perf pmu: Add flex debug build flag
Allow pmu parser's flex to be debugged as the parse-events and expr currently are. Enabling this requires the C code to call perf_pmu__flex_debug. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
5011a52fc5 |
perf pmu: Add bison debug build flag
Allow pmu parser to be debugged as the parse-events and expr currently are. Enabling this requires the C code to set perf_pmu_debug. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
da77a14db3 |
perf parse-events: Use automatic variable for yacc input
This reduces the command line size slightly. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
8d54c308c8 |
perf parse-events: Use automatic variable for flex input
This reduces the command line size slightly. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200619043356.90024-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
92c7d7cdf4 |
perf evlist: Fix the class prefix for 'struct evlist' branch_type methods
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
8cedf3a5c1 |
perf evlist: Fix the class prefix for 'struct evlist' sample_id_all methods
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
b3c2cc2bd2 |
perf evlist: Fix the class prefix for 'struct evlist' sample_type methods
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
d1f249ecbd |
perf evlist: Fix the class prefix for 'struct evlist' strerror methods
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
e251abee87 |
perf evlist: Fix the class prefix for 'struct evlist' 'add' evsel methods
To differentiate from libperf's 'struct perf_evlist' methods. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ce0dc7d222 |
perf pmu: Improve CPU core PMU HW event list ordering
For perf list, the CPU core PMU HW event ordering is such that not all
events may will be listed adjacent - consider this example:
$ tools/perf/perf list
List of pre-defined events (to be used in -e):
duration_time [Tool event]
branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
branch-misses OR cpu/branch-misses/ [Kernel PMU event]
bus-cycles OR cpu/bus-cycles/ [Kernel PMU event]
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
cache-references OR cpu/cache-references/ [Kernel PMU event]
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
cstate_core/c3-residency/ [Kernel PMU event]
cstate_core/c6-residency/ [Kernel PMU event]
cstate_core/c7-residency/ [Kernel PMU event]
cstate_pkg/c2-residency/ [Kernel PMU event]
cstate_pkg/c3-residency/ [Kernel PMU event]
cstate_pkg/c6-residency/ [Kernel PMU event]
cstate_pkg/c7-residency/ [Kernel PMU event]
cycles-ct OR cpu/cycles-ct/ [Kernel PMU event]
cycles-t OR cpu/cycles-t/ [Kernel PMU event]
el-abort OR cpu/el-abort/ [Kernel PMU event]
el-capacity OR cpu/el-capacity/ [Kernel PMU event]
Notice in the above example how the cstate_core PMU events are mixed in
the middle of the CPU core events.
For my arm64 platform, all the uncore events get mixed in, making the list
very disorganised:
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
br_mis_pred OR armv8_pmuv3_0/br_mis_pred/ [Kernel PMU event]
br_mis_pred_retired OR armv8_pmuv3_0/br_mis_pred_retired/ [Kernel PMU event]
br_pred OR armv8_pmuv3_0/br_pred/ [Kernel PMU event]
br_retired OR armv8_pmuv3_0/br_retired/ [Kernel PMU event]
br_return_retired OR armv8_pmuv3_0/br_return_retired/ [Kernel PMU event]
bus_access OR armv8_pmuv3_0/bus_access/ [Kernel PMU event]
bus_cycles OR armv8_pmuv3_0/bus_cycles/ [Kernel PMU event]
cid_write_retired OR armv8_pmuv3_0/cid_write_retired/ [Kernel PMU event]
cpu_cycles OR armv8_pmuv3_0/cpu_cycles/ [Kernel PMU event]
dtlb_walk OR armv8_pmuv3_0/dtlb_walk/ [Kernel PMU event]
exc_return OR armv8_pmuv3_0/exc_return/ [Kernel PMU event]
exc_taken OR armv8_pmuv3_0/exc_taken/ [Kernel PMU event]
hisi_sccl1_ddrc0/act_cmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_rcmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_rd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_wcmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/flux_wr/ [Kernel PMU event]
hisi_sccl1_ddrc0/pre_cmd/ [Kernel PMU event]
hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event]
...
hisi_sccl7_l3c21/wr_hit_cpipe/ [Kernel PMU event]
hisi_sccl7_l3c21/wr_hit_spipe/ [Kernel PMU event]
hisi_sccl7_l3c21/wr_spipe/ [Kernel PMU event]
inst_retired OR armv8_pmuv3_0/inst_retired/ [Kernel PMU event]
inst_spec OR armv8_pmuv3_0/inst_spec/ [Kernel PMU event]
itlb_walk OR armv8_pmuv3_0/itlb_walk/ [Kernel PMU event]
l1d_cache OR armv8_pmuv3_0/l1d_cache/ [Kernel PMU event]
l1d_cache_refill OR armv8_pmuv3_0/l1d_cache_refill/ [Kernel PMU event]
l1d_cache_wb OR armv8_pmuv3_0/l1d_cache_wb/ [Kernel PMU event]
l1d_tlb OR armv8_pmuv3_0/l1d_tlb/ [Kernel PMU event]
l1d_tlb_refill OR armv8_pmuv3_0/l1d_tlb_refill/ [Kernel PMU event]
So the events are list alphabetically. However, CPU core event listing is
special from commit
|
||
|
c1b4745b48 |
perf pmu: List kernel supplied event aliases for arm64
In commit
|
||
|
ff1a12f962 |
perf expr: Add < and > operators
These are broadly useful but required to handle TMA metrics. For example encoding Ports_Utilization from: https://download.01.org/perfmon/TMA_Metrics.csv requires '<'. { "BriefDescription": "This metric estimates fraction of cycles the CPU performance was potentially limited due to Core computation issues (non divider-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency among contiguous instructions would manifest in this metric - such cases are often referred to as low Instruction Level Parallelism (ILP). (2) Contention on some hardware execution unit other than Divider. For example; when there are too many multiply operations.", "MetricExpr": "( ( cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + cpu@EXE_ACTIVITY.1_PORTS_UTIL@ + ( cpu@EXE_ACTIVITY.2_PORTS_UTIL@ * ( ( ( cpu@UOPS_RETIRED.RETIRE_SLOTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) ) / ( ( 4.000000 ) + 1.000000 ) ) ) ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) if ( cpu@ARITH.DIVIDER_ACTIVE\\,cmask\\=1@ < cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ ) else ( ( cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ + cpu@EXE_ACTIVITY.1_PORTS_UTIL@ + ( cpu@EXE_ACTIVITY.2_PORTS_UTIL@ * ( ( ( cpu@UOPS_RETIRED.RETIRE_SLOTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) ) / ( ( 4.000000 ) + 1.000000 ) ) ) ) - cpu@EXE_ACTIVITY.EXE_BOUND_0_PORTS@ ) / ( cpu@CPU_CLK_UNHALTED.THREAD@ ) )", "MetricGroup": "Topdown_Group_Ports_Utilization", "MetricName": "Topdown_Metric_Ports_Utilization" }, Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200610235823.52557-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
3e21a28a01 |
perf expr: Add d_ratio operation
d_ratio avoids division by 0 yielding infinity, such as when a counter doesn't get scheduled. An example usage is: { "BriefDescription": "DCache L1 misses", "MetricExpr": "d_ratio(MEM_LOAD_RETIRED.L1_MISS, MEM_LOAD_RETIRED.L1_HIT + MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT)", "MetricGroup": "DCache;DCache_L1", "MetricName": "DCache_L1_Miss", "ScaleUnit": "100%", } Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200610235823.52557-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
6d432c4c8a |
perf tools: Add test_generic_metric function
Adding test_generic_metric that prepares and runs given metric over the data from struct runtime_stat object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-12-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
9afe5658a6 |
perf tools: Release metric_events rblist
We don't release metric_events rblist, add the missing delete hook and call the release before leaving cmd_stat. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
2cfaa853d8 |
perf tools: Factor out prepare_metric function
Factoring out prepare_metric function so it can be used in test interface coming in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
f78ac00a8c |
perf tools: Add metricgroup__parse_groups_test function
Add the metricgroup__parse_groups_test function. It will be used as test's interface to metric parsing in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
1381396b0b |
perf tools: Add map to parse_groups() function
For testing purposes we need to pass our own map of events from parse_groups() through metricgroup__add_metric. Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
68173bda6a |
perf tools: Add fake_pmu to parse_group() function
Allow to pass fake_pmu in parse_groups function so it can be used in parse_events call. It's will be passed by the upcoming metricgroup__parse_groups_test function. Committer notes: Made it a 'struct perf_pmu' pointer, in line with the changes at the start of this patchkit to avoid statics deep down in library code. Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
8b4468a210 |
perf parse: Factor out parse_groups() function
Factor out the parse_groups function, it will be used for new test interface coming in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
e46fc8d9dd |
perf pmu: Add a perf_pmu__fake object to use with __parse_events()
When wanting to use the support in __parse_events() for fake pmus, just pass it. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
3bf91aa5aa |
perf parse: Provide a way to pass a fake_pmu to parse_events()
This is an alternative patch to what Jiri sent that instead of changing all callers to parse_events() for allowing to pass a fake_pmu, provide another function specifically for that. From Jiri's patch: This way it's possible to parse events from PMUs which are not present in the system. It's available only for testing purposes coming in following changes, so all the current users set fake_pmu argument as false. Based-on-a-patch-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200602214741.1218986-3-jolsa@kernel.org Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
387ad33fe7 |
perf tools: Add fake pmu support
Add a way to create a pmu event without the actual PMU being in place. This way we can test metrics defined for any processor. The interface is to define fake_pmu in struct parse_events_state data. It will be used only in tests via special interface function added in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200602214741.1218986-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
85d0f9ad82 |
perf pmu: Remove unused declaration
This avoids multiple declarations if the flex header is included. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200609234344.3795-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ffaecd7d1f |
perf parse-events: Fix an old style declaration
Fixes:
|
||
|
c2412fae3f |
perf parse-events: Fix an incompatible pointer
Arrays are pointer types and don't need their address taking.
Fixes:
|
||
|
d38c692f16 |
perf bpf: Fix bpf prologue generation
Issue:
bpf_probe_read() is no longer available for architecture which has
overlapping address space. Hence bpf prologue generation fails
Fix:
Use bpf_probe_read_kernel for kernel member access. For user attribute
access in kprobes, use bpf_probe_read_user.
Other:
@user attribute was introduced in commit
|
||
|
9256c3031e |
perf probe: Fix user attribute access in kprobes
Issue: # perf probe -a 'do_sched_setscheduler pid policy param->sched_priority@user' did not work before. Fix: Make: # perf probe -a 'do_sched_setscheduler pid policy param->sched_priority@user' output equivalent to ftrace: # echo 'p:probe/do_sched_setscheduler _text+517384 pid=%r2:s32 policy=%r3:s32 sched_priority=+u0(%r4):s32' > /sys/kernel/debug/tracing/kprobe_events Other: 1. Right now, __match_glob() does not handle [u]<offset>. For now, use *u]<offset>. 2. @user attribute was introduced in commit |
||
|
c0c652fc70 |
perf stat: Fix NULL pointer dereference
If config->aggr_map is NULL and config->aggr_get_id is not NULL,
the function print_aggr() will still calling arrg_update_shadow(),
which can result in accessing the invalid pointer.
Fixes:
|
||
|
3e9b26dc22 |
perf tools: Remove some duplicated includes
There exists some duplicated includes in tools/perf, remove them. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xuefeng li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
0affd0e526 |
perf symbols: Fix kernel maps for kcore and eBPF
Adjust 'map->pgoff' also when moving a map's start address.
Example with v5.4.34 based kernel:
Before:
$ sudo tools/perf/perf record -a --kcore -e intel_pt//k sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.958 MB perf.data ]
$ sudo tools/perf/perf script --itrace=e >/dev/null
Warning:
961 instruction trace errors
After:
$ sudo tools/perf/perf script --itrace=e >/dev/null
$
Committer testing:
# uname -a
Linux seventh 5.6.10-100.fc30.x86_64 #1 SMP Mon May 4 15:36:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
#
Before:
# perf record -a --kcore -e intel_pt//k sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.923 MB perf.data ]
# perf script --itrace=e >/dev/null
Warning:
295 instruction trace errors
#
After:
# perf record -a --kcore -e intel_pt//k sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.919 MB perf.data ]
# perf script --itrace=e >/dev/null
#
Fixes:
|
||
|
a54ca19498 |
perf arm-spe: Support synthetic events
After the commit
|
||
|
9f74d77018 |
perf auxtrace: Add four itrace options
This patch is to add four options to synthesize events which are described as below: 'f': synthesize first level cache events 'm': synthesize last level cache events 't': synthesize TLB events 'a': synthesize remote access events This four options will be used by ARM SPE as their first consumer. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Tested-by: James Clark <james.clark@arm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200530122442.490-3-leo.yan@linaro.org Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
4db25f6693 |
perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
Create a new arm-spe-decoder directory for subsequent extensions and move arm-spe-pkt-decoder.h/c to this directory. No code changes. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Tested-by: James Clark <james.clark@arm.com> Tested-by: Qi Liu <liuqi115@hisilicon.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200530122442.490-2-leo.yan@linaro.org Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
7094349078 |
perf tools: Add optional support for libpfm4
This patch links perf with the libpfm4 library if it is available and LIBPFM4 is passed to the build. The libpfm4 library contains hardware event tables for all processors supported by perf_events. It is a helper library that helps convert from a symbolic event name to the event encoding required by the underlying kernel interface. This library is open-source and available from: http://perfmon2.sf.net. With this patch, it is possible to specify full hardware events by name. Hardware filters are also supported. Events must be specified via the --pfm-events and not -e option. Both options are active at the same time and it is possible to mix and match: $ perf stat --pfm-events inst_retired:any_p:c=1:i -e cycles .... One needs to explicitely ask for its inclusion by using the LIBPFM4 make command line option, ie its opt-in rather than opt-out of feature detection and build support. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200505182943.218248-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
1e4bd2ae45 |
perf jit: Fix inaccurate DWARF line table
Fix an issue where addresses in the DWARF line table are offset by -0x40 (GEN_ELF_TEXT_OFFSET). This can be seen with `objdump -S` on the ELF files after perf inject. Committer notes: Ian added this in his Acked-by reply: --- Without too much knowledge this looks good to me. The original code came from oprofile's jit support: https://sourceforge.net/p/oprofile/oprofile/ci/master/tree/opjitconv/debug_line.c#l325 --- Signed-off-by: Nick Gasson <nick.gasson@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200528051916.6722-1-nick.gasson@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
a9e8c1f856 |
perf trace: Use zalloc() to make sure all fields are zeroed in the syscalltbl constructor
In the past this wasn't needed as the libaudit based code would use just one field, and the alternative constructor would fill in all the fields, but now that even when using the libaudit based method we need the other fields, switch to zalloc() to make sure the other fields are zeroed at instantiation time. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
db6b8cc891 |
perf trace: Remove union from syscalltbl, all the fields are needed
When we moved to a syscalltbl generated from the kernel syscall tables (arch/..../syscall*.tbl) the idea was to either use it, when having the generator (e.g. tools/perf/arch/x86/entry/syscalls/syscalltbl.sh), or falling back to the previous audit-libs based way of mapping syscall ids to strings and the other way around. At first we just needed the audit_detect_machine() return to then use it to the str->id/id->str, or the other fields for the now used by default in the most well developed arches method of using the syscall table generator. The problem is that then the libaudit code fell into disrepair, and architectures where it is the method used are not working. Now, with NO_SYSCALL_TABLE=1 being possible to pass on the make command line we can automate the testing of that method even on x86-64, arm64, etc. And doing it I noted that we actually use fields in both entries in the union, oops, so ditch the union, as we need all those fields at the same time. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
16b4b4e1a0 |
perf record: Respect --no-switch-events
Context switch events are added automatically by Intel PT and Coresight. Make it possible to suppress them. That is useful for tracing the scheduler without the disturbance that the switch event processing creates. Example: Prerequisites: $ which perf ~/bin/perf $ sudo setcap "cap_sys_rawio,cap_sys_admin,cap_sys_ptrace,cap_syslog,cap_ipc_lock=ep" ~/bin/perf $ sudo chmod +r /proc/kcore Before: $ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.938 MB perf.data ] $ perf script -D | grep PERF_RECORD_SWITCH | wc -l 572 After: $ perf record --no-switch-events --kcore -a -e intel_pt//k -- sleep 0.001 Warning: Intel Processor Trace decoding will not be possible except for kernel tracing! [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.838 MB perf.data ] $ perf script -D | grep PERF_RECORD_SWITCH | wc -l 0 $ sudo chmod go-r /proc/kcore $ sudo setcap -r ~/bin/perf Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lore.kernel.org/lkml/20200528120859.21604-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
87cf836073 |
perf evlist: Disable 'immediate' events last
Events marked as 'immediate' are started before other events to ensure that there is context at the start of the main tracing events. The same is true at the end of tracing, so disable 'immediate' events after other events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
61f82e3fb6 |
perf kcore_copy: Fix module map when there are no modules loaded
In the absence of any modules, no "modules" map is created, but there are other executable pages to map, due to eBPF JIT, kprobe or ftrace. Map them by recognizing that the first "module" symbol is not necessarily from a module, and adjust the map accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200512121922.8997-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
0bdf31811b |
perf jvmti: Fix demangling Java symbols
For a Java method signature like: Ljava/lang/AbstractStringBuilder;appendChars(Ljava/lang/String;II)V The demangler produces: void class java.lang.AbstractStringBuilder.appendChars(class java.lang., shorttring., int, int) The arguments should be (java.lang.String, int, int) but the demangler interprets the "S" in String as the type code for "short". Correct this and two other minor things: - There is no "bool" type in Java, should be "boolean". - The demangler prepends "class" to every Java class name. This is not standard Java syntax and it wastes a lot of horizontal space if the signature is long. Remove this as there isn't any ambiguity between class names and primitives. Committer notes: This was split from a larger patch that also added a java demangler 'perf test' entry, that, before this patch shows the error being fixed by it: $ perf test java 65: Demangle Java : FAILED! $ perf test -v java Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 65: Demangle Java : --- start --- test child forked, pid 307264 FAILED: Ljava/lang/StringLatin1;equals([B[B)Z: bool class java.lang.StringLatin1.equals(byte[], byte[]) != boolean java.lang.StringLatin1.equals(byte[], byte[]) FAILED: Ljava/util/zip/ZipUtils;CENSIZ([BI)J: long class java.util.zip.ZipUtils.CENSIZ(byte[], int) != long java.util.zip.ZipUtils.CENSIZ(byte[], int) FAILED: Ljava/util/regex/Pattern$BmpCharProperty;match(Ljava/util/regex/Matcher;ILjava/lang/CharSequence;)Z: bool class java.util.regex.Pattern$BmpCharProperty.match(class java.util.regex.Matcher., int, class java.lang., charhar, shortequence) != boolean java.util.regex.Pattern$BmpCharProperty.match(java.util.regex.Matcher, int, java.lang.CharSequence) FAILED: Ljava/lang/AbstractStringBuilder;appendChars(Ljava/lang/String;II)V: void class java.lang.AbstractStringBuilder.appendChars(class java.lang., shorttring., int, int) != void java.lang.AbstractStringBuilder.appendChars(java.lang.String, int, int) FAILED: Ljava/lang/Object;<init>()V: void class java.lang.Object<init>() != void java.lang.Object<init>() test child finished with -1 ---- end ---- Demangle Java: FAILED! $ After applying this patch: $ perf test java 65: Demangle Java : Ok $ Signed-off-by: Nick Gasson <nick.gasson@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200427061520.24905-4-nick.gasson@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
85afd35575 |
perf symbols: Fix debuginfo search for Ubuntu
Reportedly, from 19.10 Ubuntu has begun mixing up the location of some debug symbol files, putting files expected to be in /usr/lib/debug/usr/lib into /usr/lib/debug/lib instead. Fix by adding another dso_binary_type. Example on Ubuntu 20.04 Before: $ perf record -e intel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data ] $ perf script --call-trace | head -5 uname 14003 [005] 15321.764958566: cbr: 42 freq: 4219 MHz (156%) uname 14003 [005] 15321.764958566: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) 7f1e71cc4100 uname 14003 [005] 15321.764961566: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) 7f1e71cc4df0 uname 14003 [005] 15321.764961900: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) 7f1e71cc4e18 uname 14003 [005] 15321.764963233: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) 7f1e71cc5128 After: $ perf script --call-trace | head -5 uname 14003 [005] 15321.764958566: cbr: 42 freq: 4219 MHz (156%) uname 14003 [005] 15321.764958566: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) _start uname 14003 [005] 15321.764961566: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) _dl_start uname 14003 [005] 15321.764961900: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) _dl_start uname 14003 [005] 15321.764963233: (/usr/lib/x86_64-linux-gnu/ld-2.31.so ) _dl_start Reported-by: Travis Downs <travis.downs@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200526155207.9172-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
1244a32736 |
perf parse: Add 'struct parse_events_state' pointer to scanner
We need to pass more data to the scanner so let's start with having it to take pointer to 'struct parse_events_state' object instead of just start token. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200524224219.234847-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
5f09ca5a14 |
perf stat: Do not pass avg to generic_metric
There's no need to pass the given evsel's count to metric data, because it will be pushed again within the following metric_events loop. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200524224219.234847-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
e2ce1059b0 |
perf metricgroup: Remove unnecessary ',' from events
Remove unnecessary commas from events before they are parsed. This avoids ',' being echoed by parse-events.l. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
05530a7921 |
perf metricgroup: Add options to not group or merge
Add --metric-no-group that causes all events within metrics to not be grouped. This can allow the event to get more time when multiplexed, but may also lower accuracy. Add --metric-no-merge option. By default events in different metrics may be shared if the group of events for one metric is the same or larger than that of the second. Sharing may increase or lower accuracy and so is now configurable. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
2440689d62 |
perf metricgroup: Remove duped metric group events
A metric group contains multiple metrics. These metrics may use the same events. If metrics use separate events then it leads to more multiplexing and overall metric counts fail to sum to 100%. Modify how metrics are associated with events so that if the events in an earlier group satisfy the current metric, the same events are used. A record of used events is kept and at the end of processing unnecessary events are eliminated. Before: $ perf stat -a -M TopDownL1 sleep 1 Performance counter stats for 'system wide': 920,211,343 uops_issued.any # 0.5 Backend_Bound (16.56%) 1,977,733,128 idq_uops_not_delivered.core (16.56%) 51,668,510 int_misc.recovery_cycles (16.56%) 732,305,692 uops_retired.retire_slots (16.56%) 1,497,621,849 cycles (16.56%) 721,098,274 uops_issued.any # 0.1 Bad_Speculation (16.79%) 1,332,681,791 cycles (16.79%) 552,475,482 uops_retired.retire_slots (16.79%) 47,708,340 int_misc.recovery_cycles (16.79%) 1,383,713,292 cycles # 0.4 Frontend_Bound (16.76%) 2,013,757,701 idq_uops_not_delivered.core (16.76%) 1,373,363,790 cycles # 0.1 Retiring (33.54%) 577,302,589 uops_retired.retire_slots (33.54%) 392,766,987 inst_retired.any # 0.3 IPC (50.24%) 1,351,873,350 cpu_clk_unhalted.thread (50.24%) 1,332,510,318 cycles # 5330041272.0 SLOTS (49.90%) 1.006336145 seconds time elapsed After: $ perf stat -a -M TopDownL1 sleep 1 Performance counter stats for 'system wide': 765,949,145 uops_issued.any # 0.1 Bad_Speculation # 0.5 Backend_Bound (50.09%) 1,883,830,591 idq_uops_not_delivered.core # 0.3 Frontend_Bound (50.09%) 48,237,080 int_misc.recovery_cycles (50.09%) 581,798,385 uops_retired.retire_slots # 0.1 Retiring (50.09%) 1,361,628,527 cycles # 5446514108.0 SLOTS (50.09%) 391,415,714 inst_retired.any # 0.3 IPC (49.91%) 1,336,486,781 cpu_clk_unhalted.thread (49.91%) 1.005469298 seconds time elapsed Note: Bad_Speculation + Backend_Bound + Frontend_Bound + Retiring = 100% after, where as before it is 110%. After there are 2 groups, whereas before there are 6. After the cycles event appears once, before it appeared 5 times. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
6bf2102bec |
perf metricgroup: Order event groups by size
When adding event groups to the group list, insert them in size order. This performs an insertion sort on the group list. By placing the largest groups at the front of the group list it is possible to see if a larger group contains the same events as a later group. This can make the later group redundant - it can reuse the events from the large group. A later patch will add this sharing. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
7f9eca51c1 |
perf metricgroup: Delay events string creation
Currently event groups are placed into groups_list at the same time as the events string containing the events is built. Separate these two operations and build the groups_list first, then the event string from the groups_list. This adds an ability to reorder the groups_list that will be used in a later patch. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
908103991a |
perf metricgroup: Use early return in add_metric
Use early return in metricgroup__add_metric and try to make the intent of the returns more intention revealing. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
4e21c13aca |
perf metricgroup: Always place duration_time last
If a metric contains the duration_time event then the event is placed outside of the metric's group of events. Rather than split the group, make it so the duration_time is immediately after the group. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520182011.32236-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
a159e2fe89 |
perf metricgroup: Free metric_events on error
Avoid a simple memory leak. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: kp singh <kpsingh@chromium.org> Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200508053629.210324-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
fa99ce8282 |
perf util: Fix potential SEGFAULT in put_tracepoints_path error path
This patch fix potential segment fault triggered in put_tracepoints_path() when the address of the local variable 'path' be freed in error path of record_saved_cmdline. Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hongbo Yao <yaohongbo@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Xie XiuQi <xiexiuqi@huawei.com> Link: http://lore.kernel.org/lkml/20200521133218.30150-5-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
07e9a6f538 |
perf util: Fix memory leak of prefix_if_not_in
Need to free "str" before return when asprintf() failed to avoid memory leak. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hongbo Yao <yaohongbo@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200521133218.30150-4-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ffe7428e6d |
perf branch: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array
member[1][2], introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle and audited _manually_.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
|
||
|
d778a778a8 |
perf config: Add stat.big-num support
Add support for new "stat.big-num" boolean option. This allows a user to set a default for "--no-big-num" for "perf stat" commands. -- $ perf config stat.big-num $ perf stat --event cycles /bin/true Performance counter stats for '/bin/true': 778,849 cycles [...] $ perf config stat.big-num=false $ perf config stat.big-num stat.big-num=false $ perf stat --event cycles /bin/true Performance counter stats for '/bin/true': 769622 cycles [...] -- There is an interaction with "--field-separator" that must be accommodated, such that specifying "--big-num --field-separator={x}" still reports an invalid combination of options. Documentation for perf-config and perf-stat updated. Signed-off-by: Paul Clarke <pc@us.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/1589991815-17951-1-git-send-email-pc@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
04f9bf2bac |
perf bpf-loader: Add missing '*' for key_scan_pos
key_scan_pos is a pointer for getting scan position in
bpf__obj_config_map() for each BPF map configuration term,
but it's misused when error not happened.
Committer notes:
The point is that the only user of this is:
tools/perf/util/parse-events.c
err = bpf__config_obj(obj, term, parse_state->evlist, &error_pos);
if (err) bpf__strerror_config_obj(obj, term, parse_state->evlist, &error_pos, err, errbuf, sizeof(errbuf));
And then:
int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
struct parse_events_term *term __maybe_unused,
struct evlist *evlist __maybe_unused,
int *error_pos __maybe_unused, int err,
char *buf, size_t size)
{
bpf__strerror_head(err, buf, size);
bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE,
"Can't use this config term with this map type");
bpf__strerror_end(buf, size);
return 0;
}
So this is infrastructure that Wang Nan put in place for providing
better error messages but that he ended up not using, so I'll apply the
fix, its correct even not fixing any real problem at this time.
Fixes:
|
||
|
c7e5b328a8 |
perf stat: Report summary for interval mode
Currently 'perf stat' supports to print counts at regular interval (-I), but it's not very easy for user to get the overall statistics. The patch uses 'evsel->prev_raw_counts' to get counts for summary. Copy the counts to 'evsel->counts' after printing the interval results. Next, we just follow the non-interval processing. Let's see some examples, root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2 # time counts unit events 1.000412064 2,281,114 cycles 2.001383658 2,547,880 cycles Performance counter stats for 'system wide': 4,828,994 cycles 2.002860349 seconds time elapsed root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000389902 1,536,093 cycles 1.000389902 420,226 instructions # 0.27 insn per cycle 2.001433453 2,213,952 cycles 2.001433453 735,465 instructions # 0.33 insn per cycle Performance counter stats for 'system wide': 3,750,045 cycles 1,155,691 instructions # 0.31 insn per cycle 2.003023361 seconds time elapsed root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2 # time counts unit events 1.000435121 905,303 inst_retired.any # 2.9 CPI 1.000435121 2,663,333 cycles 1.000435121 914,702 inst_retired.any # 0.3 IPC 1.000435121 2,676,559 cpu_clk_unhalted.thread 2.001615941 1,951,092 inst_retired.any # 1.8 CPI 2.001615941 3,551,357 cycles 2.001615941 1,950,837 inst_retired.any # 0.5 IPC 2.001615941 3,551,044 cpu_clk_unhalted.thread Performance counter stats for 'system wide': 2,856,395 inst_retired.any # 2.2 CPI 6,214,690 cycles 2,865,539 inst_retired.any # 0.5 IPC 6,227,603 cpu_clk_unhalted.thread 2.003403078 seconds time elapsed Committer testing: Before: # perf stat -e cycles -I1000 --interval-count 2 # time counts unit events 1.000618627 26,877,408 cycles 2.001417968 233,672,829 cycles # After: # perf stat -e cycles -I1000 --interval-count 2 # time counts unit events 1.001531815 5,341,388,792 cycles 2.002936530 100,073,912 cycles Performance counter stats for 'system wide': 5,441,462,704 cycles 2.004893794 seconds time elapsed # Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200520042737.24160-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
905365f493 |
perf stat: Save aggr value to first member of prev_raw_counts
To collect the overall statistics for interval mode, we copy the counts from evsel->prev_raw_counts to evsel->counts. For AGGR_GLOBAL mode, because the perf_stat_process_counter creates aggr values from per cpu values, but the per cpu values are 0, so the calculated aggr values will be always 0. This patch uses a trick that saves the previous aggr value to the first member of perf_counts, then aggr calculation in process_counter_values can work correctly for AGGR_GLOBAL. v6: --- Add comments in perf_evlist__save_aggr_prev_raw_counts. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200520042737.24160-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
297767ac0c |
perf stat: Copy counts from prev_raw_counts to evsel->counts
It would be useful to support the overall statistics for perf-stat interval mode. For example, report the summary at the end of "perf-stat -I" output. But since perf-stat can support many aggregation modes, such as --per-thread, --per-socket, -M and etc, we need a solution which doesn't bring much complexity. The idea is to use 'evsel->prev_raw_counts' which is updated in each interval and it's saved with the latest counts. Before reporting the summary, we copy the counts from evsel->prev_raw_counts to evsel->counts, and next we just follow non-interval processing. v5: --- Don't save the previous aggr value to the member of [cpu0,thread0] in perf_counts. Originally that was a trick because the perf_stat_process_counter would create aggr values from per cpu values. But we don't need to do that all the time. We will handle it in next patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200520042737.24160-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
cf4d9bd67c |
perf counts: Reset prev_raw_counts counts
When we want to reset the evsel->prev_raw_counts, zeroing the aggr is not enough, we need to reset the perf_counts too. The perf_counts__reset zeros the perf_counts, and it should zero the aggr too. This patch changes perf_counts__reset to non-static, and calls it in evsel__reset_prev_raw_counts to reset the prev_raw_counts. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200520042737.24160-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
a45badc739 |
perf expr: Allow numbers to be followed by a dot
Metrics like UNC_M_POWER_SELF_REFRESH encode 100 as "100." and
consequently the 100 is treated as a symbol. Alter the regular
expression to allow the dot to be before or after the number.
Note, this passed the pmu-events test as that tests the validity of a
number using strtod rather than lex code. strtod allows the dot after.
Add a test for this behavior.
Fixes:
|
||
|
45db55f2ef |
perf metricgroup: Make 'evlist_used' variable a bitmap instead of array of bools
Use a bitmap rather than an array of bools. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200520072814.128267-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ae7626418d |
perf stat: Fail on extra comma while parsing events
Ian reported that we allow to parse following: $ perf stat -e ,cycles true which is wrong and we should fail, like we do with this fix: $ perf stat -e ,cycles true event syntax error: ',cycles' \___ parser error The reason is that we don't have rule for ',' in 'event' start condition and it's matched and accepted by default rule. Add scanner debug support (that Ian already added for expr code), which was really useful for finding this. It's enabled together with bison debug via 'make PARSER_DEBUG=1'. Reported-by: Ian Rogers <irogers@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200520074050.156988-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
498ef715a0 |
perf script: Better align register values in dump
Before: $ perf script --dump-raw-trace [...] 2492031077254920 0x1e08 [0x308]: PERF_RECORD_SAMPLE(IP, 0x1): 47557/47557: 0xc00000000012eeb0 period: 1 addr: 0 ... user regs: mask 0x1fffffffffff ABI 64-bit .... r0 0xb .... r1 0x7ffff3b90fa0 .... r2 0x7fffbabf7300 .... r3 0x7ffff3b9ed60 .... r4 0x7ffff3b95cc0 .... r5 0x1000c5a2940 .... r6 0xfefefefefefefeff .... r7 0x7f7f7f7f7f7f7f7f .... r8 0x7ffff3b9ed60 .... r9 0x0 [...] After: [...] 2492031077254920 0x1e08 [0x308]: PERF_RECORD_SAMPLE(IP, 0x1): 47557/47557: 0xc00000000012eeb0 period: 1 addr: 0 ... user regs: mask 0x1fffffffffff ABI 64-bit .... r0 0x000000000000000b .... r1 0x00007ffff3b90fa0 .... r2 0x00007fffbabf7300 .... r3 0x00007ffff3b9ed60 .... r4 0x00007ffff3b95cc0 .... r5 0x000001000c5a2940 .... r6 0xfefefefefefefeff .... r7 0x7f7f7f7f7f7f7f7f .... r8 0x00007ffff3b9ed60 .... r9 0x0000000000000000 [...] Committer testing: Full set of instructions, testing on x86_64: # perf record -I ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.855 MB perf.data (4902 samples) ] # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, sample_regs_intr: 0xff0fff dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD|REGS_INTR, read_format: ID, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_intr: 0xff0fff # Before: # perf script --dump-raw-trace [...] 0 1542674658099675 0x1cb700 [0xe0]: PERF_RECORD_SAMPLE(IP, 0x4001): 1825/1825: 0xffffffff9506e544 period: 1 addr: 0 ... intr regs: mask 0xff0fff ABI 64-bit .... AX 0xf .... BX 0xffff96e1064125a0 .... CX 0x38f .... DX 0x7 .... SI 0xf .... DI 0x38f .... BP 0x1 .... SP 0xfffffe000000bdf0 .... IP 0xffffffff9506e544 .... FLAGS 0xa .... CS 0x10 .... SS 0x18 .... R8 0x0 .... R9 0x0 .... R10 0xfffffe00000260c8 .... R11 0xfffffe000000bef8 .... R12 0x1 .... R13 0x64 .... R14 0x390 .... R15 0xffff96e1064125a0 ... thread: perf:1825 ...... dso: /proc/kcore perf 1825 [000] 1542674.658099: 1 cycles: ffffffff9506e544 native_write_msr+0x4 (vmlinux [...] After: # perf script --dump-raw-trace [...] 0 1542674658096068 0x1cb620 [0xe0]: PERF_RECORD_SAMPLE(IP, 0x4001): 1825/1825: 0xffffffff9506e544 period: 1 addr: 0 ... intr regs: mask 0xff0fff ABI 64-bit .... AX 0x000000000000000f .... BX 0xffff96e1064125a0 .... CX 0x000000000000038f .... DX 0x0000000000000007 .... SI 0x000000000000000f .... DI 0x000000000000038f .... BP 0x0000000000000000 .... SP 0xffffb3e788fb7c20 .... IP 0xffffffff9506e544 .... FLAGS 0x000000000000000a .... CS 0x0000000000000010 .... SS 0x0000000000000018 .... R8 0x00057b0deeffdfe3 .... R9 0xffff96e106432480 .... R10 0x0000000000000000 .... R11 0xffff96e106412cc0 .... R12 0xffffb3e788fb7d00 .... R13 0xffff96e106432408 .... R14 0xffff96e106432400 .... R15 0xffff96e0e09a4800 ... thread: perf:1825 ...... dso: /proc/kcore perf 1825 [000] 1542674.658096: 1 cycles: ffffffff9506e544 native_write_msr+0x4 (vmlinux) [...] Signed-off-by: Paul Clarke <pc@us.ibm.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> LPU-Reference: 1589911102-9460-1-git-send-email-pc@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
6549a8c0c3 |
perf tools: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array
member[1][2], introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
|
||
|
961224db04 |
perf intel-pt: Use allocated branch stack for PEBS sample
To avoid having struct branch_stack as a non-last structure member, use allocated branch stack for PEBS sample. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/2540ed9a-89f1-6d59-10c9-a66cc90db5d2@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
c1034eb069 |
perf tool: Make perf tool aware of SELinux access control
Implement selinux sysfs check to see the system is in enforcing mode and print warning message with pointer to check audit logs. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/819338ce-d160-4a2f-f1aa-d756a2e7c6fc@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
ded80bda8b |
perf expr: Migrate expr ids table to a hashmap
Use a hashmap between a char* string and a double* value. While bpf's hashmap entries are size_t in size, we can't guarantee sizeof(size_t) >= sizeof(double). Avoid a memory allocation when gathering ids by making 0.0 a special value encoded as NULL. Original map suggestion by Andi Kleen: https://lore.kernel.org/lkml/20200224210308.GQ160988@tassilo.jf.intel.com/ and seconded by Jiri Olsa: https://lore.kernel.org/lkml/20200423112915.GH1136647@krava/ Committer notes: There are fixes that need to land upstream before we can use libbpf's headers, for now use our copy unconditionally, since the data structures at this point are exactly the same, no problem. When the fixes for libbpf's hashmap land upstream, we can fix this up. Testing it: Building with LIBBPF=1, i.e. the default: $ perf -vv | grep -i bpf bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ nm ~/bin/perf | grep -i libbpf_ | wc -l 39 $ nm ~/bin/perf | grep -i hashmap_ | wc -l 17 $ Explicitely building without LIBBPF: $ perf -vv | grep -i bpf bpf: [ OFF ] # HAVE_LIBBPF_SUPPORT $ $ nm ~/bin/perf | grep -i libbpf_ | wc -l 0 $ nm ~/bin/perf | grep -i hashmap_ | wc -l 9 $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: kp singh <kpsingh@chromium.org> Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200515221732.44078-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
eee1950192 |
perf tools: Grab a copy of libbpf's hashmap
Allow use of hashmap in perf. Modify perf's check-headers.sh script to check that the files are kept in sync, in the same way kernel headers are checked. This will warn if they are out of sync at the start of a perf build. Committer note: This starts out of synch as a fix went thru the bpf tree, namely the one removing the needless libbpf_internal.h include in hashmap.h. There is also another change related to __WORDSIZE, that as is in tools/lib/bpf/hashmap.h causes the tools/perf/ build to fail in systems such as Alpine Linus, that uses the Musl libc, so we need an alternative way of having __WORDSIZE available, use the one used by tools/include/linux/bitops.h, that builds in all the systems I have build containers for. These differences will be resolved at some point, so keep the warning in check-headers.sh as a reminder. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: kp singh <kpsingh@chromium.org> Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20200515221732.44078-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
4ac22b484d |
perf parse-events: Make add PMU verbose output clearer
On a CPU like skylakex an uncore_iio_0 PMU may alias with uncore_iio_free_running_0. The latter PMU doesn't support fc_mask as a parameter and so pmu_config_term fails. Typically parse_events_add_pmu is called in a loop where if one alias succeeds errors are ignored, however, if multiple errors occur parse_events__handle_error will currently give a WARN_ONCE. This change removes the WARN_ONCE in parse_events__handle_error and makes it a pr_debug. It adds verbose messages to parse_events_add_pmu warning that non-fatal errors may occur, while giving details on the pmu and config terms for useful context. pmu_config_term is altered so the failing term and pmu are present in the case of the 'unknown term' error which makes spotting the free_running case more straightforward. Before: $ perf --debug verbose=3 stat -M llc_misses.pcie_read sleep 1 Using CPUID GenuineIntel-6-55-4 metric expr unc_iio_data_req_of_cpu.mem_read.part0 + unc_iio_data_req_of_cpu.mem_read.part1 + unc_iio_data_req_of_cpu.mem_read.part2 + unc_iio_data_req_of_cpu.mem_read.part3 for LLC_MISSES.PCIE_READ found event unc_iio_data_req_of_cpu.mem_read.part0 found event unc_iio_data_req_of_cpu.mem_read.part1 found event unc_iio_data_req_of_cpu.mem_read.part2 found event unc_iio_data_req_of_cpu.mem_read.part3 metric expr unc_iio_data_req_of_cpu.mem_read.part0 + unc_iio_data_req_of_cpu.mem_read.part1 + unc_iio_data_req_of_cpu.mem_read.part2 + unc_iio_data_req_of_cpu.mem_read.part3 for LLC_MISSES.PCIE_READ found event unc_iio_data_req_of_cpu.mem_read.part0 found event unc_iio_data_req_of_cpu.mem_read.part1 found event unc_iio_data_req_of_cpu.mem_read.part2 found event unc_iio_data_req_of_cpu.mem_read.part3 adding {unc_iio_data_req_of_cpu.mem_read.part0,unc_iio_data_req_of_cpu.mem_read.part1,unc_iio_data_req_of_cpu.mem_read.part2,unc_iio_data_req_of_cpu.mem_read.part3}:W,{unc_iio_data_req_of_cpu.mem_read.part0,unc_iio_data_req_of_cpu.mem_read.part1,unc_iio_data_req_of_cpu.mem_read.part2,unc_iio_data_req_of_cpu.mem_read.part3}:W intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch WARNING: multiple event parsing errors ... Invalid event/parameter 'fc_mask' ... After: $ perf --debug verbose=3 stat -M llc_misses.pcie_read sleep 1 Using CPUID GenuineIntel-6-55-4 metric expr unc_iio_data_req_of_cpu.mem_read.part0 + unc_iio_data_req_of_cpu.mem_read.part1 + unc_iio_data_req_of_cpu.mem_read.part2 + unc_iio_data_req_of_cpu.mem_read.part3 for LLC_MISSES.PCIE_READ found event unc_iio_data_req_of_cpu.mem_read.part0 found event unc_iio_data_req_of_cpu.mem_read.part1 found event unc_iio_data_req_of_cpu.mem_read.part2 found event unc_iio_data_req_of_cpu.mem_read.part3 metric expr unc_iio_data_req_of_cpu.mem_read.part0 + unc_iio_data_req_of_cpu.mem_read.part1 + unc_iio_data_req_of_cpu.mem_read.part2 + unc_iio_data_req_of_cpu.mem_read.part3 for LLC_MISSES.PCIE_READ found event unc_iio_data_req_of_cpu.mem_read.part0 found event unc_iio_data_req_of_cpu.mem_read.part1 found event unc_iio_data_req_of_cpu.mem_read.part2 found event unc_iio_data_req_of_cpu.mem_read.part3 adding {unc_iio_data_req_of_cpu.mem_read.part0,unc_iio_data_req_of_cpu.mem_read.part1,unc_iio_data_req_of_cpu.mem_read.part2,unc_iio_data_req_of_cpu.mem_read.part3}:W,{unc_iio_data_req_of_cpu.mem_read.part0,unc_iio_data_req_of_cpu.mem_read.part1,unc_iio_data_req_of_cpu.mem_read.part2,unc_iio_data_req_of_cpu.mem_read.part3}:W intel_pt default config: tsc,mtc,mtc_period=3,psb_period=3,pt,branch Attempting to add event pmu 'uncore_iio_free_running_5' with 'unc_iio_data_req_of_cpu.mem_read.part0,' that may result in non-fatal errors After aliases, add event pmu 'uncore_iio_free_running_5' with 'fc_mask,ch_mask,umask,event,' that may result in non-fatal errors Attempting to add event pmu 'uncore_iio_free_running_3' with 'unc_iio_data_req_of_cpu.mem_read.part0,' that may result in non-fatal errors After aliases, add event pmu 'uncore_iio_free_running_3' with 'fc_mask,ch_mask,umask,event,' that may result in non-fatal errors Attempting to add event pmu 'uncore_iio_free_running_1' with 'unc_iio_data_req_of_cpu.mem_read.part0,' that may result in non-fatal errors After aliases, add event pmu 'uncore_iio_free_running_1' with 'fc_mask,ch_mask,umask,event,' that may result in non-fatal errors Multiple errors dropping message: unknown term 'fc_mask' for pmu 'uncore_iio_free_running_3' (valid terms: event,umask,config,config1,config2,name,period,percore) ... So before you see a 'WARNING: multiple event parsing errors' and 'Invalid event/parameter'. After you see 'Attempting... that may result in non-fatal errors' then 'Multiple errors...' with details that 'fc_mask' wasn't known to a free running counter. While not completely clean, this makes it clearer that an error hasn't really occurred. v2. addresses review feedback from Jiri Olsa <jolsa@redhat.com>. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200513220635.54700-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
6365757894 |
perf expr: Fix memory leaks in metric bison
Add a destructor for strings to reclaim memory in the event of errors. Free the ID given for a lookup, it was previously strdup-ed in the lex code. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200513000318.15166-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
f0aef4759b |
perf evsel: Initialize evsel->per_pkg_mask to NULL in evsel__init()
Just like with the other fields, this probably isn't fixing anything observable as evsel__new() uses zalloc() for the whole 'struct evsel', but since evsels can be embedded in larger structures and maybe those larger structures don't use zalloc() for some reason, init it to NULL just in case. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
3efc899d9a |
perf evsel: Fix 2 memory leaks
If allocated, perf_pkg_mask and metric_events need freeing. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200512235918.10732-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
7fcdccd423 |
perf parse-events: Fix incorrect conversion of 'if () free()' to 'zfree()'
When applying a patch by Ian I incorrectly converted to zfree() an
expression that involved testing some other struct member, not the one
being freed, which lead to bugs reproduceable by:
$ perf stat -e i/bs,tsc,L2/o sleep 1
WARNING: multiple event parsing errors
Segmentation fault (core dumped)
$
Fix it by restoring the test for pos->free_str before freeing
pos->val.str, but continue using zfree(&pos->val.str) to set that member
to NULL after freeing it.
Reported-by: Ian Rogers <irogers@google.com>
Fixes:
|
||
|
e12a89ef73 |
perf tools: Fix is_bpf_image function logic
Adrian reported that is_bpf_image is not working the way it was intended
- passing on trampolines and dispatcher names. Instead it returned true
for all the bpf names.
The reason even this logic worked properly is that all bpf objects, even
trampolines and dispatcher, were assigned DSO_BINARY_TYPE__BPF_IMAGE
binary_type.
The later for bpf_prog objects, the binary_type was fixed in bpf load
event processing, which is executed after the ksymbol code.
Fixing the is_bpf_image logic, so it properly recognizes trampoline and
dispatcher objects.
Fixes:
|
||
|
b027cc6fdf |
perf c2c: Fix 'perf c2c record -e list' to show the default events used
When the event is passed as list, the default events should be listed as per 'perf mem record -e list'. Previous behavior is: $ perf c2c record -e list failed: event 'list' not found, use '-e list' to get list of available events Usage: perf c2c record [<options>] [<command>] or: perf c2c record [<options>] -- <command> [<options>] -e, --event <event> event selector. Use 'perf mem record -e list' to list available events $ New behavior: $ perf c2c record -e list ldlat-loads : available ldlat-stores : available v3: is a rebase. v2: addresses review comments by Jiri Olsa. https://lore.kernel.org/lkml/20191127081844.GH32367@krava/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200507220604.3391-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |