Commit Graph

7531 Commits

Author SHA1 Message Date
Adrian Hunter
057ae59f5a perf intel-pt: Fix some PGE (packet generation enable/control flow packets) usage
Packet generation enable (PGE) refers to whether control flow (COFI)
packets are being produced.

PGE may be false even when branch-tracing is enabled, due to being
out-of-context, or outside a filter address range.  Fix some missing PGE
usage.

Fixes: 7c1b16ba0e ("perf intel-pt: Add support for decoding FUP/TIP only")
Fixes: 839598176b ("perf intel-pt: Allow decoding with branch tracing disabled")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org # v5.15+
Link: https://lore.kernel.org/r/20211210162303.2288710-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11 08:19:47 -03:00
German Gomez
c897899752 perf tools: Prevent out-of-bounds access to registers
The size of the cache of register values is arch-dependant
(PERF_REGS_MAX). This has the potential of causing an out-of-bounds
access in the function "perf_reg_value" if the local architecture
contains less registers than the one the perf.data file was recorded on.

Since the maximum number of registers is bound by the bitmask "u64
cache_mask", and the size of the cache when running under x86 systems is
64 already, fix the size to 64 and add a range-check to the function
"perf_reg_value" to prevent out-of-bounds access.

Reported-by: Alexandre Truong <alexandre.truong@arm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20211201123334.679131-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11 08:19:46 -03:00
Jakub Kicinski
be3158290d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2021-12-10 v2

We've added 115 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5747 insertions(+), 2564 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits)
  libbpf: Add "bool skipped" to struct bpf_map
  libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition
  bpftool: Switch bpf_object__load_xattr() to bpf_object__load()
  selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
  selftests/bpf: Add test for libbpf's custom log_buf behavior
  selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
  libbpf: Deprecate bpf_object__load_xattr()
  libbpf: Add per-program log buffer setter and getter
  libbpf: Preserve kernel error code and remove kprobe prog type guessing
  libbpf: Improve logging around BPF program loading
  libbpf: Allow passing user log setting through bpf_object_open_opts
  libbpf: Allow passing preallocated log_buf when loading BTF into kernel
  libbpf: Add OPTS-based bpf_btf_load() API
  libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
  samples/bpf: Remove unneeded variable
  bpf: Remove redundant assignment to pointer t
  selftests/bpf: Fix a compilation warning
  perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
  samples: bpf: Fix 'unknown warning group' build warning on Clang
  samples: bpf: Fix xdp_sample_user.o linking with Clang
  ...
====================

Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-10 15:56:13 -08:00
Song Liu
8d0f9e73ef perf/bpf_counter: Use bpf_map_create instead of bpf_create_map
bpf_create_map is deprecated. Replace it with bpf_map_create. Also add a
__weak bpf_map_create() so that when older version of libbpf is linked as
a shared library, it falls back to bpf_create_map().

Fixes: 992c422541 ("libbpf: Unify low-level map creation APIs w/ new bpf_map_create()")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211207232340.2561471-1-song@kernel.org
2021-12-08 11:55:45 -08:00
Jin Yao
e69dc84282 perf stat: Support --cputype option for hybrid events
In previous patch, we have supported the syntax which enables
the event on a specified pmu, such as:

cpu_core/<event>/
cpu_atom/<event>/

While this syntax is not very easy for applying on a set of
events or applying on a group. In following example, we have to
explicitly assign the pmu prefix.

  # ./perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -- sleep 1

   Performance counter stats for 'sleep 1':

           1,158,545      cpu_core/cycles/
           1,003,113      cpu_core/instructions/

         1.002428712 seconds time elapsed

A much easier way is:

  # ./perf stat --cputype core -e '{cycles,instructions}' -- sleep 1

   Performance counter stats for 'sleep 1':

           1,101,071      cpu_core/cycles/
             939,892      cpu_core/instructions/

         1.002363142 seconds time elapsed

For this example, the '--cputype' enables the events from specified
pmu (cpu_core).

If '--cputype' conflicts with pmu prefix, '--cputype' is ignored.

  # ./perf stat --cputype core -e cycles,cpu_atom/instructions/ -a -- sleep 1

   Performance counter stats for 'system wide':

          21,003,407      cpu_core/cycles/
             367,886      cpu_atom/instructions/

         1.002203520 seconds time elapsed

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210909062215.10278-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:25 -03:00
Ian Rogers
94dbfd6781 perf parse-events: Architecture specific leader override
Currently topdown events must appear after a slots event:

  $ perf stat -e '{slots,topdown-fe-bound}' /bin/true

   Performance counter stats for '/bin/true':

         3,183,090      slots
           986,133      topdown-fe-bound

Reversing the events yields:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound).

For metrics the order of events is determined by iterating over a
hashmap, and so slots isn't guaranteed to be first which can yield this
error.

Change the set_leader in parse-events, called when a group is closed, so
that rather than always making the first event the leader, if the slots
event exists then it is made the leader. It is then moved to the head of
the evlist otherwise it won't be opened in the correct order.

The result is:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true

   Performance counter stats for '/bin/true':

         3,274,795      slots
         1,001,702      topdown-fe-bound

A problem with this approach is the slots event is identified by name,
names can be overwritten like 'cpu/slots,name=foo/' and this causes the
leader change to fail.

The change also modifies and fixes mixed groups like, with the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

        5574985410      slots
         971981616      instructions
        1348461887      topdown-fe-bound

       2.001263120 seconds time elapsed

Without the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

     <not counted>      instructions
     <not counted>      slots
   <not supported>      topdown-fe-bound

       2.006247990 seconds time elapsed

Something that may be undesirable here is that the events are reordered
in the output.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:24 -03:00
Ian Rogers
ecdcf630d7 perf evlist: Allow setting arbitrary leader
The leader of a group is the first, but allow it to be an arbitrary list
member so that for Intel topdown events slots may always be the group
leader.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:24 -03:00
Ian Rogers
6b6b16b3bb perf metric: Reduce multiplexing with duration_time
It is common to use the same counters with and without duration_time.
The ID sharing code treats duration_time as if it were a hardware event
placed in the same group. This causes unnecessary multiplexing such as
in the following example where l3_cache_access isn't shared:

  $ perf stat -M l3 -a sleep 1

   Performance counter stats for 'system wide':

         3,117,007      l3_cache_miss         #    199.5 MB/s  l3_rd_bw
                                              #     43.6 %  l3_hits
                                              #     56.4 %  l3_miss                 (50.00%)
         5,526,447      l3_cache_access                                             (50.00%)
         5,392,435      l3_cache_access       # 5389191.2 access/s  l3_access_rate  (50.00%)
     1,000,601,901 ns   duration_time

       1.000601901 seconds time elapsed

Fix this by placing duration_time in all groups unless metric
sharing has been disabled on the command line:

  $ perf stat -M l3 -a sleep 1

   Performance counter stats for 'system wide':

         3,597,972      l3_cache_miss         #    230.3 MB/s  l3_rd_bw
                                              #     48.0 %  l3_hits
                                              #     52.0 %  l3_miss
         6,914,459      l3_cache_access       # 6909935.9 access/s  l3_access_rate
     1,000,654,579 ns   duration_time

       1.000654579 seconds time elapsed

  $ perf stat --metric-no-merge -M l3 -a sleep 1

   Performance counter stats for 'system wide':

         3,501,834      l3_cache_miss         #     53.5 %  l3_miss                (24.99%)
         6,548,173      l3_cache_access                                            (24.99%)
         3,417,622      l3_cache_miss         #     45.7 %  l3_hits                (25.04%)
         6,294,062      l3_cache_access                                            (25.04%)
         5,923,238      l3_cache_access       # 5919688.1 access/s  l3_access_rate (24.99%)
     1,000,599,683 ns   duration_time
         3,607,486      l3_cache_miss         #    230.9 MB/s  l3_rd_bw            (49.97%)

       1.000599683 seconds time elapsed

v2. Doesn't count duration_time in the metric_list_cmp function that
    sorts larger metrics first. Without this a metric with duration_time
    and an event is sorted the same as a metric with two events,
    possibly not allowing the first metric to share with the second.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:24 -03:00
Shunsuke Nakamura
9a5b2d1afa libperf: Adopt perf_counts_values__scale() from tools/perf/util
Move perf_counts_values__scale() from tools/perf/util to tools/lib/perf
so that it can be used with libperf.

Committer notes:

As noted by Jiri, use __s8 instead of s8 on the exported function.

Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211109085831.3770594-2-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:23 -03:00
Song Liu
5a897531e0 perf bpf_skel: Do not use typedef to avoid error on old clang
When building bpf_skel with clang-10, typedef causes confusions like:

  libbpf: map 'prev_readings': unexpected def kind var.

Fix this by removing the typedef.

Fixes: 7fac83aaf2 ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/BEF5C312-4331-4A60-AEC0-AD7617CB2BC4@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:53 -03:00
Song Liu
f7c4e85bcc perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
Arnaldo reported that building all his containers with BUILD_BPF_SKEL=1
to then make this the default he found problems in some distros where
the system linux/bpf.h file was being used and lacked this:

   util/bpf_skel/bperf_leader.bpf.c:13:20: error: use of undeclared identifier 'BPF_F_PRESERVE_ELEMS'
           __uint(map_flags, BPF_F_PRESERVE_ELEMS);

So use instead the vmlinux.h file generated by bpftool from BTF info.

This fixed these as well, getting the build back working on debian:11,
debian:experimental and ubuntu:21.10:

  In file included from In file included from util/bpf_skel/bperf_leader.bpf.cutil/bpf_skel/bpf_prog_profiler.bpf.c::33:
  :
  In file included from In file included from /usr/include/linux/bpf.h/usr/include/linux/bpf.h::1111:
  :
  /usr/include/linux/types.h/usr/include/linux/types.h::55::1010:: In file included from  util/bpf_skel/bperf_follower.bpf.c:3fatal errorfatal error:
  : : In file included from /usr/include/linux/bpf.h:'asm/types.h' file not found11'asm/types.h' file not found:

  /usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found
  #include <asm/types.h>#include <asm/types.h>

           ^~~~~~~~~~~~~         ^~~~~~~~~~~~~

  #include <asm/types.h>
           ^~~~~~~~~~~~~
  1 error generated.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/CF175681-8101-43D1-ABDB-449E644BE986@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:53 -03:00
Ian Rogers
4747395082 perf header: Fix memory leaks when processing feature headers
These leaks were found with leak sanitizer running "perf pipe recording
and injection test".

In pipe mode feat_fd may hold onto an events struct that needs freeing.

When string features are processed they may overwrite an already created
string, so free this before the overwrite.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118201730.2302927-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:53 -03:00
Ian Rogers
4ffbe87e2d perf tools: Fix SMT detection fast read path
sysfs__read_int() returns 0 on success, and so the fast read path was
always failing.

Fixes: bb629484d9 ("perf tools: Simplify checking if SMT is active.")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211124001231.3277836-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06 21:57:53 -03:00
Masami Hiramatsu
7c689c8397 tools/perf: Add '__rel_loc' event field parsing support
Add new '__rel_loc' dynamic data location attribute support.
This type attribute is similar to the '__data_loc' but records the
offset from the field itself.
The libtraceevent adds TEP_FIELD_IS_RELATIVE to the
'tep_format_field::flags' with TEP_FIELD_IS_DYNAMIC for'__rel_loc'.

Link: https://lkml.kernel.org/r/163757344810.510314.12449413842136229871.stgit@devnote2

Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-12-06 15:37:22 -05:00
Andrii Nakryiko
0bf40542c0 perf: Mute libbpf API deprecations temporarily
Libbpf development version was bumped to 0.7 in c93faaaf2f
("libbpf: Deprecate bpf_prog_load_xattr() API"), activating a bunch of
previously scheduled deprecations. Most APIs are pretty straightforward
to replace with newer APIs, but perf has a complicated mixed setup with
libbpf used both as static and shared configurations, which makes it
non-trivial to migrate the APIs.

Further, bpf_program__set_prep() needs more involved refactoring, which
will require help from Arnaldo and/or Jiri.

So for now, mute deprecation warnings and work on migrating perf off of
deprecated APIs separately with the input from owners of the perf tool.

Fixes: c93faaaf2f ("libbpf: Deprecate bpf_prog_load_xattr() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211203004640.2455717-1-andrii@kernel.org
2021-12-03 11:54:51 -08:00
Ian Rogers
b194c9cd09 perf evsel: Fix memory leaks relating to unit
unit may have a strdup pointer or be to a literal, consequently memory
assocciated with it isn't freed. Change it so the unit is always strdup
and so the memory can be safely freed.

Fix related issue in perf_event__process_event_update() for name and
own_cpus. Leaks were spotted by leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118084749.2191447-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:19:14 -03:00
Ian Rogers
d9fc706108 perf report: Fix memory leaks around perf_tip()
perf_tip() may allocate memory or use a literal, this means memory
wasn't freed if allocated. Change the API so that literals aren't used.

At the same time add missing frees for system_path. These issues were
spotted using leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118073804.2149974-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:18:03 -03:00
Ian Rogers
0ca1f534a7 perf hist: Fix memory leak of a perf_hpp_fmt
perf_hpp__column_unregister() removes an entry from a list but doesn't
free the memory causing a memory leak spotted by leak sanitizer.

Add the free while at the same time reducing the scope of the function
to static.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:16:56 -03:00
German Gomez
9e1a8d9f68 perf inject: Fix ARM SPE handling
'perf inject' is currently not working for Arm SPE. When you try to run
'perf inject' and 'perf report' with a perf.data file that contains SPE
traces, the tool reports a "Bad address" error:

  # ./perf record -e arm_spe_0/ts_enable=1,store_filter=1,branch_filter=1,load_filter=1/ -a -- sleep 1
  # ./perf inject -i perf.data -o perf.inject.data --itrace
  # ./perf report -i perf.inject.data --stdio

  0x42c00 [0x8]: failed to process type: 9 [Bad address]
  Error:
  failed to process sample

As far as I know, the issue was first spotted in [1], but 'perf inject'
was not yet injecting the samples. This patch does something similar to
what cs_etm does for injecting the samples [2], but for SPE.

[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210412091006.468557-1-leo.yan@linaro.org/#24117339
[2] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/cs-etm.c?h=perf/core&id=133fe2e617e48ca0948983329f43877064ffda3e#n1196

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211105104130.28186-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
Namhyung Kim
db4b284029 perf sort: Fix the 'p_stage_cyc' sort key behavior
andle 'p_stage_cyc' (for pipeline stage cycles) sort key with the same
rationale as for the 'weight' and 'local_weight', see the fix in this
series for a full explanation.

Not sure it also needs the local and global variants.

But I couldn't test it actually because I don't have the machine.

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
Namhyung Kim
4d03c75363 perf sort: Fix the 'ins_lat' sort key behavior
Handle 'ins_lat' (for instruction latency) and 'local_ins_lat' sort keys
with the same rationale as for the 'weight' and 'local_weight', see the
previous fix in this series for a full explanation.

But I couldn't test it actually, so only build tested.

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
Namhyung Kim
784e8adda4 perf sort: Fix the 'weight' sort key behavior
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses.  And perf tool has 'weight'
and 'local_weight' sort keys to display the info.

But it's somewhat confusing what it shows exactly.  In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.

For example:

  $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M

  $ perf report --stdio -n -s +local_weight
  ...
  #
  # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
  # ........  .......  .......  ................  .........................  ............
  #
      21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
      12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
      11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
      10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
       7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
       6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
       6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
  ...

So let's look at the 'lockref_get_not_zero' symbols.  The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016.  But it's not the case:

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  ......
  #
       1.36%        4  dd       [kernel.vmlinux]  36            144
       0.47%        4  dd       [kernel.vmlinux]  37            148
       0.42%        4  dd       [kernel.vmlinux]  32            128
       0.40%        4  dd       [kernel.vmlinux]  34            136
       0.35%        4  dd       [kernel.vmlinux]  36            144
       0.34%        4  dd       [kernel.vmlinux]  35            140
       0.30%        4  dd       [kernel.vmlinux]  36            144
       0.30%        4  dd       [kernel.vmlinux]  34            136
       0.30%        4  dd       [kernel.vmlinux]  32            128
       0.30%        4  dd       [kernel.vmlinux]  32            128
  ...

With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
what we want.

I found this because of the way it aggregates the 'weight' value.  Since
it's not a period, we should not add them in the he->stat.  Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.

After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.

Let's keep the weight and display it differently.  For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.

With this change, I can see the expected numbers.

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  .....
  #
      21.23%      313  dd       [kernel.vmlinux]  32            10016
      12.43%      183  dd       [kernel.vmlinux]  35            6405
      11.97%      159  dd       [kernel.vmlinux]  36            5724
       7.63%      113  dd       [kernel.vmlinux]  33            3729
       6.37%       92  dd       [kernel.vmlinux]  34            3128
       4.17%       59  dd       [kernel.vmlinux]  37            2183
       0.08%        1  dd       [kernel.vmlinux]  269           269
       0.08%        1  dd       [kernel.vmlinux]  38            38

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
Jiri Olsa
2a4898fc26 perf tools: Add more weak libbpf functions
We hit the window where perf uses libbpf functions, that did not make it
to the official libbpf release yet and it's breaking perf build with
dynamicly linked libbpf.

Fixing this by providing the new interface as weak functions which calls
the original libbpf functions. Fortunatelly the changes were just
renames.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20211109140707.1689940-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
4924b1f7c4 perf bpf: Avoid memory leak from perf_env__insert_btf()
perf_env__insert_btf() doesn't insert if a duplicate BTF id is
encountered and this causes a memory leak. Modify the function to return
a success/error value and then free the memory if insertion didn't
happen.

v2. Adds a return -1 when the insertion error occurs in
    perf_env__fetch_btf. This doesn't affect anything as the result is
    never checked.

Fixes: 3792cb2ff4 ("perf bpf: Save BTF in a rbtree in perf_env")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211112074525.121633-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
4f74f18789 perf symbols: Factor out annotation init/exit
The exit function fixes a memory leak with the src field as detected by
leak sanitizer. An example of which is:

Indirect leak of 25133184 byte(s) in 207 object(s) allocated from:
    #0 0x7f199ecfe987 in __interceptor_calloc libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55defe638224 in annotated_source__alloc_histograms util/annotate.c:803
    #2 0x55defe6397e4 in symbol__hists util/annotate.c:952
    #3 0x55defe639908 in symbol__inc_addr_samples util/annotate.c:968
    #4 0x55defe63aa29 in hist_entry__inc_addr_samples util/annotate.c:1119
    #5 0x55defe499a79 in hist_iter__report_callback tools/perf/builtin-report.c:182
    #6 0x55defe7a859d in hist_entry_iter__add util/hist.c:1236
    #7 0x55defe49aa63 in process_sample_event tools/perf/builtin-report.c:315
    #8 0x55defe731bc8 in evlist__deliver_sample util/session.c:1473
    #9 0x55defe731e38 in machines__deliver_event util/session.c:1510
    #10 0x55defe732a23 in perf_session__deliver_event util/session.c:1590
    #11 0x55defe72951e in ordered_events__deliver_event util/session.c:183
    #12 0x55defe740082 in do_flush util/ordered-events.c:244
    #13 0x55defe7407cb in __ordered_events__flush util/ordered-events.c:323
    #14 0x55defe740a61 in ordered_events__flush util/ordered-events.c:341
    #15 0x55defe73837f in __perf_session__process_events util/session.c:2390
    #16 0x55defe7385ff in perf_session__process_events util/session.c:2420
    ...

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
4270456704 perf symbols: Bit pack to save a byte
Use a bit field alongside the earlier bit fields.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
bd9acd9cc6 perf symbols: Add documentation to 'struct symbol'
Refactor some existing comments and then infer the rest.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
German Gomez
27d113cfe8 perf arm-spe: Support hardware-based PID tracing
If ARM SPE traces contains CONTEXT packets with TID info, use these
values for tracking the TID of samples. Otherwise fall back to using
context switch events and display a message warning to the user of
possible timing inaccuracies [1].

[1] https://lore.kernel.org/lkml/f877cfa6-9b25-6445-3806-ca44a4042eaf@arm.com/

Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211111133625.193568-5-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
German Gomez
169de64f5d perf arm-spe: Save context ID in record
This patch is to save context ID in record, this will be used to set TID
for samples.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211111133625.193568-4-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Namhyung Kim
9dc9855f18 perf arm-spe: Track task context switch for cpu-mode events
When perf report synthesize events from ARM SPE data, it refers to
current cpu, pid and tid in the machine.  But there's no place to set
them in the ARM SPE decoder.  I'm seeing all pid/tid is set to -1 and
user symbols are not resolved in the output.

  # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1

  # perf report -q | head
     8.77%     8.77%  :-1      [kernel.kallsyms]  [k] format_decode
     7.02%     7.02%  :-1      [kernel.kallsyms]  [k] seq_printf
     7.02%     7.02%  :-1      [unknown]          [.] 0x0000ffff9f687c34
     5.26%     5.26%  :-1      [kernel.kallsyms]  [k] vsnprintf
     3.51%     3.51%  :-1      [kernel.kallsyms]  [k] string
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f66ae20
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f670b3c
     3.51%     3.51%  :-1      [unknown]          [.] 0x0000ffff9f67c040
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] ___cache_free
     1.75%     1.75%  :-1      [kernel.kallsyms]  [k] __count_memcg_events

Like Intel PT, add context switch records to track task info.  As ARM
SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think
we can safely set the attr.context_switch bit and use it.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211111133625.193568-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Andrew Kilroy
09e9afac8c perf arm-spe: Print size using consistent format
Since the size is already printed earlier in hex, print the same data
using the same format, in hex.

Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211109142153.56546-3-german.gomez@arm.com
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Andrew Kilroy
d54e50b7c9 perf cs-etm: Print size using consistent format
Since the size is already printed earlier in hex, print the same data
using the same format, in hex.

Reviewed-by: James Clark <james.clark@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211109142153.56546-2-german.gomez@arm.com
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
9aba0adae8 perf expr: Add source_count for aggregating events
Events like uncore_imc/cas_count_read/ on Skylake open multiple events
and then aggregate in the metric leader. To determine the average value
per event the number of these events is needed. Add a source_count
function that returns this value by counting the number of events with
the given metric leader. For most events the value is 1 but for
uncore_imc/cas_count_read/ it can yield values like 6.

Add a generic test, but manually tested with a test metric that uses
the function.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
1e7ab82975 perf expr: Move ID handling to its own function
This will facilitate sharing in a follow-on change.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
fdf1e29b61 perf expr: Add metric literals for topology.
Allow the number of cpus, cores, dies and packages to be queried by a
metric expression.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
3613f6c118 perf expr: Add literal values starting with #
It is useful to have literal values for constants relating to
topologies, SMT, etc. Make the parsing of literals shared code and add a
lookup function. Move #smt_on to this function.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
0b6b84cca6 perf cputopo: Match thread_siblings to topology ABI name
The topology name for thread_siblings is core_cpus_list, use this for
consistency and add documentation.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
406018dcc1 perf cputopo: Match die_siblings to topology ABI name
The topology name for die_siblings is die_cpus_list, use this for
consistency and add documentation.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
48f07b0b2a perf cputopo: Update to use pakage_cpus
core_siblings_list is the deprecated topology name for
package_cpus_list, update the code to try the non-deprecated path first.
Adjust variable names to match topology name.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul A . Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20211111002109.194172-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:50 -03:00
Ian Rogers
44a8528c24 perf test: Convert clang tests to test cases.
Use null terminated array of test cases rather than the previous sub
test functions.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211104064208.3156807-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:49 -03:00
Ian Rogers
aba8c5e380 perf metric: Fix memory leaks
Certain error paths may leak memory as caught by address sanitizer.
Ensure this is cleaned up to make sure address/leak sanitizer is happy.

Fixes: 5ecd5a0c7d ("perf metrics: Modify setup and deduplication")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:28 -03:00
Ian Rogers
07eafd4e05 perf parse-event: Add init and exit to parse_event_error
parse_events() may succeed but leave string memory allocations reachable
in the error.

Add an init/exit that must be called to initialize and clean up the
error. This fixes a leak in metricgroup parse_ids.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:39:25 -03:00
Ian Rogers
6c1912898e perf parse-events: Rename parse_events_error functions
Group error functions and name after the data type they manipulate.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107090002.3784612-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 15:38:54 -03:00
Ian Rogers
e4e290791d perf stat: Fix memory leak on error path
strdup() is used to deduplicate, ensure it isn't leaking an already
created string by freeing first.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211107085444.3781604-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:44:53 -03:00
Ilya Leoshkevich
4e88118c20 perf tools: Use __BYTE_ORDER__
Switch from the libc-defined __BYTE_ORDER to the compiler-defined
__BYTE_ORDER__ in order to make endianness detection more robust, like
it was done for libbpf.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211104132311.984703-1-iii@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:27:38 -03:00
James Clark
a3df50abeb perf tools: Refactor out kernel symbol argument sanity checking
User supplied values for vmlinux and kallsyms are checked before
continuing. Refactor this into a function so that it can be used
elsewhere.

Reviewed-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20211018134844.2627174-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:27:38 -03:00
Lexi Shao
1a86f4ba5c perf symbols: Ignore $a/$d symbols for ARM modules
On anARM machine, kernel symbols from modules can be resolved to $a
instead of printing the actual symbol name. Ignore symbols starting with
"$" when building kallsyms rbtree.

A sample stacktrace is shown as follows:

  c0f2e39c schedule_hrtimeout+0x14 ([kernel.kallsyms])
  bf4a66d8 $a+0x78 ([test_module])
  c0a4f5f4 kthread+0x15c ([kernel.kallsyms])
  c0a001f8 ret_from_fork+0x14 ([kernel.kallsyms])

On an ARM machine, $a/$d symbols are used by the compiler to mark the
beginning of code/data part in code section. These symbols are filtered
out when linking vmlinux(see scripts/kallsyms.c ignored_prefixes), but
are left on modules. So there are $a symbols in /proc/kallsyms which
share the same addresses with the actual module symbols and confuses
perf when resolving symbols.

After this patch, the module symbol name is printed:

  c0f2e39c schedule_hrtimeout+0x14 ([kernel.kallsyms])
  bf4a66d8 test_func+0x78 ([test_module])
  c0a4f5f4 kthread+0x15c ([kernel.kallsyms])
  c0a001f8 ret_from_fork+0x14 ([kernel.kallsyms])

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Lexi Shao <shaolexi@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: QiuXi <qiuxi1@huawei.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Wangbing <wangbing6@huawei.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: clang-built-linux@googlegroups.com
Link: https://lore.kernel.org/r/20211029065038.39449-2-shaolexi@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:27:38 -03:00
Ravi Bangoria
eb39bf3256 perf evsel: Don't set exclude_guest by default
Perf tool sets exclude_guest by default while calling perf_event_open().
Because IBS does not have filtering capability, it always gets rejected
by IBS PMU driver and thus perf falls back to non-precise sampling. Fix
it by not setting exclude_guest by default on AMD.

Before:
  $ sudo ./perf record -C 0 -vvv true |& grep precise
    precise_ip                       3
  decreasing precise_ip by one (2)
    precise_ip                       2
  decreasing precise_ip by one (1)
    precise_ip                       1
  decreasing precise_ip by one (0)

After:
  $ sudo ./perf record -C 0 -vvv true |& grep precise
    precise_ip                       3
  decreasing precise_ip by one (2)
    precise_ip                       2

Committer notes:

Fixup init to zero for perf_env in older compilers:

  arch/x86/util/evsel.c:15:26: error: missing field 'os_release' initializer [-Werror,-Wmissing-field-initializers]
          struct perf_env env = {0};
                                  ^

Committer notes:

Namhyung remarked:

  It'd be nice if it can cover explicit "-e cycles:pp" as well.

Ravi clarified:

  For explicit :pp modifier, evsel->precise_max does not get set and thus perf
  does not try with different attr->precise_ip values while exclude_guest set.
  So no issue with explicit :pp:

    $ sudo ./perf record -C 0 -e cycles:pp -vvv |& grep "precise_ip\|exclude_guest"
      precise_ip                       2
      exclude_guest                    1
      precise_ip                       2
      exclude_guest                    1
    switching off exclude_guest, exclude_host
      precise_ip                       2
    ^C

  Also, with :P modifier, evsel->precise_max gets set but exclude_guest does
  not and thus :P also works fine:

    $ sudo ./perf record -C 0 -e cycles:P -vvv |& grep "precise_ip\|exclude_guest"
      precise_ip                       3
    decreasing precise_ip by one (2)
      precise_ip                       2
    ^C

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211103072112.32312-1-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-07 12:26:24 -03:00
Namhyung Kim
3500eeebed perf evsel: Fix missing exclude_{host,guest} setting
The current logic for the perf missing feature has a bug that it can
wrongly clear some modifiers like G or H.  Actually some PMUs don't
support any filtering or exclusion while others do.  But we check it as
a global feature.

For example, the cycles event can have 'G' modifier to enable it only in
the guest mode on x86.  When you don't run any VMs it'll return 0.

  # perf stat -a -e cycles:G sleep 1

    Performance counter stats for 'system wide':

                    0      cycles:G

          1.000721670 seconds time elapsed

But when it's used with other pmu events that don't support G modifier,
it'll be reset and return non-zero values.

  # perf stat -a -e cycles:G,msr/tsc/ sleep 1

    Performance counter stats for 'system wide':

          538,029,960      cycles:G
       16,924,010,738      msr/tsc/

          1.001815327 seconds time elapsed

This is because of the missing feature detection logic being global.
Add a hashmap to set pmu-specific exclude_host/guest features.

Committer notes:

Fix 'perf test python' by adding a stub for evsel__find_pmu() in
tools/perf/util/python.c, document that it is used so far only for the
above reasons so that if anybody needs this in the python binding
usecases, we can revisit this.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: http://lore.kernel.org/lkml/20211105205847.120950-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-06 17:54:42 -03:00
Ian Rogers
88c42f4d6c perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
If btf__new() is called then there needs to be a corresponding btf__free().

Fixes: f8dfeae009 ("perf bpf: Show more BPF program info in print_bpf_prog_info()")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211106053733.3580931-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-06 17:54:42 -03:00