Commit Graph

1215316 Commits

Author SHA1 Message Date
James Clark
8a55c1e2c9 perf util: Add a function for replacing characters in a string
It finds all occurrences of a single character and replaces them with
a multi character string. This will be used in a test in a following
commit.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
James Clark
f561fc78c5 perf jevents: Remove unused keyword
'cpuid_not_more_than' was the working title of the new
'strcmp_cpuid_str' keyword and was accidentally left in. It was never
used so tidying it up has no effect.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
James Clark
d19a353cdd perf test: Check result of has_event(cycles) test
Currently the function always returns 0, so even when the has_event()
test fails, the test still passes. Fix it by returning ret instead.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230904095104.1162928-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Ian Rogers
6bd8c2ea6b perf list pfm: Retry supported test with exclude_kernel
With paranoia set at 2 evsel__open will fail with EACCES for non-root
users. To avoid this stopping libpfm4 events from being printed, retry
with exclude_kernel enabled - copying the regular is_event_supported
test.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230906234416.3472339-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Ian Rogers
4f19fc1839 perf list: Avoid a hardcoded cpu PMU name
Use the first core PMU instead.

On a Raspberry Pi, before:

  $ perf list
  ...
    cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
         [(see 'man perf-list' on how to encode it)]
  ...

After:

  $ perf list
  ...
    armv8_cortex_a72/t1=v1[,t2=v2,t3 ...]/modifier     [Raw hardware event descriptor]
         [(see 'man perf-list' on how to encode it)]
  ...
  ```

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230906234416.3472339-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Namhyung Kim
e44b47b931 perf test shell lock_contention: Add cgroup aggregation and filter tests
Add cgroup aggregation and filter tests.

  $ sudo ./perf test -v contention
   84: kernel lock contention analysis test                            :
  --- start ---
  test child forked, pid 222423
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  Testing perf lock contention --callstack-filter with task aggregation
  Testing perf lock contention --cgroup-filter
  Testing perf lock contention CSV output
  test child finished with 0
  ---- end ----
  kernel lock contention analysis test: Ok

Committer testing:

  [root@quaco ~]# uname -a
  Linux quaco 6.4.10-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 11 12:20:29 UTC 2023 x86_64 GNU/Linux
  [root@quaco ~]# perf test -v contention
   84: kernel lock contention analysis test                            :
  --- start ---
  test child forked, pid 452625
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  Testing perf lock contention --callstack-filter with task aggregation
  Testing perf lock contention --cgroup-filter
  Testing perf lock contention CSV output
  test child finished with 0
  ---- end ----
  kernel lock contention analysis test: Ok
  [root@quaco ~]#

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Namhyung Kim
4fd06bd2dc perf lock contention: Add -G/--cgroup-filter option
The -G/--cgroup-filter is to limit lock contention collection on the
tasks in the specific cgroups only.

  $ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \
    ./perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.174 [sec]
   contended   total wait     max wait     avg wait          pid   comm

           4    114.45 us     60.06 us     28.61 us       214847   sched-messaging
           2    111.40 us     60.84 us     55.70 us       214848   sched-messaging
           2    106.09 us     59.42 us     53.04 us       214837   sched-messaging
           1     81.70 us     81.70 us     81.70 us       214709   sched-messaging
          68     78.44 us      6.83 us      1.15 us       214633   sched-messaging
          69     73.71 us      2.69 us      1.07 us       214632   sched-messaging
           4     72.62 us     60.83 us     18.15 us       214850   sched-messaging
           2     71.75 us     67.60 us     35.88 us       214840   sched-messaging
           2     69.29 us     67.53 us     34.65 us       214804   sched-messaging
           2     69.00 us     68.23 us     34.50 us       214826   sched-messaging
  ...

Export cgroup__new() function as it's needed from outside.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Namhyung Kim
4d1792d0a2 perf lock contention: Add --lock-cgroup option
The --lock-cgroup option shows lock contention stats break down by
cgroups.

Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field.

  $ sudo ./perf lock con -ab --lock-cgroup sleep 1
   contended   total wait     max wait     avg wait   cgroup

           8     15.70 us      6.34 us      1.96 us   /
           2      1.48 us       747 ns       738 ns   /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope
           1       848 ns       848 ns       848 ns   /user.slice/.../session.slice/org.gnome.Shell@x11.service
           1       220 ns       220 ns       220 ns   /user.slice/.../session.slice/pipewire-pulse.service

For now, the cgroup mode only works with BPF (-b).

Committer notes:

Remove -g as it is used in the other tools with a clear meaning of
collect/show callchains. As agreed with Namhyung off list.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Namhyung Kim
d0c502e46e perf lock contention: Prepare to handle cgroups
Save cgroup info and display cgroup names if requested.  This is a
preparation for the next patch.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Namhyung Kim
2bc12abce8 perf tools: Add read_all_cgroups() and __cgroup_find()
The read_all_cgroups() is to build a tree of cgroups in the system and
users can look up a cgroup using __cgroup_find().

Committer notes:

Had to do this to cover that #else block:

  -static inline u64 __read_cgroup_id(const char *path) { return -1ULL; }
  +static inline u64 __read_cgroup_id(const char *path __maybe_unused) { return -1ULL; }

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:32:00 -03:00
Yang Jihong
36019dff30 perf kwork top: Add BPF-based statistics on softirq event support
Use BPF to collect statistics on softirq events based on perf BPF skeletons.

Example usage:

  # perf kwork top -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
  Total  : 135445.704 ms, 8 cpus
  %Cpu(s):  28.35% id,   0.00% hi,   0.25% si
  %Cpu0   [||||||||||||||||||||            69.85%]
  %Cpu1   [||||||||||||||||||||||          74.10%]
  %Cpu2   [|||||||||||||||||||||           71.18%]
  %Cpu3   [||||||||||||||||||||            69.61%]
  %Cpu4   [||||||||||||||||||||||          74.05%]
  %Cpu5   [||||||||||||||||||||            69.33%]
  %Cpu6   [||||||||||||||||||||            69.71%]
  %Cpu7   [||||||||||||||||||||||          73.77%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   30.43       5271.005 ms  [swapper/5]
          0        0   30.17       5226.644 ms  [swapper/3]
          0        0   30.08       5210.257 ms  [swapper/6]
          0        0   29.89       5177.177 ms  [swapper/0]
          0        0   28.51       4938.672 ms  [swapper/2]
          0        0   25.93       4223.464 ms  [swapper/7]
          0        0   25.69       4181.411 ms  [swapper/4]
          0        0   25.63       4173.804 ms  [swapper/1]
      16665    16265    2.16        360.600 ms  sched-messaging
      16537    16265    2.05        356.275 ms  sched-messaging
      16503    16265    2.01        343.063 ms  sched-messaging
      16424    16265    1.97        336.876 ms  sched-messaging
      16580    16265    1.94        323.658 ms  sched-messaging
      16515    16265    1.92        321.616 ms  sched-messaging
      16659    16265    1.91        325.538 ms  sched-messaging
      16634    16265    1.88        327.766 ms  sched-messaging
      16454    16265    1.87        326.843 ms  sched-messaging
      16382    16265    1.87        322.591 ms  sched-messaging
      16642    16265    1.86        320.506 ms  sched-messaging
      16582    16265    1.86        320.164 ms  sched-messaging
      16315    16265    1.86        326.872 ms  sched-messaging
      16637    16265    1.85        323.766 ms  sched-messaging
      16506    16265    1.82        311.688 ms  sched-messaging
      16512    16265    1.81        304.643 ms  sched-messaging
      16560    16265    1.80        314.751 ms  sched-messaging
      16320    16265    1.80        313.405 ms  sched-messaging
      16442    16265    1.80        314.403 ms  sched-messaging
      16626    16265    1.78        295.380 ms  sched-messaging
      16600    16265    1.77        309.444 ms  sched-messaging
      16550    16265    1.76        301.161 ms  sched-messaging
      16525    16265    1.75        296.560 ms  sched-messaging
      16314    16265    1.75        298.338 ms  sched-messaging
      16595    16265    1.74        304.390 ms  sched-messaging
      16555    16265    1.74        287.564 ms  sched-messaging
      16520    16265    1.74        295.734 ms  sched-messaging
      16507    16265    1.73        293.956 ms  sched-messaging
      16593    16265    1.72        296.443 ms  sched-messaging
      16531    16265    1.72        299.950 ms  sched-messaging
      16281    16265    1.72        301.339 ms  sched-messaging
  <SNIP>

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-17-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
d2956b3acf perf kwork top: Add BPF-based statistics on hardirq event support
Use BPF to collect statistics on hardirq events based on perf BPF skeletons.

Example usage:

  # perf kwork top -k sched,irq -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
  Total  : 136717.945 ms, 8 cpus
  %Cpu(s):  17.10% id,   0.01% hi,   0.00% si
  %Cpu0   [|||||||||||||||||||||||||       84.26%]
  %Cpu1   [|||||||||||||||||||||||||       84.77%]
  %Cpu2   [||||||||||||||||||||||||        83.22%]
  %Cpu3   [||||||||||||||||||||||||        80.37%]
  %Cpu4   [||||||||||||||||||||||||        81.49%]
  %Cpu5   [|||||||||||||||||||||||||       84.68%]
  %Cpu6   [|||||||||||||||||||||||||       84.48%]
  %Cpu7   [||||||||||||||||||||||||        80.21%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   19.78       3482.833 ms  [swapper/7]
          0        0   19.62       3454.219 ms  [swapper/3]
          0        0   18.50       3258.339 ms  [swapper/4]
          0        0   16.76       2842.749 ms  [swapper/2]
          0        0   15.71       2627.905 ms  [swapper/0]
          0        0   15.51       2598.206 ms  [swapper/6]
          0        0   15.31       2561.820 ms  [swapper/5]
          0        0   15.22       2548.708 ms  [swapper/1]
      13253    13018    2.95        513.108 ms  sched-messaging
      13092    13018    2.67        454.167 ms  sched-messaging
      13401    13018    2.66        454.790 ms  sched-messaging
      13240    13018    2.64        454.587 ms  sched-messaging
      13251    13018    2.61        442.273 ms  sched-messaging
      13075    13018    2.61        438.932 ms  sched-messaging
      13220    13018    2.60        443.245 ms  sched-messaging
      13235    13018    2.59        443.268 ms  sched-messaging
      13222    13018    2.50        426.344 ms  sched-messaging
      13410    13018    2.49        426.191 ms  sched-messaging
      13228    13018    2.46        425.121 ms  sched-messaging
      13379    13018    2.38        409.950 ms  sched-messaging
      13236    13018    2.37        413.159 ms  sched-messaging
      13095    13018    2.36        396.572 ms  sched-messaging
      13325    13018    2.35        408.089 ms  sched-messaging
      13242    13018    2.32        394.750 ms  sched-messaging
      13386    13018    2.31        396.997 ms  sched-messaging
      13046    13018    2.29        383.833 ms  sched-messaging
      13109    13018    2.28        388.482 ms  sched-messaging
      13388    13018    2.28        393.576 ms  sched-messaging
      13238    13018    2.26        388.487 ms  sched-messaging
  <SNIP>

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-16-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
8c98420987 perf kwork top: Implements BPF-based cpu usage statistics
Use BPF to collect statistics on the CPU usage based on perf BPF skeletons.

Example usage:

  # perf kwork top -h

   Usage: perf kwork top [<options>]

      -b, --use-bpf         Use BPF to measure task cpu usage
      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): rate, runtime, tid
          --time <str>      Time span for analysis (start,stop)

  #
  # perf kwork -k sched top -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
  Total  : 160702.425 ms, 8 cpus
  %Cpu(s):  36.00% id,   0.00% hi,   0.00% si
  %Cpu0   [||||||||||||||||||              61.66%]
  %Cpu1   [||||||||||||||||||              61.27%]
  %Cpu2   [|||||||||||||||||||             66.40%]
  %Cpu3   [||||||||||||||||||              61.28%]
  %Cpu4   [||||||||||||||||||              61.82%]
  %Cpu5   [|||||||||||||||||||||||         77.41%]
  %Cpu6   [||||||||||||||||||              61.73%]
  %Cpu7   [||||||||||||||||||              63.25%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   38.72       8089.463 ms  [swapper/1]
          0        0   38.71       8084.547 ms  [swapper/3]
          0        0   38.33       8007.532 ms  [swapper/0]
          0        0   38.26       7992.985 ms  [swapper/6]
          0        0   38.17       7971.865 ms  [swapper/4]
          0        0   36.74       7447.765 ms  [swapper/7]
          0        0   33.59       6486.942 ms  [swapper/2]
          0        0   22.58       3771.268 ms  [swapper/5]
       9545     9351    2.48        447.136 ms  sched-messaging
       9574     9351    2.09        418.583 ms  sched-messaging
       9724     9351    2.05        372.407 ms  sched-messaging
       9531     9351    2.01        368.804 ms  sched-messaging
       9512     9351    2.00        362.250 ms  sched-messaging
       9514     9351    1.95        357.767 ms  sched-messaging
       9538     9351    1.86        384.476 ms  sched-messaging
       9712     9351    1.84        386.490 ms  sched-messaging
       9723     9351    1.83        380.021 ms  sched-messaging
       9722     9351    1.82        382.738 ms  sched-messaging
       9517     9351    1.81        354.794 ms  sched-messaging
       9559     9351    1.79        344.305 ms  sched-messaging
       9725     9351    1.77        365.315 ms  sched-messaging
  <SNIP>

  # perf kwork -k sched top -b -n perf
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
  Total  : 151563.332 ms, 8 cpus
  %Cpu(s):  26.49% id,   0.00% hi,   0.00% si
  %Cpu0   [                                 0.01%]
  %Cpu1   [                                 0.00%]
  %Cpu2   [                                 0.00%]
  %Cpu3   [                                 0.00%]
  %Cpu4   [                                 0.00%]
  %Cpu5   [                                 0.00%]
  %Cpu6   [                                 0.00%]
  %Cpu7   [                                 0.00%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
       9754     9754    0.01          2.303 ms  perf

  #
  # perf kwork -k sched top -b -C 2,3,4
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
  Total  :  48016.721 ms, 3 cpus
  %Cpu(s):  27.82% id,   0.00% hi,   0.00% si
  %Cpu2   [||||||||||||||||||||||          74.68%]
  %Cpu3   [|||||||||||||||||||||           71.06%]
  %Cpu4   [|||||||||||||||||||||           70.91%]

        PID     SPID    %CPU           RUNTIME  COMMMAND
    -------------------------------------------------------------
          0        0   29.08       4734.998 ms  [swapper/4]
          0        0   28.93       4710.029 ms  [swapper/3]
          0        0   25.31       3912.363 ms  [swapper/2]
      10248    10158    1.62        264.931 ms  sched-messaging
      10253    10158    1.62        265.136 ms  sched-messaging
      10158    10158    1.60        263.013 ms  bash
      10360    10158    1.49        243.639 ms  sched-messaging
      10413    10158    1.48        238.604 ms  sched-messaging
      10531    10158    1.47        234.067 ms  sched-messaging
      10400    10158    1.47        240.631 ms  sched-messaging
      10355    10158    1.47        230.586 ms  sched-messaging
      10377    10158    1.43        234.835 ms  sched-messaging
      10526    10158    1.42        232.045 ms  sched-messaging
      10298    10158    1.41        222.396 ms  sched-messaging
      10410    10158    1.38        221.853 ms  sched-messaging
      10364    10158    1.38        226.042 ms  sched-messaging
      10480    10158    1.36        213.633 ms  sched-messaging
      10370    10158    1.36        223.620 ms  sched-messaging
      10553    10158    1.34        217.169 ms  sched-messaging
      10291    10158    1.34        211.516 ms  sched-messaging
      10251    10158    1.34        218.813 ms  sched-messaging
      10522    10158    1.33        218.498 ms  sched-messaging
      10288    10158    1.33        216.787 ms  sched-messaging
  <SNIP>

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-15-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
aa172a5ad3 perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time options
Provide the following options for perf kwork top:

1. -C, --cpu <cpu>		list of cpus to profile
2. -i, --input <file>		input file name
3. -n, --name <name>		event name to profile
4. -s, --sort <key[,key2...]>	sort by key(s): rate, runtime, tid
5. --time <str>		Time span for analysis (start,stop)

Example usage:

  # perf kwork top -h

   Usage: perf kwork top [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): rate, runtime, tid
          --time <str>      Time span for analysis (start,stop)

  # perf kwork top -C 2,4,5

  Total  :  51226.940 ms, 3 cpus
  %Cpu(s):  92.59% id,   0.00% hi,   0.09% si
  %Cpu2   [|                                4.61%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.98      17073.515 ms  swapper/4
          0   95.17      16250.874 ms  swapper/2
          0   82.62      14108.577 ms  swapper/5
       4342   21.70       3708.358 ms  perf
         16    0.13         22.296 ms  rcu_preempt
         75    0.02          4.261 ms  kworker/2:1
         98    0.01          2.540 ms  jbd2/sda-8
         61    0.01          3.404 ms  kcompactd0
         87    0.00          0.145 ms  kworker/5:1H
         73    0.00          0.596 ms  kworker/5:1
         41    0.00          0.041 ms  ksoftirqd/5
         40    0.00          0.718 ms  migration/5
         64    0.00          0.115 ms  kworker/4:1
         35    0.00          0.556 ms  migration/4
        353    0.00          1.143 ms  sshd
         26    0.00          1.665 ms  ksoftirqd/2
         25    0.00          0.662 ms  migration/2

  # perf kwork top -i perf.data

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.02%]
  %Cpu1   [                                 0.01%]
  %Cpu2   [|                                4.61%]
  %Cpu3   [                                 0.04%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]
  %Cpu6   [                                 0.51%]
  %Cpu7   [|||                             11.42%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.98      17073.515 ms  swapper/4
          0   99.98      17072.173 ms  swapper/1
          0   99.93      17064.229 ms  swapper/3
          0   99.62      17011.013 ms  swapper/0
          0   99.47      16985.180 ms  swapper/6
          0   95.17      16250.874 ms  swapper/2
          0   88.51      15111.684 ms  swapper/7
          0   82.62      14108.577 ms  swapper/5
       4342   33.00       5644.045 ms  perf
       4344    0.43         74.351 ms  perf
         16    0.13         22.296 ms  rcu_preempt
       4345    0.05         10.093 ms  perf
       4343    0.05          8.769 ms  perf
       4341    0.02          4.882 ms  perf
       4095    0.02          4.605 ms  kworker/7:1
         75    0.02          4.261 ms  kworker/2:1
        120    0.01          1.909 ms  systemd-journal
         98    0.01          2.540 ms  jbd2/sda-8
         61    0.01          3.404 ms  kcompactd0
        667    0.01          2.542 ms  kworker/u16:2
       4340    0.00          1.052 ms  kworker/7:2
         97    0.00          0.489 ms  kworker/7:1H
         51    0.00          0.209 ms  ksoftirqd/7
         50    0.00          0.646 ms  migration/7
         76    0.00          0.753 ms  kworker/6:1
         45    0.00          0.572 ms  migration/6
         87    0.00          0.145 ms  kworker/5:1H
         73    0.00          0.596 ms  kworker/5:1
         41    0.00          0.041 ms  ksoftirqd/5
         40    0.00          0.718 ms  migration/5
         64    0.00          0.115 ms  kworker/4:1
         35    0.00          0.556 ms  migration/4
        353    0.00          2.600 ms  sshd
         74    0.00          0.205 ms  kworker/3:1
         33    0.00          1.576 ms  kworker/3:0H
         30    0.00          0.996 ms  migration/3
         26    0.00          1.665 ms  ksoftirqd/2
         25    0.00          0.662 ms  migration/2
        397    0.00          0.057 ms  kworker/1:1
         20    0.00          1.005 ms  migration/1
       2909    0.00          1.053 ms  kworker/0:2
         17    0.00          0.720 ms  migration/0
         15    0.00          0.039 ms  ksoftirqd/0

  # perf kwork top -n perf

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.01%]
  %Cpu1   [                                 0.00%]
  %Cpu2   [|                                4.44%]
  %Cpu3   [                                 0.00%]
  %Cpu4   [                                 0.00%]
  %Cpu5   [                                 0.00%]
  %Cpu6   [                                 0.49%]
  %Cpu7   [|||                             11.38%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
       4342   15.74       2695.516 ms  perf
       4344    0.43         74.351 ms  perf
       4345    0.05         10.093 ms  perf
       4343    0.05          8.769 ms  perf
       4341    0.02          4.882 ms  perf

  # perf kwork top -s tid

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.02%]
  %Cpu1   [                                 0.01%]
  %Cpu2   [|                                4.61%]
  %Cpu3   [                                 0.04%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]
  %Cpu6   [                                 0.51%]
  %Cpu7   [|||                             11.42%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.62      17011.013 ms  swapper/0
          0   99.98      17072.173 ms  swapper/1
          0   95.17      16250.874 ms  swapper/2
          0   99.93      17064.229 ms  swapper/3
          0   99.98      17073.515 ms  swapper/4
          0   82.62      14108.577 ms  swapper/5
          0   99.47      16985.180 ms  swapper/6
          0   88.51      15111.684 ms  swapper/7
         15    0.00          0.039 ms  ksoftirqd/0
         16    0.13         22.296 ms  rcu_preempt
         17    0.00          0.720 ms  migration/0
         20    0.00          1.005 ms  migration/1
         25    0.00          0.662 ms  migration/2
         26    0.00          1.665 ms  ksoftirqd/2
         30    0.00          0.996 ms  migration/3
         33    0.00          1.576 ms  kworker/3:0H
         35    0.00          0.556 ms  migration/4
         40    0.00          0.718 ms  migration/5
         41    0.00          0.041 ms  ksoftirqd/5
         45    0.00          0.572 ms  migration/6
         50    0.00          0.646 ms  migration/7
         51    0.00          0.209 ms  ksoftirqd/7
         61    0.01          3.404 ms  kcompactd0
         64    0.00          0.115 ms  kworker/4:1
         73    0.00          0.596 ms  kworker/5:1
         74    0.00          0.205 ms  kworker/3:1
         75    0.02          4.261 ms  kworker/2:1
         76    0.00          0.753 ms  kworker/6:1
         87    0.00          0.145 ms  kworker/5:1H
         97    0.00          0.489 ms  kworker/7:1H
         98    0.01          2.540 ms  jbd2/sda-8
        120    0.01          1.909 ms  systemd-journal
        353    0.00          2.600 ms  sshd
        397    0.00          0.057 ms  kworker/1:1
        667    0.01          2.542 ms  kworker/u16:2
       2909    0.00          1.053 ms  kworker/0:2
       4095    0.02          4.605 ms  kworker/7:1
       4340    0.00          1.052 ms  kworker/7:2
       4341    0.02          4.882 ms  perf
       4342   33.00       5644.045 ms  perf
       4343    0.05          8.769 ms  perf
       4344    0.43         74.351 ms  perf
       4345    0.05         10.093 ms  perf

  # perf kwork top --time 128800,

  Total  :  53495.122 ms, 8 cpus
  %Cpu(s):  94.71% id,   0.09% hi,   0.09% si
  %Cpu0   [                                 0.07%]
  %Cpu1   [                                 0.04%]
  %Cpu2   [||                               8.49%]
  %Cpu3   [                                 0.09%]
  %Cpu4   [                                 0.02%]
  %Cpu5   [                                 0.06%]
  %Cpu6   [                                 0.12%]
  %Cpu7   [||||||                          21.24%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.96       3981.363 ms  swapper/4
          0   99.94       3978.955 ms  swapper/1
          0   99.91       9329.375 ms  swapper/5
          0   99.87       4906.829 ms  swapper/3
          0   99.86       9028.064 ms  swapper/6
          0   98.67       3928.161 ms  swapper/0
          0   91.17       8388.432 ms  swapper/2
          0   78.65       7125.602 ms  swapper/7
       4342   29.42       2675.198 ms  perf
         16    0.18         16.817 ms  rcu_preempt
       4345    0.09          8.183 ms  perf
       4344    0.04          4.290 ms  perf
       4343    0.03          2.844 ms  perf
        353    0.03          2.600 ms  sshd
       4095    0.02          2.702 ms  kworker/7:1
        120    0.02          1.909 ms  systemd-journal
         98    0.02          2.540 ms  jbd2/sda-8
         61    0.02          1.886 ms  kcompactd0
        667    0.02          1.011 ms  kworker/u16:2
         75    0.02          2.693 ms  kworker/2:1
       4341    0.01          1.838 ms  perf
         30    0.01          0.788 ms  migration/3
         26    0.01          1.665 ms  ksoftirqd/2
         20    0.01          0.752 ms  migration/1
       2909    0.01          0.604 ms  kworker/0:2
       4340    0.00          0.635 ms  kworker/7:2
         97    0.00          0.214 ms  kworker/7:1H
         51    0.00          0.209 ms  ksoftirqd/7
         50    0.00          0.646 ms  migration/7
         76    0.00          0.602 ms  kworker/6:1
         45    0.00          0.366 ms  migration/6
         87    0.00          0.145 ms  kworker/5:1H
         40    0.00          0.446 ms  migration/5
         35    0.00          0.318 ms  migration/4
         74    0.00          0.205 ms  kworker/3:1
         33    0.00          0.080 ms  kworker/3:0H
         25    0.00          0.448 ms  migration/2
        397    0.00          0.057 ms  kworker/1:1
         17    0.00          0.365 ms  migration/0

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-14-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
e29090d28c perf kwork top: Add statistics on softirq event support
Calculate the runtime of the softirq events and subtract it from
the corresponding task runtime to improve the precision.

Example usage:

  # perf kwork -k sched,irq,softirq record -- perf record -e cpu-clock -o perf_record.data -a sleep 10
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.467 MB perf_record.data (7154 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.152 MB perf.data (22846 samples) ]
  # perf kwork top

  Total  : 136601.588 ms, 8 cpus
  %Cpu(s):  95.66% id,   0.04% hi,   0.05% si
  %Cpu0   [                                 0.02%]
  %Cpu1   [                                 0.01%]
  %Cpu2   [|                                4.61%]
  %Cpu3   [                                 0.04%]
  %Cpu4   [                                 0.01%]
  %Cpu5   [|||||                           17.31%]
  %Cpu6   [                                 0.51%]
  %Cpu7   [|||                             11.42%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.98      17073.515 ms  swapper/4
          0   99.98      17072.173 ms  swapper/1
          0   99.93      17064.229 ms  swapper/3
          0   99.62      17011.013 ms  swapper/0
          0   99.47      16985.180 ms  swapper/6
          0   95.17      16250.874 ms  swapper/2
          0   88.51      15111.684 ms  swapper/7
          0   82.62      14108.577 ms  swapper/5
       4342   33.00       5644.045 ms  perf
       4344    0.43         74.351 ms  perf
         16    0.13         22.296 ms  rcu_preempt
       4345    0.05         10.093 ms  perf
       4343    0.05          8.769 ms  perf
       4341    0.02          4.882 ms  perf
       4095    0.02          4.605 ms  kworker/7:1
         75    0.02          4.261 ms  kworker/2:1
        120    0.01          1.909 ms  systemd-journal
         98    0.01          2.540 ms  jbd2/sda-8
         61    0.01          3.404 ms  kcompactd0
        667    0.01          2.542 ms  kworker/u16:2
       4340    0.00          1.052 ms  kworker/7:2
         97    0.00          0.489 ms  kworker/7:1H
         51    0.00          0.209 ms  ksoftirqd/7
         50    0.00          0.646 ms  migration/7
         76    0.00          0.753 ms  kworker/6:1
         45    0.00          0.572 ms  migration/6
         87    0.00          0.145 ms  kworker/5:1H
         73    0.00          0.596 ms  kworker/5:1
         41    0.00          0.041 ms  ksoftirqd/5
         40    0.00          0.718 ms  migration/5
         64    0.00          0.115 ms  kworker/4:1
         35    0.00          0.556 ms  migration/4
        353    0.00          2.600 ms  sshd
         74    0.00          0.205 ms  kworker/3:1
         33    0.00          1.576 ms  kworker/3:0H
         30    0.00          0.996 ms  migration/3
         26    0.00          1.665 ms  ksoftirqd/2
         25    0.00          0.662 ms  migration/2
        397    0.00          0.057 ms  kworker/1:1
         20    0.00          1.005 ms  migration/1
       2909    0.00          1.053 ms  kworker/0:2
         17    0.00          0.720 ms  migration/0
         15    0.00          0.039 ms  ksoftirqd/0

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-13-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
2f21f5e4b4 perf kwork top: Add statistics on hardirq event support
Calculate the runtime of the hardirq events and subtract it from
the corresponding task runtime to improve the precision.

Example usage:

  # perf kwork -k sched,irq record -- perf record -o perf_record.data -a sleep 10
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 1.054 MB perf_record.data (18019 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.798 MB perf.data (16334 samples) ]
  #
  # perf kwork top

  Total  : 139240.869 ms, 8 cpus
  %Cpu(s):  94.91% id,   0.05% hi
  %Cpu0   [                                 0.05%]
  %Cpu1   [|                                5.00%]
  %Cpu2   [                                 0.43%]
  %Cpu3   [                                 0.57%]
  %Cpu4   [                                 1.19%]
  %Cpu5   [||||||                          20.46%]
  %Cpu6   [                                 0.48%]
  %Cpu7   [|||                             12.10%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
          0   99.54      17325.622 ms  swapper/2
          0   99.54      17327.527 ms  swapper/0
          0   99.51      17319.909 ms  swapper/6
          0   99.42      17304.934 ms  swapper/3
          0   98.80      17197.385 ms  swapper/4
          0   94.99      16534.991 ms  swapper/1
          0   87.89      15295.264 ms  swapper/7
          0   79.53      13843.182 ms  swapper/5
       4252   36.50       6361.768 ms  perf
       4256    1.17        205.215 ms  bash
        151    0.53         93.298 ms  systemd-resolve
       4254    0.39         69.468 ms  perf
        423    0.34         59.368 ms  bash
        412    0.29         51.204 ms  sshd
        249    0.20         35.288 ms  sd-resolve
         16    0.17         30.287 ms  rcu_preempt
        153    0.09         17.266 ms  systemd-timesyn
          1    0.09         17.078 ms  systemd
       4253    0.07         12.457 ms  perf
       4255    0.06         11.559 ms  perf
       4234    0.03          6.105 ms  kworker/u16:1
         69    0.03          6.259 ms  kworker/1:1H
       4251    0.02          4.615 ms  perf
       4095    0.02          4.890 ms  kworker/7:1
         61    0.02          4.005 ms  kcompactd0
         75    0.02          3.546 ms  kworker/2:1
         97    0.01          3.106 ms  kworker/7:1H
         98    0.01          1.995 ms  jbd2/sda-8
       4088    0.01          1.779 ms  kworker/u16:3
       2909    0.01          1.795 ms  kworker/0:2
       4246    0.00          1.117 ms  kworker/7:2
         51    0.00          0.327 ms  ksoftirqd/7
         50    0.00          0.369 ms  migration/7
        102    0.00          0.160 ms  kworker/6:1H
         76    0.00          0.609 ms  kworker/6:1
         45    0.00          0.779 ms  migration/6
         87    0.00          0.504 ms  kworker/5:1H
         73    0.00          1.130 ms  kworker/5:1
         41    0.00          0.152 ms  ksoftirqd/5
         40    0.00          0.702 ms  migration/5
         64    0.00          0.316 ms  kworker/4:1
         35    0.00          0.791 ms  migration/4
        353    0.00          2.211 ms  sshd
         74    0.00          0.272 ms  kworker/3:1
         30    0.00          0.819 ms  migration/3
         25    0.00          0.784 ms  migration/2
        397    0.00          0.539 ms  kworker/1:1
         21    0.00          1.600 ms  ksoftirqd/1
         20    0.00          0.773 ms  migration/1
         17    0.00          1.682 ms  migration/0
         15    0.00          0.076 ms  ksoftirqd/0

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-12-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
a8792242e4 perf evsel: Add evsel__intval_common() helper
Add evsel__intval_common() helper to search for common_field in
tracepoint format.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-11-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
55c40e5052 perf kwork top: Introduce new top utility
Some common tools for collecting statistics on CPU usage, such as top,
obtain statistics from timer interrupt sampling, and then periodically
read statistics from /proc/stat.

This method has some deviations:

1. In the tick interrupt, the time between the last tick and the current
   tick is counted in the current task. However, the task may be running
   only part of the time.
2. For each task, the top tool periodically reads the /proc/{PID}/status
   information. For tasks with a short life cycle, it may be missed.

In conclusion, the top tool cannot accurately collect statistics on the
CPU usage and running time of tasks.

The statistical method based on sched_switch tracepoint can accurately
calculate the CPU usage of all tasks. This method is applicable to
scenarios where performance comparison data is of high precision.

Example usage:

  # perf kwork

   Usage: perf kwork [<options>] {record|report|latency|timehist|top}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork <kwork>   list of kwork to profile (irq, softirq, workqueue, sched, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 1 groups == 40 processes run

       Total time: 14.074 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ]
  # perf kwork top

  Total  : 115708.178 ms, 8 cpus
  %Cpu(s):   9.78% id
  %Cpu0   [|||||||||||||||||||||||||||     90.55%]
  %Cpu1   [|||||||||||||||||||||||||||     90.51%]
  %Cpu2   [||||||||||||||||||||||||||      88.57%]
  %Cpu3   [|||||||||||||||||||||||||||     91.18%]
  %Cpu4   [|||||||||||||||||||||||||||     91.09%]
  %Cpu5   [|||||||||||||||||||||||||||     90.88%]
  %Cpu6   [||||||||||||||||||||||||||      88.64%]
  %Cpu7   [|||||||||||||||||||||||||||     90.28%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
       4113   22.23       3221.547 ms  sched-messaging
       4105   21.61       3131.495 ms  sched-messaging
       4119   21.53       3120.937 ms  sched-messaging
       4103   21.39       3101.614 ms  sched-messaging
       4106   21.37       3095.209 ms  sched-messaging
       4104   21.25       3077.269 ms  sched-messaging
       4115   21.21       3073.188 ms  sched-messaging
       4109   21.18       3069.022 ms  sched-messaging
       4111   20.78       3010.033 ms  sched-messaging
       4114   20.74       3007.073 ms  sched-messaging
       4108   20.73       3002.137 ms  sched-messaging
       4107   20.47       2967.292 ms  sched-messaging
       4117   20.39       2955.335 ms  sched-messaging
       4112   20.34       2947.080 ms  sched-messaging
       4118   20.32       2942.519 ms  sched-messaging
       4121   20.23       2929.865 ms  sched-messaging
       4110   20.22       2930.078 ms  sched-messaging
       4122   20.15       2919.542 ms  sched-messaging
       4120   19.77       2866.032 ms  sched-messaging
       4116   19.72       2857.660 ms  sched-messaging
       4127   16.19       2346.334 ms  sched-messaging
       4142   15.86       2297.600 ms  sched-messaging
       4141   15.62       2262.646 ms  sched-messaging
       4136   15.41       2231.408 ms  sched-messaging
       4130   15.38       2227.008 ms  sched-messaging
       4129   15.31       2217.692 ms  sched-messaging
       4126   15.21       2201.711 ms  sched-messaging
       4139   15.19       2200.722 ms  sched-messaging
       4137   15.10       2188.633 ms  sched-messaging
       4134   15.06       2182.082 ms  sched-messaging
       4132   15.02       2177.530 ms  sched-messaging
       4131   14.73       2131.973 ms  sched-messaging
       4125   14.68       2125.439 ms  sched-messaging
       4128   14.66       2122.255 ms  sched-messaging
       4123   14.65       2122.113 ms  sched-messaging
       4135   14.56       2107.144 ms  sched-messaging
       4133   14.51       2103.549 ms  sched-messaging
       4124   14.27       2066.671 ms  sched-messaging
       4140   14.17       2052.251 ms  sched-messaging
       4138   13.81       2000.361 ms  sched-messaging
          0   11.42       1652.009 ms  swapper/2
          0   11.35       1641.694 ms  swapper/6
          0    9.71       1405.108 ms  swapper/7
          0    9.48       1372.338 ms  swapper/1
          0    9.44       1366.013 ms  swapper/0
          0    9.11       1318.382 ms  swapper/5
          0    8.90       1287.582 ms  swapper/4
          0    8.81       1274.356 ms  swapper/3
       4100    2.61        379.328 ms  perf
       4101    1.16        169.487 ms  perf-exec
        151    0.65         94.741 ms  systemd-resolve
        249    0.36         53.030 ms  sd-resolve
        153    0.14         21.405 ms  systemd-timesyn
          1    0.10         16.200 ms  systemd
         16    0.09         15.785 ms  rcu_preempt
       4102    0.06          9.727 ms  perf
       4095    0.03          5.464 ms  kworker/7:1
         98    0.02          3.231 ms  jbd2/sda-8
        353    0.02          4.115 ms  sshd
         75    0.02          3.889 ms  kworker/2:1
         73    0.01          1.552 ms  kworker/5:1
         64    0.01          1.591 ms  kworker/4:1
         74    0.01          1.952 ms  kworker/3:1
         61    0.01          2.608 ms  kcompactd0
        397    0.01          1.602 ms  kworker/1:1
         69    0.01          1.817 ms  kworker/1:1H
         10    0.01          2.553 ms  kworker/u16:0
       2909    0.01          2.684 ms  kworker/0:2
       1211    0.00          0.426 ms  kworker/7:0
         97    0.00          0.153 ms  kworker/7:1H
         51    0.00          0.100 ms  ksoftirqd/7
        120    0.00          0.856 ms  systemd-journal
         76    0.00          1.414 ms  kworker/6:1
         46    0.00          0.246 ms  ksoftirqd/6
         45    0.00          0.164 ms  migration/6
         41    0.00          0.098 ms  ksoftirqd/5
         40    0.00          0.207 ms  migration/5
         86    0.00          1.339 ms  kworker/4:1H
         36    0.00          0.252 ms  ksoftirqd/4
         35    0.00          0.090 ms  migration/4
         31    0.00          0.156 ms  ksoftirqd/3
         30    0.00          0.073 ms  migration/3
         26    0.00          0.180 ms  ksoftirqd/2
         25    0.00          0.085 ms  migration/2
         21    0.00          0.106 ms  ksoftirqd/1
         20    0.00          0.118 ms  migration/1
        302    0.00          1.440 ms  systemd-logind
         17    0.00          0.132 ms  migration/0
         15    0.00          0.255 ms  ksoftirqd/0

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-10-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
b83b5071c0 perf kwork: Add root parameter to work_sort()
Add a `struct rb_root_cached *root` parameter to work_sort() to sort the
specified rb tree elements.

No functional change.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-9-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
38d8d013a5 perf kwork: Add sched record support
The kwork_class type of sched is added to support recording and parsing of
sched_switch events.

As follows:

  # perf kwork -h

   Usage: perf kwork [<options>] {record|report|latency|timehist}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork <kwork>   list of kwork to profile (irq, softirq, workqueue, sched, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork -k sched record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.083 MB perf.data (47 samples) ]
  # perf evlist
  sched:sched_switch
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-8-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
26b7254ff1 perf kwork: Set default events list if not specified in setup_event_list()
Currently when no kwork event is specified, all events are configured by
default. Now set to default event list string, which is more flexible and
supports subsequent function extension.

Also put setup_event_list() into each subcommand for different settings.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-7-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
86c67c8af4 perf kwork: Overwrite original atom in the list when a new atom is pushed.
work_push_atom() supports nesting. Currently, all supported kworks are not
nested. A `overwrite` parameter is added to overwrite the original atom in
the list.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-6-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
95064b3352 perf kwork: Add kwork and src_type to work_init() for 'struct kwork_class'
To support different types of reports, two parameters `struct perf_kwork
* kwork` and `enum kwork_trace_type src_type` are added to work_init()
of struct kwork_class for initialization in different scenarios.

No functional change intended.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-5-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
0c526579a4 perf kwork: Set ordered_events to true in 'struct perf_tool'
'perf kwork' processes data based on timestamps and needs to sort events.

Fixes: f98919ec4f ("perf kwork: Implement 'report' subcommand")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230812084917.169338-4-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
76e0d8c821 perf kwork: Add the supported subcommands to the document
Add missing report, latency and timehist subcommands to the document.

Fixes: f98919ec4f ("perf kwork: Implement 'report' subcommand")
Fixes: ad3d9f7a92 ("perf kwork: Implement perf kwork latency")
Fixes: bcc8b3e88d ("perf kwork: Implement perf kwork timehist")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20230812084917.169338-3-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
d39710088d perf kwork: Fix incorrect and missing free atom in work_push_atom()
1. Atoms are managed in page mode and should be released using atom_free()
   instead of free().
2. When the event does not match, the atom needs to free.

Fixes: f98919ec4f ("perf kwork: Implement 'report' subcommand")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230812084917.169338-2-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:59 -03:00
Yang Jihong
d50ad02cb3 perf test: Add perf_event_attr test for record dummy event
If only dummy event is recorded, tracking event is not needed.
Add this test scenario.

Test result:

  # ./perf test list 2>&1 | grep 'Setup struct perf_event_attr'
   17: Setup struct perf_event_attr
  # ./perf test 17 -v
   17: Setup struct perf_event_attr                                    :
  --- start ---
  test child forked, pid 720198
  <SNIP>
  running './tests/attr/test-record-dummy-C0'
  <SNIP>
  test child finished with 0
  ---- end ----
  Setup struct perf_event_attr: Ok

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-7-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:54 -03:00
Yang Jihong
23b97c7ee9 perf test: Add test case for record sideband events
Add a new test case to record sideband events for all CPUs when tracing
selected CPUs

Test result:

  # ./perf test list 2>&1 | grep 'perf record sideband tests'
   95: perf record sideband tests
  # ./perf test 95
   95: perf record sideband tests                                      : Ok

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-6-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:49 -03:00
Yang Jihong
74b4f3ecdf perf record: Track sideband events for all CPUs when tracing selected CPUs
User space tasks can migrate between CPUs, we need to track side-band
events for all CPUs.

The specific scenarios are as follows:

         CPU0                                 CPU1
  perf record -C 0 start
                              taskA starts to be created and executed
                                -> PERF_RECORD_COMM and PERF_RECORD_MMAP
                                   events only deliver to CPU1
                              ......
                                |
                          migrate to CPU0
                                |
  Running on CPU0    <----------/
  ...

  perf record -C 0 stop

Now perf samples the PC of taskA. However, perf does not record the
PERF_RECORD_COMM and PERF_RECORD_MMAP events of taskA.
Therefore, the comm and symbols of taskA cannot be parsed.

The solution is to record sideband events for all CPUs when tracing
selected CPUs. Because this modifies the default behavior, add related
comments to the perf record man page.

The sys_perf_event_open invoked is as follows:

  # perf --debug verbose=3 record -e cpu-clock -C 1 true
  <SNIP>
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0 (PERF_COUNT_SW_CPU_CLOCK)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
  Opening: dummy:u
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0x9 (PERF_COUNT_SW_DUMMY)
    { sample_period, sample_freq }   1
    sample_type                      IP|TID|TIME|CPU|IDENTIFIER
    read_format                      ID|LOST
    inherit                          1
    exclude_kernel                   1
    exclude_hv                       1
    mmap                             1
    comm                             1
    task                             1
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 14
  <SNIP>

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-5-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:43 -03:00
Yang Jihong
1285ab300d perf record: Move setting tracking events before record__init_thread_masks()
User space tasks can migrate between CPUs, so when tracing selected CPUs,
sideband for all CPUs is needed. In this case set the cpu map of the evsel
to all online CPUs. This may modify the original cpu map of the evlist.

Therefore, need to check whether the preceding scenario exists before
record__init_thread_masks().

Dummy tracking has been set in record__open(), move it before
record__init_thread_masks() and add a helper for unified processing.

The sys_perf_event_open invoked is as follows:

  # perf --debug verbose=3 record -e cpu-clock -D 100 true
  <SNIP>
  Opening: cpu-clock
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0 (PERF_COUNT_SW_CPU_CLOCK)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|PERIOD|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    freq                             1
    sample_id_all                    1
    exclude_guest                    1
  ------------------------------------------------------------
  sys_perf_event_open: pid 10318  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 10318  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 10318  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid 10318  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 10318  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 10318  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 10318  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 10318  cpu 7  group_fd -1  flags 0x8 = 13
  Opening: dummy:u
  ------------------------------------------------------------
  perf_event_attr:
    type                             1 (PERF_TYPE_SOFTWARE)
    size                             136
    config                           0x9 (PERF_COUNT_SW_DUMMY)
    { sample_period, sample_freq }   1
    sample_type                      IP|TID|TIME|IDENTIFIER
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    exclude_kernel                   1
    exclude_hv                       1
    mmap                             1
    comm                             1
    enable_on_exec                   1
    task                             1
    sample_id_all                    1
    exclude_guest                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
  ------------------------------------------------------------
  sys_perf_event_open: pid 10318  cpu 0  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid 10318  cpu 1  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid 10318  cpu 2  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid 10318  cpu 3  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid 10318  cpu 4  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid 10318  cpu 5  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid 10318  cpu 6  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid 10318  cpu 7  group_fd -1  flags 0x8 = 21
  <SNIP>

'perf test' needs to update base-record & system-wide-dummy attr expected values
for test-record-C0:

1. Because a dummy sideband event is added to the sampling of specified
   CPUs. When evlist contains evsel of different sample_type,
   evlist__config() will change the default PERF_SAMPLE_ID bit to
   PERF_SAMPLE_IDENTIFICATION bit.
   The attr sample_type expected value of base-record and system-wide-dummy
   in test-record-C0 needs to be updated.

2. The perf record uses evlist__add_aux_dummy() instead of
   evlist__add_dummy() to add a dummy event.
   The expected value of system-wide-dummy attr needs to be updated.

The 'perf test' result is as follows:

  # ./perf test list  2>&1 | grep 'Setup struct perf_event_attr'
   17: Setup struct perf_event_attr
  # ./perf test 17
   17: Setup struct perf_event_attr                                    : Ok

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-4-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:37 -03:00
Yang Jihong
9c95e4ef06 perf evlist: Add evlist__findnew_tracking_event() helper
Currently, intel-bts, intel-pt, and arm-spe may add tracking event to the
evlist. We may need to search for the tracking event for some settings.

Therefore, add evlist__findnew_tracking_event() helper.

If system_wide is true, evlist__findnew_tracking_event() set the cpu map
of the evsel to all online CPUs.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-3-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:32 -03:00
Yang Jihong
f6ff1c7604 perf evlist: Add perf_evlist__go_system_wide() helper
For dummy events that keep tracking, we may need to modify its cpu_maps.

For example, change the cpu_maps to record sideband events for all CPUS.

Add perf_evlist__go_system_wide() helper to support this scenario.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20230904023340.12707-2-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12 17:31:22 -03:00
Arnaldo Carvalho de Melo
a6e414a4cb perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:

  a3e7e6b179 ("libbpf: Remove HASHMAP_INIT static initialization helper")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:31:02 -03:00
Ian Rogers
b333067ff3 perf vendor events intel: Fix spelling mistakes
Update perf JSON files with spelling fixes by Colin Ian King
<colin.i.king@gmail.com> contributed in:

https://github.com/intel/perfmon/pull/96 "Fix various spelling mistakes and typos as found using codespell #96"

This is added on top of the spelling mistakes and release number
updates in:

  https://github.com/intel/perfmon/pull/98 "EMR, SPR, CLX, SKX, BDX, HSX, BDW-DE, WSM-EP*, NHM-*, JKT, IVT : Release event updates"

Some additional spelling fixes reported by Edward Baker
<edward.baker@intel.com> are added on top of this.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230829001730.1352769-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
8c994eff8f perf vendor events intel: Add emeraldrapids, update sapphirerapids to v1.16
Add emeraldrapids events that were added at intel's perfmon site in:

  https://github.com/intel/perfmon/pull/98

  "EMR, SPR, CLX, SKX, BDX, HSX, BDW-DE, WSM-EP*, NHM-*, JKT, IVT : Release event
   updates"

  "Emerald Rapids (0xCF) was previously pointing to SPR core. In this
   pull request dedicated EMR files are introduced."

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230829001730.1352769-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
5d6151531a perf vendor events intel: Add lunarlake v1.0
Add lunarlake events that were added at intel's perfmon site in:

  https://github.com/intel/perfmon/pull/97 "LNL: Release initial events"

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230829001730.1352769-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
0d3f0e6f94 perf parse-events: Introduce 'struct parse_events_terms'
parse_events_terms() existed in function names but was passed a
'struct list_head'.

As many parse_events functions take an evsel_config list as well as a
parse_event_term list, and the naming head_terms and head_config is
inconsistent, there's a potential to switch the lists and get errors.

Introduce a 'struct parse_events_terms', that just wraps a list_head, to
avoid this. Add the regular init/exit functions and transition the code
to use them.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
727adeed06 perf parse-events: Copy fewer term lists
When trying to add events to multiple PMUs the term list is copied first
as adding the event will rewrite the event's name term into the sysfs
and/or json encoding terms (see perf_pmu__check_alias).

Change the parse events add API so the passed in term list is const,
then copy the list when modification is necessary.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
4163644818 perf parse-events: Avoid enum casts
Add term_type to union of values returned by the lexer to avoid casts
to and from an integer.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
8f91662ef8 perf parse-events: Tidy up str parameter
Add a const and rename str to event_name.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
6fcfe54d2c perf parse-events: Remove unnecessary __maybe_unused
The parameter head_terms is always used in get_config_terms.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230901233949.2930562-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
fa88095856 perf shell completion: Support completion of metrics/metricgroups
Allow metrics to expand for -M or --metrics options.

Committer testing:

  # grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 5950X 16-Core Processor
  #

Before:

  Just expansion of files/directories in the pwd are expanded:

  # . tools/perf/perf-completion.sh
  # perf stat -M b
  block/ build/
  # perf stat -M b

After:

  # . tools/perf/perf-completion.sh
  # perf stat -M
  all_l2_cache_accesses  all_remote_links_outbound   data_fabric          l1_itlb_misses                  l2_cache_misses_from_l2_hwpf  macro_ops_dispatched       tlb
  all_l2_cache_hits      branch_misprediction_ratio  decoder              l2_cache                        l3_cache                      nps1_die_to_dram
  all_l2_cache_misses    branch_prediction           ic_fetch_miss_ratio  l2_cache_accesses_from_l2_hwpf  l3_read_miss_latency          op_cache_fetch_miss_ratio
  # perf stat -M branch_
  branch_misprediction_ratio  branch_prediction
  # perf stat -M branch_prediction -a sleep 1

   Performance counter stats for 'system wide':

       115,079,765      ex_ret_brn                       #      4.0 %  branch_misprediction_ratio
         4,561,456      ex_ret_brn_misp

       1.015925106 seconds time elapsed

  #

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230905181554.3202873-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:36 -03:00
Ian Rogers
493902fcbd perf completion: Support completion of libpfm4 events
Use `perf list --raw-dump pfm` to support completion of libpfm4 events.

Committer testing:

  # grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 5950X 16-Core Processor

Before:

  Files in the current directory are expanded when <tab>

After:

  Only the PFM events are:

  # . tools/perf/perf-completion.sh
  # perf stat --pfm-events <tab>

Becomes:

  # perf stat --pfm-events perf_raw::r0000

As apparently there are no other PFM events for this Ryzen 9 5950X
machine.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230905181554.3202873-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Ian Rogers
10864594d8 perf shell completion: Restrict completion of events to events
'perf list' will list libpfm4 events and metrics which aren't valid
options to the '-e' option. Restrict the events gathered so that invalid
ones aren't shown.

Before:

  $ perf stat -e <tab><tab>
  Display all 633 possibilities? (y or n)

After:

  $ perf stat -e <tab><tab>
  Display all 375 possibilities? (y or n)

Committer testing:

  # grep -m1 'model name' /proc/cpuinfo
  model name	: AMD Ryzen 9 5950X 16-Core Processor
  #

Before:

  # . tools/perf/perf-completion.sh
  # perf stat -e
  Display all 2672 possibilities? (y or n)

After:

  # . tools/perf/perf-completion.sh
  # perf stat -e
  Display all 2648 possibilities? (y or n)

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230905181554.3202873-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Ian Rogers
a84fbf2056 perf stat: Fix aggr mode initialization
Generating metrics llc_code_read_mpi_demand_plus_prefetch,
llc_data_read_mpi_demand_plus_prefetch,
llc_miss_local_memory_bandwidth_read,
llc_miss_local_memory_bandwidth_write,
nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read,
memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw,
C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency,
C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency,
C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would
trigger an address sanitizer heap-buffer-overflows on a SkylakeX.

```
==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98
READ of size 4 at 0x5020003eee78 thread T0
    #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12
    #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6
    #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9
    #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31
    #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18
    #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3
    #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5
    #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2
    #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3
    #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11
    #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8
    #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2
    #12 0x558265bb9389 in main tools/perf/perf.c:537:3
```

The issue was the use of testing a cpumap with NULL rather than using
empty, as a map containing the dummy value isn't NULL and the -1
results in an empty aggr map being allocated which legitimately
overflows when any member is accessed.

Fixes: 8a96f454f5 ("perf stat: Avoid SEGV if core.cpus isn't set")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Kajol Jain
1bd69b4bf1 perf vendor events: Update metric events for power10 platform
Update JSON/events for power10 platform with additional metrics.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230905114039.176645-3-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Kajol Jain
23ba30b23b perf vendor events power10: Add extra data-source events
Update JSON/Events list with additional data-source events for power10
platform.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230905114039.176645-2-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Kajol Jain
fc14358075 perf vendor events power10: Update JSON/events
Update JSON/Events list with data-source events for power10 platform.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20230905114039.176645-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Jiapeng Chong
6066622c97 perf machine: Use true and false for bool variable
Fix the following coccicheck warnings:

./tools/perf/util/machine.c:2000:9-10: WARNING: return of 0/1 in
function 'symbol__match_regex' with return type bool.

Committer notes:

Found this in the pile, it was already returning bool, but this patch
simplifies it further, from 3 lines to just 1.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/1614247483-102665-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11 10:26:35 -03:00
Linus Torvalds
0bb80ecc33 Linux 6.6-rc1 2023-09-10 16:28:41 -07:00