linux/tools/perf
K Prateek Nayak aab667ca88 perf stat: Add "--per-cache" aggregation option and document it
This patch adds support for "--per-cache" option for aggregation at a
particular cache level and documents the same.

Following is the output of 'perf stat' with aggregation at L3 for the
event "ls_dmnd_fills_from_sys.ext_cache_remote" on a dual socket 3rd
Generation EPYC Processor (2 x 64C/128T - 16 LLCs) when running
hackbench pinned to 4 LLCs:

  $ sudo perf stat --per-cache=L3 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- \
    taskset -c 0-15,64-79,128-143,192-207 \
    perf bench sched messaging -p -t -l 100000 -g 8

  ...

   Performance counter stats for 'system wide':

  S0-D0-L3-ID0             16          9,500,803      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID8             16          6,338,099      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID16            16            355,005      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID24            16             22,067      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID32            16             16,321      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID40            16             11,619      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID48            16              4,238      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID56            16             31,158      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID64            16         28,242,452      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID72            16         22,906,973      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID80            16             72,898      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID88            16             56,907      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID96            16             20,456      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID104           16             40,913      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID112           16             78,113      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID120           16             37,897      ls_dmnd_fills_from_sys.ext_cache_remote

Also support 'perf stat record' and 'perf stat report' with the ability
to specify a different cache level to aggregate data at when running
'perf stat report'.

  $ sudo perf stat record --per-cache=L2 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- \
    taskset -c 0-15,64-79,128-143,192-207 \
    perf bench sched messaging -p -t -l 100000 -g 8

  ...

   Performance counter stats for 'system wide':

  S0-D0-L2-ID0              2          1,442,061      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID1              2          1,548,994      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID2              2          1,553,557      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID3              2          1,420,122      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID4              2          1,465,461      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID5              2          1,455,153      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID6              2          1,595,237      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID7              2          1,499,321      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L2-ID8              2          1,919,025      ls_dmnd_fills_from_sys.ext_cache_remote
  ...
  S1-D1-L2-ID127            2             21,295      ls_dmnd_fills_from_sys.ext_cache_remote

  $ sudo perf stat report --per-cache=L3

   Performance counter stats for 'perf stat record --per-cache=L2 -a -e ls_dmnd_fills_from_sys.ext_cache_remote --\
                                  taskset -c 0-15,64-79,128-143,192-207 \
                                  perf bench sched messaging -p -t -l 100000 -g 8':

  S0-D0-L3-ID0             16         11,979,906      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID8             16         14,257,202      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID16            16            377,484      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID24            16             27,224      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID32            16             26,816      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID40            16             14,461      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID48            16             10,499      ls_dmnd_fills_from_sys.ext_cache_remote
  S0-D0-L3-ID56            16             53,817      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID64            16         27,361,987      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID72            16         37,299,024      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID80            16             84,125      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID88            16             64,561      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID96            16             13,403      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID104           16             20,138      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID112           16             93,220      ls_dmnd_fills_from_sys.ext_cache_remote
  S1-D1-L3-ID120           16             35,465      ls_dmnd_fills_from_sys.ext_cache_remote

On the above system, the domain covered by S0-D0-L3-ID0 contains
S0-D0-L2-ID0 to S0-D0-L2-ID7, the corresponding count for L3-ID0 is
equal to the sum of counts for L2-ID0 to L2-ID7.

Add documentation for the newly introduced "--per-cache" option.

Suggested-by: Gautham Shenoy <gautham.shenoy@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-5-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-23 16:10:13 -03:00
..
arch Merge remote-tracking branch 'acme/perf-tools' into perf-tools-next 2023-05-22 15:22:46 -03:00
bench tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' 2023-05-17 10:42:19 -03:00
dlfilters perf tools: Fix usage of the verbose variable 2022-12-20 15:16:33 -03:00
Documentation perf stat: Add "--per-cache" aggregation option and document it 2023-05-23 16:10:13 -03:00
examples/bpf perf trace: Remove unused bpf map 'syscalls' 2022-11-23 10:30:00 -03:00
include/perf perf bpf: Remove now unused BPF headers 2022-11-04 11:41:48 -03:00
jvmti
pmu-events perf vendor events intel: Update tigerlake events/metrics 2023-05-23 12:14:13 -03:00
python
scripts perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it 2023-05-02 08:36:14 -03:00
tests perf test: Add test validating JSON generated by 'perf data convert --to-json' 2023-05-22 15:47:48 -03:00
trace tools headers UAPI: Sync arch prctl headers with the kernel sources 2023-05-17 11:23:43 -03:00
ui perf annotate browser: Add '<' and '>' keys for navigation 2023-05-15 17:52:13 -03:00
util perf stat record: Save cache level information 2023-05-23 16:10:13 -03:00
.gitignore perf jevents: Run metric_test.py at compile-time 2023-02-03 17:11:39 -03:00
Build perf script: Fix Python support when no libtraceevent 2023-03-15 10:27:07 -03:00
builtin-annotate.c perf annotate browser: Add '<' and '>' keys for navigation 2023-05-15 17:52:13 -03:00
builtin-bench.c perf bench: Add pmu-scan benchmark 2023-04-04 13:23:58 -03:00
builtin-buildid-cache.c
builtin-buildid-list.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-c2c.c perf c2c: Use zfree() to reduce chances of use after free 2023-04-12 09:55:10 -03:00
builtin-config.c
builtin-daemon.c perf daemon: Use zfree() to reduce chances of use after free 2023-04-12 09:52:29 -03:00
builtin-data.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-diff.c perf util: Move perf_guest/host declarations 2023-04-10 19:22:05 -03:00
builtin-evlist.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-ftrace.c perf ftrace: Flush output after each writing 2023-05-15 18:27:33 -03:00
builtin-help.c perf usage: Move usage strings 2023-04-10 19:20:53 -03:00
builtin-inject.c perf dso: Add dso__filename_with_chroot() to reduce number of accesses to dso->nsinfo members 2023-04-17 18:47:55 -03:00
builtin-kallsyms.c perf map: Add helper for ->map_ip() and ->unmap_ip() 2023-04-06 22:10:17 -03:00
builtin-kmem.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-kvm.c perf evsel: Introduce evsel__name_is() method to check if the evsel name is equal to a given string 2023-04-24 14:28:11 -03:00
builtin-kwork.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-list.c perf stat: Make cputype filter generic 2023-05-15 09:12:14 -03:00
builtin-lock.c Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL" 2023-05-06 18:07:37 -03:00
builtin-mem.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-probe.c tools: Rename __fallthrough to fallthrough 2023-04-06 21:41:00 -03:00
builtin-record.c perf parse-events: Add pmu filter 2023-05-15 09:12:14 -03:00
builtin-report.c perf map: Add accessors for ->pgoff and ->reloc 2023-04-06 22:12:40 -03:00
builtin-sched.c perf sched: Fix sched latency analysis incorrection when using 'sched:sched_wakeup' 2023-04-12 19:30:39 -03:00
builtin-script.c perf script: Add new output field 'dsoff' to print dso offset 2023-05-12 15:21:49 -03:00
builtin-stat.c perf stat: Add "--per-cache" aggregation option and document it 2023-05-23 16:10:13 -03:00
builtin-timechart.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
builtin-top.c perf parse-events: Add pmu filter 2023-05-15 09:12:14 -03:00
builtin-trace.c perf parse-events: Add pmu filter 2023-05-15 09:12:14 -03:00
builtin-version.c Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL" 2023-05-06 18:07:37 -03:00
builtin.h perf usage: Move usage strings 2023-04-10 19:20:53 -03:00
check-headers.sh Disable building BPF based features by default for v6.4. 2023-05-07 11:32:18 -07:00
command-list.txt perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands 2023-01-02 11:51:53 -03:00
CREDITS
design.txt
Makefile perf tools: Use "grep -E" instead of "egrep" 2022-12-14 15:28:19 -03:00
Makefile.config perf build: Gracefully fail the build if BUILD_BPF_SKEL=1 is specified and clang isn't available 2023-05-10 12:54:56 -03:00
Makefile.perf perf build: Add system include paths to BPF builds 2023-05-10 14:23:52 -03:00
MANIFEST tools lib traceevent: Remove libtraceevent 2022-12-14 11:16:12 -03:00
perf-archive.sh
perf-completion.sh perf tools: Fix auto-complete on aarch64 2023-02-08 10:38:10 -03:00
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c perf util: Move input_name to util 2023-04-10 19:21:31 -03:00
perf.h perf util: Move perf_guest/host declarations 2023-04-10 19:22:05 -03:00