linux/tools/perf
German Gomez ea15483e7c perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.

Rows are labeled with the SIMD ISA, and the type of predicate (if any):

  - [p] partial predicate
  - [e] empty predicate (no elements in the vector being used)

Example with Arm SPE and SVE (Scalable Vector Extension):

  #include <arm_sve.h>

  double src[1025], dst[1025];

  int main(void) {
    svfloat64_t vc = svdup_f64(1);
    for(;;)
      for(int i = 0; i < 1025; i += svcntd())
      {
        svbool_t pg = svwhilelt_b64(i, 1025);
        svfloat64_t vsrc = svld1(pg, &src[i]);
        svfloat64_t vdst = svadd_x(pg, vsrc, vc);
        svst1(pg, &dst[i], vdst);
      }
    return 0;
  }

  ... compiled using "gcc-11 -march=armv8-a+sve -O3"

Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:

  $ perf record -e arm_spe_0// -- ./a.out
  $ perf report --itrace=i1i -s overhead,pid,simd,sym

  Overhead      Pid:Command   Simd     Symbol
  ........  ................  .......  ......................

    53.76%    10758:program            [.] main
    46.14%    10758:program   [.] SVE  [.] main
     0.09%    10758:program   [p] SVE  [.] main

The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.

Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 19:28:21 -03:00
..
arch perf intel-pt: Add support for new branch instructions ERETS and ERETU 2023-03-20 19:25:40 -03:00
bench perf bench syscall: Add execve syscall benchmark 2023-02-02 16:32:19 -03:00
dlfilters perf tools: Fix usage of the verbose variable 2022-12-20 15:16:33 -03:00
Documentation perf report: Add 'simd' sort field 2023-03-20 19:28:21 -03:00
examples/bpf perf trace: Remove unused bpf map 'syscalls' 2022-11-23 10:30:00 -03:00
include/perf perf bpf: Remove now unused BPF headers 2022-11-04 11:41:48 -03:00
jvmti
pmu-events perf vendor events s390: Add metric for TLB and cache 2023-03-14 18:36:08 -03:00
python
scripts perf script: Fix Python support when no libtraceevent 2023-03-15 10:27:07 -03:00
tests perf test: Fix memory leak in symbols 2023-03-20 12:51:08 -03:00
trace perf beauty: Update copy of linux/socket.h with the kernel sources 2023-01-18 10:12:23 -03:00
ui perf tools: Fix "kernel lock contention analysis" test by not printing warnings in quiet mode 2022-10-27 16:37:26 -03:00
util perf report: Add 'simd' sort field 2023-03-20 19:28:21 -03:00
.gitignore perf jevents: Run metric_test.py at compile-time 2023-02-03 17:11:39 -03:00
Build perf script: Fix Python support when no libtraceevent 2023-03-15 10:27:07 -03:00
builtin-annotate.c perf hist: Add 'kvm_info' field in histograms entry 2023-03-15 16:47:20 -03:00
builtin-bench.c perf bench syscall: Add execve syscall benchmark 2023-02-02 16:32:19 -03:00
builtin-buildid-cache.c
builtin-buildid-list.c perf buildid-list: Add a "-m" option to show kernel and modules build-ids 2022-07-18 16:35:34 -03:00
builtin-c2c.c perf hist: Add 'kvm_info' field in histograms entry 2023-03-15 16:47:20 -03:00
builtin-config.c
builtin-daemon.c perf daemon: Use sig_atomic_t to avoid UB 2022-11-03 09:35:44 -03:00
builtin-data.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-diff.c perf hist: Add 'kvm_info' field in histograms entry 2023-03-15 16:47:20 -03:00
builtin-evlist.c
builtin-ftrace.c perf ftrace: Reuse target::initial_delay 2023-03-13 14:52:14 -03:00
builtin-help.c
builtin-inject.c perf inject: Fix --buildid-all not to eat up MMAP2 2023-02-23 10:03:00 -03:00
builtin-kallsyms.c
builtin-kmem.c mm: discard __GFP_ATOMIC 2023-02-02 22:33:13 -08:00
builtin-kvm.c perf kvm: Add TUI mode for stat report 2023-03-15 16:53:35 -03:00
builtin-kwork.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-list.c perf list: Support for printing metric thresholds 2023-02-19 08:06:49 -03:00
builtin-lock.c perf lock contention: Show lock type with address 2023-03-14 08:33:48 -03:00
builtin-mem.c perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers 2022-10-31 11:06:41 -03:00
builtin-probe.c perf probe: Fix usage when libtraceevent is missing 2023-02-02 16:32:19 -03:00
builtin-record.c perf bpf filter: Show warning for missing sample flags 2023-03-15 11:08:36 -03:00
builtin-report.c perf evlist: Remove nr_groups 2023-03-13 17:42:27 -03:00
builtin-sched.c perf tools: Use dedicated non-atomic clear/set bit helpers 2022-12-05 09:29:06 -03:00
builtin-script.c perf intel-pt: Add event type names UINTR and UIRET 2023-03-20 19:25:27 -03:00
builtin-stat.c perf stat: Don't remove all grouped events when CPU maps disagree 2023-03-13 15:11:51 -03:00
builtin-timechart.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
builtin-top.c perf evlist: Remove group option. 2022-12-14 15:28:18 -03:00
builtin-trace.c perf record: Reuse target::initial_delay 2023-03-13 14:52:14 -03:00
builtin-version.c perf build: Remove unused HAVE_GLIBC_SUPPORT 2023-03-14 08:29:46 -03:00
builtin.h perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue) 2022-07-26 16:01:24 -03:00
check-headers.sh tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench' 2022-10-25 17:40:48 -03:00
command-list.txt perf help: Use HAVE_LIBTRACEEVENT to filter out unsupported commands 2023-01-02 11:51:53 -03:00
CREDITS
design.txt
Makefile perf tools: Use "grep -E" instead of "egrep" 2022-12-14 15:28:19 -03:00
Makefile.config perf build: Error if no libelf and NO_LIBELF isn't set 2023-03-14 08:29:46 -03:00
Makefile.perf perf bpf filter: Implement event sample filtering 2023-03-15 10:34:33 -03:00
MANIFEST tools lib traceevent: Remove libtraceevent 2022-12-14 11:16:12 -03:00
perf-archive.sh
perf-completion.sh perf tools: Fix auto-complete on aarch64 2023-02-08 10:38:10 -03:00
perf-iostat.sh
perf-read-vdso.c
perf-sys.h
perf.c perf build: Use libtraceevent from the system 2022-12-14 11:16:12 -03:00
perf.h