mirror of
https://github.com/torvalds/linux.git
synced 2024-12-12 14:12:51 +00:00
628eaa4e87
8519 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Ian Rogers
|
628eaa4e87 |
perf pmus: Add placeholder core PMU
If loading a core PMU fails, legacy hardware/cache events may segv due to there being no PMU. Create a placeholder empty PMU for this case. This was discussed in: https://lore.kernel.org/lkml/20230614151625.2077-1-yangjihong1@huawei.com/ Reported-by: Yang Jihong <yangjihong1@huawei.com> Tested-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230627182834.117565-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Fangrui Song
|
78987bb02a |
perf: Replace deprecated -target with --target= for Clang
-target has been deprecated since Clang 3.4 in 2013. Use the preferred
--target=bpf form instead. This matches how we use --target= in
scripts/Makefile.clang.
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: llvm@lists.linux.dev
Cc: Ingo Molnar <mingo@redhat.com>
Cc: bpf@vger.kernel.org
Link:
|
||
Ian Rogers
|
710dffc969 |
perf pmu: Correct auto_merge_stats test
The original logic was to check is_pmu_hybrid() like in the below.
It just checks the name of PMU specifically for Intel hybrid systems
which means uncore PMU events should return false.
https://lore.kernel.org/all/20230527072210.2900565-35-irogers@google.com/
The is_pmu_hybrid() was replaced by arch-agnostic way but with the
incorrect condition which was fixed for core PMUs but not uncore.
This change fixes both.
Fixes:
|
||
Yang Jihong
|
929ff679b6 |
perf tools: Add printing perf_event_attr config symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr config and its symbol to improve the readability of debugging information. Before: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11 <SNIP> After: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 (PERF_TYPE_TRACEPOINT) size 136 config 0x143 (sched:sched_switch) { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10005 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_BPU) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 (PERF_TYPE_BREAKPOINT) size 136 config 0 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 11 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x9 (PERF_COUNT_SW_DUMMY) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID inherit 1 mmap 1 comm 1 freq 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 12 <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-5-yangjihong1@huawei.com [ fix perf import test by adding a dummy tracepoint_id__to_name() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Yang Jihong
|
2e4dd65d7d |
perf tools: Add printing perf_event_attr type symbol in perf_event_attr__fprintf()
When printing perf_event_attr, always display perf_event_attr type and its symbol to improve the readability of debugging information. Before: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ <SNIP> After: # perf --debug verbose=2 record -e cycles,cpu-clock,sched:sched_switch,branch-load-misses,r101,mem:0x0 -C 0 true <SNIP> ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 2 (PERF_TYPE_TRACEPOINT) size 136 config 0x143 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10005 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 9 ------------------------------------------------------------ perf_event_attr: type 4 (PERF_TYPE_RAW) size 136 config 0x101 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER read_format ID disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 10 ------------------------------------------------------------ perf_event_attr: type 5 (PERF_TYPE_BREAKPOINT) size 136 { sample_period, sample_freq } 1 sample_type IP|TID|TIME|CPU|IDENTIFIER read_format ID disabled 1 inherit 1 sample_id_all 1 exclude_guest 1 bp_type 3 { bp_len, config2 } 0x4 ------------------------------------------------------------ <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-4-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Yang Jihong
|
5492e72500 |
perf tools: Extend PRINT_ATTRf to support printing of members with a value of 0
When printing attr, members whose value is 0 will not be printed, we want to print the case where attr->type is 0(PERF_TYPE_HARDWARE), add `_a` param to PRINT_ATTRf macro to always print member when it is true No functional change. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-3-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Yang Jihong
|
220f88b5a1 |
perf trace-event-info: Add tracepoint_id_to_name() helper
Add tracepoint_id_to_name() helper to search for the trace events directory by given event id and return the corresponding tracepoint. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: anshuman.khandual@arm.com Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: jesussanp@google.com Cc: peterz@infradead.org Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: mingo@redhat.com Link: https://lore.kernel.org/r/20230623054416.160858-2-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
d82257d7f6 |
perf symbol: Remove now unused symbol_conf.sort_by_name
Previously used to specify symbol_name_rb_node was in use. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
259dce914e |
perf symbol: Remove symbol_name_rb_node
Most perf commands want to sort symbols by name and this is done via an invasive rbtree that on 64-bit systems costs 24 bytes. Sorting the symbols in a DSO by name is optional and not done by default, however, if sorting is requested the 24 bytes is allocated for every symbol. This change removes the rbtree and uses a sorted array of symbol pointers instead (costing 8 bytes per symbol). As the array is created on demand then there are further memory savings. The complexity of sorting the array and using the rbtree are the same. To support going to the next symbol, the index of the current symbol needs to be passed around as a pair with the current symbol. This requires some API changes. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-3-irogers@google.com [ minimize change in symbols__sort_by_name() ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
ce5b293405 |
perf dso: Sort symbols under lock
Determine if symbols are sorted, set the sorted flag and sort under the dso lock. Done in the interest of thread safety. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-2-irogers@google.com [ handle the similar code in util/probe-event.c ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
5c45b21047 |
perf bpf: Move the declaration of struct rq
struct rq is defined in vmlinux.h when the vmlinux.h is generated,
this causes a redefinition failure if it is declared in
lock_contention.bpf.c. Move the definition to vmlinux.h for
consistency with the generated version.
Fixes:
|
||
Ian Rogers
|
b7a2d774c9 |
perf build: Add ability to build with a generated vmlinux.h
Commit |
||
Ian Rogers
|
d06593aa00 |
perf pmu: Remove a hard coded cpu PMU assumption
The property of "cpu" when it has no cpu map is true on S390 with the PMU cpum_cf. Rather than maintain a list of such PMUs, reuse the is_core test result from the caller. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230623043843.4080180-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
d685819b40 |
perf pmus: Add notion of default PMU for JSON events
JSON events created in pmu-events.c by jevents.py may not specify a PMU they are associated with, in which case it is implied that it is the first core PMU. Care is needed to select this for regular 'cpu', s390 'cpum_cf' and ARMs many names as at the point the name is first needed the core PMUs list hasn't been initialized. Add a helper in perf_pmus to create this value, in the worst case by scanning sysfs. v2. Add missing close if fdopendir fails. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043843.4080180-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
33941dbd14 |
perf unwind: Fix map reference counts
The result of thread__find_map is the map in the passed in addr_location. Calling addr_location__exit puts that map and so copies need to do a map__get. Add in the corresponding map__puts. v2. Add missing map__put when dso is missing. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230623043107.4077510-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Namhyung Kim
|
2d7f5540b8 |
perf script: Initialize buffer for regs_map()
The buffer is used to save register mapping in a sample. Normally
perf samples don't have any register so the string should be empty.
But it missed to initialize the buffer when the size is 0. And it's
passed to PyUnicode_FromString() with a garbage data.
So it returns NULL due to invalid input (instead of an empty unicode
string object) which causes a segfault like below:
Thread 2.1 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7c83780 (LWP 193775)]
0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
(gdb) bt
#0 0x00007ffff6dbca2e in PyDict_SetItem () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#1 0x00007ffff6dbf848 in PyDict_SetItemString () from /lib/x86_64-linux-gnu/libpython3.11.so.1.0
#2 0x000055555575824d in pydict_set_item_string_decref (val=0x0, key=0x5555557f96e3 "iregs", dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:145
#3 set_regs_in_dict (evsel=0x555555efc370, sample=0x7fffffffb870, dict=0x7ffff5f7f780)
at util/scripting-engines/trace-event-python.c:776
#4 get_perf_sample_dict (sample=sample@entry=0x7fffffffb870, evsel=evsel@entry=0x555555efc370, al=al@entry=0x7fffffffb2e0,
addr_al=addr_al@entry=0x0, callchain=callchain@entry=0x7ffff63ef440) at util/scripting-engines/trace-event-python.c:923
#5 0x0000555555758ec1 in python_process_tracepoint (sample=0x7fffffffb870, evsel=0x555555efc370, al=0x7fffffffb2e0, addr_al=0x0)
at util/scripting-engines/trace-event-python.c:1044
#6 0x00005555555c5db8 in process_sample_event (tool=<optimized out>, event=<optimized out>, sample=<optimized out>,
evsel=0x555555efc370, machine=0x555555ef4d68) at builtin-script.c:2421
#7 0x00005555556b7793 in perf_session__deliver_event (session=0x555555ef4b60, event=0x7ffff62ff7d0, tool=0x7fffffffc150,
file_offset=30672, file_path=0x555555efb8a0 "perf.data") at util/session.c:1639
#8 0x00005555556bc864 in do_flush (show_progress=true, oe=0x555555efb700) at util/ordered-events.c:245
#9 __ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL, timestamp=timestamp@entry=0)
at util/ordered-events.c:324
#10 0x00005555556bd06e in ordered_events__flush (oe=oe@entry=0x555555efb700, how=how@entry=OE_FLUSH__FINAL)
at util/ordered-events.c:342
#11 0x00005555556b9d63 in __perf_session__process_events (session=0x555555ef4b60) at util/session.c:2465
#12 perf_session__process_events (session=0x555555ef4b60) at util/session.c:2627
#13 0x00005555555cb1d0 in __cmd_script (script=0x7fffffffc150) at builtin-script.c:2839
#14 cmd_script (argc=<optimized out>, argv=<optimized out>) at builtin-script.c:4365
#15 0x0000555555650811 in run_builtin (p=p@entry=0x555555ed8948 <commands+456>, argc=argc@entry=4, argv=argv@entry=0x7fffffffe240)
at perf.c:323
#16 0x0000555555597eb3 in handle_internal_command (argv=0x7fffffffe240, argc=4) at perf.c:377
#17 run_argv (argv=<synthetic pointer>, argcp=<synthetic pointer>) at perf.c:421
#18 main (argc=4, argv=0x7fffffffe240) at perf.c:537
Fixes:
|
||
Tiezhu Yang
|
765be32b97 |
perf symbol: Add LoongArch case in get_plt_sizes()
We can see the following definitions in bfd/elfnn-loongarch.c: #define PLT_HEADER_INSNS 8 #define PLT_HEADER_SIZE (PLT_HEADER_INSNS * 4) #define PLT_ENTRY_INSNS 4 #define PLT_ENTRY_SIZE (PLT_ENTRY_INSNS * 4) so plt header size is 32 and plt entry size is 16 on LoongArch, let us add LoongArch case in get_plt_sizes(). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Ingo Molnar <mingo@redhat.com> Link: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-loongarch.c Link: https://lore.kernel.org/r/1684835873-15956-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
elisabeth
|
362f9c907f |
perf jit: Fix incorrect file name in DWARF line table
Fixes an issue where an incorrect filename was added in the DWARF line table of an ELF object file when calling 'perf inject --jit' due to not checking the filename of a debug entry against the repeated name marker (/xff/0). The marker is mentioned in the tools/perf/util/jitdump.h header, which describes the jitdump binary format, and indicitates that the filename in a debug entry is the same as the previous enrty. In the function emit_lineno_info(), in the file tools/perf/util/genelf-debug.c, the debug entry filename gets compared to the previous entry filename. If they are not the same, a new filename is added to the DWARF line table. However, since there is no check against '\xff\0', in some cases '\xff\0' is inserted as the filename into the DWARF line table. This can be seen with `objdump --dwarf=line` on the ELF file after `perf inject --jit`. It also makes no source code information show up in 'perf annotate'. Signed-off-by: Elisabeth Panholzer <elisabeth@leaningtech.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230602123815.255001-1-paniii94@gmail.com [ Fixed a trailing white space, removed a subject prefix ] Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
WANG Rui
|
4ca0d340ce |
perf annotate: Fix instruction association and parsing for LoongArch
In the perf annotate view for LoongArch, there is no arrowed line pointing to the target from the branch instruction. This issue is caused by incorrect instruction association and parsing. $ perf record alloc-6276705c94ad1398 # rust benchmark $ perf report 0.28 │ ori $a1, $zero, 0x63 │ move $a2, $zero 10.55 │ addi.d $a3, $a2, 1(0x1) │ sltu $a4, $a3, $s7 9.53 │ masknez $a4, $s7, $a4 │ sub.d $a3, $a3, $a4 12.12 │ st.d $a1, $fp, 24(0x18) │ st.d $a3, $fp, 16(0x10) 16.29 │ slli.d $a2, $a2, 0x2 │ ldx.w $a2, $s8, $a2 12.77 │ st.w $a2, $sp, 724(0x2d4) │ st.w $s0, $sp, 720(0x2d0) 7.03 │ addi.d $a2, $sp, 720(0x2d0) │ addi.d $a1, $a1, -1(0xfff) 12.03 │ move $a2, $a3 │ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4> 2.50 │ addi.d $a0, $a0, 1(0x1) This patch fixes instruction association issues, such as associating branch instructions with jump_ops instead of call_ops, and corrects false instruction matches. It also implements branch instruction parsing specifically for LoongArch. With this patch, we will be able to see the arrowed line. 0.79 │3ec: ori $a1, $zero, 0x63 │ move $a2, $zero 10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1) │ │ sltu $a4, $a3, $s7 10.44 │ │ masknez $a4, $s7, $a4 │ │ sub.d $a3, $a3, $a4 14.17 │ │ st.d $a1, $fp, 24(0x18) │ │ st.d $a3, $fp, 16(0x10) 13.15 │ │ slli.d $a2, $a2, 0x2 │ │ ldx.w $a2, $s8, $a2 11.00 │ │ st.w $a2, $sp, 724(0x2d4) │ │ st.w $s0, $sp, 720(0x2d0) 8.00 │ │ addi.d $a2, $sp, 720(0x2d0) │ │ addi.d $a1, $a1, -1(0xfff) 11.99 │ │ move $a2, $a3 │ └──bne $a1, $s3, 3f4 3.17 │ addi.d $a0, $a0, 1(0x1) Signed-off-by: WANG Rui <wangrui@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
2e9f9d4a72 |
perf annotation: Switch lock from a mutex to a sharded_mutex
Remove the "struct mutex lock" variable from annotation that is allocated per symbol. This removes in the region of 40 bytes per symbol allocation. Use a sharded mutex where the number of shards is set to the number of CPUs. Assuming good hashing of the annotation (done based on the pointer), this means in order to contend there needs to be more threads than CPUs, which is not currently true in any perf command. Were contention an issue it is straightforward to increase the number of shards in the mutex. On my Debian/glibc based machine, this reduces the size of struct annotation from 136 bytes to 96 bytes, or nearly 30%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Ian Rogers
|
0650b2b2e6 |
perf sharded_mutex: Introduce sharded_mutex
Per object mutexes may come with significant memory cost while a global mutex can suffer from unnecessary contention. A sharded mutex is a compromise where objects are hashed and then a particular mutex for the hash of the object used. Contention can be controlled by the number of shards. v2. Use hashmap.h's hash_bits in case of contention from alignment of objects. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Li Dong
|
5e37ef5c2a |
tools: Fix incorrect calculation of object size by sizeof
What we need to calculate is the size of the object, not the size of the
pointer.
Fixed:
|
||
baomingtong001@208suo.com
|
240de691dd |
perf parse-events: Remove unneeded semicolon
./tools/perf/util/parse-events.c:1466:2-3: Unneeded semicolon Signed-off-by: Mingtong Bao <baomingtong001@208suo.com> Link: https://lore.kernel.org/r/2c733a91717eae93119ba2226420fd8f@208suo.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Yang Jihong
|
bc06026d14 |
perf parse: Add missing newline to pr_debug message in evsel__compute_group_pmu_name()
The newline is missing for pr_debug message in evsel__compute_group_pmu_name(), fix it. Before: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u'No PMU found for 'instructions:u'------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> After: # perf --debug verbose=2 record -e cpu-clock true <SNIP> No PMU found for 'cycles:u' No PMU found for 'instructions:u' ------------------------------------------------------------ perf_event_attr: type 1 size 136 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ <SNIP> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: mark.rutland@arm.com Cc: irogers@google.com Cc: peterz@infradead.org Cc: adrian.hunter@intel.com Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: kan.liang@linux.intel.com Cc: mingo@redhat.com Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20230616024515.80814-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> |
||
Arnaldo Carvalho de Melo
|
82fe2e45cd |
perf pmus: Check if we can encode the PMU number in perf_event_attr.type
In some architectures we can't encode the PMU number in perf_event_attr.type and thus can't just ask for the same event in multiple CPUs (and thus PMUs), that is what we want in hybrid systems but we can't when that encoding isn't understood by the kernel, such as in ARM64's big.LITTLE. If that is the case, fallback to the previous behaviour till we find a better solution to have consistent output accross architectures with hybrid CPU configurations. Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
e2be06662c |
perf print-events: Export is_event_supported()
Will be used when checking if we can encode the PMU number in perf_event_attr.type, part of the logic to use in hybrid systems (multiple types of CPUs, such as Intel's (Alder Lake, etc) or ARM's big.LITTLE). Co-developed-with: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-perf-users/ZIzYgImv61OGK1wA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
5752c20f37 |
perf mem: Scan all PMUs instead of just core ones
Scanning only core PMUs is not sufficient on platforms like AMD since perf mem on AMD uses IBS OP PMU, which is independent of core PMU. Scan all PMUs instead of just core PMUs. There should be negligible performance overhead because of scanning all PMUs, so we should be okay. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-4-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
f0dc208267 |
perf mem amd: Fix perf_pmus__num_mem_pmus()
perf mem/c2c on AMD internally uses IBS OP PMU, not the core PMU. Also, AMD platforms does not have heterogeneous PMUs. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-3-ravi.bangoria@amd.com [ Added the improved comment for perf_pmus__num_mem_pmus() as b4 didn't from the per-patch (not series) newer version ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
cddfc5fb3f |
perf pmus: Describe semantics of 'core_pmus' and 'other_pmus'
Notion of 'core_pmus' and 'other_pmus' are independent of hw core and uncore pmus. For example, AMD IBS PMUs are present in each SMT-thread but they belongs to 'other_pmus'. Add a comment describing what these list contains and how they are treated. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230615051700.1833-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
dada1a1f5f |
perf stat: Show average value on multiple runs
When -r option is used, perf stat runs the command multiple times and update stats in the evsel->stats.res_stats for global aggregation. But the value is never used and the value it prints at the end is just the value from the last run. I think we should print the average number of multiple runs. Add evlist__copy_res_stats() to update the aggr counter (for display) using the values in the evsel->stats.res_stats. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Thomas Richter
|
6fbd67b0f0 |
perf test: fix failing test cases on linux-next for s390
In linux-next tree the many test cases fail on s390x when running the
perf test suite, sometime the perf tool dumps core.
Output before:
6.1: Test event parsing : FAILED!
10.3: Parsing of PMU event table metrics : FAILED!
10.4: Parsing of PMU event table metrics with fake PMUs: FAILED!
17: Setup struct perf_event_attr : FAILED!
24: Number of exit events of a simple workload : FAILED!
26: Object code reading : FAILED!
28: Use a dummy software event to keep tracking : FAILED!
35: Track with sched_switch : FAILED!
42.3: BPF prologue generation : FAILED!
66: Parse and process metrics : FAILED!
68: Event expansion for cgroups : FAILED!
69.2: Perf time to TSC : FAILED!
74: build id cache operations : FAILED!
86: Zstd perf.data compression/decompression : FAILED!
87: perf record tests : FAILED!
106: Test java symbol : FAILED!
The reason for all these failure is a missing PMU. On s390x the PMU is
named cpum_cf which is not detected as core PMU. A similar patch was
added before, see commit
|
||
Vincent Whitchurch
|
66dc1920f6 |
perf annotate: Work with vmlinux outside symfs
It is currently possible to use --symfs along with a vmlinux which lies outside of the symfs by passing an absolute path to --vmlinux, thanks to the check in dso__load_vmlinux() which handles this explicitly. However, the annotate code lacks this check and thus 'perf annotate' does not work ("Internal error: Invalid -1 error code") for kernel functions with this combination. Add the missing handling. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel@axis.com Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kan Liang
|
6a80d794d7 |
perf stat: New metricgroup output for the default mode
In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output hard to read. Since different ARCHs (even different generations in the same ARCH) may use different events. The output also vary on different platforms. For a metricgroup, only outputting the value of each metric is good enough. Add a new field default_metricgroup in evsel to indicate an event of the default metricgroup. For those events, printout() should print the metricgroup name rather than each event. Add perf_stat__skip_metric_event() to skip the evsel in the Default metricgroup, if it's not running or not the metric event. Add print_metricgroup_header_t to pass the functions which print the display name of each metricgroup in the Default metricgroup. Support all three output methods. Factor out perf_stat__print_shadow_stats_metricgroup() to print out each metrics. On SPR: Before: ./perf_old stat sleep 1 Performance counter stats for 'sleep 1': 0.54 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 125.445 K/sec 540,970 cycles:u # 0.998 GHz 556,325 instructions:u # 1.03 insn per cycle 123,602 branches:u # 228.018 M/sec 6,889 branch-misses:u # 5.57% of all branches 3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound # 17.2 % tma_retiring # 23.1 % tma_bad_speculation # 41.4 % tma_frontend_bound 564,859 topdown-retiring:u 1,370,999 topdown-fe-bound:u 603,271 topdown-be-bound:u 744,874 topdown-bad-spec:u 12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec 1.001798215 seconds time elapsed 0.000193000 seconds user 0.001700000 seconds sys After: $ ./perf stat sleep 1 Performance counter stats for 'sleep 1': 0.51 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 132.683 K/sec 545,228 cycles:u # 1.064 GHz 555,509 instructions:u # 1.02 insn per cycle 123,574 branches:u # 241.120 M/sec 6,957 branch-misses:u # 5.63% of all branches TopdownL1 # 17.5 % tma_backend_bound # 22.6 % tma_bad_speculation # 42.7 % tma_frontend_bound # 17.1 % tma_retiring TopdownL2 # 21.8 % tma_branch_mispredicts # 11.5 % tma_core_bound # 13.4 % tma_fetch_bandwidth # 29.3 % tma_fetch_latency # 2.7 % tma_heavy_operations # 14.5 % tma_light_operations # 0.8 % tma_machine_clears # 6.1 % tma_memory_bound 1.001712086 seconds time elapsed 0.000151000 seconds user 0.001618000 seconds sys Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kan Liang
|
1c0e47956a |
perf metrics: Sort the Default metricgroup
The new default mode will print the metrics as a metric group. The metrics from the same metric group must be adjacent to each other in the metric list. But the metric_list_cmp() sorts metrics by the number of events. Add a new sort for the Default metricgroup, which sorts by default_metricgroup_name and metric_name. Add is_default in the struct metric_event to indicate that it's from the Default metricgroup. Store the displayed metricgroup name of the Default metricgroup into the metric expr for output. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kan Liang
|
b0a9e8f81f |
perf stat,jevents: Introduce Default tags for the default mode
Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. It will be printed in the default output to replace the event names. There is nothing changed for the output format. On SPR, both TopdownL1 and TopdownL2 are displayed in the default output. On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is displayed in the default output. Suggested-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kan Liang
|
e15e4a3d7d |
perf evsel: Fix the annotation for hardware events on hybrid
The annotation for hardware events is wrong on hybrid. For example, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,148.85 msec cpu-clock # 32.000 CPUs utilized 374 context-switches # 11.633 /sec 33 cpu-migrations # 1.026 /sec 295 page-faults # 9.176 /sec 18,979,960 cpu_core/cycles/ # 590.378 K/sec 261,230,783 cpu_atom/cycles/ # 8.126 M/sec (54.21%) 17,019,732 cpu_core/instructions/ # 529.404 K/sec 38,020,470 cpu_atom/instructions/ # 1.183 M/sec (63.36%) 3,296,743 cpu_core/branches/ # 102.546 K/sec 6,692,338 cpu_atom/branches/ # 208.167 K/sec (63.40%) 96,421 cpu_core/branch-misses/ # 2.999 K/sec 1,016,336 cpu_atom/branch-misses/ # 31.613 K/sec (63.38%) The hardware events have extended type on hybrid, but the evsel__match() doesn't take it into account. Filter the config on hybrid before checking. With the patch, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,139.90 msec cpu-clock # 32.003 CPUs utilized 343 context-switches # 10.672 /sec 32 cpu-migrations # 0.996 /sec 73 page-faults # 2.271 /sec 13,712,841 cpu_core/cycles/ # 0.000 GHz 258,301,691 cpu_atom/cycles/ # 0.008 GHz (54.20%) 12,428,163 cpu_core/instructions/ # 0.91 insn per cycle 37,786,557 cpu_atom/instructions/ # 2.76 insn per cycle (63.35%) 2,418,826 cpu_core/branches/ # 75.259 K/sec 6,965,962 cpu_atom/branches/ # 216.739 K/sec (63.38%) 72,150 cpu_core/branch-misses/ # 2.98% of all branches 1,032,746 cpu_atom/branch-misses/ # 42.70% of all branches (63.35%) Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230615135315.3662428-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
e90208e9ff |
perf srcline: Fix handling of inline functions
We write an address then a ',' to addr2line. With inline data we
generally get back (// are my comments):
0x1234 // address
foo // function name
foo.c:123 // filename:line
bar // function name
bar.c:123 // filename:line
0x000000000000000 // sentinel address created by ','
?? // unknown function name
??:0 // unknown filename:line
The code was assuming the inline data also had the address, which is
incorrect. This means the first inline function name (bar above) needs
to be checked to see if it is the sentinel, otherwise to be treated as
a function name. The regression was caused by the addition of
addresses as the kernel is reporting a symbol at address 0 (used by
GNU binutils when it interprets ',').
Committer testing:
Using:
# perf trace --call-graph=dwarf -e lock:contention_*
<SNIP>
1244.615 TaskCon~ller #/2645281 lock:contention_begin(lock_addr: 0xffff8e6748da5ab0, flags: 2)
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_begin (inlined)
trace_contention_begin (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
[0x4def008] (/usr/lib64/firefox/libxul.so)
1244.619 TaskCon~ller #/2645281 lock:contention_end(lock_addr: 0xffff8e6748da5ab0)
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__preempt_count_dec_and_test (inlined)
trace_contention_end (inlined)
trace_contention_end (inlined)
rwsem_down_read_slowpath ([kernel.kallsyms])
__down_read_common (inlined)
__down_read (inlined)
down_read ([kernel.kallsyms])
arch_static_branch (inlined)
static_key_false (inlined)
__mmap_lock_trace_acquire_returned (inlined)
mmap_read_lock (inlined)
do_user_addr_fault ([kernel.kallsyms])
arch_local_irq_disable (inlined)
handle_page_fault (inlined)
exc_page_fault ([kernel.kallsyms])
asm_exc_page_fault ([kernel.kallsyms])
<SNIP>
Fixes:
|
||
Ian Rogers
|
701677b957 |
perf srcline: Add a timeout to reading from addr2line
addr2line may fail to send expected values causing perf to wait indefinitely. Add a 1 second timeout (twice the timeout for reading from /proc/pid/maps) so that such reads don't cause perf to appear to lock up. There are already checks that the file for addr2line contains a debug section but this isn't always sufficient. The problem was observed when a valid elf file would set the configuration for binutils addr2line, then a later read of vmlinux with ELF debug sections would cause a failing write/read which would block indefinitely. As a service to future readers, if the io hits eof or an error, cleanup the addr2line process. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20230608061812.3715566-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
6ec9503f45 |
perf parse-events: Avoid string for PE_BP_COLON, PE_BP_SLASH
There's no need to read the string ':' or '/' for PE_BP_COLON or
PE_BP_SLASH and doing so causes parse-events.y to leak memory.
The original patch has a committer note about not using these tokens
presumably as yacc spotted they were a memory leak because no
%destructor could be run. Remove the unused token workaround as there
is now no value associated with these tokens.
Fixes:
|
||
Kan Liang
|
e4c4e8a538 |
perf metric: Fix no group check
The no group check fails if there is more than one meticgroup in the
metricgroup_no_group.
The first parameter of the match_metric() should be the string, while
the substring should be the second parameter.
Fixes:
|
||
Ian Rogers
|
8dc26b6f71 |
perf srcline: Make sentinel reading for binutils addr2line more robust
The addr2line process is sent an address then multiple function, filename:line "records" are read. To detect the end of output a ',' is sent and for llvm-addr2line a ',' is then read back showing the end of addrline's output. For binutils addr2line the ',' translates to address 0 and we expect the bogus filename marker "??:0" (see filename_split) to be sent from addr2line. For some kernels address 0 may have a mapping and so a seemingly valid inline output is given and breaking the sentinel discovery: ``` $ addr2line -e vmlinux -f -i , __per_cpu_start ./arch/x86/kernel/cpu/common.c:1850 ``` To avoid this problem enable the address dumping for addr2line (the -a option). If an address of 0x0000000000000000 is read then this is the sentinel value working around the problem above. The filename_split still needs to check for "??:0" as bogus non-zero addresses also need handling. Reported-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Changbin Du <changbin.du@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230613034817.1356114-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
c7a0023a14 |
perf srcline: Make addr2line configuration failure more verbose
To aid debugging why it fails. Also, combine the loops for reading a line for the llvm/binutils cases. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Changbin Du <changbin.du@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230613034817.1356114-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
7f911905ff |
perf dwarf-aux: Allow unnamed struct/union/enum
It's possible some struct/union/enum type don't have type name. Allow the empty name after "struct"/"union"/"enum" string rather than fail. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612234102.3909116-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
3abfcfd847 |
perf dwarf-aux: Fix off-by-one in die_get_varname()
The die_get_varname() returns "(unknown_type)" string if it failed to
find a type for the variable. But it had a space before the opening
parenthesis and it made the closing parenthesis cut off due to the
off-by-one in the string length (14).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes:
|
||
Arnaldo Carvalho de Melo
|
d15b8c76c9 |
perf pfm: Remove duplicate util/cpumap.h include
Fixes:
|
||
Namhyung Kim
|
103b3d2f94 |
perf annotate: Allow whitespace between insn operands
The llvm-objdump adds a space between the operands while GNU objdump does not. Allow a space to handle the both. In GNU objdump: Disassembly of section .text: here | ffffffff81000000 <_stext>: v ffffffff81000000: 48 8d 25 51 1f 40 01 lea 0x1401f51(%rip),%rsp ffffffff81000007: e8 d4 00 00 00 call ffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff lea -0x13(%rip),%rdi In llvm-objdump: Disassembly of section .text: here | ffffffff81000000 <startup_64>: v ffffffff81000000: 48 8d 25 51 1f 40 01 leaq 20979537(%rip), %rsp ffffffff81000007: e8 d4 00 00 00 callq 0xffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff leaq -19(%rip), %rdi Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612230026.3887586-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
0f0d1354a5 |
perf help: Ensure clean_cmds is called on all paths
Avoid potential memory leaks. Committer notes: This is right before calling exit(1), so just to clean up memory leak checker detection. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230611233610.953456-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
d927ef5004 |
perf cs-etm: Add exception level consistency check
Assert that our own tracking of the exception level matches what OpenCSD provides. OpenCSD doesn't distinguish between EL0 and EL1 in the memory access callback so the extra tracking was required. But a rough assert can still be done. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
8d3031d39f |
perf cs-etm: Track exception level
Currently we assume all trace belongs to the host machine so when the decoder should be looking at the guest kernel maps it can crash because it looks at the host ones instead. Avoid one scenario (guest kernel running at EL1) by assigning the default guest machine to this trace. For userspace trace it's still not possible to determine guest vs host, but the PIDs should help in this case. Committer notes: Fixed up conflict with: perf addr_location: Add init/exit/copy functions That was only on tmp.perf-tools-next. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
5414b53261 |
perf cs-etm: Make PID format accessible from struct cs_etm_auxtrace
To avoid every user of PID format having to use their own static local variable, cache it on initialisation and change the accessor to take struct cs_etm_auxtrace. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230612111403.100613-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |