Commit Graph

16437 Commits

Author SHA1 Message Date
Ian Rogers
071b117e75 perf stat: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:10:52 -03:00
Ian Rogers
113f614c6d perf report: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-16-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:10:27 -03:00
Ian Rogers
a37c0436f3 perf inject: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:10:10 -03:00
Ian Rogers
2fa28ccb17 perf script: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:09:49 -03:00
Ian Rogers
6bfb6df866 perf c2c: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:09:32 -03:00
Ian Rogers
cecb1cf154 perf record: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:09:05 -03:00
Ian Rogers
419cbc44f5 perf evlist: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:08:35 -03:00
Ian Rogers
b4fd4d00f9 perf lock: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:08:08 -03:00
Ian Rogers
a01a5ef988 perf kvm: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:07:40 -03:00
Ian Rogers
584a268f50 perf buildid-list: Use perf_tool__init
Reduce scope of build_id__mark_dso_hit_ops() to the scope of function
perf_session__list_build_ids, its only use, and use perf_tool__init()
for the default values. Move perf_event__exit_del_thread() to event.[ch]
so it can be used in builtin-buildid-list.c.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:07:10 -03:00
Ian Rogers
f32b37cc78 perf kmem: Use perf_tool__init
Reduce the scope of the tool from global/static to just that of the
cmd_kmem function where the session is scoped. Use the perf_tool__init()
to initialize default values.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:06:48 -03:00
Ian Rogers
ae737b6102 perf tool: Add perf_tool__init()
Add init function that behaves like perf_tool__fill_defaults() but
assumes all values haven't been initialized.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:06:26 -03:00
Ian Rogers
564e5cbcfd perf tool: Move fill defaults into tool.c
The aim here is to eventually make perf_tool__fill_defaults() an init
function so that the tools struct is more const.

Create a tool.c to go along with tool.h. Move perf_tool__fill_defaults()
out of session.c into tool.c along with the default stub values. Add
perf_tool__compressed_is_stub() for a test in
perf_session__process_user_event().

perf_session__process_compressed_event() is only used from being default
initialized so migrate into tool.c.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:05:39 -03:00
Ian Rogers
30f29bae91 perf tool: Constify tool pointers
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:05:14 -03:00
Ian Rogers
1816dc4bc5 perf s390-cpumsf: Remove unused struct
struct s390_cpumsf_synth was likely cargo culted from other auxtrace
examples. It has no users, so remove.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:56 -03:00
Ian Rogers
4e322c7855 perf auxtrace: Remove dummy tools
Add perf_session__deliver_synth_attr_event that synthesizes a
perf_record_header_attr event with one id. Remove use of
perf_event__synthesize_attr that necessitates the use of the dummy
tool in order to pass the session.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:36 -03:00
Ian Rogers
79bcd34e0f perf inject: Fix leader sampling inserting additional samples
The processing of leader samples would turn an individual sample with
a group of read values into multiple samples. 'perf inject' would pass
through the additional samples increasing the output data file size:

  $ perf record -g -e "{instructions,cycles}:S" -o perf.orig.data true
  $ perf script -D -i perf.orig.data | sed -e 's/perf.orig.data/perf.data/g' > orig.txt
  $ perf inject -i perf.orig.data -o perf.new.data
  $ perf script -D -i perf.new.data | sed -e 's/perf.new.data/perf.data/g' > new.txt
  $ diff -u orig.txt new.txt
  --- orig.txt    2024-07-29 14:29:40.606576769 -0700
  +++ new.txt     2024-07-29 14:30:04.142737434 -0700
  ...
  -0xc550@perf.data [0x30]: event: 3
  +0xc550@perf.data [0xd0]: event: 9
  +.
  +. ... raw event: size 208 bytes
  +.  0000:  09 00 00 00 01 00 d0 00 fc 72 01 86 ff ff ff ff  .........r......
  +.  0010:  74 7d 2c 00 74 7d 2c 00 fb c3 79 f9 ba d5 05 00  t},.t},...y.....
  +.  0020:  e6 cb 1a 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
  +.  0030:  02 00 00 00 00 00 00 00 76 01 00 00 00 00 00 00  ........v.......
  +.  0040:  e6 cb 1a 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  +.  0050:  62 18 00 00 00 00 00 00 f6 cb 1a 00 00 00 00 00  b...............
  +.  0060:  00 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00  ................
  +.  0070:  80 ff ff ff ff ff ff ff fc 72 01 86 ff ff ff ff  .........r......
  +.  0080:  f3 0e 6e 85 ff ff ff ff 0c cb 7f 85 ff ff ff ff  ..n.............
  +.  0090:  bc f2 87 85 ff ff ff ff 44 af 7f 85 ff ff ff ff  ........D.......
  +.  00a0:  bd be 7f 85 ff ff ff ff 26 d0 7f 85 ff ff ff ff  ........&.......
  +.  00b0:  6d a4 ff 85 ff ff ff ff ea 00 20 86 ff ff ff ff  m......... .....
  +.  00c0:  00 fe ff ff ff ff ff ff 57 14 4f 43 fc 7e 00 00  ........W.OC.~..
  +
  +1642373909693435 0xc550 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 2915700/2915700: 0xffffffff860172fc period: 1 addr: 0
  +... FP chain: nr:12
  +.....  0: ffffffffffffff80
  +.....  1: ffffffff860172fc
  +.....  2: ffffffff856e0ef3
  +.....  3: ffffffff857fcb0c
  +.....  4: ffffffff8587f2bc
  +.....  5: ffffffff857faf44
  +.....  6: ffffffff857fbebd
  +.....  7: ffffffff857fd026
  +.....  8: ffffffff85ffa46d
  +.....  9: ffffffff862000ea
  +..... 10: fffffffffffffe00
  +..... 11: 00007efc434f1457
  +... sample_read:
  +.... group nr 2
  +..... id 00000000001acbe6, value 0000000000000176, lost 0
  +..... id 00000000001acbf6, value 0000000000001862, lost 0
  +
  +0xc620@perf.data [0x30]: event: 3
  ...

This behavior is incorrect as in the case above 'perf inject' should
have done nothing. Fix this behavior by disabling separating samples
for a tool that requests it. Only request this for `perf inject` so as
to not affect other perf tools. With the patch and the test above
there are no differences between the orig.txt and new.txt.

Fixes: e4caec0d1a ("perf evsel: Add PERF_SAMPLE_READ sample related processing")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240729220620.2957754-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Namhyung Kim
7f3c8f13ad perf annotate-data: Show first-level children by default in TUI
Now default is to fold everything but it only shows the name of the
top-level data type which is not very useful.  Instead just expand the
top level entry so that it can show the layout at a higher level.

  Annotate type: 'struct task_struct' (4 samples)
        Percent     Offset       Size  Field
  -      100.00          0       9792  struct task_struct {                           ◆
  +        0.50          0         24      struct thread_info     thread_info;        ▒
           0.00         24          4      unsigned int   __state;                    ▒
           0.00         32          8      void*  stack;                              ▒
  +        0.00         40          4      refcount_t     usage;                      ▒
           0.00         44          4      unsigned int   flags;                      ▒
           0.00         48          4      unsigned int   ptrace;                     ▒
           0.00         52          4      int    on_cpu;                             ▒
  +        0.00         56         16      struct __call_single_node      wake_entry; ▒
           0.00         72          4      unsigned int   wakee_flips;                ▒
           0.00         80          8      long unsigned int      wakee_flip_decay_ts;▒
           0.00         88          8      struct task_struct*    last_wakee;         ▒
           0.00         96          4      int    recent_used_cpu;                    ▒
           0.00        100          4      int    wake_cpu;                           ▒
           0.00        104          4      int    on_rq;                              ▒
           0.00        108          4      int    prio;                               ▒
           0.00        112          4      int    static_prio;                        ▒
           0.00        116          4      int    normal_prio;                        ▒
           0.00        120          4      unsigned int   rt_priority;                ▒
  +        0.00        128        256      struct sched_entity    se;                 ▒
  +        0.00        384         48      struct sched_rt_entity rt;                 ▒
  +        0.00        432        224      struct sched_dl_entity dl;                 ▒
           0.00        656          8      struct sched_class*    sched_class;        ▒
  ...

Committer testing:

  # perf mem record -a sleep 5s
  # perf annotate --group --data-type=pthread_mutex_t

 Annotate type: 'pthread_mutex_t' (13 samples)
      Percent     Offset       Size  Field
-      100.00          0         40  pthread_mutex_t {                                ▒
-      100.00          0         40      struct __pthread_mutex_s       __data {      ▒
        39.45          0          4          int        __lock;                       ▒
         0.00          4          4          unsigned int       __count;              ▒
         7.80          8          4          int        __owner;                      ▒
         6.88         12          4          unsigned int       __nusers;             ▒
        45.87         16          4          int        __kind;                       ▒
         0.00         20          2          short int  __spins;                      ▒
         0.00         22          2          short int  __elision;                    ▒
+        0.00         24         16          __pthread_list_t   __list;               ▒
                                         };                                           ▒
         0.00          0          0      char[] __size;                               ▒
        39.45          0          8      long int       __align;

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812194447.2049187-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Namhyung Kim
af73856e9a perf annotate-data: Implement folding in TUI browser
Like 'perf report', use 'e' or 'E' key to toggle folding the current
entry so that it can control displaying child entries.

Note I didn't add the 'c' and 'C' key to collapse the entry because it's
also handled with the 'e'/'E' since it toggles the state.

Committer testing:

Do some 'perf mem record' for some workload of the whole system, using
the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide
profiling, etc) and then:

  # perf annotate --skip-empty --data-type=pthread_mutex_t

That, by default, will start as --tui, then press 'E' to see the whole
struct unfolded, etc.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812194447.2049187-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Namhyung Kim
05fc5b7de3 perf annotate-data: Support folding in TUI browser
Like in the hists browser, it should support folding current entry so
that it can hide unwanted details in some data structures.

The folded entries will be displayed with the '+' sign, while unfolded
entries will have the '-' sign.

Entries that have no children will not show any signs.

  Annotate type: 'struct socket' (1 samples)
        Percent     Offset       Size  Field
  -      100.00          0        128  struct socket {                                  ◆
           0.00          0          4      socket_state   state;                        ▒
           0.00          4          2      short int      type;                         ▒
           0.00          8          8      long unsigned int      flags;                ▒
           0.00         16          8      struct file*   file;                         ▒
         100.00         24          8      struct sock*   sk;                           ▒
           0.00         32          8      struct proto_ops*      ops;                  ▒
  -        0.00         64         64      struct socket_wq       wq {                  ▒
  -        0.00         64         24          wait_queue_head_t  wait {                ▒
  +        0.00         64          4              spinlock_t     lock;                 ▒
  -        0.00         72         16              struct list_head       head {        ▒
           0.00         72          8                  struct list_head*  next;         ▒
           0.00         80          8                  struct list_head*  prev;         ▒
                                                   };                                   ▒
                                               };                                       ▒
           0.00         88          8          struct fasync_struct*      fasync_list;  ▒
           0.00         96          8          long unsigned int  flags;                ▒
  +        0.00        104         16          struct callback_head       rcu;          ▒
                                           };                                           ▒
                                       };                                               ▒

This just adds the display logic for folding, actually folding action
will be implemented in the next patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812194447.2049187-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Ian Rogers
7a75c6c23a perf vendor events: SKX, CLX, SNR uncore cache event fixes
Cache home agent (CHA) events were setting the low rather than high
config1 bits. SNR was using CLX CHA events, however its CHA is similar
to ICX so remove the events.

Incorporate the updates in:

  https://github.com/intel/perfmon/pull/215
  https://github.com/intel/perfmon/pull/216

Fixes: 4cc4994244 ("perf vendor events: Update cascadelakex events/metrics")
Closes: https://lore.kernel.org/linux-perf-users/CAPhsuW4nem9XZP+b=sJJ7kqXG-cafz0djZf51HsgjCiwkGBA+A@mail.gmail.com/
Reported-by: Song Liu <song@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Co-authored-by: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240811042004.421869-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Namhyung Kim
040c0f887f perf lock contention: Change stack_id type to s32
The bpf_get_stackid() helper returns a signed type to check whether it
failed to get a stacktrace or not.  But it saved the result in u32 and
checked if the value is negative.

      376         if (needs_callstack) {
      377                 pelem->stack_id = bpf_get_stackid(ctx, &stacks,
      378                                                   BPF_F_FAST_STACK_CMP | stack_skip);
  --> 379                 if (pelem->stack_id < 0)

  ./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin()
  warn: unsigned 'pelem->stack_id' is never less than zero.

Let's change the type to s32 instead.

Fixes: 6d499a6b3d ("perf lock: Print the number of lost entries for BPF")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812172533.2015291-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:04:35 -03:00
Namhyung Kim
00b0424268 perf annotate-data: Fix a buffer overflow in TUI browser
In get_member_overhead(), k is updated when it has a entry in the
histogram.  But the entry->hists array is allocated with the number of
evsel in the group.  So the k should be reset when it iterates the event
using for_each_group_evsel(), otherwise it'd crash due to a buffer
overflow.

Fixes: cb1898f58e ("perf annotate-data: Support --skip-empty option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240810191502.1947959-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 18:01:59 -03:00
Leo Yan
043da846c2 perf docs: Refine the description for the buffer size
Current description for the AUX trace buffer size is misleading. When a
user specifies the option '-m,512M', it represents a size value in bytes
(512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB).

Make the document clear that the normal buffer and the AUX tracing
buffer share the same semantics. Syncs the documents for consistent
text.

Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 13:59:22 -03:00
Martin Liška
e6b56ae7c2 perf script: add --addr2line option
Similarly to other subcommands (like report, top), it would be handy to
provide a path for addr2line command.

Signed-off-by: Martin Liska <martin.liska@hey.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 13:59:22 -03:00
Arnaldo Carvalho de Melo
4f21bfed69 perf tests pmu: Initialize all fields of test_pmu variable
Instead of explicitely initializing just the .name and .alias_name,
use struct member named initialization of just the non-null -name field,
the compiler will initialize all the other non-explicitely initialized
fields to NULL.

This makes the code more robust, avoiding the error recently fixed when
the .alias_name was used and contained a random value.

Reviewed-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/lkml/e26941f9-f86c-4f2e-b812-20c49fb2c0d3@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12 13:42:56 -03:00
Namhyung Kim
cb1898f58e perf annotate-data: Support --skip-empty option
The --skip-empty option is to hide dummy events in a group.  Like other
output mode in 'perf report' and 'perf annotate', the data-type
profiling output should support the option.

Committer testing:

With dummy:

  root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
  Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
   event[0] = cpu_atom/mem-loads,ldlat=30/P
   event[1] = cpu_atom/mem-stores/P
   event[2] = dummy:u
  ============================================================================
                   Percent     offset       size  field
    100.00  100.00    0.00          0         40  pthread_mutex_t	 {
    100.00  100.00    0.00          0         40      struct __pthread_mutex_s	__data {
     45.21   84.54    0.00          0          4          int	__lock;
      0.00    0.00    0.00          4          4          unsigned int	__count;
      0.00    1.83    0.00          8          4          int	__owner;
      5.19   10.65    0.00         12          4          unsigned int	__nusers;
     49.61    2.97    0.00         16          4          int	__kind;
      0.00    0.00    0.00         20          2          short int	__spins;
      0.00    0.00    0.00         22          2          short int	__elision;
      0.00    0.00    0.00         24         16          __pthread_list_t	__list {
      0.00    0.00    0.00         24          8              struct __pthread_internal_list*	__prev;
      0.00    0.00    0.00         32          8              struct __pthread_internal_list*	__next;
                                                          };
                                                      };
      0.00    0.00    0.00          0          0      char[]	__size;
     45.21   84.54    0.00          0          8      long int	__align;
                                                };
Skipping it:

  root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24
  Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples):
   event[0] = cpu_atom/mem-loads,ldlat=30/P
   event[1] = cpu_atom/mem-stores/P
  ============================================================================
           Percent     offset       size  field
    100.00  100.00          0         40  pthread_mutex_t	 {
    100.00  100.00          0         40      struct __pthread_mutex_s	__data {
     45.21   84.54          0          4          int	__lock;
      0.00    0.00          4          4          unsigned int	__count;
      0.00    1.83          8          4          int	__owner;
      5.19   10.65         12          4          unsigned int	__nusers;
     49.61    2.97         16          4          int	__kind;
      0.00    0.00         20          2          short int	__spins;
      0.00    0.00         22          2          short int	__elision;
      0.00    0.00         24         16          __pthread_list_t	__list {
      0.00    0.00         24          8              struct __pthread_internal_list*	__prev;
      0.00    0.00         32          8              struct __pthread_internal_list*	__next;
                                                  };
                                              };
      0.00    0.00          0          0      char[]	__size;
     45.21   84.54          0          8      long int	__align;
                                          };

  Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples):
  root@number:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807061713.1642924-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09 18:32:51 -03:00
Namhyung Kim
336989d00f perf annotate: Fix --group behavior when leader has no samples
When --group option is used, it should display all events together.  But
the current logic only checks if the first (leader) event has samples or
not.  Let's check the member events as well.

Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.

For example, take a look at this example.

  $ ./perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

It has three events but 'path_put' function has samples only for
mem-stores (second) event.

  $ sudo ./perf annotate --stdio -f path_put
   Percent |      Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
  ----------------------------------------------------------------------------------------------------------
           : 0                0xffffffffae600020 <path_put>:
      0.00 :   ffffffffae600020:       endbr64
      0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
     91.22 :   ffffffffae600029:       pushq   %rbx
      0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      8.78 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00 :   ffffffffae600039:       popq    %rbx
      0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00 :   ffffffffae60003f:       nop

Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples.  But now it checks both
events.

Before:
  $ sudo ./perf annotate --stdio -f --group path_put
  (no output)

After:
  $ sudo ./perf annotate --stdio -f --group path_put
   Percent                 |      Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
  -------------------------------------------------------------------------------------------------------------------------------------------------------------
                           : 0                0xffffffffae600020 <path_put>:
      0.00    0.00    0.00 :   ffffffffae600020:       endbr64
      0.00    0.00    0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
      0.00   91.22    0.00 :   ffffffffae600029:       pushq   %rbx
      0.00    0.00    0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00    0.00    0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      0.00    8.78    0.00 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00    0.00    0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00    0.00    0.00 :   ffffffffae600039:       popq    %rbx
      0.00    0.00    0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00    0.00    0.00 :   ffffffffae60003f:       nop

Committer testing:

Before:

  root@number:~# perf annotate --group --stdio2 clear_page_erms
  root@number:~#

After:

  root@number:~# perf annotate --group --stdio2 clear_page_erms
  Samples: 125  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period]
  clear_page_erms() /proc/kcore
  Percent                      0xffffffff990c6cc0 <clear_page_erms>:
                                 endbr64
                                 movl    $0x1000,%ecx
                                 xorl    %eax,%eax
     0.00  100.00    0.00        rep     stosb %al, (%rdi)
                               ← retq
                                 int3
                                 int3
                                 int3
                                 int3
                                 nop
                                 nop
  root@number:~#

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09 18:12:29 -03:00
Andi Kleen
890a1961c8 perf tools: Create source symlink in perf object dir
Create a source symlink to the original source in the objdir.

This is similar to what the main kernel build script does.

Committer testing:

  ⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
  <SNIP>
  ⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source
  lrwxrwxrwx. 1 acme acme 41 Aug  9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf
  ⬢[acme@toolbox perf-tools-next]$

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240807231823.898979-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09 17:37:24 -03:00
Arnaldo Carvalho de Melo
13d675aea6 perf debuginfo: Fix the build with !HAVE_DWARF_SUPPORT
In that case we have a set of placeholder functions, one of them uses a
'Dwarf_Addr' type that is not present as it is defined in the missing
DWARF libraries, so provide a placeholder typedef for that as well.

The build error before this patch:

  In file included from util/annotate.c:28:
  util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’
     44 |                                              Dwarf_Addr *offs __maybe_unused,
        |                                              ^~~~~~~~~~
  make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1
  make[6]: *** Waiting for unfinished jobs....

Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09 17:37:03 -03:00
Zixian Cai
05673c42f7 perf script python: Add the 'ins_lat' field to event handler
For example, when using the Alder Lake PMU memory load event, the
instruction latency is stored in 'ins_lat', while the cache latency
is stored in 'weight'.

This patch reports the 'ins_lat' field for Python scripting.

Committer testing:

On a Rocket Lake Refresh Intel machine (14th gen):

  root@number:~# grep -m1 'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-14700K
  root@number:~# perf mem record -a sleep 5
  Memory events are enabled on a subset of CPUs: 16-27
  [ perf record: Woken up 85 times to write data ]
  [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ]
  root@number:~# perf evlist -v
  cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
  cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1
  dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  root@number:~#

Now generate a python script to then dump the dictionary that now needs
to have that 'ins_lat' field:

  root@number:~# perf script --gen python
  generated Python script: perf-script.py
  root@number:~# vim perf-script.py
  root@number:~# perf script -s perf-script.py | head -40
  in trace_begin
  in trace_end
  root@number:~# vim perf-script.py

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Zixian Cai <fzczx123@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09 10:25:07 -03:00
Arnaldo Carvalho de Melo
9e9d0a79d3 perf test shell lbr: Support hybrid x86 systems too
Running on a:

  root@x1:~# grep 'model name' -m1 /proc/cpuinfo
  model name	: 13th Gen Intel(R) Core(TM) i7-1365U
  root@x1:~#

It skips all the tests with:

  root@x1:~# perf test -vvvv LBR
   97: perf record LBR tests:
  --- start ---
  test child forked, pid 2033388
  Skip: only x86 CPUs support LBR
  ---- end(-2) ----
   97: perf record LBR tests                                           : Skip
  root@x1:~#

Because the test checks for the /sys/devices/cpu/caps/branches file,
that isn't present as we have instead:

  root@x1:~# ls -la /sys/devices/cpu*/caps/branches
  -r--r--r--. 1 root root 4096 Aug  8 11:22 /sys/devices/cpu_atom/caps/branches
  -r--r--r--. 1 root root 4096 Aug  8 11:21 /sys/devices/cpu_core/caps/branches
  root@x1:~#

If we check as well for one of those,
/sys/devices/cpu_core/caps/branches, then we don't skip the tests and
all are run on these x86 Intel Hybrid systems as well, passing all of
them:

  root@x1:~# perf test -vvvv LBR
   97: perf record LBR tests:
  --- start ---
  test child forked, pid 2034956
  LBR callgraph
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ]
  LBR callgraph [Success]
  LBR any branch test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ]
  LBR any branch test: 8071 samples
  LBR any branch test [Success]
  LBR any call test
  [ perf record: Woken up 23 times to write data ]
  [ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ]
  LBR any call test: 8092 samples
  LBR any call test [Success]
  LBR any ret test
  [ perf record: Woken up 24 times to write data ]
  [ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ]
  LBR any ret test: 8093 samples
  LBR any ret test [Success]
  LBR any indirect call test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ]
  LBR any indirect call test: 8067 samples
  LBR any indirect call test [Success]
  LBR any indirect jump test
  [ perf record: Woken up 12 times to write data ]
  [ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ]
  LBR any indirect jump test: 8061 samples
  LBR any indirect jump test [Success]
  LBR direct calls test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ]
  LBR direct calls test: 8076 samples
  LBR direct calls test [Success]
  LBR any indirect user call test
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ]
  LBR any indirect user call test: 8079 samples
  LBR any indirect user call test [Success]
  LBR system wide any branch test
  [ perf record: Woken up 26 times to write data ]
  [ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ]
  LBR system wide any branch test: 9209 samples
  LBR system wide any branch test [Success]
  LBR system wide any call test
  [ perf record: Woken up 25 times to write data ]
  [ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ]
  LBR system wide any call test: 9333 samples
  LBR system wide any call test [Success]
  LBR parallel any branch test
  LBR parallel any call test
  LBR parallel any ret test
  LBR parallel any indirect call test
  LBR parallel any indirect jump test
  LBR parallel direct calls test
  LBR parallel system wide any branch test
  LBR parallel any indirect user call test
  LBR parallel system wide any call test
  [ perf record: Woken up 9 times to write data ]
  [ perf record: Woken up 51 times to write data ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Woken up 5 times to write data ]
  [ perf record: Woken up 559 times to write data ]
  [ perf record: Woken up 14 times to write data ]
  [ perf record: Woken up 17 times to write data ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Woken up 11 times to write data ]
  [ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ]
  [ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ]
  [ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ]
  [ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ]
  [ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ]
  [ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ]
  [ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ]
  LBR parallel any indirect user call test: 1909 samples
  [ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ]
  LBR parallel any indirect user call test [Success]
  [ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ]
  LBR parallel any indirect jump test: 4311 samples
  LBR parallel any indirect jump test [Success]
  LBR parallel direct calls test: 3033 samples
  LBR parallel direct calls test [Success]
  LBR parallel any indirect call test: 1742 samples
  LBR parallel any indirect call test [Success]
  LBR parallel any call test: 4827 samples
  LBR parallel any call test [Success]
  LBR parallel any branch test: 6130 samples
  LBR parallel any branch test [Success]
  LBR parallel system wide any branch test: 29662 samples
  LBR parallel any ret test: 3971 samples
  LBR parallel any ret test [Success]
  LBR parallel system wide any branch test [Success]
  LBR parallel system wide any call test: 29576 samples
  LBR parallel system wide any call test [Success]
  ---- end(0) ----
   97: perf record LBR tests                                           : Ok
  root@x1:~#

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 17:30:39 -03:00
Ian Rogers
32559b99e0 perf test: Add set of perf record LBR tests
Adds coverage for LBR operations and LBR callgraph.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 17:30:38 -03:00
Ian Rogers
599c19397b perf callchain: Fix stitch LBR memory leaks
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps
and map members are reference counted. Ensure these values use a _get
routine to increment the reference counts and use map_symbol__exit() to
release the reference counts.

Do similar for 'struct thread's prev_lbr_cursor, but save the size of
the prev_lbr_cursor array so that it may be iterated.

Ensure that when stitch_nodes are placed on the free list the
map_symbols are exited.

Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to
list_splice_init(), so the whole list is moved and nodes aren't leaked.

A reproduction of the memory leaks is possible with a leak sanitizer
build in the perf report command of:

  ```
  $ perf record -e cycles --call-graph lbr perf test -w thloop
  $ perf report --stitch-lbr
  ```

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Fixes: ff165628d7 ("perf callchain: Stitch LBR call stack")
Signed-off-by: Ian Rogers <irogers@google.com>
[ Basic tests after applying the patch, repeating the example above ]
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anne Macedo <retpolanne@posteo.net>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 17:30:27 -03:00
Veronika Molnarova
37e2a19c98 perf test pmu: Set uninitialized PMU alias to null
Commit 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard
support") adds a test case "PMU cmdline match" that covers PMU name
wildcard support provided by function perf_pmu__match().

The test works with a wide range of supported combinations of PMU name
matching but omits the case that if the perf_pmu__match() cannot match
the PMU name to the wildcard, it tries to match its alias. However, this
variable is not set up, causing the test case to fail when run with
subprocesses or to segfault if run as a single process.

  ./perf test -vv 9
    9: Sysfs PMU tests                                :
    9.1: Parsing with PMU format directory            : Ok
    9.2: Parsing with PMU event                       : Ok
    9.3: PMU event names                              : Ok
    9.4: PMU name combining                           : Ok
    9.5: PMU name comparison                          : Ok
    9.6: PMU cmdline match                            : FAILED!

  ./perf test -F 9
    9.1: Parsing with PMU format directory            : Ok
    9.2: Parsing with PMU event                       : Ok
    9.3: PMU event names                              : Ok
    9.4: PMU name combining                           : Ok
    9.5: PMU name comparison                          : Ok
  Segmentation fault (core dumped)

Initialize the PMU alias to null for all tests of perf_pmu__match()
as this functionality is not being tested and the alias matching works
exactly the same as the matching of the PMU name.

./perf test -F 9
  9.1: Parsing with PMU format directory                             : Ok
  9.2: Parsing with PMU event                                        : Ok
  9.3: PMU event names                                               : Ok
  9.4: PMU name combining                                            : Ok
  9.5: PMU name comparison                                           : Ok
  9.6: PMU cmdline match                                             : Ok

Fixes: 3e0bf9fde2 ("perf pmu: Restore full PMU name wildcard support")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 11:01:54 -03:00
Arnaldo Carvalho de Melo
2df5484bbf perf tests ftrace: Add pattern check for time, count
In 'perf ftrace profile sleep 0.1' we know that we'll have an specific
kernel function that will take a bit more than 0.1 seconds and will take
place just one time, so we can add a check for that so that we validate
more than just the presence of some functions in the profile.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 09:59:40 -03:00
Namhyung Kim
ed5bb548cc perf test: Add a new shell test for perf ftrace
$ sudo ./perf test ftrace -vv
   86: perf ftrace tests:
  --- start ---
  test child forked, pid 1772223
  perf ftrace list test
  syscalls for sleep:
  __x64_sys_nanosleep
  __ia32_sys_nanosleep
  __x64_sys_clock_nanosleep
  __ia32_sys_clock_nanosleep
  perf ftrace list test  [Success]
  perf ftrace trace test
  # tracer: function_graph
  #
  # CPU  DURATION                  FUNCTION CALLS
  # |     |   |                     |   |   |   |
   0)               |  __x64_sys_clock_nanosleep() {
   0)               |    common_nsleep() {
   0)               |      hrtimer_nanosleep() {
   0)               |        do_nanosleep() {
  perf ftrace trace test  [Success]
  perf ftrace latency test
  target function: __x64_sys_clock_nanosleep
  #   DURATION     |      COUNT | GRAPH                                          |
      32 - 64   ms |          1 | ############################################## |
  perf ftrace latency test  [Success]
  perf ftrace profile test
  # Total (us)   Avg (us)   Max (us)      Count   Function
    100136.400 100136.400 100136.400          1   __x64_sys_clock_nanosleep
    100135.200 100135.200 100135.200          1   common_nsleep
    100134.700 100134.700 100134.700          1   hrtimer_nanosleep
    100133.700 100133.700 100133.700          1   do_nanosleep
    100130.600 100130.600 100130.600          1   schedule
       166.868     55.623     80.299          3   scheduler_tick
         5.926      5.926      5.926          1   native_smp_send_reschedule
       301.941    301.941    301.941          1   __x64_sys_execve
       295.786    295.786    295.786          1   do_execveat_common.isra.0
        71.397     35.699     46.403          2   bprm_execve
         2.519      1.260      1.547          2   sched_mm_cid_before_execve
         1.098      0.549      0.686          2   sched_mm_cid_after_execve
  perf ftrace profile test  [Success]
  ---- end(0) ----
   86: perf ftrace tests                                               : Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240808044954.1775333-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 09:41:35 -03:00
Namhyung Kim
90d78e7b8e perf annotate-data: Show typedef names properly
The die_get_typename() would resolve typedef and get to the original
type.  But sometimes the original type is a struct without name and it
makes the output confusing and hard to read.

This is a diff of perf report -s type before and after the change.
New types such as atomic{,64}_t and sigset_t appeared and the portion
of unnamed struct was reduced.  Also u32, u64 and size_t were splitted
from the base types.

  --- b   2024-08-01 17:02:34.307809952 -0700
  +++ a   2024-08-07 14:17:05.245853999 -0700
  -     2.40%  long unsigned int
  +     2.26%  long unsigned int
  -     1.56%  unsigned int
  +     1.27%  unsigned int
  -     0.98%  struct
  -     0.79%  long long unsigned int
  +     0.58%  long long unsigned int
  +     0.36%  struct
  +     0.27%  atomic64_t
  +     0.22%  u32
  +     0.21%  u64
  +     0.19%  atomic_t
  +     0.13%  size_t
  -     0.08%  struct seqcount_spinlock
  +     0.08%  seqcount_spinlock_t
  +     0.08%  sigset_t
  +     0.08%  __poll_t

Let's use the typedef name directly and the resolved to get the size of
the type.

Committer testing:

  root@x1:~# diff -u before after | head -30
  --- before	2024-08-08 09:35:13.917325041 -0300
  +++ after	2024-08-08 09:37:35.312257905 -0300
  @@ -10,25 +10,27 @@
   # ........  .........
   #
       79.40%  (unknown)
  -     2.28%  union
        1.96%  (stack operation)
  -     1.24%  struct
  +     1.87%  pthread_mutex_t
        0.99%  u32[]
  -     0.92%  unsigned int
        0.77%  struct task_struct
  +     0.75%  U32
        0.75%  struct pcpu_hot
        0.63%  struct qspinlock
  +     0.61%  atomic_t
        0.59%  struct list_head
  -     0.58%  int
        0.53%  struct cfs_rq
        0.51%  BYTE*
  -     0.48%  unsigned char
  +     0.48%  BYTE
        0.48%  long unsigned int
        0.46%  struct rq
        0.41%  struct worker
        0.41%  struct memcg_vmstats_percpu
  +     0.41%  pthread_cond_t
        0.37%  _Bool
  +     0.36%  int
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 09:36:52 -03:00
Namhyung Kim
037f1b67e8 perf annotate: Cache debuginfo for data type profiling
In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample.  This is inefficient and it most
likely accesses the same binary again and again.

Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.

Note that this will introduce a little difference in the output due to
the order of checking stack operations.  It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check.  So it'll report stack operations in DSOs without debug
info as unknown.  But I think it's ok and better to have the checking
near the caching logic.

Committer testing:

  root@x1:~# perf mem record -a sleep 5s
  root@x1:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u
  root@x1:~# diff -u before after
  --- before	2024-08-08 09:33:53.880780784 -0300
  +++ after	2024-08-08 09:35:13.917325041 -0300
  @@ -81,8 +81,8 @@
   # Overhead  Data Type
   # ........  .........
   #
  -    55.43%  (unknown)
  -    11.61%  (stack operation)
  +    55.56%  (unknown)
  +    11.48%  (stack operation)
        4.93%  struct pcpu_hot
        3.26%  unsigned int
        2.48%  struct

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 09:34:43 -03:00
Ian Rogers
b2f70c99ed perf hist: Fix reference counting of branch_info
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08 09:32:02 -03:00
Arnaldo Carvalho de Melo
37ce8a562a Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick a patch that albeit being for tools/perf/ directory went thru a
different tree and ended up breaking some recent tests introduced in the
perf-tools-next tree to validate duplicate events in the JSON
performance event files.

Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06 14:01:06 -03:00
Ian Rogers
4bd380390f perf jevents.py: Ensure event names aren't duplicated
Duplicate event names break invariants in 'perf list'. Assert that an
event name isn't duplicated so that broken JSON won't build.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06 10:37:12 -03:00
Ian Rogers
c4f74bb61a perf pmu-events: Remove duplicated ampereone event
OP_SPEC is repeated twice in the file which will break invariants in
'perf list' as discussed in this thread:

  https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06 10:35:27 -03:00
Ian Rogers
b79f9a437a perf pmu-events: Change dependencies for empty-pmu-events.c test
Switch from $? (all the prerequisites that are newer than the target)
to $^ (all the prerequisites) as touching jevents.py will mean that
empty-pmu-events.c won't be passed to the diff command breaking the
build.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06 10:35:27 -03:00
Ian Rogers
2576b20abd perf test: Add build test for JEVENTS_ARCH=all
Building with JEVENTS_ARCH=all builds all CPU types and allows things
like assertions to check the validity of the input JSON.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Charles Ci-Jyun Wu <dminus@andestech.com>
Cc: Eric Lin <eric.lin@sifive.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inochi Amaoto <inochiama@outlook.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Locus Wei-Han Chen <locus84@andestech.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06 10:35:27 -03:00
Namhyung Kim
ce533c9bc6 perf annotate: Add --skip-empty option
Like in 'perf report', we want to hide empty events in the 'perf annotate'
output.  This is consistent when the option is set in perf report.

For example, the following command would use 3 events including dummy.

  $ perf mem record -a -- perf test -w noploop

  $ perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

Just using perf annotate with --group will show the all 3 events.

  $ perf annotate --group --stdio | head
   Percent                 |	Source code & Disassembly of ...
  --------------------------------------------------------------
                           : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d

Now with --skip-empty, it'll hide the last dummy event.

  $ perf annotate --group --stdio --skip-empty | head
   Percent         |	Source code & Disassembly of ...
  ------------------------------------------------------
                   : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00 :    e06d:       movl    %edx, %r13d

Committer testing:

  root@x1:~# perf evlist
  cpu_atom/mem-loads,ldlat=30/P
  cpu_atom/mem-stores/P
  dummy:u
  root@x1:~#

Before:

  root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
  Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
  do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
  Percent                       0x9900 <do_lookup_x>:
                                  pushq      %rbp
                                  movq       %rsp,%rbp
                                  pushq      %r15
                                  pushq      %r14
                                  pushq      %r13
                                  pushq      %r12
                                  pushq      %rbx
                                  subq       $0x88,%rsp
                                  movq       %rdi,-0x50(%rbp)
                                  movl       8(%r9),%edi
                                  movq       0x10(%rbp),%r12
                                  movq       0x28(%rbp),%r10
                                  movq       %rdx,-0x70(%rbp)
                                  movq       %rcx,-0x58(%rbp)
                                  movq       %rdi,%r11
     0.00    5.73    0.00         movq       %r8,-0x68(%rbp)
                                  movq       (%r9),%r8
                                  movl       %esi,%eax
     8.30    0.00    0.00         movl       0x30(%rbp),%r9d
                                  movl       %esi,%r15d
                                  shrl       $6, %eax
                                  movq       %r8,%r13
  root@x1:~#

After:

  root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
  Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
  do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
  Percent               0x9900 <do_lookup_x>:
                          pushq      %rbp
                          movq       %rsp,%rbp
                          pushq      %r15
                          pushq      %r14
                          pushq      %r13
                          pushq      %r12
                          pushq      %rbx
                          subq       $0x88,%rsp
                          movq       %rdi,-0x50(%rbp)
                          movl       8(%r9),%edi
                          movq       0x10(%rbp),%r12
                          movq       0x28(%rbp),%r10
                          movq       %rdx,-0x70(%rbp)
                          movq       %rcx,-0x58(%rbp)
                          movq       %rdi,%r11
     0.00    5.73         movq       %r8,-0x68(%rbp)
                          movq       (%r9),%r8
                          movl       %esi,%eax
     8.30    0.00         movl       0x30(%rbp),%r9d
                          movl       %esi,%r15d
                          shrl       $6, %eax
                          movq       %r8,%r13
  root@x1:~#

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 16:14:01 -03:00
Namhyung Kim
bb588e3829 perf annotate: Set al->data_nr using the notes->src->nr_events
This is a preparation to support skipping empty events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 16:13:18 -03:00
Namhyung Kim
b00e4d0d93 perf annotate: Use annotation__pcnt_width() consistently
The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly.  Use this
function consistently so that we can make sure it has similar output
in different modes.  But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.

Let's use 8 and adjust the print width in __annotation_line__write()
properly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 16:11:42 -03:00
Namhyung Kim
cb1e8bfc79 perf annotate: Set notes->src->nr_events early
We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 16:11:03 -03:00
Namhyung Kim
2dc02c2641 perf annotate: Use al->data_nr if possible
The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array.  The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05 16:07:02 -03:00