perf tools changes for v5.16:

perf annotate: - Add riscv64 support. - Add fusion logic for AMD microarchs. perf record: - Add an option to control the synthesizing behavior: --synth <no|all|task|mmap|cgroup> Fine-tune event synthesis: default=all core: - Allow controlling synthesizing PERF_RECORD_ metadata events during record. - perf.data reader prep work for multithreaded processing. - Fix missing exclude_{host,guest} setting in PMUs that don't support it and that were causing the feature detection code to disable it for all events, even the ones in PMUs that support it. - Fix the default use of precise events on AMD, that were always falling back to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does not have filtering capability, refusing precise + exclude_guest. - Add bitfield_swap() to handle branch_stack endian issue. perf script: - Show binary offsets for userspace addresses in callchains. - Support instruction latency via new "ins_lat" selectable field. - Add dlfilter-show-cycles perf inject: - Add vmlinux and ignore-vmlinux arguments, similar to other tools. perf list: - Display PMU prefix for partially supported hybrid cache events. - Display hybrid PMU events with cpu type. perf stat: - Improve metrics documentation of data structures. - Fix memory leaks in the metric code. - Use NAN for missing event IDs. - Don't compute unused events. - Fix memory leak on error path. - Encode and use metric-id as a metric qualifier. - Allow metrics with no events. - Avoid events for an 'if' constant result. - Only add a referenced metric once. - Simplify metric_refs calculation. - Allow modifiers on metrics. perf test: - Add workload test of metric and metric groups. - Workload test of all PMUs. - vmlinux-kallsyms: Ignore hidden symbols. - Add pmu-event test for event described as "config=". - Verify more event members in pmu-events test. - Add endian test for struct branch_flags on the sample-parsing test. - Improve temp file cleanup in several tests. perf daemon: - Address MSAN warnings on send_cmd(). perf kmem: - Improve man page for record options perf srcline: - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order. perf symbols: - Ignore $a/$d symbols for ARM modules. - Fix /proc/kcore access on 32 bit systems. Kernel UAPI copies: - Update copy of linux/socket.h with the kernel sources, no change in tooling output. libbpf: - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf. - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries() - Install libbpf headers locally when building. - Bump minimum LLVM C++ std to GNU++14. libperf: - Use binary search in perf_cpu_map__idx() as array are sorted. libtracefs: - Enable libtracefs dynamic linking. libtraceevent: - Increase logging when verbose. Arch specific: PowerPC: - Add support to expose instruction and data address registers as part of extended regs. Vendor events: JSON parser: - Support ConfigCode to set the config= in PMUs like: $ cat /sys/bus/event_source/devices/hisi_sccl1_ddrc3/events/act_cmd config=0x5 - Make the JSON parser more conformant when in strict mode. All JSON files: - Fix all remaining invalid JSON files. ARM: - Syntax corrections in Neoverse N1 json. - Categorise the Neoverse V1 counters. - Add new armv8 PMU events. - Revise hip08 uncore events. Hardware tracing: auxtrace: - Add missing Z option to ITRACE_HELP. - Add itrace A option to approximate IPC. - Add itrace d+o option to direct debug log to stdout. Intel PT: - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID - Support itrace A option to approximate IPC - Support itrace d+o option to direct debug log to stdout. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYYg7RwAKCRCyPKLppCJ+ J1KgAQCGC3802gsI38/xwli1SLBNHsZ9DEy2nX5Ikw/f64K+0QEA6DsU0hpRTR0D i8ZFa2zNX748n+pU2WX6E7AjO01hrQo= =sXHg -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "perf annotate: - Add riscv64 support. - Add fusion logic for AMD microarchs. perf record: - Add an option to control the synthesizing behavior: --synth <no|all|task|mmap|cgroup> core: - Allow controlling synthesizing PERF_RECORD_ metadata events during record. - perf.data reader prep work for multithreaded processing. - Fix missing exclude_{host,guest} setting in PMUs that don't support it and that were causing the feature detection code to disable it for all events, even the ones in PMUs that support it. - Fix the default use of precise events on AMD, that were always falling back to non-precise because perf_event_attr.exclude_guest=1 was set and IBS does not have filtering capability, refusing precise + exclude_guest. - Add bitfield_swap() to handle branch_stack endian issue. perf script: - Show binary offsets for userspace addresses in callchains. - Support instruction latency via new "ins_lat" selectable field. - Add dlfilter-show-cycles perf inject: - Add vmlinux and ignore-vmlinux arguments, similar to other tools. perf list: - Display PMU prefix for partially supported hybrid cache events. - Display hybrid PMU events with cpu type. perf stat: - Improve metrics documentation of data structures. - Fix memory leaks in the metric code. - Use NAN for missing event IDs. - Don't compute unused events. - Fix memory leak on error path. - Encode and use metric-id as a metric qualifier. - Allow metrics with no events. - Avoid events for an 'if' constant result. - Only add a referenced metric once. - Simplify metric_refs calculation. - Allow modifiers on metrics. perf test: - Add workload test of metric and metric groups. - Workload test of all PMUs. - vmlinux-kallsyms: Ignore hidden symbols. - Add pmu-event test for event described as "config=". - Verify more event members in pmu-events test. - Add endian test for struct branch_flags on the sample-parsing test. - Improve temp file cleanup in several tests. perf daemon: - Address MSAN warnings on send_cmd(). perf kmem: - Improve man page for record options perf srcline: - Use long-running addr2line per DSO, greatly speeding up the 'srcline' sort order. perf symbols: - Ignore $a/$d symbols for ARM modules. - Fix /proc/kcore access on 32 bit systems. Kernel UAPI copies: - Update copy of linux/socket.h with the kernel sources, no change in tooling output. libbpf: - Pull in bpf_program__get_prog_info_linear() from libbpf, too much specific to perf. - Deprecate bpf_map__resize() in favor of bpf_map_set_max_entries() - Install libbpf headers locally when building. - Bump minimum LLVM C++ std to GNU++14. libperf: - Use binary search in perf_cpu_map__idx() as array are sorted. libtracefs: - Enable libtracefs dynamic linking. libtraceevent: - Increase logging when verbose. Arch specific: * PowerPC: - Add support to expose instruction and data address registers as part of extended regs. Vendor events: * JSON parser: - Support ConfigCode to set the config= in PMUs - Make the JSON parser more conformant when in strict mode. * All JSON files: - Fix all remaining invalid JSON files. * ARM: - Syntax corrections in Neoverse N1 json. - Categorise the Neoverse V1 counters. - Add new armv8 PMU events. - Revise hip08 uncore events. Hardware tracing: * auxtrace: - Add missing Z option to ITRACE_HELP. - Add itrace A option to approximate IPC. - Add itrace d+o option to direct debug log to stdout. * Intel PT: - Add support for PERF_RECORD_AUX_OUTPUT_HW_ID - Support itrace A option to approximate IPC - Support itrace d+o option to direct debug log to stdout" * tag 'perf-tools-for-v5.16-2021-11-07-without-bpftool-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits) perf build: Install libbpf headers locally when building perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1 perf metric: Fix memory leaks perf parse-event: Add init and exit to parse_event_error perf parse-events: Rename parse_events_error functions perf stat: Fix memory leak on error path perf tools: Use __BYTE_ORDER__ perf inject: Add vmlinux and ignore-vmlinux arguments perf tools: Check vmlinux/kallsyms arguments in all tools perf tools: Refactor out kernel symbol argument sanity checking perf symbols: Ignore $a/$d symbols for ARM modules perf evsel: Don't set exclude_guest by default perf evsel: Fix missing exclude_{host,guest} setting perf bpf: Add missing free to bpf_event__print_bpf_prog_info() perf beauty: Update copy of linux/socket.h with the kernel sources perf clang: Fixes for more recent LLVM/clang tools: Bump minimum LLVM C++ std to GNU++14 perf bpf: Pull in bpf_program__get_prog_info_linear() Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t" perf test sample-parsing: Add endian test for struct branch_flags ...
2021-11-08 09:25:26 -08:00 · 2021-11-08 09:25:26 -08:00 · bbdbeb0048
commit bbdbeb0048
parent 1e9ed9360f 6b491a86b7
184 changed files with 5105 additions and 1727 deletions
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@ -61,27 +61,35 @@ enum perf_event_powerpc_regs {
 	PERF_REG_POWERPC_PMC4,
 	PERF_REG_POWERPC_PMC5,
 	PERF_REG_POWERPC_PMC6,
-	/* Max regs without the extended regs */
+	PERF_REG_POWERPC_SDAR,
+	PERF_REG_POWERPC_SIAR,
+	/* Max mask value for interrupt regs w/o extended regs */
 	PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
+	/* Max mask value for interrupt regs including extended regs */
+	PERF_REG_EXTENDED_MAX = PERF_REG_POWERPC_SIAR + 1,
 };

 #define PERF_REG_PMU_MASK	((1ULL << PERF_REG_POWERPC_MAX) - 1)

-/* Exclude MMCR3, SIER2, SIER3 for CPU_FTR_ARCH_300 */
-#define	PERF_EXCLUDE_REG_EXT_300	(7ULL << PERF_REG_POWERPC_MMCR3)
-
 /*
 * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300
- * includes 9 SPRS from MMCR0 to PMC6 excluding the
- * unsupported SPRS in PERF_EXCLUDE_REG_EXT_300.
+ * includes 11 SPRS from MMCR0 to SIAR excluding the
+ * unsupported SPRS MMCR3, SIER2 and SIER3.
 */
-#define PERF_REG_PMU_MASK_300   ((0xfffULL << PERF_REG_POWERPC_MMCR0) - PERF_EXCLUDE_REG_EXT_300)
+#define PERF_REG_PMU_MASK_300	\
+	((1ULL << PERF_REG_POWERPC_MMCR0) | (1ULL << PERF_REG_POWERPC_MMCR1) | \
+	(1ULL << PERF_REG_POWERPC_MMCR2) | (1ULL << PERF_REG_POWERPC_PMC1) | \
+	(1ULL << PERF_REG_POWERPC_PMC2) | (1ULL << PERF_REG_POWERPC_PMC3) | \
+	(1ULL << PERF_REG_POWERPC_PMC4) | (1ULL << PERF_REG_POWERPC_PMC5) | \
+	(1ULL << PERF_REG_POWERPC_PMC6) | (1ULL << PERF_REG_POWERPC_SDAR) | \
+	(1ULL << PERF_REG_POWERPC_SIAR))

 /*
 * PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_31
- * includes 12 SPRs from MMCR0 to PMC6.
+ * includes 14 SPRs from MMCR0 to SIAR.
 */
-#define PERF_REG_PMU_MASK_31   (0xfffULL << PERF_REG_POWERPC_MMCR0)
+#define PERF_REG_PMU_MASK_31	\
+	(PERF_REG_PMU_MASK_300 | (1ULL << PERF_REG_POWERPC_MMCR3) | \
+	(1ULL << PERF_REG_POWERPC_SIER2) | (1ULL << PERF_REG_POWERPC_SIER3))

-#define PERF_REG_EXTENDED_MAX  (PERF_REG_POWERPC_PMC6 + 1)
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@ -52,6 +52,7 @@ FEATURE_TESTS_BASIC :=                  \
        libslang                        \
        libslang-include-subdir         \
        libtraceevent                   \
+        libtracefs                      \
        libcrypto                       \
        libunwind                       \
        pthread-attr-setaffinity-np     \
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@ -36,6 +36,7 @@ FILES=                                          \
         test-libslang.bin                      \
         test-libslang-include-subdir.bin       \
         test-libtraceevent.bin                 \
+         test-libtracefs.bin                    \
         test-libcrypto.bin                     \
         test-libunwind.bin                     \
         test-libunwind-debug-frame.bin         \
@ -90,7 +91,7 @@ __BUILDXX = $(CXX) $(CXXFLAGS) -MD -Wall -Werror -o $@ $(patsubst %.bin,%.cpp,$(
 ###############################

 $(OUTPUT)test-all.bin:
-	$(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -I/usr/include/slang -lslang $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -DPACKAGE='"perf"' -lbfd -ldl -lz -llzma -lzstd -lcap
+	$(BUILD) -fstack-protector-all -O2 -D_FORTIFY_SOURCE=2 -ldw -lelf -lnuma -lelf -lslang $(FLAGS_PERL_EMBED) $(FLAGS_PYTHON_EMBED) -DPACKAGE='"perf"' -lbfd -ldl -lz -llzma -lzstd -lcap

 $(OUTPUT)test-hello.bin:
 	$(BUILD)
@ -199,6 +200,9 @@ $(OUTPUT)test-libslang-include-subdir.bin:
 $(OUTPUT)test-libtraceevent.bin:
 	$(BUILD) -ltraceevent

+$(OUTPUT)test-libtracefs.bin:
+	$(BUILD) -ltracefs
+
 $(OUTPUT)test-libcrypto.bin:
 	$(BUILD) -lcrypto

@ -296,7 +300,7 @@ $(OUTPUT)test-jvmti-cmlr.bin:
 	$(BUILD)

 $(OUTPUT)test-llvm.bin:
-	$(BUILDXX) -std=gnu++11 				\
+	$(BUILDXX) -std=gnu++14 				\
 		-I$(shell $(LLVM_CONFIG) --includedir) 		\
 		-L$(shell $(LLVM_CONFIG) --libdir)		\
 		$(shell $(LLVM_CONFIG) --libs Core BPF)		\
@ -304,12 +308,12 @@ $(OUTPUT)test-llvm.bin:
 		> $(@:.bin=.make.output) 2>&1

 $(OUTPUT)test-llvm-version.bin:
-	$(BUILDXX) -std=gnu++11 				\
+	$(BUILDXX) -std=gnu++14					\
 		-I$(shell $(LLVM_CONFIG) --includedir)		\
 		> $(@:.bin=.make.output) 2>&1

 $(OUTPUT)test-clang.bin:
-	$(BUILDXX) -std=gnu++11 				\
+	$(BUILDXX) -std=gnu++14					\
 		-I$(shell $(LLVM_CONFIG) --includedir) 		\
 		-L$(shell $(LLVM_CONFIG) --libdir)		\
 		-Wl,--start-group -lclangBasic -lclangDriver	\
--- a/tools/build/feature/test-libtracefs.c
+++ b/tools/build/feature/test-libtracefs.c
@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <tracefs/tracefs.h>
+
+int main(void)
+{
+	struct tracefs_instance *inst = tracefs_instance_create("dummy");
+
+	tracefs_instance_destroy(inst);
+	return 0;
+}
--- a/tools/include/linux/list_sort.h
+++ b/tools/include/linux/list_sort.h
@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_LIST_SORT_H
+#define _LINUX_LIST_SORT_H
+
+#include <linux/types.h>
+
+struct list_head;
+
+typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
+		const struct list_head *, const struct list_head *);
+
+__attribute__((nonnull(2,3)))
+void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);
+#endif
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@ -1141,6 +1141,21 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_TEXT_POKE			= 20,

+	/*
+	 * Data written to the AUX area by hardware due to aux_output, may need
+	 * to be matched to the event by an architecture-specific hardware ID.
+	 * This records the hardware ID, but requires sample_id to provide the
+	 * event ID. e.g. Intel PT uses this record to disambiguate PEBS-via-PT
+	 * records from multiple events.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				hw_id;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_AUX_OUTPUT_HW_ID		= 21,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };

--- a/tools/lib/list_sort.c
+++ b/tools/lib/list_sort.c
@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kernel.h>
+#include <linux/compiler.h>
+#include <linux/export.h>
+#include <linux/string.h>
+#include <linux/list_sort.h>
+#include <linux/list.h>
+
+/*
+ * Returns a list organized in an intermediate format suited
+ * to chaining of merge() calls: null-terminated, no reserved or
+ * sentinel head node, "prev" links not maintained.
+ */
+__attribute__((nonnull(2,3,4)))
+static struct list_head *merge(void *priv, list_cmp_func_t cmp,
+				struct list_head *a, struct list_head *b)
+{
+	struct list_head *head, **tail = &head;
+
+	for (;;) {
+		/* if equal, take 'a' -- important for sort stability */
+		if (cmp(priv, a, b) <= 0) {
+			*tail = a;
+			tail = &a->next;
+			a = a->next;
+			if (!a) {
+				*tail = b;
+				break;
+			}
+		} else {
+			*tail = b;
+			tail = &b->next;
+			b = b->next;
+			if (!b) {
+				*tail = a;
+				break;
+			}
+		}
+	}
+	return head;
+}
+
+/*
+ * Combine final list merge with restoration of standard doubly-linked
+ * list structure.  This approach duplicates code from merge(), but
+ * runs faster than the tidier alternatives of either a separate final
+ * prev-link restoration pass, or maintaining the prev links
+ * throughout.
+ */
+__attribute__((nonnull(2,3,4,5)))
+static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
+			struct list_head *a, struct list_head *b)
+{
+	struct list_head *tail = head;
+	u8 count = 0;
+
+	for (;;) {
+		/* if equal, take 'a' -- important for sort stability */
+		if (cmp(priv, a, b) <= 0) {
+			tail->next = a;
+			a->prev = tail;
+			tail = a;
+			a = a->next;
+			if (!a)
+				break;
+		} else {
+			tail->next = b;
+			b->prev = tail;
+			tail = b;
+			b = b->next;
+			if (!b) {
+				b = a;
+				break;
+			}
+		}
+	}
+
+	/* Finish linking remainder of list b on to tail */
+	tail->next = b;
+	do {
+		/*
+		 * If the merge is highly unbalanced (e.g. the input is
+		 * already sorted), this loop may run many iterations.
+		 * Continue callbacks to the client even though no
+		 * element comparison is needed, so the client's cmp()
+		 * routine can invoke cond_resched() periodically.
+		 */
+		if (unlikely(!++count))
+			cmp(priv, b, b);
+		b->prev = tail;
+		tail = b;
+		b = b->next;
+	} while (b);
+
+	/* And the final links to make a circular doubly-linked list */
+	tail->next = head;
+	head->prev = tail;
+}
+
+/**
+ * list_sort - sort a list
+ * @priv: private data, opaque to list_sort(), passed to @cmp
+ * @head: the list to sort
+ * @cmp: the elements comparison function
+ *
+ * The comparison function @cmp must return > 0 if @a should sort after
+ * @b ("@a > @b" if you want an ascending sort), and <= 0 if @a should
+ * sort before @b *or* their original order should be preserved.  It is
+ * always called with the element that came first in the input in @a,
+ * and list_sort is a stable sort, so it is not necessary to distinguish
+ * the @a < @b and @a == @b cases.
+ *
+ * This is compatible with two styles of @cmp function:
+ * - The traditional style which returns <0 / =0 / >0, or
+ * - Returning a boolean 0/1.
+ * The latter offers a chance to save a few cycles in the comparison
+ * (which is used by e.g. plug_ctx_cmp() in block/blk-mq.c).
+ *
+ * A good way to write a multi-word comparison is::
+ *
+ *	if (a->high != b->high)
+ *		return a->high > b->high;
+ *	if (a->middle != b->middle)
+ *		return a->middle > b->middle;
+ *	return a->low > b->low;
+ *
+ *
+ * This mergesort is as eager as possible while always performing at least
+ * 2:1 balanced merges.  Given two pending sublists of size 2^k, they are
+ * merged to a size-2^(k+1) list as soon as we have 2^k following elements.
+ *
+ * Thus, it will avoid cache thrashing as long as 3*2^k elements can
+ * fit into the cache.  Not quite as good as a fully-eager bottom-up
+ * mergesort, but it does use 0.2*n fewer comparisons, so is faster in
+ * the common case that everything fits into L1.
+ *
+ *
+ * The merging is controlled by "count", the number of elements in the
+ * pending lists.  This is beautifully simple code, but rather subtle.
+ *
+ * Each time we increment "count", we set one bit (bit k) and clear
+ * bits k-1 .. 0.  Each time this happens (except the very first time
+ * for each bit, when count increments to 2^k), we merge two lists of
+ * size 2^k into one list of size 2^(k+1).
+ *
+ * This merge happens exactly when the count reaches an odd multiple of
+ * 2^k, which is when we have 2^k elements pending in smaller lists,
+ * so it's safe to merge away two lists of size 2^k.
+ *
+ * After this happens twice, we have created two lists of size 2^(k+1),
+ * which will be merged into a list of size 2^(k+2) before we create
+ * a third list of size 2^(k+1), so there are never more than two pending.
+ *
+ * The number of pending lists of size 2^k is determined by the
+ * state of bit k of "count" plus two extra pieces of information:
+ *
+ * - The state of bit k-1 (when k == 0, consider bit -1 always set), and
+ * - Whether the higher-order bits are zero or non-zero (i.e.
+ *   is count >= 2^(k+1)).
+ *
+ * There are six states we distinguish.  "x" represents some arbitrary
+ * bits, and "y" represents some arbitrary non-zero bits:
+ * 0:  00x: 0 pending of size 2^k;           x pending of sizes < 2^k
+ * 1:  01x: 0 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
+ * 2: x10x: 0 pending of size 2^k; 2^k     + x pending of sizes < 2^k
+ * 3: x11x: 1 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
+ * 4: y00x: 1 pending of size 2^k; 2^k     + x pending of sizes < 2^k
+ * 5: y01x: 2 pending of size 2^k; 2^(k-1) + x pending of sizes < 2^k
+ * (merge and loop back to state 2)
+ *
+ * We gain lists of size 2^k in the 2->3 and 4->5 transitions (because
+ * bit k-1 is set while the more significant bits are non-zero) and
+ * merge them away in the 5->2 transition.  Note in particular that just
+ * before the 5->2 transition, all lower-order bits are 11 (state 3),
+ * so there is one list of each smaller size.
+ *
+ * When we reach the end of the input, we merge all the pending
+ * lists, from smallest to largest.  If you work through cases 2 to
+ * 5 above, you can see that the number of elements we merge with a list
+ * of size 2^k varies from 2^(k-1) (cases 3 and 5 when x == 0) to
+ * 2^(k+1) - 1 (second merge of case 5 when x == 2^(k-1) - 1).
+ */
+__attribute__((nonnull(2,3)))
+void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp)
+{
+	struct list_head *list = head->next, *pending = NULL;
+	size_t count = 0;	/* Count of pending */
+
+	if (list == head->prev)	/* Zero or one elements */
+		return;
+
+	/* Convert to a null-terminated singly-linked list. */
+	head->prev->next = NULL;
+
+	/*
+	 * Data structure invariants:
+	 * - All lists are singly linked and null-terminated; prev
+	 *   pointers are not maintained.
+	 * - pending is a prev-linked "list of lists" of sorted
+	 *   sublists awaiting further merging.
+	 * - Each of the sorted sublists is power-of-two in size.
+	 * - Sublists are sorted by size and age, smallest & newest at front.
+	 * - There are zero to two sublists of each size.
+	 * - A pair of pending sublists are merged as soon as the number
+	 *   of following pending elements equals their size (i.e.
+	 *   each time count reaches an odd multiple of that size).
+	 *   That ensures each later final merge will be at worst 2:1.
+	 * - Each round consists of:
+	 *   - Merging the two sublists selected by the highest bit
+	 *     which flips when count is incremented, and
+	 *   - Adding an element from the input as a size-1 sublist.
+	 */
+	do {
+		size_t bits;
+		struct list_head **tail = &pending;
+
+		/* Find the least-significant clear bit in count */
+		for (bits = count; bits & 1; bits >>= 1)
+			tail = &(*tail)->prev;
+		/* Do the indicated merge */
+		if (likely(bits)) {
+			struct list_head *a = *tail, *b = a->prev;
+
+			a = merge(priv, cmp, b, a);
+			/* Install the merged result in place of the inputs */
+			a->prev = b->prev;
+			*tail = a;
+		}
+
+		/* Move one element from input list to pending */
+		list->prev = pending;
+		pending = list;
+		list = list->next;
+		pending->next = NULL;
+		count++;
+	} while (list);
+
+	/* End of input; merge together all the pending lists. */
+	list = pending;
+	pending = pending->prev;
+	for (;;) {
+		struct list_head *next = pending->prev;
+
+		if (!next)
+			break;
+		list = merge(priv, cmp, pending, list);
+		pending = next;
+	}
+	/* The final merge, rebuilding prev links */
+	merge_final(priv, cmp, head, pending, list);
+}
+EXPORT_SYMBOL(list_sort);
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@ -270,11 +270,19 @@ bool perf_cpu_map__empty(const struct perf_cpu_map *map)

 int perf_cpu_map__idx(struct perf_cpu_map *cpus, int cpu)
 {
-	int i;
+	int low = 0, high = cpus->nr;

-	for (i = 0; i < cpus->nr; ++i) {
-		if (cpus->map[i] == cpu)
-			return i;
+	while (low < high) {
+		int idx = (low + high) / 2,
+		    cpu_at_idx = cpus->map[idx];
+
+		if (cpu_at_idx == cpu)
+			return idx;
+
+		if (cpu_at_idx > cpu)
+			high = idx;
+		else
+			low = idx + 1;
 	}

 	return -1;
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@ -289,6 +289,11 @@ struct perf_record_itrace_start {
 	__u32			 tid;
 };

+struct perf_record_aux_output_hw_id {
+	struct perf_event_header header;
+	__u64			hw_id;
+};
+
 struct perf_record_thread_map_entry {
 	__u64			 pid;
 	char			 comm[16];
@ -414,6 +419,7 @@ union perf_event {
 	struct perf_record_auxtrace_error	auxtrace_error;
 	struct perf_record_aux			aux;
 	struct perf_record_itrace_start		itrace_start;
+	struct perf_record_aux_output_hw_id	aux_output_hw_id;
 	struct perf_record_switch		context_switch;
 	struct perf_record_thread_map		thread_map;
 	struct perf_record_cpu_map		cpu_map;
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@ -20,6 +20,7 @@
 		L	synthesize last branch entries on existing event records
 		s       skip initial number of events
 		q	quicker (less detailed) decoding
+		A	approximate IPC
 		Z	prefer to ignore timestamps (so-called "timeless" decoding)

 	The default is all events i.e. the same as --itrace=ibxwpe,
@ -61,5 +62,6 @@
 	debug messages will or will not be logged. Each flag must be preceded
 	by either '+' or '-'. The flags are:
 		a	all perf events
+		o	output to stdout

 	If supported, the 'q' option may be repeated to increase the effect.
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@ -45,6 +45,13 @@ OPTIONS
 	tasks slept. sched_switch contains a callchain where a task slept and
 	sched_stat contains a timeslice how long a task slept.

+-k::
+--vmlinux=<file>::
+        vmlinux pathname
+
+--ignore-vmlinux::
+	Ignore vmlinux files.
+
 --kallsyms=<file>::
 	kallsyms pathname

--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@ -157,6 +157,17 @@ of instructions and number of cycles since the last update, and thus represent
 the average IPC since the last IPC for that event type.  Note IPC for "branches"
 events is calculated separately from IPC for "instructions" events.

+Even with the 'cyc' config term, it is possible to produce IPC information for
+every change of timestamp, but at the expense of accuracy.  That is selected by
+specifying the itrace 'A' option.  Due to the granularity of timestamps, the
+actual number of cycles increases even though the cycles reported does not.
+The number of instructions is known, but if IPC is reported, cycles can be too
+low and so IPC is too high.  Note that inaccuracy decreases as the period of
+sampling increases i.e. if the number of cycles is too low by a small amount,
+that becomes less significant if the number of cycles is large.  It may also be
+useful to use the 'A' option in conjunction with dlfilter-show-cycles.so to
+provide higher granularity cycle information.
+
 Also note that the IPC instruction count may or may not include the current
 instruction.  If the cycle count is associated with an asynchronous branch
 (e.g. page fault or interrupt), then the instruction count does not include the
@ -873,6 +884,7 @@ The letters are:
 	L	synthesize last branch entries on existing event records
 	s	skip initial number of events
 	q	quicker (less detailed) decoding
+	A	approximate IPC
 	Z	prefer to ignore timestamps (so-called "timeless" decoding)

 "Instructions" events look like they were recorded by "perf record -e
@ -941,6 +953,7 @@ by flags which affect what debug messages will or will not be logged. Each flag
 must be preceded by either '+' or '-'. The flags support by Intel PT are:
 		-a	Suppress logging of perf events
 		+a	Log all perf events
+		+o	Output to stdout instead of "intel_pt.log"
 By default, logged perf events are filtered by any specified time ranges, but
 flag +a overrides that.

@ -1072,6 +1085,21 @@ The Z option is equivalent to having recorded a trace without TSC
 decoding a trace of a virtual machine.


+dlfilter-show-cycles.so
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Cycles can be displayed using dlfilter-show-cycles.so in which case the itrace A
+option can be useful to provide higher granularity cycle information:
+
+	perf script --itrace=A --call-trace --dlfilter dlfilter-show-cycles.so
+
+To see a list of dlfilters:
+
+	perf script -v --list-dlfilters
+
+See also linkperf:perf-dlfilters[1]
+
+
 dump option
 ~~~~~~~~~~~

@ -1144,7 +1172,12 @@ Recording is selected by using the aux-output config term e.g.

 	perf record -c 10000 -e '{intel_pt/branch=0/,cycles/aux-output/ppp}' uname

-Note that currently, software only supports redirecting at most one PEBS event.
+Originally, software only supported redirecting at most one PEBS event because it
+was not able to differentiate one event from another. To overcome that, more recent
+kernels and perf tools add support for the PERF_RECORD_AUX_OUTPUT_HW_ID side-band event.
+To check for the presence of that event in a PEBS-via-PT trace:
+
+	perf script -D --no-itrace | grep PERF_RECORD_AUX_OUTPUT_HW_ID

 To display PEBS events from the Intel PT trace, use the itrace 'o' option e.g.

--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@ -8,22 +8,25 @@ perf-kmem - Tool to trace/measure kernel memory properties
 SYNOPSIS
 --------
 [verse]
-'perf kmem' {record|stat} [<options>]
+'perf kmem' [<options>] {record|stat}

 DESCRIPTION
 -----------
 There are two variants of perf kmem:

-  'perf kmem record <command>' to record the kmem events
-  of an arbitrary workload.
+  'perf kmem [<options>] record [<perf-record-options>] <command>' to
+  record the kmem events of an arbitrary workload. Additional 'perf
+  record' options may be specified after record, such as '-o' to
+  change the output file name.

-  'perf kmem stat' to report kernel memory statistics.
+  'perf kmem [<options>] stat' to report kernel memory statistics.

 OPTIONS
 -------
 -i <file>::
 --input=<file>::
-	Select the input file (default: perf.data unless stdin is a fifo)
+	For stat, select the input file (default: perf.data unless stdin is a
+	fifo)

 -f::
 --force::
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@ -39,6 +39,10 @@ any extra expressions computed by perf stat.
 --deprecated::
 Print deprecated events. By default the deprecated events are hidden.

+--cputype::
+Print events applying cpu with this type for hybrid platform
+(e.g. --cputype core or --cputype atom)
+
 [[EVENT_MODIFIERS]]
 EVENT MODIFIERS
 ---------------
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@ -596,6 +596,22 @@ options.
 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
 in config file is set to true.

+--synth=TYPE::
+Collect and synthesize given type of events (comma separated).  Note that
+this option controls the synthesis from the /proc filesystem which represent
+task status for pre-existing threads.
+
+Kernel (and some other) events are recorded regardless of the
+choice in this option.  For example, --synth=no would have MMAP events for
+kernel and modules.
+
+Available types are:
+  'task'    - synthesize FORK and COMM events for each task
+  'mmap'    - synthesize MMAP events for each process (implies 'task')
+  'cgroup'  - synthesize CGROUP events for each cgroup
+  'all'     - synthesize all events (default)
+  'no'      - do not synthesize any of the above events
+
 --tail-synthesize::
 Instead of collecting non-sample events (for example, fork, comm, mmap) at
 the beginning of record, collect them during finalizing an output file.
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@ -130,7 +130,7 @@ OPTIONS
        comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
        srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
        brstackinsn, brstackoff, callindent, insn, insnlen, synth, phys_addr,
-        metric, misc, srccode, ipc, data_page_size, code_page_size.
+        metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat.
        Field list can be prepended with the type, trace, sw or hw,
        to indicate to which event type the field list applies.
        e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@ -346,7 +346,7 @@ to special needs.

        HEADER_BPF_PROG_INFO = 25,

-struct bpf_prog_info_linear, which contains detailed information about
+struct perf_bpil, which contains detailed information about
 a BPF program, including type, id, tag, jited/xlated instructions, etc.

        HEADER_BPF_BTF = 26,
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@ -17,7 +17,11 @@ tools/lib/symbol/kallsyms.c
 tools/lib/symbol/kallsyms.h
 tools/lib/find_bit.c
 tools/lib/bitmap.c
+tools/lib/list_sort.c
 tools/lib/str_error_r.c
 tools/lib/vsprintf.c
 tools/lib/zalloc.c
 scripts/bpf_doc.py
+tools/bpf/bpftool
+kernel/bpf/disasm.c
+kernel/bpf/disasm.h
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@ -285,7 +285,7 @@ CORE_CFLAGS += -Wall
 CORE_CFLAGS += -Wextra
 CORE_CFLAGS += -std=gnu99

-CXXFLAGS += -std=gnu++11 -fno-exceptions -fno-rtti
+CXXFLAGS += -std=gnu++14 -fno-exceptions -fno-rtti
 CXXFLAGS += -Wall
 CXXFLAGS += -fno-omit-frame-pointer
 CXXFLAGS += -ggdb3
@ -1093,11 +1093,32 @@ ifdef LIBTRACEEVENT_DYNAMIC
  $(call feature_check,libtraceevent)
  ifeq ($(feature-libtraceevent), 1)
    EXTLIBS += -ltraceevent
+    LIBTRACEEVENT_VERSION := $(shell $(PKG_CONFIG) --modversion libtraceevent)
+    LIBTRACEEVENT_VERSION_1 := $(word 1, $(subst ., ,$(LIBTRACEEVENT_VERSION)))
+    LIBTRACEEVENT_VERSION_2 := $(word 2, $(subst ., ,$(LIBTRACEEVENT_VERSION)))
+    LIBTRACEEVENT_VERSION_3 := $(word 3, $(subst ., ,$(LIBTRACEEVENT_VERSION)))
+    LIBTRACEEVENT_VERSION_CPP := $(shell expr $(LIBTRACEEVENT_VERSION_1) \* 255 \* 255 + $(LIBTRACEEVENT_VERSION_2) \* 255 + $(LIBTRACEEVENT_VERSION_3))
+    CFLAGS += -DLIBTRACEEVENT_VERSION=$(LIBTRACEEVENT_VERSION_CPP)
  else
    dummy := $(error Error: No libtraceevent devel library found, please install libtraceevent-devel);
  endif
 endif

+ifdef LIBTRACEFS_DYNAMIC
+  $(call feature_check,libtracefs)
+  ifeq ($(feature-libtracefs), 1)
+    EXTLIBS += -ltracefs
+    LIBTRACEFS_VERSION := $(shell $(PKG_CONFIG) --modversion libtracefs)
+    LIBTRACEFS_VERSION_1 := $(word 1, $(subst ., ,$(LIBTRACEFS_VERSION)))
+    LIBTRACEFS_VERSION_2 := $(word 2, $(subst ., ,$(LIBTRACEFS_VERSION)))
+    LIBTRACEFS_VERSION_3 := $(word 3, $(subst ., ,$(LIBTRACEFS_VERSION)))
+    LIBTRACEFS_VERSION_CPP := $(shell expr $(LIBTRACEFS_VERSION_1) \* 255 \* 255 + $(LIBTRACEFS_VERSION_2) \* 255 + $(LIBTRACEFS_VERSION_3))
+    CFLAGS += -DLIBTRACEFS_VERSION=$(LIBTRACEFS_VERSION_CPP)
+  else
+    dummy := $(error Error: No libtracefs devel library found, please install libtracefs-dev);
+  endif
+endif
+
 # Among the variables below, these:
 #   perfexecdir
 #   perf_include_dir
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@ -130,6 +130,8 @@ include ../scripts/utilities.mak
 #
 # Define LIBTRACEEVENT_DYNAMIC to enable libtraceevent dynamic linking
 #
+# Define LIBTRACEFS_DYNAMIC to enable libtracefs dynamic linking
+#

 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
@ -241,7 +243,7 @@ else # force_fixdep

 LIB_DIR         = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
-BPF_DIR         = $(srctree)/tools/lib/bpf/
+LIBBPF_DIR      = $(srctree)/tools/lib/bpf/
 SUBCMD_DIR      = $(srctree)/tools/lib/subcmd/
 LIBPERF_DIR     = $(srctree)/tools/lib/perf/
 DOC_DIR         = $(srctree)/tools/perf/Documentation/
@ -293,7 +295,6 @@ strip-libs = $(filter-out -l%,$(1))
 ifneq ($(OUTPUT),)
  TE_PATH=$(OUTPUT)
  PLUGINS_PATH=$(OUTPUT)
-  BPF_PATH=$(OUTPUT)
  SUBCMD_PATH=$(OUTPUT)
  LIBPERF_PATH=$(OUTPUT)
 ifneq ($(subdir),)
@ -305,7 +306,6 @@ else
  TE_PATH=$(TRACE_EVENT_DIR)
  PLUGINS_PATH=$(TRACE_EVENT_DIR)plugins/
  API_PATH=$(LIB_DIR)
-  BPF_PATH=$(BPF_DIR)
  SUBCMD_PATH=$(SUBCMD_DIR)
  LIBPERF_PATH=$(LIBPERF_DIR)
 endif
@ -324,7 +324,14 @@ LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS = $(if $(findstring -static,$(LDFLAGS)),,$(DY
 LIBAPI = $(API_PATH)libapi.a
 export LIBAPI

-LIBBPF = $(BPF_PATH)libbpf.a
+ifneq ($(OUTPUT),)
+  LIBBPF_OUTPUT = $(abspath $(OUTPUT))/libbpf
+else
+  LIBBPF_OUTPUT = $(CURDIR)/libbpf
+endif
+LIBBPF_DESTDIR = $(LIBBPF_OUTPUT)
+LIBBPF_INCLUDE = $(LIBBPF_DESTDIR)/include
+LIBBPF = $(LIBBPF_OUTPUT)/libbpf.a

 LIBSUBCMD = $(SUBCMD_PATH)libsubcmd.a

@ -360,7 +367,7 @@ ifndef NO_JVMTI
 PROGRAMS += $(OUTPUT)$(LIBJVMTI)
 endif

-DLFILTERS := dlfilter-test-api-v0.so
+DLFILTERS := dlfilter-test-api-v0.so dlfilter-show-cycles.so
 DLFILTERS := $(patsubst %,$(OUTPUT)dlfilters/%,$(DLFILTERS))

 # what 'all' will build and 'install' will install, in perfexecdir
@ -829,12 +836,14 @@ $(LIBAPI)-clean:
 	$(call QUIET_CLEAN, libapi)
 	$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null

-$(LIBBPF): FORCE
-	$(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
+$(LIBBPF): FORCE | $(LIBBPF_OUTPUT)
+	$(Q)$(MAKE) -C $(LIBBPF_DIR) FEATURES_DUMP=$(FEATURE_DUMP_EXPORT) \
+		O= OUTPUT=$(LIBBPF_OUTPUT)/ DESTDIR=$(LIBBPF_DESTDIR) prefix= \
+		$@ install_headers

 $(LIBBPF)-clean:
 	$(call QUIET_CLEAN, libbpf)
-	$(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+	$(Q)$(RM) -r -- $(LIBBPF_OUTPUT)

 $(LIBPERF): FORCE
 	$(Q)$(MAKE) -C $(LIBPERF_DIR) EXTRA_CFLAGS="$(LIBPERF_CFLAGS)" O=$(OUTPUT) $(OUTPUT)libperf.a
@ -1034,16 +1043,15 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
 SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
 SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h

-ifdef BUILD_BPF_SKEL
-BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
-LIBBPF_SRC := $(abspath ../lib/bpf)
-BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/..
-
-$(SKEL_TMP_OUT):
+$(SKEL_TMP_OUT) $(LIBBPF_OUTPUT):
 	$(Q)$(MKDIR) -p $@

+ifdef BUILD_BPF_SKEL
+BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
+BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(LIBBPF_INCLUDE)
+
 $(BPFTOOL): | $(SKEL_TMP_OUT)
-	CFLAGS= $(MAKE) -C ../bpf/bpftool \
+	$(Q)CFLAGS= $(MAKE) -C ../bpf/bpftool \
 		OUTPUT=$(SKEL_TMP_OUT)/ bootstrap

 VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux)				\
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@ -3,7 +3,7 @@
 #include "../../../util/cpumap.h"
 #include "../../../util/pmu.h"

-struct pmu_events_map *pmu_events_map__find(void)
+const struct pmu_events_map *pmu_events_map__find(void)
 {
 	struct perf_pmu *pmu = NULL;

--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@ -77,6 +77,8 @@ static const char *reg_names[] = {
 	[PERF_REG_POWERPC_PMC4] = "pmc4",
 	[PERF_REG_POWERPC_PMC5] = "pmc5",
 	[PERF_REG_POWERPC_PMC6] = "pmc6",
+	[PERF_REG_POWERPC_SDAR] = "sdar",
+	[PERF_REG_POWERPC_SIAR] = "siar",
 };

 static inline const char *__perf_reg_name(int id)
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@ -40,7 +40,7 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
 	return bufp;
 }

-int arch_get_runtimeparam(struct pmu_event *pe)
+int arch_get_runtimeparam(const struct pmu_event *pe)
 {
 	int count;
 	char path[PATH_MAX] = "/devices/hv_24x7/interface/";
--- a/tools/perf/arch/powerpc/util/kvm-stat.c
+++ b/tools/perf/arch/powerpc/util/kvm-stat.c
@ -113,10 +113,11 @@ static int is_tracepoint_available(const char *str, struct evlist *evlist)
 	struct parse_events_error err;
 	int ret;

-	bzero(&err, sizeof(err));
+	parse_events_error__init(&err);
 	ret = parse_events(evlist, str, &err);
 	if (err.str)
-		parse_events_print_error(&err, "tracepoint");
+		parse_events_error__print(&err, "tracepoint");
+	parse_events_error__exit(&err);
 	return ret;
 }

--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@ -74,6 +74,8 @@ const struct sample_reg sample_reg_masks[] = {
 	SMPL_REG(pmc4, PERF_REG_POWERPC_PMC4),
 	SMPL_REG(pmc5, PERF_REG_POWERPC_PMC5),
 	SMPL_REG(pmc6, PERF_REG_POWERPC_PMC6),
+	SMPL_REG(sdar, PERF_REG_POWERPC_SDAR),
+	SMPL_REG(siar, PERF_REG_POWERPC_SIAR),
 	SMPL_REG_END
 };

--- a/tools/perf/arch/riscv64/annotate/instructions.c
+++ b/tools/perf/arch/riscv64/annotate/instructions.c
@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+static
+struct ins_ops *riscv64__associate_ins_ops(struct arch *arch, const char *name)
+{
+	struct ins_ops *ops = NULL;
+
+	if (!strncmp(name, "jal", 3) ||
+	    !strncmp(name, "jr", 2) ||
+	    !strncmp(name, "call", 4))
+		ops = &call_ops;
+	else if (!strncmp(name, "ret", 3))
+		ops = &ret_ops;
+	else if (name[0] == 'j' || name[0] == 'b')
+		ops = &jump_ops;
+	else
+		return NULL;
+
+	arch__associate_ins_ops(arch, name, ops);
+
+	return ops;
+}
+
+static
+int riscv64__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
+{
+	if (!arch->initialized) {
+		arch->associate_instruction_ops = riscv64__associate_ins_ops;
+		arch->initialized = true;
+		arch->objdump.comment_char = '#';
+	}
+
+	return 0;
+}
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@ -144,8 +144,31 @@ static struct ins x86__instructions[] = {
 	{ .name = "xorps",	.ops = &mov_ops, },
 };

-static bool x86__ins_is_fused(struct arch *arch, const char *ins1,
+static bool amd__ins_is_fused(struct arch *arch, const char *ins1,
 			      const char *ins2)
+{
+	if (strstr(ins2, "jmp"))
+		return false;
+
+	/* Family >= 15h supports cmp/test + branch fusion */
+	if (arch->family >= 0x15 && (strstarts(ins1, "test") ||
+	    (strstarts(ins1, "cmp") && !strstr(ins1, "xchg")))) {
+		return true;
+	}
+
+	/* Family >= 19h supports some ALU + branch fusion */
+	if (arch->family >= 0x19 && (strstarts(ins1, "add") ||
+	    strstarts(ins1, "sub") || strstarts(ins1, "and") ||
+	    strstarts(ins1, "inc") || strstarts(ins1, "dec") ||
+	    strstarts(ins1, "or") || strstarts(ins1, "xor"))) {
+		return true;
+	}
+
+	return false;
+}
+
+static bool intel__ins_is_fused(struct arch *arch, const char *ins1,
+				const char *ins2)
 {
 	if (arch->family != 6 || arch->model < 0x1e || strstr(ins2, "jmp"))
 		return false;
@ -184,6 +207,9 @@ static int x86__cpuid_parse(struct arch *arch, char *cpuid)
 	if (ret == 3) {
 		arch->family = family;
 		arch->model = model;
+		arch->ins_is_fused = strstarts(cpuid, "AuthenticAMD") ?
+					amd__ins_is_fused :
+					intel__ins_is_fused;
 		return 0;
 	}

--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@ -1,8 +1,31 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <stdio.h>
+#include <stdlib.h>
 #include "util/evsel.h"
+#include "util/env.h"
+#include "linux/string.h"

 void arch_evsel__set_sample_weight(struct evsel *evsel)
 {
 	evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
 }
+
+void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
+{
+	struct perf_env env = { .total_mem = 0, } ;
+
+	if (!perf_env__cpuid(&env))
+		return;
+
+	/*
+	 * On AMD, precise cycles event sampling internally uses IBS pmu.
+	 * But IBS does not have filtering capabilities and perf by default
+	 * sets exclude_guest = 1. This makes IBS pmu event init fail and
+	 * thus perf ends up doing non-precise sampling. Avoid it by clearing
+	 * exclude_guest.
+	 */
+	if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
+		attr->exclude_guest = 0;
+
+	free(env.cpuid);
+}
--- a/tools/perf/bench/evlist-open-close.c
+++ b/tools/perf/bench/evlist-open-close.c
@ -25,6 +25,11 @@ static int iterations = 100;
 static int nr_events = 1;
 static const char *event_string = "dummy";

+static inline u64 timeval2usec(struct timeval *tv)
+{
+	return tv->tv_sec * USEC_PER_SEC + tv->tv_usec;
+}
+
 static struct record_opts opts = {
 	.sample_time	     = true,
 	.mmap_pages	     = UINT_MAX,
@ -73,7 +78,7 @@ static int evlist__count_evsel_fds(struct evlist *evlist)

 static struct evlist *bench__create_evlist(char *evstr)
 {
-	struct parse_events_error err = { .idx = 0, };
+	struct parse_events_error err;
 	struct evlist *evlist = evlist__new();
 	int ret;

@ -82,14 +87,16 @@ static struct evlist *bench__create_evlist(char *evstr)
 		return NULL;
 	}

+	parse_events_error__init(&err);
 	ret = parse_events(evlist, evstr, &err);
 	if (ret) {
-		parse_events_print_error(&err, evstr);
+		parse_events_error__print(&err, evstr);
+		parse_events_error__exit(&err);
 		pr_err("Run 'perf list' for a list of valid events\n");
 		ret = 1;
 		goto out_delete_evlist;
 	}
-
+	parse_events_error__exit(&err);
 	ret = evlist__create_maps(evlist, &opts.target);
 	if (ret < 0) {
 		pr_err("Not enough memory to create thread/cpu maps\n");
@ -167,7 +174,7 @@ static int bench_evlist_open_close__run(char *evstr)

 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &diff);
-		runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
+		runtime_us = timeval2usec(&diff);
 		update_stats(&time_stats, runtime_us);

 		evlist__delete(evlist);
--- a/tools/perf/bench/futex.h
+++ b/tools/perf/bench/futex.h
@ -28,7 +28,7 @@ struct bench_futex_parameters {
 };

 /**
- * futex() - SYS_futex syscall wrapper
+ * futex_syscall() - SYS_futex syscall wrapper
 * @uaddr:	address of first futex
 * @op:		futex op code
 * @val:	typically expected value of uaddr, but varies by op
@ -38,17 +38,26 @@ struct bench_futex_parameters {
 * @val3:	varies by op
 * @opflags:	flags to be bitwise OR'd with op, such as FUTEX_PRIVATE_FLAG
 *
- * futex() is used by all the following futex op wrappers. It can also be
+ * futex_syscall() is used by all the following futex op wrappers. It can also be
 * used for misuse and abuse testing. Generally, the specific op wrappers
- * should be used instead. It is a macro instead of an static inline function as
- * some of the types over overloaded (timeout is used for nr_requeue for
- * example).
+ * should be used instead.
 *
 * These argument descriptions are the defaults for all
 * like-named arguments in the following wrappers except where noted below.
 */
-#define futex(uaddr, op, val, timeout, uaddr2, val3, opflags) \
-	syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3)
+static inline int
+futex_syscall(volatile u_int32_t *uaddr, int op, u_int32_t val, struct timespec *timeout,
+	      volatile u_int32_t *uaddr2, int val3, int opflags)
+{
+	return syscall(SYS_futex, uaddr, op | opflags, val, timeout, uaddr2, val3);
+}
+
+static inline int
+futex_syscall_nr_requeue(volatile u_int32_t *uaddr, int op, u_int32_t val, int nr_requeue,
+			 volatile u_int32_t *uaddr2, int val3, int opflags)
+{
+	return syscall(SYS_futex, uaddr, op | opflags, val, nr_requeue, uaddr2, val3);
+}

 /**
 * futex_wait() - block on uaddr with optional timeout
@ -57,7 +66,7 @@ struct bench_futex_parameters {
 static inline int
 futex_wait(u_int32_t *uaddr, u_int32_t val, struct timespec *timeout, int opflags)
 {
-	return futex(uaddr, FUTEX_WAIT, val, timeout, NULL, 0, opflags);
+	return futex_syscall(uaddr, FUTEX_WAIT, val, timeout, NULL, 0, opflags);
 }

 /**
@ -67,7 +76,7 @@ futex_wait(u_int32_t *uaddr, u_int32_t val, struct timespec *timeout, int opflag
 static inline int
 futex_wake(u_int32_t *uaddr, int nr_wake, int opflags)
 {
-	return futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags);
+	return futex_syscall(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags);
 }

 /**
@ -76,7 +85,7 @@ futex_wake(u_int32_t *uaddr, int nr_wake, int opflags)
 static inline int
 futex_lock_pi(u_int32_t *uaddr, struct timespec *timeout, int opflags)
 {
-	return futex(uaddr, FUTEX_LOCK_PI, 0, timeout, NULL, 0, opflags);
+	return futex_syscall(uaddr, FUTEX_LOCK_PI, 0, timeout, NULL, 0, opflags);
 }

 /**
@ -85,7 +94,7 @@ futex_lock_pi(u_int32_t *uaddr, struct timespec *timeout, int opflags)
 static inline int
 futex_unlock_pi(u_int32_t *uaddr, int opflags)
 {
-	return futex(uaddr, FUTEX_UNLOCK_PI, 0, NULL, NULL, 0, opflags);
+	return futex_syscall(uaddr, FUTEX_UNLOCK_PI, 0, NULL, NULL, 0, opflags);
 }

 /**
@ -97,8 +106,8 @@ static inline int
 futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wake,
 		 int nr_requeue, int opflags)
 {
-	return futex(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2,
-		 val, opflags);
+	return futex_syscall_nr_requeue(uaddr, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, uaddr2,
+					val, opflags);
 }

 /**
@ -113,8 +122,8 @@ static inline int
 futex_wait_requeue_pi(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2,
 		      struct timespec *timeout, int opflags)
 {
-	return futex(uaddr, FUTEX_WAIT_REQUEUE_PI, val, timeout, uaddr2, 0,
-		     opflags);
+	return futex_syscall(uaddr, FUTEX_WAIT_REQUEUE_PI, val, timeout, uaddr2, 0,
+			     opflags);
 }

 /**
@ -130,8 +139,8 @@ static inline int
 futex_cmp_requeue_pi(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2,
 		     int nr_requeue, int opflags)
 {
-	return futex(uaddr, FUTEX_CMP_REQUEUE_PI, 1, nr_requeue, uaddr2,
-		     val, opflags);
+	return futex_syscall_nr_requeue(uaddr, FUTEX_CMP_REQUEUE_PI, 1, nr_requeue, uaddr2,
+					val, opflags);
 }

 #endif /* _FUTEX_H */
--- a/tools/perf/bench/synthesize.c
+++ b/tools/perf/bench/synthesize.c
@ -80,7 +80,7 @@ static int do_run_single_threaded(struct perf_session *session,
 						NULL,
 						target, threads,
 						process_synthesized_event,
-						data_mmap,
+						true, data_mmap,
 						nr_threads_synthesize);
 		if (err)
 			return err;
@ -171,7 +171,7 @@ static int do_run_multi_threaded(struct target *target,
 						NULL,
 						target, NULL,
 						process_synthesized_event,
-						false,
+						true, false,
 						nr_threads_synthesize);
 		if (err) {
 			perf_session__delete(session);
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@ -591,6 +591,10 @@ int cmd_annotate(int argc, const char **argv)
 		return ret;
 	}

+	ret = symbol__validate_sym_arguments();
+	if (ret)
+		return ret;
+
 	if (quiet)
 		perf_quiet_option();

--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@ -2768,6 +2768,10 @@ static int perf_c2c__report(int argc, const char **argv)
 	if (c2c.stats_only)
 		c2c.use_stdio = true;

+	err = symbol__validate_sym_arguments();
+	if (err)
+		goto out;
+
 	if (!input_name || !strlen(input_name))
 		input_name = "perf.data";

--- a/tools/perf/builtin-daemon.c
+++ b/tools/perf/builtin-daemon.c
@ -1121,8 +1121,6 @@ static int setup_config(struct daemon *daemon)
 #ifndef F_TLOCK
 #define F_TLOCK 2

-#include <sys/file.h>
-
 static int lockf(int fd, int cmd, off_t len)
 {
 	if (cmd != F_TLOCK || len != 0)
@ -1403,8 +1401,10 @@ out:

 static int send_cmd_list(struct daemon *daemon)
 {
-	union cmd cmd = { .cmd = CMD_LIST, };
+	union cmd cmd;

+	memset(&cmd, 0, sizeof(cmd));
+	cmd.list.cmd = CMD_LIST;
 	cmd.list.verbose = verbose;
 	cmd.list.csv_sep = daemon->csv_sep ? *daemon->csv_sep : 0;

@ -1432,6 +1432,7 @@ static int __cmd_signal(struct daemon *daemon, struct option parent_options[],
 		return -1;
 	}

+	memset(&cmd, 0, sizeof(cmd));
 	cmd.signal.cmd = CMD_SIGNAL,
 	cmd.signal.sig = SIGUSR2;
 	strncpy(cmd.signal.name, name, sizeof(cmd.signal.name) - 1);
@ -1446,7 +1447,7 @@ static int __cmd_stop(struct daemon *daemon, struct option parent_options[],
 		OPT_PARENT(parent_options),
 		OPT_END()
 	};
-	union cmd cmd = { .cmd = CMD_STOP, };
+	union cmd cmd;

 	argc = parse_options(argc, argv, start_options, daemon_usage, 0);
 	if (argc)
@ -1457,6 +1458,8 @@ static int __cmd_stop(struct daemon *daemon, struct option parent_options[],
 		return -1;
 	}

+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = CMD_STOP;
 	return send_cmd(daemon, &cmd);
 }

@ -1470,7 +1473,7 @@ static int __cmd_ping(struct daemon *daemon, struct option parent_options[],
 		OPT_PARENT(parent_options),
 		OPT_END()
 	};
-	union cmd cmd = { .cmd = CMD_PING, };
+	union cmd cmd;

 	argc = parse_options(argc, argv, ping_options, daemon_usage, 0);
 	if (argc)
@ -1481,6 +1484,8 @@ static int __cmd_ping(struct daemon *daemon, struct option parent_options[],
 		return -1;
 	}

+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = CMD_PING;
 	scnprintf(cmd.ping.name, sizeof(cmd.ping.name), "%s", name);
 	return send_cmd(daemon, &cmd);
 }
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@ -815,7 +815,8 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
 		inject->tool.auxtrace	    = perf_event__process_auxtrace;
 		inject->tool.aux	    = perf_event__drop_aux;
-		inject->tool.itrace_start   = perf_event__drop_aux,
+		inject->tool.itrace_start   = perf_event__drop_aux;
+		inject->tool.aux_output_hw_id = perf_event__drop_aux;
 		inject->tool.ordered_events = true;
 		inject->tool.ordering_requires_timestamps = true;
 		/* Allow space in the header for new attributes */
@ -882,6 +883,7 @@ int cmd_inject(int argc, const char **argv)
 			.lost_samples	= perf_event__repipe,
 			.aux		= perf_event__repipe,
 			.itrace_start	= perf_event__repipe,
+			.aux_output_hw_id = perf_event__repipe,
 			.context_switch	= perf_event__repipe,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
@ -938,6 +940,10 @@ int cmd_inject(int argc, const char **argv)
 #endif
 		OPT_INCR('v', "verbose", &verbose,
 			 "be more verbose (show build ids, etc)"),
+		OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
+			   "file", "vmlinux pathname"),
+		OPT_BOOLEAN(0, "ignore-vmlinux", &symbol_conf.ignore_vmlinux,
+			    "don't load vmlinux even if found"),
 		OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name, "file",
 			   "kallsyms pathname"),
 		OPT_BOOLEAN('f', "force", &data.force, "don't complain, do it"),
@ -972,6 +978,9 @@ int cmd_inject(int argc, const char **argv)
 		return -1;
 	}

+	if (symbol__validate_sym_arguments())
+		return -1;
+
 	if (inject.in_place_update) {
 		if (!strcmp(inject.input_name, "-")) {
 			pr_err("Input file name required for in-place updating\n");
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@ -1456,7 +1456,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	perf_session__set_id_hdr_size(kvm->session);
 	ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true);
 	machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target,
-				    kvm->evlist->core.threads, false, 1);
+				    kvm->evlist->core.threads, true, false, 1);
 	err = kvm_live_open_events(kvm);
 	if (err)
 		goto out;
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@ -12,6 +12,7 @@

 #include "util/parse-events.h"
 #include "util/pmu.h"
+#include "util/pmu-hybrid.h"
 #include "util/debug.h"
 #include "util/metricgroup.h"
 #include <subcmd/pager.h>
@ -20,13 +21,15 @@

 static bool desc_flag = true;
 static bool details_flag;
+static const char *hybrid_type;

 int cmd_list(int argc, const char **argv)
 {
-	int i;
+	int i, ret = 0;
 	bool raw_dump = false;
 	bool long_desc_flag = false;
 	bool deprecated = false;
+	char *pmu_name = NULL;
 	struct option list_options[] = {
 		OPT_BOOLEAN(0, "raw-dump", &raw_dump, "Dump raw events"),
 		OPT_BOOLEAN('d', "desc", &desc_flag,
@ -37,6 +40,9 @@ int cmd_list(int argc, const char **argv)
 			    "Print information on the perf event names and expressions used internally by events."),
 		OPT_BOOLEAN(0, "deprecated", &deprecated,
 			    "Print deprecated events."),
+		OPT_STRING(0, "cputype", &hybrid_type, "hybrid cpu type",
+			   "Print events applying cpu with this type for hybrid platform "
+			   "(e.g. core or atom)"),
 		OPT_INCR(0, "debug", &verbose,
 			     "Enable debugging output"),
 		OPT_END()
@ -56,10 +62,16 @@ int cmd_list(int argc, const char **argv)
 	if (!raw_dump && pager_in_use())
 		printf("\nList of pre-defined events (to be used in -e):\n\n");

+	if (hybrid_type) {
+		pmu_name = perf_pmu__hybrid_type_to_pmu(hybrid_type);
+		if (!pmu_name)
+			pr_warning("WARNING: hybrid cputype is not supported!\n");
+	}
+
 	if (argc == 0) {
 		print_events(NULL, raw_dump, !desc_flag, long_desc_flag,
-				details_flag, deprecated);
-		return 0;
+				details_flag, deprecated, pmu_name);
+		goto out;
 	}

 	for (i = 0; i < argc; ++i) {
@ -82,25 +94,27 @@ int cmd_list(int argc, const char **argv)
 		else if (strcmp(argv[i], "pmu") == 0)
 			print_pmu_events(NULL, raw_dump, !desc_flag,
 						long_desc_flag, details_flag,
-						deprecated);
+						deprecated, pmu_name);
 		else if (strcmp(argv[i], "sdt") == 0)
 			print_sdt_events(NULL, NULL, raw_dump);
 		else if (strcmp(argv[i], "metric") == 0 || strcmp(argv[i], "metrics") == 0)
-			metricgroup__print(true, false, NULL, raw_dump, details_flag);
+			metricgroup__print(true, false, NULL, raw_dump, details_flag, pmu_name);
 		else if (strcmp(argv[i], "metricgroup") == 0 || strcmp(argv[i], "metricgroups") == 0)
-			metricgroup__print(false, true, NULL, raw_dump, details_flag);
+			metricgroup__print(false, true, NULL, raw_dump, details_flag, pmu_name);
 		else if ((sep = strchr(argv[i], ':')) != NULL) {
 			int sep_idx;

 			sep_idx = sep - argv[i];
 			s = strdup(argv[i]);
-			if (s == NULL)
-				return -1;
+			if (s == NULL) {
+				ret = -1;
+				goto out;
+			}

 			s[sep_idx] = '\0';
 			print_tracepoint_events(s, s + sep_idx + 1, raw_dump);
 			print_sdt_events(s, s + sep_idx + 1, raw_dump);
-			metricgroup__print(true, true, s, raw_dump, details_flag);
+			metricgroup__print(true, true, s, raw_dump, details_flag, pmu_name);
 			free(s);
 		} else {
 			if (asprintf(&s, "*%s*", argv[i]) < 0) {
@ -116,12 +130,16 @@ int cmd_list(int argc, const char **argv)
 			print_pmu_events(s, raw_dump, !desc_flag,
 						long_desc_flag,
 						details_flag,
-						deprecated);
+						deprecated,
+						pmu_name);
 			print_tracepoint_events(NULL, s, raw_dump);
 			print_sdt_events(NULL, s, raw_dump);
-			metricgroup__print(true, true, s, raw_dump, details_flag);
+			metricgroup__print(true, true, s, raw_dump, details_flag, pmu_name);
 			free(s);
 		}
 	}
-	return 0;
+
+out:
+	free(pmu_name);
+	return ret;
 }
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@ -21,6 +21,7 @@
 #include "util/build-id.h"
 #include "util/strlist.h"
 #include "util/strfilter.h"
+#include "util/symbol.h"
 #include "util/symbol_conf.h"
 #include "util/debug.h"
 #include <subcmd/parse-options.h>
@ -629,6 +630,10 @@ __cmd_probe(int argc, const char **argv)
 		params.command = 'a';
 	}

+	ret = symbol__validate_sym_arguments();
+	if (ret)
+		return ret;
+
 	if (params.quiet) {
 		if (verbose != 0) {
 			pr_err("  Error: -v and -q are exclusive.\n");
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@ -1255,6 +1255,7 @@ static int record__synthesize_workload(struct record *rec, bool tail)
 {
 	int err;
 	struct perf_thread_map *thread_map;
+	bool needs_mmap = rec->opts.synth & PERF_SYNTH_MMAP;

 	if (rec->opts.tail_synthesize != tail)
 		return 0;
@ -1266,6 +1267,7 @@ static int record__synthesize_workload(struct record *rec, bool tail)
 	err = perf_event__synthesize_thread_map(&rec->tool, thread_map,
 						 process_synthesized_event,
 						 &rec->session->machines.host,
+						 needs_mmap,
 						 rec->opts.sample_address);
 	perf_thread_map__put(thread_map);
 	return err;
@ -1409,7 +1411,7 @@ static int record__synthesize(struct record *rec, bool tail)
 		goto out;

 	/* Synthesize id_index before auxtrace_info */
-	if (rec->opts.auxtrace_sample_mode) {
+	if (rec->opts.auxtrace_sample_mode || rec->opts.full_auxtrace) {
 		err = perf_event__synthesize_id_index(tool,
 						      process_synthesized_event,
 						      session->evlist, machine);
@ -1470,19 +1472,26 @@ static int record__synthesize(struct record *rec, bool tail)
 	if (err < 0)
 		pr_warning("Couldn't synthesize bpf events.\n");

-	err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
-					     machine);
-	if (err < 0)
-		pr_warning("Couldn't synthesize cgroup events.\n");
+	if (rec->opts.synth & PERF_SYNTH_CGROUP) {
+		err = perf_event__synthesize_cgroups(tool, process_synthesized_event,
+						     machine);
+		if (err < 0)
+			pr_warning("Couldn't synthesize cgroup events.\n");
+	}

 	if (rec->opts.nr_threads_synthesize > 1) {
 		perf_set_multithreaded();
 		f = process_locked_synthesized_event;
 	}

-	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->core.threads,
-					    f, opts->sample_address,
-					    rec->opts.nr_threads_synthesize);
+	if (rec->opts.synth & PERF_SYNTH_TASK) {
+		bool needs_mmap = rec->opts.synth & PERF_SYNTH_MMAP;
+
+		err = __machine__synthesize_threads(machine, tool, &opts->target,
+						    rec->evlist->core.threads,
+						    f, needs_mmap, opts->sample_address,
+						    rec->opts.nr_threads_synthesize);
+	}

 	if (rec->opts.nr_threads_synthesize > 1)
 		perf_set_singlethreaded();
@ -2391,6 +2400,26 @@ static int process_timestamp_boundary(struct perf_tool *tool,
 	return 0;
 }

+static int parse_record_synth_option(const struct option *opt,
+				     const char *str,
+				     int unset __maybe_unused)
+{
+	struct record_opts *opts = opt->value;
+	char *p = strdup(str);
+
+	if (p == NULL)
+		return -1;
+
+	opts->synth = parse_synth_opt(p);
+	free(p);
+
+	if (opts->synth < 0) {
+		pr_err("Invalid synth option: %s\n", str);
+		return -1;
+	}
+	return 0;
+}
+
 /*
 * XXX Ideally would be local to cmd_record() and passed to a record__new
 * because we need to have access to it in record__exit, that is called
@ -2416,6 +2445,7 @@ static struct record record = {
 		.nr_threads_synthesize = 1,
 		.ctl_fd              = -1,
 		.ctl_fd_ack          = -1,
+		.synth               = PERF_SYNTH_ALL,
 	},
 	.tool = {
 		.sample		= process_sample_event,
@ -2631,6 +2661,8 @@ static struct option __record_options[] = {
 		     "\t\t\t  Optionally send control command completion ('ack\\n') to ack-fd descriptor.\n"
 		     "\t\t\t  Alternatively, ctl-fifo / ack-fifo will be opened and used as ctl-fd / ack-fd.",
 		      parse_control_option),
+	OPT_CALLBACK(0, "synth", &record.opts, "no|all|task|mmap|cgroup",
+		     "Fine-tune event synthesis: default=all", parse_record_synth_option),
 	OPT_END()
 };

@ -2680,6 +2712,10 @@ int cmd_record(int argc, const char **argv)
 	if (quiet)
 		perf_quiet_option();

+	err = symbol__validate_sym_arguments();
+	if (err)
+		return err;
+
 	/* Make system wide (-a) the default target. */
 	if (!argc && target__none(&rec->opts.target))
 		rec->opts.target.system_wide = true;
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@ -1378,18 +1378,9 @@ int cmd_report(int argc, const char **argv)
 	if (quiet)
 		perf_quiet_option();

-	if (symbol_conf.vmlinux_name &&
-	    access(symbol_conf.vmlinux_name, R_OK)) {
-		pr_err("Invalid file: %s\n", symbol_conf.vmlinux_name);
-		ret = -EINVAL;
+	ret = symbol__validate_sym_arguments();
+	if (ret)
 		goto exit;
-	}
-	if (symbol_conf.kallsyms_name &&
-	    access(symbol_conf.kallsyms_name, R_OK)) {
-		pr_err("Invalid file: %s\n", symbol_conf.kallsyms_name);
-		ret = -EINVAL;
-		goto exit;
-	}

 	if (report.inverted_callchain)
 		callchain_param.order = ORDER_CALLER;
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@ -3538,6 +3538,7 @@ int cmd_sched(int argc, const char **argv)
 		.fork_event	    = replay_fork_event,
 	};
 	unsigned int i;
+	int ret;

 	for (i = 0; i < ARRAY_SIZE(sched.curr_pid); i++)
 		sched.curr_pid[i] = -1;
@ -3598,6 +3599,9 @@ int cmd_sched(int argc, const char **argv)
 				parse_options_usage(NULL, timehist_options, "n", true);
 			return -EINVAL;
 		}
+		ret = symbol__validate_sym_arguments();
+		if (ret)
+			return ret;

 		return perf_sched__timehist(&sched);
 	} else {
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@ -122,6 +122,7 @@ enum perf_output_field {
 	PERF_OUTPUT_TOD             = 1ULL << 32,
 	PERF_OUTPUT_DATA_PAGE_SIZE  = 1ULL << 33,
 	PERF_OUTPUT_CODE_PAGE_SIZE  = 1ULL << 34,
+	PERF_OUTPUT_INS_LAT         = 1ULL << 35,
 };

 struct perf_script {
@ -188,6 +189,7 @@ struct output_option {
 	{.str = "tod", .field = PERF_OUTPUT_TOD},
 	{.str = "data_page_size", .field = PERF_OUTPUT_DATA_PAGE_SIZE},
 	{.str = "code_page_size", .field = PERF_OUTPUT_CODE_PAGE_SIZE},
+	{.str = "ins_lat", .field = PERF_OUTPUT_INS_LAT},
 };

 enum {
@ -262,7 +264,8 @@ static struct {
 			      PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD |
 			      PERF_OUTPUT_ADDR | PERF_OUTPUT_DATA_SRC |
 			      PERF_OUTPUT_WEIGHT | PERF_OUTPUT_PHYS_ADDR |
-			      PERF_OUTPUT_DATA_PAGE_SIZE | PERF_OUTPUT_CODE_PAGE_SIZE,
+			      PERF_OUTPUT_DATA_PAGE_SIZE | PERF_OUTPUT_CODE_PAGE_SIZE |
+			      PERF_OUTPUT_INS_LAT,

 		.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
 	},
@ -522,6 +525,10 @@ static int evsel__check_attr(struct evsel *evsel, struct perf_session *session)
 	    evsel__check_stype(evsel, PERF_SAMPLE_CODE_PAGE_SIZE, "CODE_PAGE_SIZE", PERF_OUTPUT_CODE_PAGE_SIZE))
 		return -EINVAL;

+	if (PRINT_FIELD(INS_LAT) &&
+	    evsel__check_stype(evsel, PERF_SAMPLE_WEIGHT_STRUCT, "WEIGHT_STRUCT", PERF_OUTPUT_INS_LAT))
+		return -EINVAL;
+
 	return 0;
 }

@ -2039,6 +2046,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(WEIGHT))
 		fprintf(fp, "%16" PRIu64, sample->weight);

+	if (PRINT_FIELD(INS_LAT))
+		fprintf(fp, "%16" PRIu16, sample->ins_lat);
+
 	if (PRINT_FIELD(IP)) {
 		struct callchain_cursor *cursor = NULL;

@ -3715,7 +3725,7 @@ int cmd_script(int argc, const char **argv)
 		     "addr,symoff,srcline,period,iregs,uregs,brstack,"
 		     "brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
 		     "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod,"
-		     "data_page_size,code_page_size",
+		     "data_page_size,code_page_size,ins_lat",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
@ -3836,6 +3846,9 @@ int cmd_script(int argc, const char **argv)
 	data.path  = input_name;
 	data.force = symbol_conf.force;

+	if (symbol__validate_sym_arguments())
+		return -1;
+
 	if (argc > 1 && !strncmp(argv[0], "rec", strlen("rec"))) {
 		rec_script_path = get_script_path(argv[1], RECORD_SUFFIX);
 		if (!rec_script_path)
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@ -1750,14 +1750,12 @@ static int add_default_attributes(void)
 	(PERF_COUNT_HW_CACHE_OP_PREFETCH	<<  8) |
 	(PERF_COUNT_HW_CACHE_RESULT_MISS	<< 16)				},
 };
-	struct parse_events_error errinfo;
-
 	/* Set attrs if no event is selected and !null_run: */
 	if (stat_config.null_run)
 		return 0;

-	bzero(&errinfo, sizeof(errinfo));
 	if (transaction_run) {
+		struct parse_events_error errinfo;
 		/* Handle -T as -M transaction. Once platform specific metrics
 		 * support has been added to the json files, all architectures
 		 * will use this approach. To determine transaction support
@ -1772,6 +1770,7 @@ static int add_default_attributes(void)
 							 &stat_config.metric_events);
 		}

+		parse_events_error__init(&errinfo);
 		if (pmu_have_event("cpu", "cycles-ct") &&
 		    pmu_have_event("cpu", "el-start"))
 			err = parse_events(evsel_list, transaction_attrs,
@ -1782,13 +1781,14 @@ static int add_default_attributes(void)
 					   &errinfo);
 		if (err) {
 			fprintf(stderr, "Cannot set up transaction events\n");
-			parse_events_print_error(&errinfo, transaction_attrs);
-			return -1;
+			parse_events_error__print(&errinfo, transaction_attrs);
 		}
-		return 0;
+		parse_events_error__exit(&errinfo);
+		return err ? -1 : 0;
 	}

 	if (smi_cost) {
+		struct parse_events_error errinfo;
 		int smi;

 		if (sysfs__read_int(FREEZE_ON_SMI_PATH, &smi) < 0) {
@ -1804,23 +1804,23 @@ static int add_default_attributes(void)
 			smi_reset = true;
 		}

-		if (pmu_have_event("msr", "aperf") &&
-		    pmu_have_event("msr", "smi")) {
-			if (!force_metric_only)
-				stat_config.metric_only = true;
-			err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
-		} else {
+		if (!pmu_have_event("msr", "aperf") ||
+		    !pmu_have_event("msr", "smi")) {
 			fprintf(stderr, "To measure SMI cost, it needs "
 				"msr/aperf/, msr/smi/ and cpu/cycles/ support\n");
-			parse_events_print_error(&errinfo, smi_cost_attrs);
 			return -1;
 		}
+		if (!force_metric_only)
+			stat_config.metric_only = true;
+
+		parse_events_error__init(&errinfo);
+		err = parse_events(evsel_list, smi_cost_attrs, &errinfo);
 		if (err) {
-			parse_events_print_error(&errinfo, smi_cost_attrs);
+			parse_events_error__print(&errinfo, smi_cost_attrs);
 			fprintf(stderr, "Cannot set up SMI cost events\n");
-			return -1;
 		}
-		return 0;
+		parse_events_error__exit(&errinfo);
+		return err ? -1 : 0;
 	}

 	if (topdown_run) {
@ -1875,18 +1875,22 @@ static int add_default_attributes(void)
 			return -1;
 		}
 		if (topdown_attrs[0] && str) {
+			struct parse_events_error errinfo;
 			if (warn)
 				arch_topdown_group_warn();
 setup_metrics:
+			parse_events_error__init(&errinfo);
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
 					"Cannot set up top down events %s: %d\n",
 					str, err);
-				parse_events_print_error(&errinfo, str);
+				parse_events_error__print(&errinfo, str);
+				parse_events_error__exit(&errinfo);
 				free(str);
 				return -1;
 			}
+			parse_events_error__exit(&errinfo);
 		} else {
 			fprintf(stderr, "System does not support topdown\n");
 			return -1;
@ -1896,6 +1900,7 @@ setup_metrics:

 	if (!evsel_list->core.nr_entries) {
 		if (perf_pmu__has_hybrid()) {
+			struct parse_events_error errinfo;
 			const char *hybrid_str = "cycles,instructions,branches,branch-misses";

 			if (target__has_cpu(&target))
@ -1906,15 +1911,16 @@ setup_metrics:
 				return -1;
 			}

+			parse_events_error__init(&errinfo);
 			err = parse_events(evsel_list, hybrid_str, &errinfo);
 			if (err) {
 				fprintf(stderr,
 					"Cannot set up hybrid events %s: %d\n",
 					hybrid_str, err);
-				parse_events_print_error(&errinfo, hybrid_str);
-				return -1;
+				parse_events_error__print(&errinfo, hybrid_str);
 			}
-			return err;
+			parse_events_error__exit(&errinfo);
+			return err ? -1 : 0;
 		}

 		if (target__has_cpu(&target))
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@ -1271,7 +1271,7 @@ static int __cmd_top(struct perf_top *top)
 		pr_debug("Couldn't synthesize cgroup events.\n");

 	machine__synthesize_threads(&top->session->machines.host, &opts->target,
-				    top->evlist->core.threads, false,
+				    top->evlist->core.threads, true, false,
 				    top->nr_threads_synthesize);

 	if (top->nr_threads_synthesize > 1)
@ -1618,6 +1618,10 @@ int cmd_top(int argc, const char **argv)
 	if (argc)
 		usage_with_options(top_usage, options);

+	status = symbol__validate_sym_arguments();
+	if (status)
+		goto out_delete_evlist;
+
 	if (annotate_check_args(&top.annotation_opts) < 0)
 		goto out_delete_evlist;

--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@ -1628,8 +1628,8 @@ static int trace__symbols_init(struct trace *trace, struct evlist *evlist)
 		goto out;

 	err = __machine__synthesize_threads(trace->host, &trace->tool, &trace->opts.target,
-					    evlist->core.threads, trace__tool_process, false,
-					    1);
+					    evlist->core.threads, trace__tool_process,
+					    true, false, 1);
 out:
 	if (err)
 		symbol__exit();
@ -3063,15 +3063,11 @@ static bool evlist__add_vfs_getname(struct evlist *evlist)
 	struct parse_events_error err;
 	int ret;

-	bzero(&err, sizeof(err));
+	parse_events_error__init(&err);
 	ret = parse_events(evlist, "probe:vfs_getname*", &err);
-	if (ret) {
-		free(err.str);
-		free(err.help);
-		free(err.first_str);
-		free(err.first_help);
+	parse_events_error__exit(&err);
+	if (ret)
 		return false;
-	}

 	evlist__for_each_entry_safe(evlist, evsel, tmp) {
 		if (!strstarts(evsel__name(evsel), "probe:vfs_getname"))
@ -4925,12 +4921,13 @@ int cmd_trace(int argc, const char **argv)
 	if (trace.perfconfig_events != NULL) {
 		struct parse_events_error parse_err;

-		bzero(&parse_err, sizeof(parse_err));
+		parse_events_error__init(&parse_err);
 		err = parse_events(trace.evlist, trace.perfconfig_events, &parse_err);
-		if (err) {
-			parse_events_print_error(&parse_err, trace.perfconfig_events);
+		if (err)
+			parse_events_error__print(&parse_err, trace.perfconfig_events);
+		parse_events_error__exit(&parse_err);
+		if (err)
 			goto out;
-		}
 	}

 	if ((nr_cgroups || trace.cgroup) && !trace.opts.target.system_wide) {
--- a/tools/perf/check-headers.sh
+++ b/tools/perf/check-headers.sh
@ -26,6 +26,7 @@ include/vdso/bits.h
 include/linux/const.h
 include/vdso/const.h
 include/linux/hash.h
+include/linux/list-sort.h
 include/uapi/linux/hw_breakpoint.h
 arch/x86/include/asm/disabled-features.h
 arch/x86/include/asm/required-features.h
@ -150,6 +151,7 @@ check include/uapi/linux/mman.h       '-I "^#include <\(uapi/\)*asm/mman.h>"'
 check include/linux/build_bug.h       '-I "^#\(ifndef\|endif\)\( \/\/\)* static_assert$"'
 check include/linux/ctype.h	      '-I "isdigit("'
 check lib/ctype.c		      '-I "^EXPORT_SYMBOL" -I "^#include <linux/export.h>" -B'
+check lib/list_sort.c		      '-I "^#include <linux/bug.h>"'

 # diff non-symmetric files
 check_2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
--- a/tools/perf/dlfilters/dlfilter-show-cycles.c
+++ b/tools/perf/dlfilters/dlfilter-show-cycles.c
@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * dlfilter-show-cycles.c: Print the number of cycles at the start of each line
+ * Copyright (c) 2021, Intel Corporation.
+ */
+#include <perf/perf_dlfilter.h>
+#include <string.h>
+#include <stdio.h>
+
+#define MAX_CPU 4096
+
+enum {
+	INSTR_CYC,
+	BRNCH_CYC,
+	OTHER_CYC,
+	MAX_ENTRY
+};
+
+static __u64 cycles[MAX_CPU][MAX_ENTRY];
+static __u64 cycles_rpt[MAX_CPU][MAX_ENTRY];
+
+#define BITS		16
+#define TABLESZ		(1 << BITS)
+#define TABLEMAX	(TABLESZ / 2)
+#define MASK		(TABLESZ - 1)
+
+static struct entry {
+	__u32 used;
+	__s32 tid;
+	__u64 cycles[MAX_ENTRY];
+	__u64 cycles_rpt[MAX_ENTRY];
+} table[TABLESZ];
+
+static int tid_cnt;
+
+static int event_entry(const char *event)
+{
+	if (!event)
+		return OTHER_CYC;
+	if (!strncmp(event, "instructions", 12))
+		return INSTR_CYC;
+	if (!strncmp(event, "branches", 8))
+		return BRNCH_CYC;
+	return OTHER_CYC;
+}
+
+static struct entry *find_entry(__s32 tid)
+{
+	__u32 pos = tid & MASK;
+	struct entry *e;
+
+	e = &table[pos];
+	while (e->used) {
+		if (e->tid == tid)
+			return e;
+		if (++pos == TABLESZ)
+			pos = 0;
+		e = &table[pos];
+	}
+
+	if (tid_cnt >= TABLEMAX) {
+		fprintf(stderr, "Too many threads\n");
+		return NULL;
+	}
+
+	tid_cnt += 1;
+	e->used = 1;
+	e->tid = tid;
+	return e;
+}
+
+static void add_entry(__s32 tid, int pos, __u64 cnt)
+{
+	struct entry *e = find_entry(tid);
+
+	if (e)
+		e->cycles[pos] += cnt;
+}
+
+int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
+{
+	__s32 cpu = sample->cpu;
+	__s32 tid = sample->tid;
+	int pos;
+
+	if (!sample->cyc_cnt)
+		return 0;
+
+	pos = event_entry(sample->event);
+
+	if (cpu >= 0 && cpu < MAX_CPU)
+		cycles[cpu][pos] += sample->cyc_cnt;
+	else if (tid != -1)
+		add_entry(tid, pos, sample->cyc_cnt);
+	return 0;
+}
+
+static void print_vals(__u64 cycles, __u64 delta)
+{
+	if (delta)
+		printf("%10llu %10llu ", cycles, delta);
+	else
+		printf("%10llu %10s ", cycles, "");
+}
+
+int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
+{
+	__s32 cpu = sample->cpu;
+	__s32 tid = sample->tid;
+	int pos;
+
+	pos = event_entry(sample->event);
+
+	if (cpu >= 0 && cpu < MAX_CPU) {
+		print_vals(cycles[cpu][pos], cycles[cpu][pos] - cycles_rpt[cpu][pos]);
+		cycles_rpt[cpu][pos] = cycles[cpu][pos];
+		return 0;
+	}
+
+	if (tid != -1) {
+		struct entry *e = find_entry(tid);
+
+		if (e) {
+			print_vals(e->cycles[pos], e->cycles[pos] - e->cycles_rpt[pos]);
+			e->cycles_rpt[pos] = e->cycles[pos];
+			return 0;
+		}
+	}
+
+	printf("%22s", "");
+	return 0;
+}
+
+const char *filter_description(const char **long_description)
+{
+	static char *long_desc = "Cycle counts are accumulated per CPU (or "
+		"per thread if CPU is not recorded) from IPC information, and "
+		"printed together with the change since the last print, at the "
+		"start of each line. Separate counts are kept for branches, "
+		"instructions or other events.";
+
+	*long_description = long_desc;
+	return "Print the number of cycles at the start of each line";
+}
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json
@ -18,6 +18,6 @@
        "ArchStdEvent": "BUS_ACCESS_PERIPH"
    },
    {
-        "ArchStdEvent": "BUS_ACCESS",
+        "ArchStdEvent": "BUS_ACCESS"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json
@ -39,31 +39,31 @@
        "ArchStdEvent": "L2D_CACHE_INVAL"
    },
    {
-        "ArchStdEvent": "L1I_CACHE_REFILL",
+        "ArchStdEvent": "L1I_CACHE_REFILL"
    },
    {
-        "ArchStdEvent": "L1I_TLB_REFILL",
+        "ArchStdEvent": "L1I_TLB_REFILL"
    },
    {
-        "ArchStdEvent": "L1D_CACHE_REFILL",
+        "ArchStdEvent": "L1D_CACHE_REFILL"
    },
    {
-        "ArchStdEvent": "L1D_CACHE",
+        "ArchStdEvent": "L1D_CACHE"
    },
    {
-        "ArchStdEvent": "L1D_TLB_REFILL",
+        "ArchStdEvent": "L1D_TLB_REFILL"
    },
    {
-        "ArchStdEvent": "L1I_CACHE",
+        "ArchStdEvent": "L1I_CACHE"
    },
    {
-        "ArchStdEvent": "L2D_CACHE",
+        "ArchStdEvent": "L2D_CACHE"
    },
    {
-        "ArchStdEvent": "L2D_CACHE_REFILL",
+        "ArchStdEvent": "L2D_CACHE_REFILL"
    },
    {
-        "ArchStdEvent": "L2D_CACHE_WB",
+        "ArchStdEvent": "L2D_CACHE_WB"
    },
    {
        "PublicDescription": "This event counts any load or store operation which accesses the data L1 TLB",
@ -72,7 +72,7 @@
    },
    {
        "PublicDescription": "This event counts any instruction fetch which accesses the instruction L1 TLB",
-        "ArchStdEvent": "L1I_TLB",
+        "ArchStdEvent": "L1I_TLB"
    },
    {
        "PublicDescription": "Level 2 access to data TLB that caused a page table walk. This event counts on any data access which causes L2D_TLB_REFILL to count",
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json
@ -1,7 +1,7 @@
 [
    {
        "PublicDescription": "The number of core clock cycles",
-        "ArchStdEvent": "CPU_CYCLES",
+        "ArchStdEvent": "CPU_CYCLES"
    },
    {
        "PublicDescription": "FSU clocking gated off cycle",
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json
@ -36,9 +36,9 @@
        "ArchStdEvent": "EXC_TRAP_FIQ"
    },
    {
-        "ArchStdEvent": "EXC_TAKEN",
+        "ArchStdEvent": "EXC_TAKEN"
    },
    {
-        "ArchStdEvent": "EXC_RETURN",
+        "ArchStdEvent": "EXC_RETURN"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json
@ -44,25 +44,25 @@
        "BriefDescription": "Software increment"
    },
    {
-        "ArchStdEvent": "INST_RETIRED",
+        "ArchStdEvent": "INST_RETIRED"
    },
    {
        "ArchStdEvent": "CID_WRITE_RETIRED",
        "BriefDescription": "Write to CONTEXTIDR"
    },
    {
-        "ArchStdEvent": "INST_SPEC",
+        "ArchStdEvent": "INST_SPEC"
    },
    {
-        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+        "ArchStdEvent": "TTBR_WRITE_RETIRED"
    },
    {
        "PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches",
-        "ArchStdEvent": "BR_RETIRED",
+        "ArchStdEvent": "BR_RETIRED"
    },
    {
        "PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush",
-        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED"
    },
    {
        "PublicDescription": "Operation speculatively executed, NOP",
--- a/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json
@ -15,10 +15,10 @@
        "ArchStdEvent": "UNALIGNED_LDST_SPEC"
    },
    {
-        "ArchStdEvent": "MEM_ACCESS",
+        "ArchStdEvent": "MEM_ACCESS"
    },
    {
        "PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
-        "ArchStdEvent": "MEMORY_ERROR",
+        "ArchStdEvent": "MEMORY_ERROR"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/branch.json
@ -1,10 +1,10 @@
 [
    {
        "PublicDescription": "This event counts any predictable branch instruction which is mispredicted either due to dynamic misprediction or because the MMU is off and the branches are statically predicted not taken",
-        "ArchStdEvent": "BR_MIS_PRED",
+        "ArchStdEvent": "BR_MIS_PRED"
    },
    {
        "PublicDescription": "This event counts all predictable branches.",
-        "ArchStdEvent": "BR_PRED",
+        "ArchStdEvent": "BR_PRED"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/bus.json
@ -1,21 +1,21 @@
 [
    {
-        "PublicDescription": "The number of core clock cycles"
+        "PublicDescription": "The number of core clock cycles",
        "ArchStdEvent": "CPU_CYCLES",
        "BriefDescription": "The number of core clock cycles."
    },
    {
        "PublicDescription": "This event counts for every beat of data transferred over the data channels between the core and the SCU. If both read and write data beats are transferred on a given cycle, this event is counted twice on that cycle. This event counts the sum of BUS_ACCESS_RD and BUS_ACCESS_WR.",
-        "ArchStdEvent": "BUS_ACCESS",
+        "ArchStdEvent": "BUS_ACCESS"
    },
    {
-        "PublicDescription": "This event duplicates CPU_CYCLES."
-        "ArchStdEvent": "BUS_CYCLES",
+        "PublicDescription": "This event duplicates CPU_CYCLES.",
+        "ArchStdEvent": "BUS_CYCLES"
    },
    {
-        "ArchStdEvent":  "BUS_ACCESS_RD",
+        "ArchStdEvent":  "BUS_ACCESS_RD"
    },
    {
-        "ArchStdEvent":  "BUS_ACCESS_WR",
+        "ArchStdEvent":  "BUS_ACCESS_WR"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/cache.json
@ -1,47 +1,47 @@
 [
    {
        "PublicDescription": "This event counts any instruction fetch which misses in the cache.",
-        "ArchStdEvent": "L1I_CACHE_REFILL",
+        "ArchStdEvent": "L1I_CACHE_REFILL"
    },
    {
        "PublicDescription": "This event counts any refill of the instruction L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
-        "ArchStdEvent": "L1I_TLB_REFILL",
+        "ArchStdEvent": "L1I_TLB_REFILL"
    },
    {
        "PublicDescription": "This event counts any load or store operation or page table walk access which causes data to be read from outside the L1, including accesses which do not allocate into L1.",
-        "ArchStdEvent": "L1D_CACHE_REFILL",
+        "ArchStdEvent": "L1D_CACHE_REFILL"
    },
    {
        "PublicDescription": "This event counts any load or store operation or page table walk access which looks up in the L1 data cache. In particular, any access which could count the L1D_CACHE_REFILL event causes this event to count.",
-        "ArchStdEvent": "L1D_CACHE",
+        "ArchStdEvent": "L1D_CACHE"
    },
    {
        "PublicDescription": "This event counts any refill of the data L1 TLB from the L2 TLB. This includes refills that result in a translation fault.",
-        "ArchStdEvent": "L1D_TLB_REFILL",
+        "ArchStdEvent": "L1D_TLB_REFILL"
    },
-    {,
+    {
        "PublicDescription": "Level 1 instruction cache access or Level 0 Macro-op cache access. This event counts any instruction fetch which accesses the L1 instruction cache or L0 Macro-op cache.",
-        "ArchStdEvent": "L1I_CACHE",
+        "ArchStdEvent": "L1I_CACHE"
    },
    {
        "PublicDescription": "This event counts any write-back of data from the L1 data cache to L2 or L3. This counts both victim line evictions and snoops, including cache maintenance operations.",
-        "ArchStdEvent": "L1D_CACHE_WB",
+        "ArchStdEvent": "L1D_CACHE_WB"
    },
    {
        "PublicDescription": "This event counts any transaction from L1 which looks up in the L2 cache, and any write-back from the L1 to the L2. Snoops from outside the core and cache maintenance operations are not counted.",
-        "ArchStdEvent": "L2D_CACHE",
+        "ArchStdEvent": "L2D_CACHE"
    },
    {
        "PublicDescription": "L2 data cache refill. This event counts any cacheable transaction from L1 which causes data to be read from outside the core. L2 refills caused by stashes into L2 should not be counted",
-        "ArchStdEvent": "L2D_CACHE_REFILL",
+        "ArchStdEvent": "L2D_CACHE_REFILL"
    },
    {
        "PublicDescription": "This event counts any write-back of data from the L2 cache to outside the core. This includes snoops to the L2 which return data, regardless of whether they cause an invalidation. Invalidations from the L2 which do not write data outside of the core and snoops which return data from the L1 are not counted",
-        "ArchStdEvent": "L2D_CACHE_WB",
+        "ArchStdEvent": "L2D_CACHE_WB"
    },
    {
        "PublicDescription": "This event counts any full cache line write into the L2 cache which does not cause a linefill, including write-backs from L1 to L2 and full-line writes which do not allocate into L1.",
-        "ArchStdEvent": "L2D_CACHE_ALLOCATE",
+        "ArchStdEvent": "L2D_CACHE_ALLOCATE"
    },
    {
        "PublicDescription": "This event counts any load or store operation which accesses the data L1 TLB. If both a load and a store are executed on a cycle, this event counts twice. This event counts regardless of whether the MMU is enabled.",
@ -75,21 +75,21 @@
    },
    {
        "PublicDescription": "This event counts on any access to the L2 TLB (caused by a refill of any of the L1 TLBs). This event does not count if the MMU is disabled.",
-        "ArchStdEvent": "L2D_TLB",
+        "ArchStdEvent": "L2D_TLB"
    },
    {
        "PublicDescription": "This event counts on any data access which causes L2D_TLB_REFILL to count.",
-        "ArchStdEvent": "DTLB_WALK",
+        "ArchStdEvent": "DTLB_WALK"
    },
    {
        "PublicDescription": "This event counts on any instruction access which causes L2D_TLB_REFILL to count.",
-        "ArchStdEvent": "ITLB_WALK",
+        "ArchStdEvent": "ITLB_WALK"
    },
    {
-        "ArchStdEvent": "LL_CACHE_RD",
+        "ArchStdEvent": "LL_CACHE_RD"
    },
    {
-        "ArchStdEvent": "LL_CACHE_MISS_RD",
+        "ArchStdEvent": "LL_CACHE_MISS_RD"
    },
    {
        "ArchStdEvent": "L1D_CACHE_INVAL"
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/exception.json
@ -1,10 +1,10 @@
 [
    {
-        "ArchStdEvent": "EXC_TAKEN",
+        "ArchStdEvent": "EXC_TAKEN"
    },
    {
        "PublicDescription": "This event counts any correctable or uncorrectable memory error (ECC or parity) in the protected core RAMs",
-        "ArchStdEvent": "MEMORY_ERROR",
+        "ArchStdEvent": "MEMORY_ERROR"
    },
    {
        "ArchStdEvent": "EXC_DABORT"
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/instruction.json
@ -1,32 +1,32 @@
 [
    {
-        "ArchStdEvent": "SW_INCR",
+        "ArchStdEvent": "SW_INCR"
    },
    {
        "PublicDescription": "This event counts all retired instructions, including those that fail their condition check.",
-        "ArchStdEvent": "INST_RETIRED",
+        "ArchStdEvent": "INST_RETIRED"
    },
    {
-        "ArchStdEvent": "EXC_RETURN",
+        "ArchStdEvent": "EXC_RETURN"
    },
    {
        "PublicDescription": "This event only counts writes to CONTEXTIDR in AArch32 state, and via the CONTEXTIDR_EL1 mnemonic in AArch64 state.",
-        "ArchStdEvent": "CID_WRITE_RETIRED",
+        "ArchStdEvent": "CID_WRITE_RETIRED"
    },
    {
-        "ArchStdEvent": "INST_SPEC",
+        "ArchStdEvent": "INST_SPEC"
    },
    {
        "PublicDescription": "This event only counts writes to TTBR0/TTBR1 in AArch32 state and TTBR0_EL1/TTBR1_EL1 in AArch64 state.",
-        "ArchStdEvent": "TTBR_WRITE_RETIRED",
+        "ArchStdEvent": "TTBR_WRITE_RETIRED"
    },
-    {,
+    {
        "PublicDescription": "This event counts all branches, taken or not. This excludes exception entries, debug entries and CCFAIL branches.",
-        "ArchStdEvent": "BR_RETIRED",
+        "ArchStdEvent": "BR_RETIRED"
    },
    {
        "PublicDescription": "This event counts any branch counted by BR_RETIRED which is not correctly predicted and causes a pipeline flush.",
-        "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED"
    },
    {
        "ArchStdEvent": "ASE_SPEC"
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/memory.json
@ -1,7 +1,7 @@
 [
    {
        "PublicDescription": "This event counts memory accesses due to load or store instructions. This event counts the sum of MEM_ACCESS_RD and MEM_ACCESS_WR.",
-        "ArchStdEvent": "MEM_ACCESS",
+        "ArchStdEvent": "MEM_ACCESS"
    },
    {
         "ArchStdEvent": "MEM_ACCESS_RD"
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/other.json
@ -1,5 +1,5 @@
 [
    {
-        "ArchStdEvent": "REMOTE_ACCESS",
+        "ArchStdEvent": "REMOTE_ACCESS"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a76-n1/pipeline.json
@ -1,10 +1,10 @@
 [
    {
        "PublicDescription": "The counter counts on any cycle when there are no fetched instructions available to dispatch.",
-        "ArchStdEvent": "STALL_FRONTEND",
+        "ArchStdEvent": "STALL_FRONTEND"
    },
    {
        "PublicDescription": "The counter counts on any cycle fetched instructions are not dispatched due to resource constraints.",
-        "ArchStdEvent": "STALL_BACKEND",
+        "ArchStdEvent": "STALL_BACKEND"
    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/branch.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/branch.json
@ -0,0 +1,8 @@
+[
+    {
+        "ArchStdEvent": "BR_MIS_PRED"
+    },
+    {
+        "ArchStdEvent": "BR_PRED"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/bus.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/bus.json
@ -0,0 +1,20 @@
+[
+    {
+        "ArchStdEvent": "CPU_CYCLES"
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS"
+    },
+    {
+        "ArchStdEvent": "BUS_CYCLES"
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_RD"
+    },
+    {
+        "ArchStdEvent": "BUS_ACCESS_WR"
+    },
+    {
+        "ArchStdEvent": "CNT_CYCLES"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/cache.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/cache.json
@ -0,0 +1,155 @@
+[
+    {
+        "ArchStdEvent": "L1I_CACHE_REFILL"
+    },
+    {
+        "ArchStdEvent": "L1I_TLB_REFILL"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL"
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_ALLOCATE"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB"
+    },
+    {
+        "ArchStdEvent": "L1I_TLB"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_ALLOCATE"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_REFILL"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB"
+    },
+    {
+        "ArchStdEvent": "DTLB_WALK"
+    },
+    {
+        "ArchStdEvent": "ITLB_WALK"
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_RD"
+    },
+    {
+        "ArchStdEvent": "LL_CACHE_MISS_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_LMISS_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WR"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_WR"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_INNER"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_REFILL_OUTER"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_VICTIM"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_WB_CLEAN"
+    },
+    {
+        "ArchStdEvent": "L1D_CACHE_INVAL"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_REFILL_WR"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_RD"
+    },
+    {
+        "ArchStdEvent": "L1D_TLB_WR"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_RD"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WR"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_RD"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_REFILL_WR"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_VICTIM"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_WB_CLEAN"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_INVAL"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_RD"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_REFILL_WR"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_RD"
+    },
+    {
+        "ArchStdEvent": "L2D_TLB_WR"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_RD"
+    },
+    {
+        "ArchStdEvent": "L1I_CACHE_LMISS"
+    },
+    {
+        "ArchStdEvent": "L2D_CACHE_LMISS_RD"
+    },
+    {
+        "ArchStdEvent": "L3D_CACHE_LMISS_RD"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/exception.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/exception.json
@ -0,0 +1,47 @@
+[
+    {
+        "ArchStdEvent": "EXC_TAKEN"
+    },
+    {
+        "ArchStdEvent": "MEMORY_ERROR"
+    },
+    {
+        "ArchStdEvent": "EXC_UNDEF"
+    },
+    {
+        "ArchStdEvent": "EXC_SVC"
+    },
+    {
+        "ArchStdEvent": "EXC_PABORT"
+    },
+    {
+        "ArchStdEvent": "EXC_DABORT"
+    },
+    {
+        "ArchStdEvent": "EXC_IRQ"
+    },
+    {
+        "ArchStdEvent": "EXC_FIQ"
+    },
+    {
+        "ArchStdEvent": "EXC_SMC"
+    },
+    {
+        "ArchStdEvent": "EXC_HVC"
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_PABORT"
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_DABORT"
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_OTHER"
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_IRQ"
+    },
+    {
+        "ArchStdEvent": "EXC_TRAP_FIQ"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/instruction.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/instruction.json
@ -0,0 +1,89 @@
+[
+    {
+        "ArchStdEvent": "SW_INCR"
+    },
+    {
+        "ArchStdEvent": "INST_RETIRED"
+    },
+    {
+        "ArchStdEvent": "EXC_RETURN"
+    },
+    {
+        "ArchStdEvent": "CID_WRITE_RETIRED"
+    },
+    {
+        "ArchStdEvent": "INST_SPEC"
+    },
+    {
+        "ArchStdEvent": "TTBR_WRITE_RETIRED"
+    },
+    {
+        "ArchStdEvent": "BR_RETIRED"
+    },
+    {
+        "ArchStdEvent": "BR_MIS_PRED_RETIRED"
+    },
+    {
+        "ArchStdEvent": "OP_RETIRED"
+    },
+    {
+        "ArchStdEvent": "OP_SPEC"
+    },
+    {
+        "ArchStdEvent": "LDREX_SPEC"
+    },
+    {
+        "ArchStdEvent": "STREX_PASS_SPEC"
+    },
+    {
+        "ArchStdEvent": "STREX_FAIL_SPEC"
+    },
+    {
+        "ArchStdEvent": "STREX_SPEC"
+    },
+    {
+        "ArchStdEvent": "LD_SPEC"
+    },
+    {
+        "ArchStdEvent": "ST_SPEC"
+    },
+    {
+        "ArchStdEvent": "DP_SPEC"
+    },
+    {
+        "ArchStdEvent": "ASE_SPEC"
+    },
+    {
+        "ArchStdEvent": "VFP_SPEC"
+    },
+    {
+        "ArchStdEvent": "PC_WRITE_SPEC"
+    },
+    {
+        "ArchStdEvent": "CRYPTO_SPEC"
+    },
+    {
+        "ArchStdEvent": "BR_IMMED_SPEC"
+    },
+    {
+        "ArchStdEvent": "BR_RETURN_SPEC"
+    },
+    {
+        "ArchStdEvent": "BR_INDIRECT_SPEC"
+    },
+    {
+        "ArchStdEvent": "ISB_SPEC"
+    },
+    {
+        "ArchStdEvent": "DSB_SPEC"
+    },
+    {
+        "ArchStdEvent": "DMB_SPEC"
+    },
+    {
+        "ArchStdEvent": "RC_LD_SPEC"
+    },
+    {
+        "ArchStdEvent": "RC_ST_SPEC"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/memory.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/memory.json
@ -0,0 +1,20 @@
+[
+    {
+        "ArchStdEvent": "MEM_ACCESS"
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_RD"
+    },
+    {
+        "ArchStdEvent": "MEM_ACCESS_WR"
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LD_SPEC"
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_ST_SPEC"
+    },
+    {
+        "ArchStdEvent": "UNALIGNED_LDST_SPEC"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/other.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/other.json
@ -0,0 +1,5 @@
+[
+    {
+        "ArchStdEvent": "REMOTE_ACCESS"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/pipeline.json
+++ b/tools/perf/pmu-events/arch/arm64/arm/neoverse-v1/pipeline.json
@ -0,0 +1,23 @@
+[
+    {
+        "ArchStdEvent": "STALL_FRONTEND"
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND"
+    },
+    {
+        "ArchStdEvent": "STALL"
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_BACKEND"
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT_FRONTEND"
+    },
+    {
+        "ArchStdEvent": "STALL_SLOT"
+    },
+    {
+        "ArchStdEvent": "STALL_BACKEND_MEM"
+    }
+]
--- a/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
+++ b/tools/perf/pmu-events/arch/arm64/armv8-common-and-microarch.json
@ -257,6 +257,78 @@
        "EventName": "LL_CACHE_MISS_RD",
        "BriefDescription": "Last level cache miss, read"
    },
+    {
+        "PublicDescription": "Level 1 data cache long-latency read miss.  The counter counts each memory read access counted by L1D_CACHE that incurs additional latency because it returns data from outside the Level 1 data or unified cache of this processing element.",
+        "EventCode": "0x39",
+        "EventName": "L1D_CACHE_LMISS_RD",
+        "BriefDescription": "Level 1 data cache long-latency read miss"
+    },
+    {
+        "PublicDescription": "Micro-operation architecturally executed.  The counter counts each operation counted by OP_SPEC that would be executed in a simple sequential execution of the program.",
+        "EventCode": "0x3A",
+        "EventName": "OP_RETIRED",
+        "BriefDescription": "Micro-operation architecturally executed"
+    },
+    {
+        "PublicDescription": "Micro-operation speculatively executed.  The counter counts the number of operations executed by the processing element, including those that are executed speculatively and would not be executed in a simple sequential execution of the program.",
+        "EventCode": "0x3B",
+        "EventName": "OP_SPEC",
+        "BriefDescription": "Micro-operation speculatively executed"
+    },
+    {
+        "PublicDescription": "No operation sent for execution.  The counter counts every attributable cycle on which no attributable instruction or operation was sent for execution on this processing element.",
+        "EventCode": "0x3C",
+        "EventName": "STALL",
+        "BriefDescription": "No operation sent for execution"
+    },
+    {
+        "PublicDescription": "No operation sent for execution on a slot due to the backend.  Counts each slot counted by STALL_SLOT where no attributable instruction or operation was sent for execution because the backend is unable to accept it.",
+        "EventCode": "0x3D",
+        "EventName": "STALL_SLOT_BACKEND",
+        "BriefDescription": "No operation sent for execution on a slot due to the backend"
+    },
+    {
+        "PublicDescription": "No operation sent for execution on a slot due to the frontend.  Counts each slot counted by STALL_SLOT where no attributable instruction or operation was sent for execution because there was no attributable instruction or operation available to issue from the processing element from the frontend for the slot.",
+        "EventCode": "0x3E",
+        "EventName": "STALL_SLOT_FRONTEND",
+        "BriefDescription": "No operation sent for execution on a slot due to the frontend"
+    },
+    {
+        "PublicDescription": "No operation sent for execution on a slot.  The counter counts on each attributable cycle the number of instruction or operation slots that were not occupied by an instruction or operation attributable to the processing element.",
+        "EventCode": "0x3F",
+        "EventName": "STALL_SLOT",
+        "BriefDescription": "No operation sent for execution on a slot"
+    },
+    {
+        "PublicDescription": "Constant frequency cycles.  The counter increments at a constant frequency equal to the rate of increment of the system counter, CNTPCT_EL0.",
+        "EventCode": "0x4004",
+        "EventName": "CNT_CYCLES",
+        "BriefDescription": "Constant frequency cycles"
+    },
+    {
+        "PublicDescription": "Memory stall cycles.  The counter counts each cycle counted by STALL_BACKEND where there is a cache miss in the last level of cache within the processing element clock domain",
+        "EventCode": "0x4005",
+        "EventName": "STALL_BACKEND_MEM",
+        "BriefDescription": "Memory stall cycles"
+    },
+    {
+        "PublicDescription": "Level 1 instruction cache long-latency read miss.  If the L1I_CACHE_RD event is implemented, the counter counts each access counted by L1I_CACHE_RD that incurs additional latency because it returns instructions from outside of the Level 1 instruction cache of this PE.  If the L1I_CACHE_RD event is not implemented, the counter counts each access counted by L1I_CACHE that incurs additional latency because it returns instructions from outside the Level 1 instruction cache of this PE.  The event indicates to software that the access missed in the Level 1 instruction cache and might have a significant performance impact due to the additional latency, compared to the latency of an access that hits in the Level 1 instruction cache.",
+        "EventCode": "0x4006",
+        "EventName": "L1I_CACHE_LMISS",
+        "BriefDescription": "Level 1 instruction cache long-latency read miss"
+    },
+    {
+        "PublicDescription": "Level 2 data cache long-latency read miss.  The counter counts each memory read access counted by L2D_CACHE that incurs additional latency because it returns data from outside the Level 2 data or unified cache of this processing element.  The event indicates to software that the access missed in the Level 2 data or unified cache and might have a significant performance impact compared to the latency of an access that hits in the Level 2 data or unified cache.",
+        "EventCode": "0x4009",
+        "EventName": "L2D_CACHE_LMISS_RD",
+        "BriefDescription": "Level 2 data cache long-latency read miss"
+    },
+    {
+        "PublicDescription": "Level 3 data cache long-latency read miss.  The counter counts each memory read access counted by L3D_CACHE that incurs additional latency because it returns data from outside the Level 3 data or unified cache of this processing element.  The event indicates to software that the access missed in the Level 3 data or unified cache and might have a significant performance impact compared to the latency of an access that hits in the Level 3 data or unified cache.",
+        "EventCode": "0x400B",
+        "EventName": "L3D_CACHE_LMISS_RD",
+        "BriefDescription": "Level 3 data cache long-latency read miss"
+    },
    {
        "PublicDescription": "SIMD Instruction architecturally executed.",
        "EventCode": "0x8000",
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
@ -229,5 +229,5 @@
        "BriefDescription": "Store bound L3 topdown metric",
        "MetricGroup": "TopDownL3",
        "MetricName": "store_bound"
-    },
+    }
 ]
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json
@ -1,56 +1,56 @@
 [
   {
-	    "EventCode": "0x00",
-	    "EventName": "uncore_hisi_ddrc.flux_wr",
+	    "ConfigCode": "0x00",
+	    "EventName": "flux_wr",
 	    "BriefDescription": "DDRC total write operations",
 	    "PublicDescription": "DDRC total write operations",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x01",
-	    "EventName": "uncore_hisi_ddrc.flux_rd",
+	    "ConfigCode": "0x01",
+	    "EventName": "flux_rd",
 	    "BriefDescription": "DDRC total read operations",
 	    "PublicDescription": "DDRC total read operations",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x02",
-	    "EventName": "uncore_hisi_ddrc.flux_wcmd",
+	    "ConfigCode": "0x02",
+	    "EventName": "flux_wcmd",
 	    "BriefDescription": "DDRC write commands",
 	    "PublicDescription": "DDRC write commands",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x03",
-	    "EventName": "uncore_hisi_ddrc.flux_rcmd",
+	    "ConfigCode": "0x03",
+	    "EventName": "flux_rcmd",
 	    "BriefDescription": "DDRC read commands",
 	    "PublicDescription": "DDRC read commands",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x04",
-	    "EventName": "uncore_hisi_ddrc.pre_cmd",
+	    "ConfigCode": "0x04",
+	    "EventName": "pre_cmd",
 	    "BriefDescription": "DDRC precharge commands",
 	    "PublicDescription": "DDRC precharge commands",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x05",
-	    "EventName": "uncore_hisi_ddrc.act_cmd",
+	    "ConfigCode": "0x05",
+	    "EventName": "act_cmd",
 	    "BriefDescription": "DDRC active commands",
 	    "PublicDescription": "DDRC active commands",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x06",
-	    "EventName": "uncore_hisi_ddrc.rnk_chg",
+	    "ConfigCode": "0x06",
+	    "EventName": "rnk_chg",
 	    "BriefDescription": "DDRC rank commands",
 	    "PublicDescription": "DDRC rank commands",
 	    "Unit": "hisi_sccl,ddrc"
   },
   {
-	    "EventCode": "0x07",
-	    "EventName": "uncore_hisi_ddrc.rw_chg",
+	    "ConfigCode": "0x07",
+	    "EventName": "rw_chg",
 	    "BriefDescription": "DDRC read and write changes",
 	    "PublicDescription": "DDRC read and write changes",
 	    "Unit": "hisi_sccl,ddrc"
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json
@ -1,72 +1,152 @@
 [
   {
-	    "EventCode": "0x00",
-	    "EventName": "uncore_hisi_hha.rx_ops_num",
+	    "ConfigCode": "0x00",
+	    "EventName": "rx_ops_num",
 	    "BriefDescription": "The number of all operations received by the HHA",
 	    "PublicDescription": "The number of all operations received by the HHA",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x01",
-	    "EventName": "uncore_hisi_hha.rx_outer",
+	    "ConfigCode": "0x01",
+	    "EventName": "rx_outer",
 	    "BriefDescription": "The number of all operations received by the HHA from another socket",
 	    "PublicDescription": "The number of all operations received by the HHA from another socket",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x02",
-	    "EventName": "uncore_hisi_hha.rx_sccl",
+	    "ConfigCode": "0x02",
+	    "EventName": "rx_sccl",
 	    "BriefDescription": "The number of all operations received by the HHA from another SCCL in this socket",
 	    "PublicDescription": "The number of all operations received by the HHA from another SCCL in this socket",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x03",
-	    "EventName": "uncore_hisi_hha.rx_ccix",
+	    "ConfigCode": "0x03",
+	    "EventName": "rx_ccix",
 	    "BriefDescription": "Count of the number of operations that HHA has received from CCIX",
 	    "PublicDescription": "Count of the number of operations that HHA has received from CCIX",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x1c",
-	    "EventName": "uncore_hisi_hha.rd_ddr_64b",
+	    "ConfigCode": "0x4",
+	    "EventName": "rx_wbi",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x5",
+	    "EventName": "rx_wbip",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x11",
+	    "EventName": "rx_wtistash",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x1c",
+	    "EventName": "rd_ddr_64b",
 	    "BriefDescription": "The number of read operations sent by HHA to DDRC which size is 64 bytes",
 	    "PublicDescription": "The number of read operations sent by HHA to DDRC which size is 64bytes",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x1d",
-	    "EventName": "uncore_hisi_hha.wr_ddr_64b",
+	    "ConfigCode": "0x1d",
+	    "EventName": "wr_ddr_64b",
 	    "BriefDescription": "The number of write operations sent by HHA to DDRC which size is 64 bytes",
 	    "PublicDescription": "The number of write operations sent by HHA to DDRC which size is 64 bytes",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x1e",
-	    "EventName": "uncore_hisi_hha.rd_ddr_128b",
+	    "ConfigCode": "0x1e",
+	    "EventName": "rd_ddr_128b",
 	    "BriefDescription": "The number of read operations sent by HHA to DDRC which size is 128 bytes",
 	    "PublicDescription": "The number of read operations sent by HHA to DDRC which size is 128 bytes",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x1f",
-	    "EventName": "uncore_hisi_hha.wr_ddr_128b",
+	    "ConfigCode": "0x1f",
+	    "EventName": "wr_ddr_128b",
 	    "BriefDescription": "The number of write operations sent by HHA to DDRC which size is 128 bytes",
 	    "PublicDescription": "The number of write operations sent by HHA to DDRC which size is 128 bytes",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x20",
-	    "EventName": "uncore_hisi_hha.spill_num",
+	    "ConfigCode": "0x20",
+	    "EventName": "spill_num",
 	    "BriefDescription": "Count of the number of spill operations that the HHA has sent",
 	    "PublicDescription": "Count of the number of spill operations that the HHA has sent",
 	    "Unit": "hisi_sccl,hha"
   },
   {
-	    "EventCode": "0x21",
-	    "EventName": "uncore_hisi_hha.spill_success",
+	    "ConfigCode": "0x21",
+	    "EventName": "spill_success",
 	    "BriefDescription": "Count of the number of successful spill operations that the HHA has sent",
 	    "PublicDescription": "Count of the number of successful spill operations that the HHA has sent",
 	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x23",
+	    "EventName": "bi_num",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x32",
+	    "EventName": "mediated_num",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x33",
+	    "EventName": "tx_snp_num",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x34",
+	    "EventName": "tx_snp_outer",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x35",
+	    "EventName": "tx_snp_ccix",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x38",
+	    "EventName": "rx_snprspdata",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x3c",
+	    "EventName": "rx_snprsp_outer",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x40",
+	    "EventName": "sdir-lookup",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x41",
+	    "EventName": "edir-lookup",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x42",
+	    "EventName": "sdir-hit",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x43",
+	    "EventName": "edir-hit",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x4c",
+	    "EventName": "sdir-home-migrate",
+	    "Unit": "hisi_sccl,hha"
+   },
+   {
+	    "ConfigCode": "0x4d",
+	    "EventName": "edir-home-migrate",
+	    "Unit": "hisi_sccl,hha"
   }
 ]
--- a/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json
+++ b/tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json
@ -1,91 +1,91 @@
 [
   {
-	    "EventCode": "0x00",
-	    "EventName": "uncore_hisi_l3c.rd_cpipe",
+	    "ConfigCode": "0x00",
+	    "EventName": "rd_cpipe",
 	    "BriefDescription": "Total read accesses",
 	    "PublicDescription": "Total read accesses",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x01",
-	    "EventName": "uncore_hisi_l3c.wr_cpipe",
+	    "ConfigCode": "0x01",
+	    "EventName": "wr_cpipe",
 	    "BriefDescription": "Total write accesses",
 	    "PublicDescription": "Total write accesses",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x02",
-	    "EventName": "uncore_hisi_l3c.rd_hit_cpipe",
+	    "ConfigCode": "0x02",
+	    "EventName": "rd_hit_cpipe",
 	    "BriefDescription": "Total read hits",
 	    "PublicDescription": "Total read hits",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x03",
-	    "EventName": "uncore_hisi_l3c.wr_hit_cpipe",
+	    "ConfigCode": "0x03",
+	    "EventName": "wr_hit_cpipe",
 	    "BriefDescription": "Total write hits",
 	    "PublicDescription": "Total write hits",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x04",
-	    "EventName": "uncore_hisi_l3c.victim_num",
+	    "ConfigCode": "0x04",
+	    "EventName": "victim_num",
 	    "BriefDescription": "l3c precharge commands",
 	    "PublicDescription": "l3c precharge commands",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x20",
-	    "EventName": "uncore_hisi_l3c.rd_spipe",
+	    "ConfigCode": "0x20",
+	    "EventName": "rd_spipe",
 	    "BriefDescription": "Count of the number of read lines that come from this cluster of CPU core in spipe",
 	    "PublicDescription": "Count of the number of read lines that come from this cluster of CPU core in spipe",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x21",
-	    "EventName": "uncore_hisi_l3c.wr_spipe",
+	    "ConfigCode": "0x21",
+	    "EventName": "wr_spipe",
 	    "BriefDescription": "Count of the number of write lines that come from this cluster of CPU core in spipe",
 	    "PublicDescription": "Count of the number of write lines that come from this cluster of CPU core in spipe",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x22",
-	    "EventName": "uncore_hisi_l3c.rd_hit_spipe",
+	    "ConfigCode": "0x22",
+	    "EventName": "rd_hit_spipe",
 	    "BriefDescription": "Count of the number of read lines that hits in spipe of this L3C",
 	    "PublicDescription": "Count of the number of read lines that hits in spipe of this L3C",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x23",
-	    "EventName": "uncore_hisi_l3c.wr_hit_spipe",
+	    "ConfigCode": "0x23",
+	    "EventName": "wr_hit_spipe",
 	    "BriefDescription": "Count of the number of write lines that hits in spipe of this L3C",
 	    "PublicDescription": "Count of the number of write lines that hits in spipe of this L3C",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x29",
-	    "EventName": "uncore_hisi_l3c.back_invalid",
+	    "ConfigCode": "0x29",
+	    "EventName": "back_invalid",
 	    "BriefDescription": "Count of the number of L3C back invalid operations",
 	    "PublicDescription": "Count of the number of L3C back invalid operations",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x40",
-	    "EventName": "uncore_hisi_l3c.retry_cpu",
+	    "ConfigCode": "0x40",
+	    "EventName": "retry_cpu",
 	    "BriefDescription": "Count of the number of retry that L3C suppresses the CPU operations",
 	    "PublicDescription": "Count of the number of retry that L3C suppresses the CPU operations",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x41",
-	    "EventName": "uncore_hisi_l3c.retry_ring",
+	    "ConfigCode": "0x41",
+	    "EventName": "retry_ring",
 	    "BriefDescription": "Count of the number of retry that L3C suppresses the ring operations",
 	    "PublicDescription": "Count of the number of retry that L3C suppresses the ring operations",
 	    "Unit": "hisi_sccl,l3c"
   },
   {
-	    "EventCode": "0x42",
-	    "EventName": "uncore_hisi_l3c.prefetch_drop",
+	    "ConfigCode": "0x42",
+	    "EventName": "prefetch_drop",
 	    "BriefDescription": "Count of the number of prefetch drops from this L3C",
 	    "PublicDescription": "Count of the number of prefetch drops from this L3C",
 	    "Unit": "hisi_sccl,l3c"
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@ -18,6 +18,7 @@
 0x00000000410fd080,v1,arm/cortex-a57-a72,core
 0x00000000410fd0b0,v1,arm/cortex-a76-n1,core
 0x00000000410fd0c0,v1,arm/cortex-a76-n1,core
+0x00000000410fd400,v1,arm/neoverse-v1,core
 0x00000000420f5160,v1,cavium/thunderx2,core
 0x00000000430f0af0,v1,cavium/thunderx2,core
 0x00000000460f0010,v1,fujitsu/a64fx,core
--- a/tools/perf/pmu-events/arch/nds32/n13/atcpmu.json
+++ b/tools/perf/pmu-events/arch/nds32/n13/atcpmu.json
@ -286,5 +286,5 @@
    "EventCode": "0x21e",
    "EventName": "pop25_inst",
    "BriefDescription": "V3 POP25 instructions"
-  },
+  }
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z10/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/basic.json
@ -82,5 +82,5 @@
 		"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
 		"BriefDescription": "Problem-State L1D Penalty Cycles",
 		"PublicDescription": "Problem-State Level-1 D-Cache Penalty Cycle Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z10/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z10/extended.json
@ -124,5 +124,5 @@
 		"EventName": "L2C_STORES_SENT",
 		"BriefDescription": "L2C Stores Sent",
 		"PublicDescription": "Incremented by one for every store sent to Level-2 (L1.5) cache"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z13/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/basic.json
@ -82,5 +82,5 @@
 		"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
 		"BriefDescription": "Problem-State L1D Penalty Cycles",
 		"PublicDescription": "Problem-State Level-1 D-Cache Penalty Cycle Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z13/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z13/extended.json
@ -390,5 +390,5 @@
 		"EventName": "MT_DIAG_CYCLES_TWO_THR_ACTIVE",
 		"BriefDescription": "Cycle count with two threads active",
 		"PublicDescription": "Cycle count with two threads active"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z14/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/basic.json
@ -54,5 +54,5 @@
 		"EventName": "PROBLEM_STATE_INSTRUCTIONS",
 		"BriefDescription": "Problem-State Instructions",
 		"PublicDescription": "Problem-State Instruction Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z14/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z14/extended.json
@ -369,5 +369,5 @@
 		"EventName": "MT_DIAG_CYCLES_TWO_THR_ACTIVE",
 		"BriefDescription": "Cycle count with two threads active",
 		"PublicDescription": "Cycle count with two threads active"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z15/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z15/basic.json
@ -54,5 +54,5 @@
 		"EventName": "PROBLEM_STATE_INSTRUCTIONS",
 		"BriefDescription": "Problem-State Instructions",
 		"PublicDescription": "Problem-State Instruction Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z15/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z15/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z15/crypto6.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z15/crypto6.json
@ -26,5 +26,5 @@
 		"EventName": "ECC_BLOCKED_CYCLES_COUNT",
 		"BriefDescription": "ECC Blocked Cycles Count",
 		"PublicDescription": "This counter counts the total number of CPU cycles blocked for the elliptic-curve cryptography (ECC) functions issued by the CPU because the ECC coprocessor is busy performing a function issued by another CPU."
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z15/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z15/extended.json
@ -397,5 +397,5 @@
 		"EventName": "MT_DIAG_CYCLES_TWO_THR_ACTIVE",
 		"BriefDescription": "Cycle count with two threads active",
 		"PublicDescription": "Cycle count with two threads active"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z196/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/basic.json
@ -82,5 +82,5 @@
 		"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
 		"BriefDescription": "Problem-State L1D Penalty Cycles",
 		"PublicDescription": "Problem-State Level-1 D-Cache Penalty Cycle Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_z196/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_z196/extended.json
@ -166,5 +166,5 @@
 		"EventName": "L1I_OFFCHIP_L3_SOURCED_WRITES",
 		"BriefDescription": "L1I Off-Chip L3 Sourced Writes",
 		"PublicDescription": "A directory write to the Level-1 I-Cache directory where the returned cache line was sourced from an Off Chip/On Book Level-3 cache"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/basic.json
@ -82,5 +82,5 @@
 		"EventName": "PROBLEM_STATE_L1D_PENALTY_CYCLES",
 		"BriefDescription": "Problem-State L1D Penalty Cycles",
 		"PublicDescription": "Problem-State Level-1 D-Cache Penalty Cycle Count"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/crypto.json
@ -110,5 +110,5 @@
 		"EventName": "AES_BLOCKED_CYCLES",
 		"BriefDescription": "AES Blocked Cycles",
 		"PublicDescription": "Total number of CPU cycles blocked for the AES functions issued by the CPU because the DEA/AES coprocessor is busy performing a function issued by another CPU"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json
+++ b/tools/perf/pmu-events/arch/s390/cf_zec12/extended.json
@ -243,5 +243,5 @@
 		"EventName": "TX_C_TABORT_SPECIAL",
 		"BriefDescription": "Aborted transactions in constrained TX mode using special completion logic",
 		"PublicDescription": "A transaction abort has occurred in a constrained transactional-execution mode and the CPU is using special logic to allow the transaction to complete"
-	},
+	}
 ]
--- a/tools/perf/pmu-events/arch/test/test_soc/cpu/uncore.json
+++ b/tools/perf/pmu-events/arch/test/test_soc/cpu/uncore.json
@ -38,5 +38,5 @@
 	    "BriefDescription": "Total cache hits",
 	    "PublicDescription": "Total cache hits",
 	    "Unit": "imc"
-  },
+  }
 ]
--- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
+++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json
@ -6,4 +6,11 @@
           "Unit": "sys_ddr_pmu",
           "Compat": "v8"
   },
+   {
+           "BriefDescription": "ccn read-cycles event",
+           "ConfigCode": "0x2c",
+           "EventName": "sys_ccn_pmu.read_cycles",
+           "Unit": "sys_ccn_pmu",
+           "Compat": "0x01"
+   }
 ]
--- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
+++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json
@ -311,5 +311,5 @@
        "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
        "MetricGroup": "Power",
        "MetricName": "C6_Pkg_Residency"
-    },
+    }
 ]
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@ -45,6 +45,7 @@
 #include <sys/resource.h>		/* getrlimit */
 #include <ftw.h>
 #include <sys/stat.h>
+#include <linux/compiler.h>
 #include <linux/list.h>
 #include "jsmn.h"
 #include "json.h"
@ -70,7 +71,7 @@ struct json_event {
 	char *metric_constraint;
 };

-enum aggr_mode_class convert(const char *aggr_mode)
+static enum aggr_mode_class convert(const char *aggr_mode)
 {
 	if (!strcmp(aggr_mode, "PerCore"))
 		return PerCore;
@ -81,8 +82,6 @@ enum aggr_mode_class convert(const char *aggr_mode)
 	return -1;
 }

-typedef int (*func)(void *data, struct json_event *je);
-
 static LIST_HEAD(sys_event_tables);

 struct sys_event_table {
@ -361,7 +360,7 @@ static int close_table;

 static void print_events_table_prefix(FILE *fp, const char *tblname)
 {
-	fprintf(fp, "struct pmu_event %s[] = {\n", tblname);
+	fprintf(fp, "static const struct pmu_event %s[] = {\n", tblname);
 	close_table = 1;
 }

@ -369,7 +368,7 @@ static int print_events_table_entry(void *data, struct json_event *je)
 {
 	struct perf_entry_data *pd = data;
 	FILE *outfp = pd->outfp;
-	char *topic = pd->topic;
+	char *topic_local = pd->topic;

 	/*
 	 * TODO: Remove formatting chars after debugging to reduce
@ -384,7 +383,7 @@ static int print_events_table_entry(void *data, struct json_event *je)
 	fprintf(outfp, "\t.desc = \"%s\",\n", je->desc);
 	if (je->compat)
 		fprintf(outfp, "\t.compat = \"%s\",\n", je->compat);
-	fprintf(outfp, "\t.topic = \"%s\",\n", topic);
+	fprintf(outfp, "\t.topic = \"%s\",\n", topic_local);
 	if (je->long_desc && je->long_desc[0])
 		fprintf(outfp, "\t.long_desc = \"%s\",\n", je->long_desc);
 	if (je->pmu)
@ -470,7 +469,7 @@ static void free_arch_std_events(void)
 	}
 }

-static int save_arch_std_events(void *data, struct json_event *je)
+static int save_arch_std_events(void *data __maybe_unused, struct json_event *je)
 {
 	struct event_struct *es;

@ -575,10 +574,12 @@ static int json_events(const char *fn,
 		struct json_event je = {};
 		char *arch_std = NULL;
 		unsigned long long eventcode = 0;
+		unsigned long long configcode = 0;
 		struct msrmap *msr = NULL;
 		jsmntok_t *msrval = NULL;
 		jsmntok_t *precise = NULL;
 		jsmntok_t *obj = tok++;
+		bool configcode_present = false;

 		EXPECT(obj->type == JSMN_OBJECT, obj, "expected object");
 		for (j = 0; j < obj->size; j += 2) {
@ -601,6 +602,12 @@ static int json_events(const char *fn,
 				addfield(map, &code, "", "", val);
 				eventcode |= strtoul(code, NULL, 0);
 				free(code);
+			} else if (json_streq(map, field, "ConfigCode")) {
+				char *code = NULL;
+				addfield(map, &code, "", "", val);
+				configcode |= strtoul(code, NULL, 0);
+				free(code);
+				configcode_present = true;
 			} else if (json_streq(map, field, "ExtSel")) {
 				char *code = NULL;
 				addfield(map, &code, "", "", val);
@ -682,7 +689,10 @@ static int json_events(const char *fn,
 				addfield(map, &extra_desc, " ",
 						"(Precise event)", NULL);
 		}
-		snprintf(buf, sizeof buf, "event=%#llx", eventcode);
+		if (configcode_present)
+			snprintf(buf, sizeof buf, "config=%#llx", configcode);
+		else
+			snprintf(buf, sizeof buf, "event=%#llx", eventcode);
 		addfield(map, &event, ",", buf, NULL);
 		if (je.desc && extra_desc)
 			addfield(map, &je.desc, " ", extra_desc, NULL);
@ -786,7 +796,7 @@ static bool is_sys_dir(char *fname)

 static void print_mapping_table_prefix(FILE *outfp)
 {
-	fprintf(outfp, "struct pmu_events_map pmu_events_map[] = {\n");
+	fprintf(outfp, "const struct pmu_events_map pmu_events_map[] = {\n");
 }

 static void print_mapping_table_suffix(FILE *outfp)
@ -820,7 +830,7 @@ static void print_mapping_test_table(FILE *outfp)

 static void print_system_event_mapping_table_prefix(FILE *outfp)
 {
-	fprintf(outfp, "\nstruct pmu_sys_events pmu_sys_event_tables[] = {");
+	fprintf(outfp, "\nconst struct pmu_sys_events pmu_sys_event_tables[] = {");
 }

 static void print_system_event_mapping_table_suffix(FILE *outfp)
@ -1196,7 +1206,7 @@ int main(int argc, char *argv[])
 	const char *arch;
 	const char *output_file;
 	const char *start_dirname;
-	char *err_string_ext = "";
+	const char *err_string_ext = "";
 	struct stat stbuf;

 	prog = basename(argv[0]);
--- a/tools/perf/pmu-events/jsmn.c
+++ b/tools/perf/pmu-events/jsmn.c
@ -24,6 +24,7 @@

 #include <stdlib.h>
 #include "jsmn.h"
+#define JSMN_STRICT

 /*
 * Allocates a fresh unused token from the token pool.
@ -176,6 +177,14 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 	jsmnerr_t r;
 	int i;
 	jsmntok_t *token;
+#ifdef JSMN_STRICT
+	/*
+	 * Keeps track of whether a new object/list/primitive is expected. New items are only
+	 * allowed after an opening brace, comma or colon. A closing brace after a comma is not
+	 * valid JSON.
+	 */
+	int expecting_item = 1;
+#endif

 	for (; parser->pos < len; parser->pos++) {
 		char c;
@ -185,6 +194,10 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 		switch (c) {
 		case '{':
 		case '[':
+#ifdef JSMN_STRICT
+			if (!expecting_item)
+				return JSMN_ERROR_INVAL;
+#endif
 			token = jsmn_alloc_token(parser, tokens, num_tokens);
 			if (token == NULL)
 				return JSMN_ERROR_NOMEM;
@ -196,6 +209,10 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 			break;
 		case '}':
 		case ']':
+#ifdef JSMN_STRICT
+			if (expecting_item)
+				return JSMN_ERROR_INVAL;
+#endif
 			type = (c == '}' ? JSMN_OBJECT : JSMN_ARRAY);
 			for (i = parser->toknext - 1; i >= 0; i--) {
 				token = &tokens[i];
@ -219,6 +236,11 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 			}
 			break;
 		case '\"':
+#ifdef JSMN_STRICT
+			if (!expecting_item)
+				return JSMN_ERROR_INVAL;
+			expecting_item = 0;
+#endif
 			r = jsmn_parse_string(parser, js, len, tokens,
 					      num_tokens);
 			if (r < 0)
@ -229,11 +251,15 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 		case '\t':
 		case '\r':
 		case '\n':
-		case ':':
-		case ',':
 		case ' ':
 			break;
 #ifdef JSMN_STRICT
+		case ':':
+		case ',':
+			if (expecting_item)
+				return JSMN_ERROR_INVAL;
+			expecting_item = 1;
+			break;
 			/*
 			 * In strict mode primitives are:
 			 * numbers and booleans.
@ -253,6 +279,9 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 		case 'f':
 		case 'n':
 #else
+		case ':':
+		case ',':
+			break;
 			/*
 			 * In non-strict mode every unquoted value
 			 * is a primitive.
@ -260,6 +289,12 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 			/*FALL THROUGH */
 		default:
 #endif
+
+#ifdef JSMN_STRICT
+			if (!expecting_item)
+				return JSMN_ERROR_INVAL;
+			expecting_item = 0;
+#endif
 			r = jsmn_parse_primitive(parser, js, len, tokens,
 						 num_tokens);
 			if (r < 0)
@ -282,7 +317,11 @@ jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len,
 			return JSMN_ERROR_PART;
 	}

+#ifdef JSMN_STRICT
+	return expecting_item ? JSMN_ERROR_INVAL : JSMN_SUCCESS;
+#else
 	return JSMN_SUCCESS;
+#endif
 }

 /*
--- a/Show More
+++ b/Show More