linux

Author	SHA1	Message	Date
Andi Kleen	1de3152494	perf vendor events: Update JSON metrics for IvyBridge Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-23 11:20:52 -03:00
Andi Kleen	9cd6d86466	perf vendor events: Update JSON metrics for Haswell Server Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-23 11:20:52 -03:00
Andi Kleen	0fba08e249	perf vendor events: Update JSON metrics for Haswell Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-23 11:20:52 -03:00
Andi Kleen	663ad44564	perf vendor events: Update JSON metrics for Broadwell Server Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-23 11:20:52 -03:00
Andi Kleen	008de6c69c	perf vendor events: Update JSON metrics for Broadwell Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170914200748.GA13837@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-23 11:20:51 -03:00
Ingo Molnar	ca4b9c3b74	Merge branch 'perf/urgent' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-20 11:02:05 +02:00
Li Zhijian	74f8e22c15	perf test shell trace+probe_libc_inet_pton.sh: Be compatible with Debian/Ubuntu In debian/ubuntu, libc.so is located at a different place, /lib/x86_64-linux-gnu/libc-2.23.so, so it outputs like this when testing: PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.040 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.040/0.040/0.040/0.000 ms 0.000 probe_libc:inet_pton:(7f0e2db741c0)) __GI___inet_pton (/lib/x86_64-linux-gnu/libc-2.23.so) getaddrinfo (/lib/x86_64-linux-gnu/libc-2.23.so) [0xffffa9d40f34ff4d] (/bin/ping) Fix up the libc path to make sure this test works in more OSes. Committer testing: When this test fails one can use 'perf test -v', i.e. in verbose mode, where it'll show the expected backtrace, so, after applying this test: On Fedora 26: # perf test -v ping 62: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 23322 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms 0.000 probe_libc:inet_pton:(7fe344310d80)) __GI___inet_pton (/usr/lib64/libc-2.25.so) getaddrinfo (/usr/lib64/libc-2.25.so) _init (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok # Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Li Zhijian <lizhijian@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philip Li <philip.li@intel.com> Link: http://lkml.kernel.org/r/1508315649-18836-1-git-send-email-lizhijian@cn.fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-18 09:14:18 -03:00
Jin Yao	3d8bba9535	perf xyarray: Fix wrong processing when closing evsel fd In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y() returns max_x. It's confusing and for code logic it looks not correct. Error happens when closing evsel fd. Let's see this scenario: 1. Allocate an fd (pseudo-code) perf_evsel__alloc_fd(struct perf_evsel evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); } xyarray__new(int xlen, int ylen, size_t entry_size) { size_t row_size = ylen entry_size; struct xyarray xy = zalloc(sizeof(xy) + xlen * row_size); xy->entry_size = entry_size; xy->row_size = row_size; xy->entries = xlen * ylen; xy->max_x = xlen; xy->max_y = ylen; ...... } So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4. 2. Use perf syscall and get the fd int perf_evsel__open(struct perf_evsel evsel, struct cpu_map cpus, struct thread_map threads) { for (cpu = 0; cpu < cpus->nr; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], group_fd, flags); FD(evsel, cpu, thread) = fd; } } static inline void xyarray__entry(struct xyarray xy, int x, int y) { return &xy->contents[x xy->row_size + y * xy->entry_size]; } These codes don't have issues. The issue happens in the closing of fd. 3. Close fd. void perf_evsel__close_fd(struct perf_evsel evsel) { int cpu, thread; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y() returns max_x (ncpus), so above code is actually to be: for (cpu = 0; cpu < nthreads; cpu++) for (thread = 0; thread < ncpus; ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } It's not correct! This change is introduced by "475fb533fb7d" ("perf evsel: Fix buffer overflow while freeing events") This fix is to let xyarray__max_x() return max_x (ncpus) and let xyarry__max_y() return max_y (nthreads) Committer note: This was also fixed by Ravi Bangoria, who provided the same patch, noticing the problem with 'perf record': <quote Ravi> I see 'perf record -p <pid>' crashes with following log: Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 * ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c] ./perf(perf_evsel__close+0xb4)[0x4b7614] ./perf(perf_evlist__delete+0x100)[0x4ab180] ./perf(cmd_record+0x1d9)[0x43a5a9] ./perf[0x49aa2f] ./perf(main+0x631)[0x427841] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830] ./perf(_start+0x29)[0x427a59] </> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Fixes: `d74be47673` ("perf xyarray: Save max_x, max_y") Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-18 09:09:36 -03:00
Namhyung Kim	7f0cd23615	perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACE Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL pointer deref when he ran it on a data with namespace events. It was because the buildid_id__mark_dso_hit_ops lacks the namespace event handler and perf_too__fill_default() didn't set it. Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x (gdb) where #0 0x0000000000000000 in ?? () #1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18, evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888, tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287 #2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>, sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340 #3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470, file_offset=file_offset@entry=1136) at util/session.c:1522 #4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>, data_offset=<optimized out>, session=0x0) at util/session.c:1899 #5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953 #6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>) at builtin-buildid-list.c:83 #7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115 #8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0) at perf.c:296 #9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348 #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392 #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536 (gdb) Fix it by adding a stub event handler for namespace event. Committer testing: Further clarifying, plain using 'perf buildid-list' will not end up in a SEGFAULT when processing a perf.data file with namespace info: # perf record -a --namespaces sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ] # perf buildid-list \| wc -l 38 # perf buildid-list \| head -5 e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux 874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko 5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so # It is only when one asks for checking what of those entries actually had samples, i.e. when we use either -H or --with-hits, that we will process all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c neither explicitely set a perf_tool.namespaces() callback nor the default stub was set that we end up, when processing a PERF_RECORD_NAMESPACE record, causing a SEGFAULT: # perf buildid-list -H Segmentation fault (core dumped) ^C # Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: `f3b3614a28` ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info") Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-17 11:09:19 -03:00
Taeung Song	3f50f614d6	perf record: Fix documentation for a inexistent option '-l' 'perf record' had a '-l' option that meant "scale counter values" a very long time ago, but it currently belongs to 'perf stat' as '-c'. So remove it. I found this problem in the below case. $ perf record -e cycles -l sleep 3 Error: unknown switch `l Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1507907412-19813-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-17 09:05:36 -03:00
Jiri Olsa	29479bfe83	perf tools: Check wether the eBPF file exists in event parsing Adding the check wether the eBPF file exists, to consider it as eBPF input file. This way we can differentiate eBPF events from events that end up with same suffix as eBPF file. Before: $ perf stat -e 'cpu/uops_executed.core/' true bpf: builtin compilation failed: -95, try external compiler WARNING: unable to get correct kernel building directory. Hint: Set correct kbuild directory using 'kbuild-dir' option in [llvm] section of ~/.perfconfig or set it to "" to suppress kbuild detection. event syntax error: 'cpu/uops_executed.core/' \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost After: $ perf stat -e 'cpu/uops_executed.core/' true Performance counter stats for 'true': 181,533 cpu/uops_executed.core/:u 0.002795447 seconds time elapsed If user makes type in the eBPF file, we prioritize the event syntax and show following warning: $ perf stat -e 'krava.c//' true event syntax error: 'krava.c//' \___ Cannot find PMU `krava.c'. Missing kernel support? Reported-and-Tested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-13 16:45:04 -03:00
Jiri Olsa	d0e35234f6	perf hists: Add extra integrity checks to fmt_free() Make sure the struct perf_hpp_fmt is properly unhooked before we free it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-13 16:43:42 -03:00
Jiri Olsa	70b01dfd76	perf hists: Fix crash in perf_hpp__reset_output_field() Du Changbin reported crash [1] when calling perf_hpp__reset_output_field() after unregistering field via perf_hpp__column_unregister(). This ends up in calling following list_del* sequence on the same format: perf_hpp__column_unregister: list_del(&format->list); perf_hpp__reset_output_field: list_del_init(&fmt->list); where the later list_del_init might touch already freed formats. Fixing this by replacing list_del() with list_del_init() in perf_hpp__column_unregister(). [1] http://marc.info/?l=linux-kernel&m=149059595826019&w=2 Reported-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20171013083736.15037-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-13 16:43:33 -03:00
Mark Rutland	66ec11919a	perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMU Currently, perf record is broken on arm/arm64 systems when the PMU is specified explicitly as part of the event, e.g. $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true In such cases, perf record fails to open events unless perf_event_paranoid is set to -1, even if the PMU in question supports mode exclusion. Further, even when perf_event_paranoid is toggled, no samples are recorded. This is an unintended side effect of commit: `e3ba76deef` ("perf tools: Force uncore events to system wide monitoring) ... which assumes that if a PMU has an associated cpu_map, it is an uncore PMU, and forces events for such PMUs to be system-wide. This is not true for arm/arm64 systems, which can have heterogeneous CPUs. To account for this, multiple CPU PMUs are exposed, each with a "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM PMUs do not have a "cpumask" file, and only have a "cpus" file. For the gory details as to why, see commit: `7e3fcffe95` ("perf pmu: Support alternative sysfs cpumask") Given all of this, we can instead identify uncore PMUs by explicitly checking for a "cpumask" file, and restore arm/arm64 PMU support back to a working state. This patch does so, adding a new perf_pmu::is_uncore field, and splitting the existing cpumask parsing so that it can be reused. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by Will Deacon <will.deacon@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.12+ <stable@vger.kernel.org> Fixes: `e3ba76deef` ("perf tools: Force uncore events to system wide monitoring) Link: http://lkml.kernel.org/r/1507315102-5942-1-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-09 15:48:46 -03:00
Mark Santaniello	e9516c0813	perf script: Add missing separator for "-F ip,brstack" (and brstackoff) Prior to commit `55b9b50811` ("perf script: Support -F brstack,dso and brstacksym,dso"), we were printing a space before the brstack data. It seems that this space was important. Without it, parsing is difficult. Very sorry for the mistake. Notice here how the "ip" and "brstack" run together: $ perf script -F ip,brstack \| head -n 1 22e18c40x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 After this diff, sanity is restored: $ perf script -F ip,brstack \| head -n 1 22e18c4 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 0x22e195d/0x22e1990/P/-/-/0 0x22e18e9/0x22e1943/P/-/-/0 0x22e1a69/0x22e18c0/P/-/-/0 0x22e19f7/0x22e1a20/P/-/-/0 0x22e1910/0x22e19ee/P/-/-/0 0x22e19e2/0x22e190b/P/-/-/0 0x22e19a1/0x22e19d0/P/-/-/0 Signed-off-by: Mark Santaniello <marksan@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: 4.13+ <stable@vger.kernel.org> Fixes: `55b9b50811` ("perf script: Support -F brstack,dso and brstacksym,dso") Link: http://lkml.kernel.org/r/20171006080722.3442046-1-marksan@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-06 09:48:32 -03:00
Ravi Bangoria	c1fbc0cf81	perf callchain: Compare dsos (as well) for CCKEY_FUNCTION Two functions from different binaries can have same start address. Thus, comparing only start address in match_chain() leads to inconsistent callchains. Fix this by adding a check for dsos as well. Ex, https://www.spinics.net/lists/linux-perf-users/msg04067.html Reported-by: Alexander Pozdneev <pozdneyev@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/20171005091234.5874-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-05 10:52:54 -03:00
Jiri Olsa	f6a9820d57	perf tests attr: Fix group stat tests We started to use group read whenever it's possible: `82bf311e15` perf stat: Use group read for event groups That breaks some of attr tests, this change adds the new possible read_format value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> LPU-Reference: 20170928160633.GA26973@krava Link: http://lkml.kernel.org/n/tip-1ko2zc4nph93d8lfwjyk9ivz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-03 09:41:45 -03:00
Kan Liang	0c6b499495	perf top: Add option to set the number of thread for event synthesize Using UINT_MAX to indicate the default thread#, which is the max number of online CPU. Committer testing: # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9 # cat /tmp/output ? ( ? ): ... [continued]: clone()) = 26651 (perf) 0.059 ( 0.010 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf) 0.116 ( 0.014 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf) 0.141 ( 0.009 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf) 0.160 ( 0.012 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf) 0.232 ( 0.013 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf) 0.393 ( 0.011 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf) 0.802 ( 0.012 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf) 1.411 ( 0.022 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf) 246.422 ( 0.042 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf) # Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-03 09:27:54 -03:00
Kan Liang	340b47f510	perf top: Implement multithreading for perf_event__synthesize_threads The proc files which is sorted with alphabetical order are evenly assigned to several synthesize threads to be processed in parallel. For 'perf top', the threads number hard code to online CPU number. The following patch will introduce an option to set it. For other perf tools, the thread number is 1. Because the process function is not ready for multithreading, e.g. process_synthesized_event. This patch series only support event synthesize multithreading for 'perf top'. For other tools, it can be done separately later. With multithread applied, the total processing time can get up to 1.56x speedup on Knights Mill for 'perf top'. For specific single event processing, the processing time could increase because of the lock contention. So proc_map_timeout may need to be increased. Otherwise some proc maps will be truncated. Based on my test, increasing the proc_map_timeout has small impact on the total processing time. The total processing time still get 1.49x speedup on Knights Mill after increasing the proc_map_timeout. The patch itself doesn't increase the proc_map_timeout. Doesn't need to implement multithreading for per task monitoring, perf_event__synthesize_thread_map. It doesn't have performance issue. Committer testing: # getconf _NPROCESSORS_ONLN 4 # perf trace --no-inherit -e clone -o /tmp/output perf top # tail -4 /tmp/bla 0.124 ( 0.041 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf) 0.246 ( 0.023 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf) 0.286 ( 0.019 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf) 246.540 ( 0.047 ms): clone(flags: VM\|FS\|FILES\|SIGHAND\|THREAD\|SYSVSEM\|SETTLS\|PARENT_SETTID\|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf) # Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-03 09:27:46 -03:00
Kan Liang	f988e71bc6	perf tools: Lock to protect comm_str rb tree Add comm_str_lock to protect comm_str rb tree. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-03 09:27:36 -03:00
Kan Liang	b32ee9e522	perf tools: Lock to protect namespaces and comm list Add two locks to protect namespaces_list and comm_list. The lock is only needed for multithreaded code, so using mutex wrappers provided by perf tool. Not all the comm_list/namespaces_list accessing are protected, e.g. thread__exec_comm. Because the multithread code for perf top event synthesizing does not touch them. They don't need a lock. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-03 09:27:27 -03:00
Thomas Richter	22905582f6	perf test attr: Fix ignored test case result Command perf test -v 16 (Setup struct perf_event_attr test) always reports success even if the test case fails. It works correctly if you also specify -F (for don't fork). root@s35lp76 perf]# ./perf test -v 16 15: Setup struct perf_event_attr : --- start --- running './tests/attr/test-record-no-delay' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data (1 samples) ] expected task=0, got 1 expected precise_ip=0, got 3 expected wakeup_events=1, got 0 FAILED './tests/attr/test-record-no-delay' - match failure test child finished with 0 ---- end ---- Setup struct perf_event_attr: Ok The reason for the wrong error reporting is the return value of the system() library call. It is called in run_dir() file tests/attr.c and returns the exit status, in above case 0xff00. This value is given as parameter to the exit() function which can only handle values 0-0xff. The child process terminates with exit value of 0 and the parent does not detect any error. This patch corrects the error reporting and prints the correct test result. Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> LPU-Reference: 20170913081209.39570-2-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-rdube6rfcjsr1nzue72c7lqn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-02 14:00:57 -03:00
Thomas Richter	3440fe2790	perf test attr: Fix python error on empty result Commit `d78ada4a76` ("perf tests attr: Do not store failed events") does not create an event file in the /tmp directory when the perf_open_event() system call failed. This can lead to a situation where not /tmp/event-xx-yy-zz result file exists at all (for example on a s390x virtual machine environment) where no CPUMF hardware is available. The following command then fails with a python call back chain instead of printing failure: [root@s8360046 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \ -p ./perf -v -ttest-stat-basic running './tests/attr//test-stat-basic' Traceback (most recent call last): File "./tests/attr.py", line 379, in <module> main() File "./tests/attr.py", line 370, in main run_tests(options) File "./tests/attr.py", line 311, in run_tests Test(f, options).run() File "./tests/attr.py", line 300, in run self.compare(self.expect, self.result) File "./tests/attr.py", line 248, in compare exp_event.diff(res_event) UnboundLocalError: local variable 'res_event' referenced before assignment [root@s8360046 perf]# This patch catches this pitfall and prints an error message instead: [root@s8360047 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \ -p ./perf -vvv -ttest-stat-basic running './tests/attr//test-stat-basic' loading expected events Event event:base-stat fd = 1 group_fd = -1 flags = 0\|8 [....] sample_regs_user = 0 sample_stack_user = 0 'PERF_TEST_ATTR=/tmp/tmpJbMQMP ./perf stat -o /tmp/tmpJbMQMP/perf.data -e cycles kill >/dev/null 2>&1' ret '1', expected '1' loading result events compare matching [event:base-stat] match: [event:base-stat] matches [] res_event is empty FAILED './tests/attr//test-stat-basic' - match failure [root@s8360047 perf]# Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> LPU-Reference: 20170913081209.39570-1-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-04d63nn7svfgxdhi60gq2mlm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-02 14:00:20 -03:00
Jiri Olsa	10836d9f9a	perf tests attr: Fix task term values The perf_event_attr::task is 1 by default for first (tracking) event in the session. Setting task=1 as default and adding task=0 for cases that need it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20170703145030.12903-16-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-02 13:59:18 -03:00
Arnaldo Carvalho de Melo	c976a7d6db	Merge remote-tracking branch 'tip/perf/urgent' into perf/core, to pick up fixes Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-10-02 13:58:12 -03:00
Ingo Molnar	1addcd55bc	perf/urgent fixes: - Fix syscalltbl build failure (Akemi Yagi) - Fix attr.exclude_kernel setting for default cycles:p, this time for !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo) - Sync kernel ABI headers with tooling headers (Ingo Molnar) - Remove misleading debug messages with --call-graph option (Mengting Zhang) - Revert vmlinux symbol resolution patches for s390x (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlnNSQgACgkQ1lAW81NS qkAijg//THaO1ErOm7Ha/Xyh6gs2h4OfAQ8t+FmvEjQKrd6zIjG3/WKzfzoD7AY2 Sy8VfiCM3mno/VnrH/9Ty/6jtIn06aV4Ljm3UgBYQJXHSHUmiVaK/M+ddzZahH2j 3/MBs0vK1kOjzv4s9LWs5a9nwCJrCbsdpZs2HmdoX90/NhEq41T6VsFK5zA+bgqr 4Wjm3livbxCihpAhEhB31IED1YLn8Yhoas/GxC/o4bIUFtnSe8sWS8pZi4Mu9cwI /OUo6iOCR7DDsLW5GSJkzvJnUSGEEDFX/brgibZjuDL9rLSm5tG1gmiIue4VUU12 OLUltRCOP3C3y3rZLeB0W4S4fB2R1vV56/dOIZcXThXR4etpS+cMTv+lmxFdvFkx cxyXM2KJDr313q43b49zYZMjKvsRbt4zEjJgis8OIQUBKHE9tu7MQ9nkAOZE1F1c VE49F2Q4u+LmoN5KNf6d44h12FO3o+8XBr9B4yOTghZmk3rp6fWh8I++UxfuICXO xHD6nN1oDmziB4T3+SfbGjbScTwyYScn4RAWS4bVUkMJZ57Y44cdikr12f/IPKO1 MS20DWwkXaJrB2PEJxu7F7T+6bUJLMGLJH3CRnFBkxIHV8y/oEc+gLYFj1h4k/59 3dAmXpfhcQ+B6MIqHoWsVEXbpH5iBaJR6Jywp57xyrcY0TATRuk= =Tg+2 -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo-4.14-20170928' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix syscalltbl build failure (Akemi Yagi) - Fix attr.exclude_kernel setting for default cycles:p, this time for !root with kernel.perf_event_paranoid = -1 (Arnaldo Carvalho de Melo) - Sync kernel ABI headers with tooling headers (Ingo Molnar) - Remove misleading debug messages with --call-graph option (Mengting Zhang) - Revert vmlinux symbol resolution patches for s390x (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-29 19:31:46 +02:00
Thomas Richter	5357413f5c	perf test: Fix vmlinux failure on s390x part 2 On s390x perf test 1 failed. It turned out that commit `cf6383f73c` ("perf report: Fix kernel symbol adjustment for s390x") was incorrect. The previous implementation in dso__load_sym() is also suitable for s390x. Therefore this patch undoes commit `cf6383f73c` Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Fixes: `cf6383f73c` ("perf report: Fix kernel symbol adjustment for s390x") LPU-Reference: 20170915071404.58398-2-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-v101o8k25vuja2ogosgf15yy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-28 13:01:42 -03:00
Thomas Richter	b28503a3fe	perf test: Fix vmlinux failure on s390x On s390x perf test 1 failed. It turned out that commit `4a084ecfc8` ("perf report: Fix module symbol adjustment for s390x") was incorrect. The previous implementation in dso__load_sym() is also suitable for s390x. Therefore this patch undoes commit `4a084ecfc8`. Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com> Fixes: `4a084ecfc8` ("perf report: Fix module symbol adjustment for s390x") LPU-Reference: 20170915071404.58398-1-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-5ani7ly57zji7s0hmzkx416l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-28 13:01:42 -03:00
Akemi Yagi	090657c9fb	perf tools: Fix syscalltbl build failure The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error in tools/perf: util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused' mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory Fix it by placing/moving: #include <linux/compiler.h> outside of #ifdef HAVE_SYSCALL_TABLE block. Signed-off-by: Akemi Yagi <toracat@elrepo.org> Cc: Alan Bartlett <ajb@elrepo.org> Link: http://lkml.kernel.org/r/oq41r8$1v9$1@blaine.gmane.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-25 12:21:05 -03:00
Mengting Zhang	9789e7e93f	perf report: Fix debug messages with --call-graph option With --call-graph option, perf report can display call chains using type, min percent threshold, optional print limit and order. And the default call-graph parameter is 'graph,0.5,caller,function,percent'. Before this patch, 'perf report --call-graph' shows incorrect debug messages as below: # perf report --call-graph Invalid callchain mode: 0.5 Invalid callchain order: 0.5 Invalid callchain sort key: 0.5 Invalid callchain config key: 0.5 Invalid callchain mode: caller Invalid callchain mode: function Invalid callchain order: function Invalid callchain mode: percent Invalid callchain order: percent Invalid callchain sort key: percent That is because in function __parse_callchain_report_opt(),each field of the call-graph parameter is passed to parse_callchain_{mode,order, sort_key,value} in turn until it meets the matching value. For example, the order field "caller" is passed to parse_callchain_mode() firstly and obviously it doesn't match any mode field. Therefore parse_callchain_mode() will shows the debug message "Invalid callchain mode: caller", which could confuse users. The patch fixes this issue by moving the warning out of the function parse_callchain_{mode,order,sort_key,value}. Signed-off-by: Mengting Zhang <zhangmengting@huawei.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Li Bin <huawei.libin@huawei.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-25 12:20:12 -03:00
Arnaldo Carvalho de Melo	f1e52f14a6	perf evsel: Fix attr.exclude_kernel setting for default cycles:p Yet another fix for probing the max attr.precise_ip setting: it is not enough settting attr.exclude_kernel for !root users, as they _can_ profile the kernel if the kernel.perf_event_paranoid sysctl is set to -1, so check that as well. Testing it: As non root: $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = 2 $ perf record sleep 1 $ perf evlist -v cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ... Now as non-root, but with kernel.perf_event_paranoid set set to the most permissive value, -1: $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = -1 $ perf record sleep 1 $ perf evlist -v cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ... $ I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel, non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel. In both cases, use the highest available precision: attr.precise_ip = 3. Reported-and-Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: `d37a369790` ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p") Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-25 10:39:45 -03:00
Arnaldo Carvalho de Melo	89975bd335	perf tools: Get all of tools/{arch,include}/ in the MANIFEST Now that I'm switching the container builds from using a local volume pointing to the kernel repository with the perf sources, instead getting a detached tarball to be able to use a container cluster, some places broke because I forgot to put some of the required files in tools/perf/MANIFEST, namely some bitsperlong.h files. So, to fix it do the same as for tools/build/ and pack the whole tools/arch/ directory. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-25 10:39:43 -03:00
Ingo Molnar	aa469aafdd	perf/core improvements and fixes: - Support direct --user-regs arguments in 'perf record', previously the only way to sample PERF_SAMPLE_REGS_USER was implicitly selecting it when recording callchains (Andi Kleen) - Support showing sampled user regs in 'perf script' (Andi Kleen) - Introduce the concept of weak groups in 'perf stat': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. E.g: (Andi Kleen) % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1 125,366,055 branches (80.02%) 9,208,402 branch-misses # 7.35% of all branches (80.01%) 24,560,249 l1d.replacement (80.00%) 43,174,971 l2_lines_in.all (80.05%) 31,891,457 l2_rqsts.all_code_rd (79.92%) - Support metrics in 'stat' and 'list'. A metric is a formula that uses multiple events to compute a higher level result (e.g. IPC). (Andi Kleen) - Add Intel processors vendor event metrics JSON files (Andi Kleen) - Add 'pid' and 'tid' options to 'perf sched timehist' (David Ahern) - Generate 'behavior' string table from kernel headers, helps getting new parameters when synchronizing kernel headers, like MADV_WIPEONFORK and MADV_KEEPONFORK, that are now beautied (Arnaldo Carvalho de Melo) - Improve TUI progress bar by showing how many bytes from a total were processed (Jiri Olsa) - Use scandir() to replace readdir(), prep work to have the synthesizing of PERF_RECORD_ entries for existing threads be multithreaded, making 'perf top' bearable on high core count systems such as Intel's Knights Landing/Mill (Kan Liang) - Allow creating a ~/.perfconfig file when setting a variable to its default value, previously it would bail out and not write such a file (Taeung Song) - Introduce wrapper for allowing purely single threaded apps to avoid the costs of locking (Arnaldo Carvalho de Melo) - Introduce hashtable to reduce the cost of thread lookup - Fix build C++ build wrt poison.h using void pointer arithmetic, affects only the embedded clang/llvm case, that is disabled by default (Arnaldo Carvalho de Melo) - Fix leaking rec_argv in error cases (Martin Kepplinger) - Remove Intel CQM perf test, that infrastructure was nuked (Xiaochen Shen) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlnFH1MACgkQ1lAW81NS qkCPxQ/+LsWTQVdPoLjodivSxELn19zAkf8z6j1frLiFniHq2WxY+galhoLHAGlh F4j0g3P61+7Pspa0RlC1kSEqrk0yHmzCMbodZS+8I4K8qfCA3D1lXGUnJjmMBVkj kYIMxcvotvN0r5Bwzv4Bd8niZHKp4APQyQN6vXZZY3zGwJSNbV88L4qgQhTBvyLV hJ5PhfUkxVpSlJ2Muf0jbp97DhIH2owUFTO51ZV39t40eOeTmp/fJxq2tppbYrKm puWmfMM2KLm01gTcHTw9s5IrHqWq7FAB8lMIXxJN/HPQwR5cO8KJ9Ddo+BOaRbwY OelU6W4VgTX/Wx3oSBd6SSpicNuTyipASQKOSa711ck6EKhd5QnjvrHF4A781v20 zpLYMbk04vdOXRdjOAmnV73INAgC7+3C1L6gfIgT9uAfUpJQRQJu0wfTA4734Rh8 DcrIc6SkQX8s6E5lOW2mzla4yyQxlzm42tFGr1N0ASzgHu623IKkXP/UdRxNo2ep vFNH4DPqZr5hbQkNL2md7u8KL2i/4UQhG+1Uf0jfNYg6O5HcJToLZKc462G4LmVP ASOTyUAGyDFYseAUTLtcM+2W+iTCjFNN/LnHnsOXF8ESpyHJCEXcAOy8v04RMXrP 4z3xP8OrNubBL/WkTuMGRmanFe8ZrASFTddVH/XZXsSDoC13KCk= =oNFx -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-4.15-20170922' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: - Support direct --user-regs arguments in 'perf record', previously the only way to sample PERF_SAMPLE_REGS_USER was implicitly selecting it when recording callchains (Andi Kleen) - Support showing sampled user regs in 'perf script' (Andi Kleen) - Introduce the concept of weak groups in 'perf stat': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. E.g: (Andi Kleen) % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1 125,366,055 branches (80.02%) 9,208,402 branch-misses # 7.35% of all branches (80.01%) 24,560,249 l1d.replacement (80.00%) 43,174,971 l2_lines_in.all (80.05%) 31,891,457 l2_rqsts.all_code_rd (79.92%) - Support metrics in 'stat' and 'list'. A metric is a formula that uses multiple events to compute a higher level result (e.g. IPC). (Andi Kleen) - Add Intel processors vendor event metrics JSON files (Andi Kleen) - Add 'pid' and 'tid' options to 'perf sched timehist' (David Ahern) - Generate 'behavior' string table from kernel headers, helps getting new parameters when synchronizing kernel headers, like MADV_WIPEONFORK and MADV_KEEPONFORK, that are now beautied (Arnaldo Carvalho de Melo) - Improve TUI progress bar by showing how many bytes from a total were processed (Jiri Olsa) - Use scandir() to replace readdir(), prep work to have the synthesizing of PERF_RECORD_ entries for existing threads be multithreaded, making 'perf top' bearable on high core count systems such as Intel's Knights Landing/Mill (Kan Liang) - Allow creating a ~/.perfconfig file when setting a variable to its default value, previously it would bail out and not write such a file (Taeung Song) - Introduce wrapper for allowing purely single threaded apps to avoid the costs of locking (Arnaldo Carvalho de Melo) - Introduce hashtable to reduce the cost of thread lookup - Fix build C++ build wrt poison.h using void pointer arithmetic, affects only the embedded clang/llvm case, that is disabled by default (Arnaldo Carvalho de Melo) - Fix leaking rec_argv in error cases (Martin Kepplinger) - Remove Intel CQM perf test, that infrastructure was nuked (Xiaochen Shen) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-22 18:05:48 +02:00
Arnaldo Carvalho de Melo	0a7c74eae3	perf tools: Provide mutex wrappers for pthreads rwlocks Andi reported a performance drop in single threaded perf tools such as 'perf script' due to the growing number of locks being put in place to allow for multithreaded tools, so wrap the POSIX threads rwlock routines with the names used for such kinds of locks in the Linux kernel and then allow for tools to ask for those locks to be used or not. I.e. a tool may have a multithreaded phase and then switch to single threaded, like the upcoming patches for the synthesizing of PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to single threaded mode in 'perf top'. The init routines will not be conditional, this way starting as single threaded to then move to multi threaded mode should be possible. Reported-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-21 13:28:06 -03:00
Arnaldo Carvalho de Melo	0e1eed8088	perf tools: Get all of tools/{arch,include}/ in the MANIFEST Now that I'm switching the container builds from using a local volume pointing to the kernel repository with the perf sources, instead getting a detached tarball to be able to use a container cluster, some places broke because I forgot to put some of the required files in tools/perf/MANIFEST, namely some bitsperlong.h files. So, to fix it do the same as for tools/build/ and pack the whole tools/arch/ directory. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-21 13:13:00 -03:00
Arnaldo Carvalho de Melo	5a54c2f5e1	perf trace beauty madvise: Generate 'behavior' string table from kernel headers This is one more case where the way that syscall parameter values are defined in kernel headers are easy to parse using a shell script that will then generate the string table that gets used by the madvise 'behaviour' argument beautifier. This way as soon as the header syncronization mechanism in perf's build system detects a change in a copy of a kernel ABI header and that file is syncronized, we get 'perf trace' updated automagically. So, when we syncronize this: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' We'll get these: #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only / #define MADV_KEEPONFORK 19 / Undo MADV_WIPEONFORK */ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-dolb0ghds4ui7wc1npgkchvc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-21 13:12:59 -03:00
Xiaochen Shen	5c9295bfe6	perf tests: Remove Intel CQM perf test Intel CQM perf test is obsolete for perf PMU code has been removed in commit `c39a0e2c88` ("x86/perf/cqm: Wipe out perf based cqm"). Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Pei P Jia <pei.p.jia@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1505797057-16300-1-git-send-email-xiaochen.shen@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-21 13:12:58 -03:00
Andi Kleen	411bc316f3	perf stat: Fix adding multiple event groups The -M metric group parser threw away the events of earlier groups when multiple groups were specified. Fix this here by not overwriting the string incorrectly. Now this works correctly: % perf stat -M Summary,SMT --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization SMT_2T_Utilization Kernel_Utilization CoreIPC CORE_CLKS 900907376.0 2.7 2398954144.0 0.1 0.0 0.2 0.2 0.1 0.4 2080822855.5 while previously it would only show the SMT metrics. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170914205735.18431-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-21 13:12:58 -03:00
Martin Kepplinger	c896f85a7c	perf tools: Fix leaking rec_argv in error cases Let's free the allocated rec_argv in case we return early, in order to avoid leaking memory. This adds free() at a few very similar places across the tree where it was missing. Signed-off-by: Martin Kepplinger <martink@posteo.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Martin kepplinger <martink@posteo.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-18 09:40:21 -03:00
Andi Kleen	333b566559	perf pmu: Improve error messages for missing PMUs When a PMU is missing print a better error message mentioning the missing PMU. % mkdir empty % mount --bind empty /sys/devices/msr % perf stat -M Summary true event syntax error: '{inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_comp_ops_exe.sse_scalar..' \___ Cannot find PMU `msr'. Missing kernel support? It still cannot find the right column for aliases, but it's already a vast improvement. v2: Check asprintf Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170913215006.32222-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-18 09:40:20 -03:00
Arnaldo Carvalho de Melo	75e45e4320	perf machine: Optimize a bit the machine__findnew_thread() methods In some cases we already have calculated the hash bucket, so reuse it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-800zehjsyy03er4s4jf0e99v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-18 09:40:19 -03:00
Kan Liang	91e467bc56	perf machine: Use hashtable for machine threads To process any events, it needs to find the thread in the machine first. The machine maintains a rb tree to store all threads. The rb tree is protected by a rw lock. It is not a problem for current perf which serially processing events. However, it will have scalability performance issue to process events in parallel, especially on a heavy load system which have many threads. Introduce a hashtable to divide the big rb tree into many samll rb tree for threads. The index is thread id % hashtable size. It can reduce the lock contention. Committer notes: Renamed some variables and function names to reduce semantic confusion: 'struct threads' pointers: thread -> threads threads hastable index: tid -> hash_bucket struct threads *machine__thread() -> machine__threads() Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang) Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-18 09:40:19 -03:00
Michal Hocko	0ee931c4e3	mm: treewide: remove GFP_TEMPORARY allocation flag GFP_TEMPORARY was introduced by commit `e12ba74d8f` ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL \| __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-13 18:53:16 -07:00
Andi Kleen	56de5b63ff	perf vendor events: Add JSON metrics for Skylake server Add JSON metrics for Skylake server Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:18 -03:00
Andi Kleen	69e932139d	perf vendor events: Add JSON metrics for Broadwell DE Add JSON metrics for Broadwell DE Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:18 -03:00
Andi Kleen	6d75abd3e8	perf vendor events: Add JSON metrics for Broadwell Server Add JSON metrics for Broadwell Server. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:18 -03:00
Andi Kleen	5e49f7321b	perf vendor events: Add JSON metrics for Haswell EP Add JSON metrics for Haswell EP. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:18 -03:00
Andi Kleen	43fd36a19d	perf vendor events: Add JSON metrics for Ivy Town Add JSON metrics for Ivy Town. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:18 -03:00
Andi Kleen	2099f51d18	perf vendor events: Add JSON metrics for Haswell Add JSON metrics for Haswell. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:17 -03:00
Andi Kleen	8853d2de0e	perf vendor events: Add JSON metrics for Ivy Bridge Add JSON metrics for Ivy Bridge. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:17 -03:00
Andi Kleen	28bc0ddb3a	perf vendor events: Add JSON metrics for Sandy Bridge EP Add JSON metrics for Sandy Bridge EP. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:17 -03:00
Andi Kleen	97dca6715d	perf vendor events: Add JSON metrics for Sandy Bridge Add JSON metrics for Sandy Bridge. Committer testing: # grep "model name" /proc/cpuinfo \| head -1 model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Pipeline Ports_Utilization Power SMT Summary TopDownL1 # perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 0.8 0.0 98.1 0.0 0.0 0.0 23.4 0.0 1.001153658 seconds time elapsed # perf stat -v -M Power --metric-only -a sleep 1 Using CPUID GenuineIntel-6-2A metric expr cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc for Turbo_Utilization found event cpu_clk_unhalted.thread found event cpu_clk_unhalted.ref_tsc metric expr (cstate_core@c3\-residency@ / msr@tsc@) * 100 for C3_Core_Residency found event cstate_core/c3-residency/ found event msr/tsc/ metric expr (cstate_core@c6\-residency@ / msr@tsc@) * 100 for C6_Core_Residency found event cstate_core/c6-residency/ found event msr/tsc/ metric expr (cstate_core@c7\-residency@ / msr@tsc@) * 100 for C7_Core_Residency found event cstate_core/c7-residency/ found event msr/tsc/ metric expr (cstate_pkg@c2\-residency@ / msr@tsc@) * 100 for C2_Pkg_Residency found event cstate_pkg/c2-residency/ found event msr/tsc/ metric expr (cstate_pkg@c3\-residency@ / msr@tsc@) * 100 for C3_Pkg_Residency found event cstate_pkg/c3-residency/ found event msr/tsc/ metric expr (cstate_pkg@c6\-residency@ / msr@tsc@) * 100 for C6_Pkg_Residency found event cstate_pkg/c6-residency/ found event msr/tsc/ metric expr (cstate_pkg@c7\-residency@ / msr@tsc@) * 100 for C7_Pkg_Residency found event cstate_pkg/c7-residency/ found event msr/tsc/ adding {cpu_clk_unhalted.thread,cpu_clk_unhalted.ref_tsc}:W,{cstate_core/c3-residency/,msr/tsc/}:W,{cstate_core/c6-residency/,msr/tsc/}:W,{cstate_core/c7-residency/,msr/tsc/}:W,{cstate_pkg/c2-residency/,msr/tsc/}:W,{cstate_pkg/c3-residency/,msr/tsc/}:W,{cstate_pkg/c6-residency/,msr/tsc/}:W,{cstate_pkg/c7-residency/,msr/tsc/}:W cpu_clk_unhalted.thread -> cpu/event=0x3c/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ Weak group for cstate_pkg/c2-residency//2 failed Weak group for cstate_pkg/c3-residency//2 failed Weak group for cstate_pkg/c6-residency//2 failed Weak group for cstate_pkg/c7-residency//2 failed cpu_clk_unhalted.thread: 5564185 4002833569 4002833569 cpu_clk_unhalted.ref_tsc: 7325424 4002833569 4002833569 cstate_core/c3-residency/: 68293 4003027101 4003027101 msr/tsc/: 12451294472 4003027101 4003027101 cstate_core/c6-residency/: 12238830163 4003260984 4003260984 msr/tsc/: 12452017806 4003260984 4003260984 cstate_core/c7-residency/: 0 4003489648 4003489648 msr/tsc/: 12452725162 4003489648 4003489648 cstate_pkg/c2-residency/: 1830054 1000913138 1000913138 msr/tsc/: 12453441079 4003717513 4003717513 cstate_pkg/c3-residency/: 0 1000973570 1000973570 msr/tsc/: 12454177865 4003954758 4003954758 cstate_pkg/c6-residency/: 2940448859 1001032370 1001032370 msr/tsc/: 12454833890 4004166118 4004166118 cstate_pkg/c7-residency/: 0 1001049818 1001049818 msr/tsc/: 12454919470 4004194204 4004194204 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 0.8 0.0 98.3 0.0 0.0 0.0 23.6 0.0 1.001126519 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:17 -03:00
Andi Kleen	2e006a2412	perf vendor events: Add JSON metrics for Skylake Add JSON metrics for Skylake. Committer testing: # grep "model name" /proc/cpuinfo \| head -1 model name : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz # uname -a Linux seventh 4.12.0-rc6+ #1 SMP Fri Jun 30 16:40:55 -03 2017 x86_64 x86_64 x86_64 GNU/Linux # perf stat --metric-only -M Summary -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 34021097.0 0.0 119424171.0 0.0 0.0 0.0 0.0 1.001001793 seconds time elapsed # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Memory_BW Memory_Bound Memory_Lat Pipeline Ports_Utilization Power SMT Summary TLB TopDownL1 Unknown_Branches # perf stat --metric-only -M Ports_Utilization -a sleep 1 Performance counter stats for 'system wide': ILP 1475828.0 1.000688547 seconds time elapsed # perf stat -v --metric-only -M Ports_Utilization -a sleep 1 Using CPUID GenuineIntel-6-9E metric expr uops_executed.thread / ( uops_executed.core_cycles_ge_1 / 2) if #smt_on else uops_executed.core_cycles_ge_1 for ILP found event uops_executed.thread found event uops_executed.core_cycles_ge_1 adding {uops_executed.thread,uops_executed.core_cycles_ge_1}:W uops_executed.thread -> cpu/umask=0x1,period=2000003,event=0xb1/ uops_executed.core_cycles_ge_1 -> cpu/umask=0x2,period=2000003,cmask=1,event=0xb1/ uops_executed.thread: 8115271 4002547654 4002547654 uops_executed.core_cycles_ge_1: 3282969 4002547654 4002547654 Performance counter stats for 'system wide': ILP 3282969.0 1.000719870 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:17 -03:00
Andi Kleen	cf97962308	perf vendor events: Add JSON metrics for Broadwell Add JSON metrics for Broadwell. Commiter testing: # uname -a Linux jouet 4.13.0-rc7+ #3 SMP Sat Sep 2 09:04:44 -03 2017 x86_64 x86_64 x86_64 GNU/Linux # grep "model name" /proc/cpuinfo \| head -1 model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz # perf list metricgroup List of pre-defined events (to be used in -e): Metric Groups: DSB FLOPS Frontend Frontend_Bandwidth Memory_BW Memory_Bound Memory_Lat Pipeline Ports_Utilization Power SMT Summary TLB TopDownL1 Unknown_Branches # perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.003502904 seconds time elapsed # # perf stat -M Memory_BW --metric-only -a sleep 1 Performance counter stats for 'system wide': MLP 1.7 1.001364525 seconds time elapsed # # perf stat -M TLB --metric-only -a sleep 1 Performance counter stats for 'system wide': Page_Walks_Utilization 0.1 1.005962198 seconds time elapsed # # perf stat -M Summary --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 7281856697.0 0.0 11150898087.0 1.0 0.0 1.0 0.7 1.012134025 seconds time elapsed # Running in verbose mode shows which counters and expressions are being used: # perf stat -v -M Summary --metric-only -a sleep 1 Using CPUID GenuineIntel-6-3D metric expr 1 / inst_retired.any / cycles for CPI found event inst_retired.any found event cycles metric expr cpu_clk_unhalted.thread for CLKS found event cpu_clk_unhalted.thread metric expr inst_retired.any for Instructions found event inst_retired.any metric expr cpu_clk_unhalted.ref_tsc / msr@tsc@ for CPU_Utilization found event cpu_clk_unhalted.ref_tsc found event msr/tsc/ metric expr ( 1( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2 fp_arith_inst_retired.128b_packed_double + 4( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8 fp_arith_inst_retired.256b_packed_single ) / 1000000000 / duration_time for GFLOPs found event fp_arith_inst_retired.scalar_single found event fp_arith_inst_retired.scalar_double found event fp_arith_inst_retired.128b_packed_double found event fp_arith_inst_retired.128b_packed_single found event fp_arith_inst_retired.256b_packed_double found event fp_arith_inst_retired.256b_packed_single found event duration_time metric expr 1 - cpu_clk_thread_unhalted.one_thread_active / ( cpu_clk_thread_unhalted.ref_xclk_any / 2 ) if #smt_on else 0 for SMT_2T_Utilization found event cpu_clk_thread_unhalted.one_thread_active found event cpu_clk_thread_unhalted.ref_xclk_any metric expr cpu_clk_unhalted.ref_tsc:u / cpu_clk_unhalted.ref_tsc for Kernel_Utilization found event cpu_clk_unhalted.ref_tsc:u found event cpu_clk_unhalted.ref_tsc adding {inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,duration_time}:W,{cpu_clk_thread_unhalted.one_thread_active,cpu_clk_thread_unhalted.ref_xclk_any}:W,{cpu_clk_unhalted.ref_tsc:u,cpu_clk_unhalted.ref_tsc}:W inst_retired.any -> cpu/event=0xc0/ cpu_clk_unhalted.thread -> cpu/event=0x3c/ inst_retired.any -> cpu/event=0xc0/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ fp_arith_inst_retired.scalar_single -> cpu/umask=0x2,period=2000003,event=0xc7/ fp_arith_inst_retired.scalar_double -> cpu/umask=0x1,period=2000003,event=0xc7/ fp_arith_inst_retired.128b_packed_double -> cpu/umask=0x4,period=2000003,event=0xc7/ fp_arith_inst_retired.128b_packed_single -> cpu/umask=0x8,period=2000003,event=0xc7/ fp_arith_inst_retired.256b_packed_double -> cpu/umask=0x10,period=2000003,event=0xc7/ fp_arith_inst_retired.256b_packed_single -> cpu/umask=0x20,period=2000003,event=0xc7/ cpu_clk_thread_unhalted.one_thread_active -> cpu/umask=0x2,period=2000003,event=0x3c/ cpu_clk_thread_unhalted.ref_xclk_any -> cpu/umask=0x1,any=1,period=2000003,event=0x3c/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/ Weak group for fp_arith_inst_retired.scalar_single/7 failed Weak group for cpu_clk_unhalted.ref_tsc:u/2 failed inst_retired.any: 8704146437 4026374016 619883741 cycles: 11180800018 4026374016 619883741 cpu_clk_unhalted.thread: 11140030295 4026323772 931621933 inst_retired.any: 8643115117 4026260510 1243595906 cpu_clk_unhalted.ref_tsc: 10201638510 4026184297 1247351077 msr/tsc/: 10378022785 4026184297 1247351077 fp_arith_inst_retired.scalar_single: 134697 4026102728 1559210545 fp_arith_inst_retired.scalar_double: 274339 4026007348 1870014984 fp_arith_inst_retired.128b_packed_double: 1639 4025886054 1866736918 fp_arith_inst_retired.128b_packed_single: 0 4025776614 2175106569 fp_arith_inst_retired.256b_packed_double: 0 4025681734 1235551129 fp_arith_inst_retired.256b_packed_single: 0 4025582962 1232398454 duration_time: 0 4025552913 4025552913 cpu_clk_thread_unhalted.one_thread_active: 10505 4025474649 923893076 cpu_clk_thread_unhalted.ref_xclk_any: 394992110 4025474649 923893076 cpu_clk_unhalted.ref_tsc:u: 5341421014 4025360315 1231634198 cpu_clk_unhalted.ref_tsc: 10258278508 4025252611 307909362 Performance counter stats for 'system wide': Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 8704146437.0 0.0 11140030295.0 1.0 0.0 1.0 0.5 1.006783654 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:16 -03:00
Andi Kleen	35c1980eb3	perf stat: Fall weak group back even for EBADF It's not possible to run a package event and a per cpu event in the same group. This is used by some of the power metrics. They work correctly when not using a group. Normally weak groups should handle that, but in this case EBADF is returned instead of the normal EINVAL. $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1 Using CPUID GenuineIntel-6-3E perf_event_open({type=0x17 /* PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 / PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 / PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 / PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument) perf_event_open({type=0x17 / PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3 perf_event_open({type=0x7 / PERF_TYPE_??? /, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4 perf_event_open({type=0x7 / PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor) and perf errors out. Make weak groups trigger a fall back for EBADF too. Then this case works correctly: $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1 Using CPUID GenuineIntel-6-3E Weak group for cstate_pkg/c2-residency//2 failed cstate_pkg/c2-residency/: 476709882 1000598460 1000598460 msr/tsc/: 39625837911 12007369110 12007369110 Performance counter stats for 'system wide': 476,709,882 cstate_pkg/c2-residency/ 39,625,837,911 msr/tsc/ 1.000697588 seconds time elapsed This fixes perf stat -M Power ... $ perf stat -M Power --metric-only -a sleep 1 Performance counter stats for 'system wide': Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.0 0.7 30.0 0.0 0.9 0.1 0.4 0.0 1.001240740 seconds time elapsed Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:16 -03:00
Arnaldo Carvalho de Melo	c23c2a0f23	perf tools: Make copyfile_offset() static There are no usage outside util.c and this is the only remaining reason for fcntl.h to be included in util.h, to get the loff_t definition in Alpine Linux, so make it static. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2dzlsao7k6ihozs5karw6kpx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:16 -03:00
Taeung Song	55421b4fb7	perf config: Allow creating empty config set for config file autogeneration When there isn't a config file (e.g. ~/.perfconfig) or it has nothing, the config set wasn't created. If the config set does not exist, a config file can't be autogenerated. So allow creating a empty config set in the above case, then we can support the config file autogeneration. Before: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig cat: /root/.perfconfig: No such file or directory But I think it should work even if there isn't a config file. After: $ rm -f ~/.perfconfig $ perf config --user report.children=false $ cat ~/.perfconfig # this file is auto-generated. [report] children = false NOTE: As a result, if perf_config_set__init() fails, it looks as if the config set isn't freed. But it isn't a problem. Because the config set will be freed by perf_config_set__delete() at the end of cmd_config(). Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754336-9824-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:16 -03:00
Taeung Song	5c2615556d	perf config: Write a config file just once Currently set_config() can be repeatedly called for each input config on the below case: $ perf config kmem.default=slab report.children=false ... But it's a waste, so only once write a config file gathering all given config key=value pairs. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754331-9776-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:15 -03:00
Kan Liang	ecdad24d7a	perf tools: Use scandir() to replace readdir() In perf_event__synthesize_threads() perf goes through all proc files serially by readdir. scandir() does a snapshoot of /proc, which is multithreading friendly. It's possible that some threads which are added during event synthesize. But the number of lost threads should be small. They should not impact the final analysis. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:15 -03:00
Jiri Olsa	8233822f40	perf ui progress: Add size info into progress bar Adding the size values '[current/total]' into progress bar, to show more detailed progress of data reading. Adding new ui_progress__init_size function to specify we want to display the size. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:15 -03:00
Jiri Olsa	25cc4eb44b	perf ui progress: Add ui specific init function Adding ui specific init function allowing to setup the progress bar width based on current screen scales. Adding TUI init function to get more grained update of the progress bar. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:15 -03:00
Jiri Olsa	80f8735571	perf tools: Add python-clean target To be able to cleanup only python related binaries. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908084621.31595-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:15 -03:00
Andi Kleen	b1491ace8e	perf script: Support user regs Teach perf script to print user regs. % perf record --user-regs=ip,sp ... % perf script -F ip,sym,uregs ... ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 ffffffff9e00cc12 intel_pmu_handle_irq ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637 v2: Rebased on top of phys-addr patches Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170905184057.26135-1-andi@firstfloor.org [ Use PRIu64 for regs->abi in print_sample_uregs() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:14 -03:00
Andi Kleen	84c4174227	perf record: Support direct --user-regs arguments USER_REGS can currently only collected implicitely with call graph recording. Sometimes it is useful to see them separately, and filter them. Add a new --user-regs option to record that is similar to --intr-regs, but acts on user regs. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170905170029.19722-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:14 -03:00
Andi Kleen	b90f1333ef	perf stat: Update walltime_nsecs_stats in interval mode Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval. Compute it for each interval instead of only at the end. Pointed out by Jiri. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:14 -03:00
Andi Kleen	e864c5ca14	perf stat: Hide internal duration_time counter Some perf stat metrics use an internal "duration_time" metric. It is not correctly printed however. So hide it during output to avoid confusing users with 0 counts. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:14 -03:00
Andi Kleen	fd48aad9b0	perf stat: Support duration_time for metrics Some of the metrics formulas (like GFLOPs) need to know how long the measurement period is. Support an internal event called duration_time, which reports time in second. It maps to the dummy event, but is special cased for statistics to report the walltime duration. So far it is not printed, but only used internally for metrics. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:14 -03:00
Andi Kleen	4e1a096380	perf stat: Don't use ctx for saved values lookup We don't need to use ctx to look up events for saved values. The context is already part of the evsel pointer, which is the primary key. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-9-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:13 -03:00
Andi Kleen	71b0acce78	perf list: Add metric groups to perf list Add code to perf list to print metric groups, and metrics that don't have an event name. The metricgroup code collects the eventgroups and events into a rblist, and then prints them according to the configured filters. The metricgroups are printed by default, but can be limited by perf list metric or perf list metricgroup % perf list metricgroup .. Metric Groups: DSB: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] FLOPS: GFLOPs [Giga Floating Point Operations Per Second] Frontend: IFetch_Line_Utilization [Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions] Frontend_Bandwidth: DSB_Coverage [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)] Memory_BW: MLP [Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)] v2: Check return value of asprintf to fix warning on FC26 Fix key in lookup/addition for the groups list Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-8-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:13 -03:00
Andi Kleen	b18f3e3650	perf stat: Support JSON metrics in perf stat Add generic support for standalone metrics specified in JSON files to perf stat. A metric is a formula that uses multiple events to compute a higher level result (e.g. IPC). Previously metrics were always tied to an event and automatically enabled with that event. But now change it that we can have standalone metrics. They are in the same JSON data structure as events, but don't have an event name. We also allow to organize the metrics in metric groups, which allows a short cut to select several related metrics at once. Add a new -M / --metrics option to perf stat that adds the metrics or metric groups specified. Add the core code to manage and parse the metric groups. They are collected from the JSON data structures into a separate rblist. When computing shadow values look for metrics in that list. Then they are computed using the existing saved values infrastructure in stat-shadow.c The actual JSON metrics are in a separate pull request. % perf stat -M Summary --metric-only -a sleep 1 Performance counter stats for 'system wide': Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization 317614222.0 1392930775.0 0.0 0.0 0.2 0.1 1.001497549 seconds time elapsed % perf stat -M GFLOPs flops Performance counter stats for 'flops': 3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%) 14 fp_comp_ops_exe.sse_scalar_double (66.65%) 0 fp_comp_ops_exe.sse_packed_double (66.67%) 0 fp_comp_ops_exe.sse_packed_single (66.70%) 0 simd_fp_256.packed_double (66.70%) 0 simd_fp_256.packed_single (66.67%) 0 duration_time 3.238372845 seconds time elapsed v2: Add missing header file v3: Move find_map to pmu.c Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:13 -03:00
Andi Kleen	d77ade9f41	perf pmu: Extract function to get JSON alias map Extract the code to get the per cpu JSON alias into a separate function for reuse. No behavior changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-6-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:13 -03:00
Andi Kleen	4ed962eb38	perf stat: Print generic metric header even for failed expressions Print the generic metric header even when the expression evaluation failed. Otherwise an expression that fails on the first collections due to division by zero may suddenly reappear later without an header. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:13 -03:00
Andi Kleen	bba49af873	perf stat: Factor out generic metric printing The 'perf stat' shadow metric printing already supports generic metrics. Factor out the code doing that into a separate function that can be re-used in a later patch. No behavior changes. v2: Fix indentation Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:12 -03:00
Andi Kleen	3ba36d3620	perf vendor events: Support metric_group and no event name in JSON parser Some enhancements to the JSON parser to prepare for metrics support - Parse the new MetricGroup field - Support JSON events with no event name, that have only MetricName. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170831194036.30146-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:12 -03:00
Andi Kleen	5a5dfe4b85	perf tools: Support weak groups in 'perf stat' Setting up groups can be complicated due to the complicated scheduling restrictions of different PMUs. User tools usually don't understand all these restrictions. Still in many cases it is useful to set up groups and they work most of the time. However if the group is set up wrong some members will not report any value because they never get scheduled. Add a concept of a 'weak group': try to set up a group, but if it's not schedulable fallback to not using a group. That gives us the best of both worlds: groups if they work, but still a usable fallback if they don't. In theory it would be possible to have more complex fallback strategies (e.g. try to split the group in half), but the simple fallback of not using a group seems to work for now. So far the weak group is only implemented for perf stat, not for record. Here's an unschedulable group (on IvyBridge with SMT on) % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 73,806,067 branches 4,848,144 branch-misses # 6.57% of all branches 14,754,458 l1d.replacement 24,905,558 l2_lines_in.all <not supported> l2_rqsts.all_code_rd <------- will never report anything With the weak group: % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1 125,366,055 branches (80.02%) 9,208,402 branch-misses # 7.35% of all branches (80.01%) 24,560,249 l1d.replacement (80.00%) 43,174,971 l2_lines_in.all (80.05%) 31,891,457 l2_rqsts.all_code_rd (79.92%) The extra event scheduled with some extra multiplexing v2: Move fallback code to separate function. Add comment on for_each_group_member Adjust to new perf_evsel__close interface v3: Fix debug print out. Committer testing: Before: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': <not counted> branches <not counted> branch-misses <not counted> l1d.replacement <not counted> l2_lines_in.all <not supported> l2_rqsts.all_code_rd 1.002147212 seconds time elapsed # perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1 Performance counter stats for 'system wide': 83,207,892 branches 11,065,444 l1d.replacement 28,484,024 l2_lines_in.all 12,186,179 l2_rqsts.all_code_rd 1.001739493 seconds time elapsed After: # perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1 Performance counter stats for 'system wide': 543,323,909 branches (80.01%) 27,100,512 branch-misses # 4.99% of all branches (80.02%) 50,402,905 l1d.replacement (80.03%) 67,385,892 l2_lines_in.all (80.01%) 21,352,885 l2_rqsts.all_code_rd (79.94%) 1.001086658 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org [ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:12 -03:00
David Ahern	0f59d7a352	perf sched timehist: Add pid and tid options Add options to only show event for specific pid(s) and tid(s). Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504288152-19690-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-13 09:49:12 -03:00
Ingo Molnar	b130a699c0	perf/urgent fixes: - Fix TUI progress bar when delta from new total from that of the previous update is greater than the progress "step" (screen width progress bar block)) (Jiri Olsa) - Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not to cripple debuginfo, just like tools/perf/ does (Jiri Olsa) - Avoid leaking the 'perf.data' file to workloads started from the 'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa) - Fix building when libunwind's 'unwind.h' file is present in the include path, clashing with tools/perf/util/unwind.h (Milian Wolff) - Check per .perfconfig section entry flag, not just per section (Taeung Song) - Support running perf binaries with a dash in their name, needed to run perf as an AppImage (Milian Wolff) - Wait for the right child by using waitpid() when running workloads from 'perf stat', also to fix using perf as an AppImage (Milian Wolff) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlm4HVkACgkQ1lAW81NS qkByEhAAsNfQRKGQIeudLdEWx63wyZviU0KQ2zeNurbpEMHsttcHgQciYvqQmyCn FZ+zm21vcNBKd0pqFwPL0WPJzYSnudRT23cG2NFSLlFX7RNZhzgp0X1a75kACXCH oJKpu/D4YDDS8J+xLjApJUWaVOYW39yAG3Cdzq2IBYHvbvPg/ovrBkxrwKLhJJTE ZdQIjt+DbGbEUvgOMAji/BmpgjnV5/goz736KoOIiWso3LpAsEb5kiLBMghnSTyR M4Hxl7NHS+3f7J8QpTTVlcL4oxI7RgYSQbjnqdwhff4LRrTfS3txRcit20KCMwF/ u+n1JBgR7I3ogUoO1jXyi0IaDdi77Vr7EckO5Yd+8shGIiXICwx1pMl88NvxBNHN 6YB8sL8/Fbw6q7d/o5iHwbrkuLZDWaE7fU3kj31l+W5jSIM1orCwcUG+IPU8e2UQ M40vtebGEVWKkYl/UqGNGh9d6zlN8du8HW+uT0l9Hebon6YQFt/qDnP0PRQ5QlFi p8AozBXbCUxjenUknuaHN4QtIJhshOhzw2YRKxghWeVHqP/yJ8lrPy646NosKXMj qPHezIvrEarERDxn8sMlHic2edOfcbiol6GjZmxeDutvLJzc3i54+tfwVuIUG3TV o123i3Xt7vld6xaENxPhCvfEqo/IRRuvqAxAIvLRq+OLwc3H2ec= =d3gz -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo-4.14-20170912' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: - Fix TUI progress bar when delta from new total from that of the previous update is greater than the progress "step" (screen width progress bar block)) (Jiri Olsa) - Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not to cripple debuginfo, just like tools/perf/ does (Jiri Olsa) - Avoid leaking the 'perf.data' file to workloads started from the 'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa) - Fix building when libunwind's 'unwind.h' file is present in the include path, clashing with tools/perf/util/unwind.h (Milian Wolff) - Check per .perfconfig section entry flag, not just per section (Taeung Song) - Support running perf binaries with a dash in their name, needed to run perf as an AppImage (Milian Wolff) - Wait for the right child by using waitpid() when running workloads from 'perf stat', also to fix using perf as an AppImage (Milian Wolff) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-09-13 09:25:10 +02:00
Linus Torvalds	e6328a7abe	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling updates from Ingo Molnar: "Perf tooling updates and fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB perf stat: Only auto-merge events that are PMU aliases perf test: Add test case for PERF_SAMPLE_PHYS_ADDR perf script: Support physical address perf mem: Support physical address perf sort: Add sort option for physical address perf tools: Support new sample type for physical address perf vendor events powerpc: Remove duplicate events perf intel-pt: Fix syntax in documentation of config option perf test powerpc: Fix 'Object code reading' test perf trace: Support syscall name globbing perf syscalltbl: Support glob matching on syscall names perf report: Calculate the average cycles of iterations	2017-09-12 11:28:13 -07:00
Milian Wolff	dfc9eec771	perf stat: Wait for the correct child When packaging the perf userland application into an AppImage, the wait() call in perf stat returned too early. It turned out that some other child process exited, but not the one perf stat launched: $ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1 execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars /) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID\|CLONE_CHILD_SETTID\|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912 strace: Process 3912 attached [pid 3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID\|CLONE_CHILD_SETTID\|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914 strace: Process 3914 attached [pid 3912] +++ exited with 0 +++ [pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- [pid 3914] clone(strace: Process 3915 attached child_stack=0x7f6a6d9fefb0, flags=CLONE_VM\|CLONE_FS\|CLONE_FILES\|CLONE_SIGHAND\|CLONE_THREAD\|CLONE_SYSVSEM\|CLONE_SETTLS\|CLONE_PARENT_SETTID\|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915 [pid 3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 / 21 vars /) = 0 [pid 3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID\|CLONE_CHILD_SETTID\|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916 strace: Process 3916 attached [pid 3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912 [pid 3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars /) = -1 ENOENT (No such file or directory) [pid 3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 / 22 vars */ Performance counter stats for 'sleep 1': <not counted> task-clock <not counted> context-switches <not counted> cpu-migrations <not counted> page-faults <not counted> cycles <not counted> instructions <not counted> branches <not counted> branch-misses 0.000047194 seconds time elapsed [pid 3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} --- [pid 3916] +++ killed by SIGTERM +++ [pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} --- [pid 3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} --- [pid 3911] +++ exited with 0 +++ [pid 3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} --- [pid 3915] +++ exited with 0 +++ +++ exited with 0 +++ This patch uses waitpid instead to ensure the call waits for the debuggee application launched by 'perf stat'. This fixes 'perf stat' when launched from an AppImage: $ ./perf-x86_64.AppImage stat sleep 1 Performance counter stats for 'sleep 1': 0.357235 task-clock (msec) # 0.000 CPUs utilized 1 context-switches # 0.003 M/sec 0 cpu-migrations # 0.000 K/sec 50 page-faults # 0.140 M/sec 1269602 cycles # 3.554 GHz 654278 instructions # 0.52 insn per cycle 129963 branches # 363.803 M/sec 7082 branch-misses # 5.45% of all branches 1.000633420 seconds time elapsed Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:49:13 -03:00
Milian Wolff	3192f1ed3d	perf tools: Support running perf binaries with a dash in their name Previously the part behind "perf-" was interpreted as an internal perf command. If the suffix could not be handled, the execution was stopped. This makes it impossible to launch perf binaries that got renamed to have the `perf-` prefix. This is e.g. the case for appimages (e.g. "perf-x86_64.AppImage"), but would also apply to all other scenarios where users symlink or rename perf themselves: Status quo with the broken behavior: $ ln -s ./perf ./perf-custom-suffix $ ./perf-custom-suffix list cannot handle custom-suffix internally$ Also note the missing newline at the end of the error message. With this patch applied, the above works properly: $ ./perf-custom-suffix list List of pre-defined events (to be used in -e): ... Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20170911111422.31903-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:48:54 -03:00
Taeung Song	cba225d6ee	perf config: Check not only section->from_system_config but also item's Currently section->from_system_config is being checked multiple times. item->from_system_config should be checked instead, when iterating thru the items in a section. Fix it. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1504754325-9724-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:35:11 -03:00
Jiri Olsa	a82bfd041d	perf ui progress: Fix progress update We currently update the 'next' variable only with a single step value. But it's possible the 'adv' update is bigger than single 'step' value. This would leave 'next' value under counted and force unnecessary ui_progress__ops->update calls. Calculate the amount of steps we need for 'adv' update and increase the 'next' with that amounts of steps. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:34:54 -03:00
Jiri Olsa	4d286c89e4	perf ui progress: Make sure we always define step value Unlikely, but we could have ui_progress__init being called with total < 16, which would set the next and step variables to 0. That would force unnecessary ui_progress__ops->update calls because 'next' would never raise. Forcing the next and step values to be always > 0. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908120510.22515-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:34:46 -03:00
Jiri Olsa	cd6379ebb5	perf tools: Open perf.data with O_CLOEXEC flag Do not carry the perf.data file descriptor into the workload process and close it when perf executes the workload. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170908084621.31595-2-jolsa@kernel.org [ Add definitions for O_CLOEXEC for older systems ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:34:23 -03:00
Milian Wolff	df90cc41d6	perf tests: Fix compile when libunwind's unwind.h is available When cross compiling perf and I want to link against a self-compiled libunwind, I usually make the custom path where the libunwind headers exist visible by adding the libunwind prefix to the include path when compiling perf, i.e.: ~~~~~ $ ls $HOME/projects/compiled/other/include/ libunwind-coredump.h libunwind.h libunwind-x86_64.h libunwind-common.h libunwind-dynamic.h libunwind-ptrace.h unwind.h $ make EXTRA_CFLAGS="-I$HOME/projects/compiled/other/include/ ~~~~~~ Note the `unwind.h` header from libunwind which leads to compile errors when compiling tests/dwarf-unwind.c, since it shadows perf's util/unwind.h: ~~~~~ tests/dwarf-unwind.c:41:32: error: ‘struct unwind_entry’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror] static int unwind_entry(struct unwind_entry entry, void arg) ^~~~~~~~~~~~ tests/dwarf-unwind.c: In function ‘unwind_entry’: tests/dwarf-unwind.c:44:22: error: dereferencing pointer to incomplete type ‘struct unwind_entry’ char *symbol = entry->sym ? entry->sym->name : NULL; ^~ tests/dwarf-unwind.c: In function ‘unwind_thread’: tests/dwarf-unwind.c:92:8: error: implicit declaration of function ‘unwind__get_entries’; did you mean ‘unwind_entry’? [-Werror=implicit-function-declaration] err = unwind__get_entries(unwind_entry, &cnt, thread, ^~~~~~~~~~~~~~~~~~~ unwind_entry tests/dwarf-unwind.c:92:8: error: nested extern declaration of ‘unwind__get_entries’ [-Werror=nested-externs] ~~~~~~ Fix this compile error by specificing an explicit include of perf's unwind.h in the util folder. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20170906150209.12579-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-12 12:34:02 -03:00
Linus Torvalds	bafb0762cb	Char/Misc drivers for 4.14-rc1 Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWa1+Ew8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl26wCgquufNylfhxr65NbJrovduJYzRnUAniCivXg8 bePIh/JI5WxWoHK+wEbY =hYWx -----END PGP SIGNATURE----- Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits) ANDROID: binder: don't queue async transactions to thread. ANDROID: binder: don't enqueue death notifications to thread todo. ANDROID: binder: Don't BUG_ON(!spin_is_locked()). ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue android: binder: Add page usage in binder stats android: binder: fixup crash introduced by moving buffer hdr drivers: w1: add hwmon temp support for w1_therm drivers: w1: refactor w1_slave_show to make the temp reading functionality separate drivers: w1: add hwmon support structures eeprom: idt_89hpesx: Support both ACPI and OF probing mcb: Fix an error handling path in 'chameleon_parse_cells()' MCB: add support for SC31 to mcb-lpc mux: make device_type const char: virtio: constify attribute_group structures. Documentation/ABI: document the nvmem sysfs files lkdtm: fix spelling mistake: "incremeted" -> "incremented" perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file nvmem: include linux/err.h from header ...	2017-09-05 11:08:17 -07:00
Arnaldo Carvalho de Melo	eba9fac017	perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB The popup help accessed via 'h' wasn't mentioning about TAB and shift-TAB, just about 'H', which goes to the hottest line, while the former two are the hotkeys for actually cycling thru the hottest lines. Reported-by: Flavio Bruno Leitner <fbl@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-5ppym6odizfj1ifa4t7neiku@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:55:40 -03:00
Arnaldo Carvalho de Melo	63ce8449bc	perf stat: Only auto-merge events that are PMU aliases Peter reported that when he explicitely asked for multiple events with the same name on the command line it got coalesced into just one line, i.e.: # perf stat -e cycles -e cycles -e cycles usleep 1 Performance counter stats for 'usleep 1': 3,269,652 cycles 0.000884123 seconds time elapsed # And while there is the --no-merges option to disable that auto-merging, this is a blunt change in behaviour for such explicit request, so change the code so that this auto merging is done only when handling the multi PMU aliases with the same name that introduced this coalescing, restoring the previous behaviour for the explicit case: # perf stat -e cycles -e cycles -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,472,837 cycles 1,472,837 cycles 1,472,837 cycles 0.001764870 seconds time elapsed # Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: `430daf2dc7` ("perf stat: Collapse identically named events") Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:48:59 -03:00
Kan Liang	fc33dccba3	perf test: Add test case for PERF_SAMPLE_PHYS_ADDR Extend sample-parsing test cases to support new sample type PERF_SAMPLE_PHYS_ADDR. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-6-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:34 -03:00
Kan Liang	49d58f04eb	perf script: Support physical address Display the physical address at the tail if it is available. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-5-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:29 -03:00
Kan Liang	c35aeb9dfe	perf mem: Support physical address Add option phys-data in "perf mem" to record/report physical address. The default mem sort order for physical address is changed accordingly. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-4-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:23 -03:00
Kan Liang	8780fb25ab	perf sort: Add sort option for physical address Add a new sort option "phys_daddr" for --mem-mode sort. With this option applied, perf can sort and report by sample's physical address. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:11 -03:00
Kan Liang	3b0a5daa06	perf tools: Support new sample type for physical address Support new sample type PERF_SAMPLE_PHYS_ADDR for physical address. Add new option --phys-data to record sample physical address. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-2-git-send-email-kan.liang@intel.com [ Added missing printing in evsel.c patch sent by Jiri Olsa ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:00 -03:00
Sukadev Bhattiprolu	2a118e1bd2	perf vendor events powerpc: Remove duplicate events Some POWER PMU event names have multiple/alternate event codes. These alternate event codes were listed in the POWER9 JSON files for reference. But the perf tool does not seem to handle duplicates cleanly. 'perf list' shows such duplicate events only once, but 'perf stat' ends up counting the first event code twice, multiplexing if necessary and we end up with double the event counts. Remove the duplicate event codes from the JSON files for now. Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: http://lkml.kernel.org/r/20170830231506.GB20351@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:46:00 -03:00
Jack Henschel	4fb2053920	perf intel-pt: Fix syntax in documentation of config option As specified in tools/perf/Documentation/perf-config.txt, perf configuration items must be in 'key = value' format, otherwise the following error message occurs: $ perf record -e intel_pt//u -- ls bad config file line 2 in ~/.perfconfig $ cat .perfconfig [intel-pt] mispred-all Changing to assigning a value to the key 'mispred-all' fixes the issue: $ perf record -e intel_pt//u -- ls [ perf record: Woken up 1 times to write data ] [ perf record: Capured and wrote 0.031 MB perf.data] $ cat .perfconfig [intel-pt] mispred-all = true Signed-off-by: Jack Henschel <jackdev@mailbox.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170831080535.2157-1-jackdev@mailbox.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:45:59 -03:00
Ravi Bangoria	9a805d8648	perf test powerpc: Fix 'Object code reading' test 'Object code reading' test always fails on powerpc guest. Two reasons for the failure are: 1. When elf section is too big (size beyond 'unsigned int' max value). objdump fails to disassemble from such section. This was fixed with commit 0f6329bd7fc ("binutils/objdump: Fix disassemble for huge elf sections") in binutils. 2. When the sample is from hypervisor. Hypervisor symbols can not be resolved within guest and thus thread__find_addr_map() fails for such symbols. Fix this by ignoring hypervisor symbols in the test. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1504170896-7876-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:45:59 -03:00
Arnaldo Carvalho de Melo	27702bcfe8	perf trace: Support syscall name globbing So now we can use: # perf trace -e pkey_* 532.784 ( 0.006 ms): pkey/16018 pkey_alloc(init_val: DISABLE_WRITE) = -1 EINVAL Invalid argument 532.795 ( 0.004 ms): pkey/16018 pkey_mprotect(start: 0x7f380d0a6000, len: 4096, prot: READ\|WRITE, pkey: -1) = 0 532.801 ( 0.002 ms): pkey/16018 pkey_free(pkey: -1 ) = -1 EINVAL Invalid argument ^C[root@jouet ~]# Or '-e epoll', '-e msg', etc. Combining syscall names with perf events, tracepoints, etc, continues to be valid, i.e. this is possible: # perf probe -L sys_nanosleep <SyS_nanosleep@/home/acme/git/linux/kernel/time/hrtimer.c:0> 0 SYSCALL_DEFINE2(nanosleep, struct timespec __user , rqtp, struct timespec __user , rmtp) { struct timespec64 tu; 5 if (get_timespec64(&tu, rqtp)) 6 return -EFAULT; if (!timespec64_valid(&tu)) 9 return -EINVAL; 11 current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE; 12 current->restart_block.nanosleep.rmtp = rmtp; 13 return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC); } # perf probe my_probe="sys_nanosleep:12 rmtp" Added new event: probe:my_probe (on sys_nanosleep:12 with rmtp) You can now use it in all perf tools, such as: perf record -e probe:my_probe -aR sleep 1 # # perf trace -e probe:my_probe/max-stack=5/,sleep sleep 1 0.427 ( 0.003 ms): sleep/16690 nanosleep(rqtp: 0x7ffefc245090) ... 0.430 ( ): probe:my_probe:(ffffffffbd112923) rmtp=0) sys_nanosleep ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep_nocancel (/usr/lib64/libc-2.25.so) 0.427 (1000.208 ms): sleep/16690 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-elycoi8wy6y0w9dkj7ox1mzz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:45:58 -03:00
Arnaldo Carvalho de Melo	89be3f8ab7	perf syscalltbl: Support glob matching on syscall names With two new methods, one to find the first match, returning its syscall id and its index in whatever internal database it keeps the syscall into, then one to find the next match, if any. Implemented only on arches where we actually read the syscall table from the kernel sources, i.e. x86-64 for now, all the others use the libaudit method for which this returns -1, i.e. just stubs were added, with the actual implementation using whatever libaudit functions for matching that may be available. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-i0sj4rxk1a63pfe9gl8z8irs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-09-01 14:45:48 -03:00
Jin Yao	c4ee06251d	perf report: Calculate the average cycles of iterations The branch history code has a loop detection function. With this, we can get the number of iterations by calculating the removed loops. While it would be nice for knowing the average cycles of iterations. This patch adds up the cycles in branch entries of removed loops and save the result to the next branch entry (e.g. branch entry A). Finally it will display the iteration number and average cycles at the "from" of branch entry A. For example: perf record -g -j any,save_type ./div perf report --branch-history --no-children --stdio --22.63%--main div.c:42 (RET CROSS_2M) compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2) \| --10.73%--compute_flag div.c:27 (RET CROSS_2M) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M) Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-30 10:03:27 -03:00
Li Bin	b2f7605076	perf symbols: Fix plt entry calculation for ARM and AARCH64 On x86, the plt header size is as same as the plt entry size, and can be identified from shdr's sh_entsize of the plt. But we can't assume that the sh_entsize of the plt shdr is always the plt entry size in all architecture, and the plt header size may be not as same as the plt entry size in some architecure. On ARM, the plt header size is 20 bytes and the plt entry size is 12 bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils implementation. The plt section is as follows: Disassembly of section .plt: 000004a0 <__cxa_finalize@plt-0x14>: 4a0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4a4: e59fe004 ldr lr, [pc, #4] ; 4b0 <_init+0x1c> 4a8: e08fe00e add lr, pc, lr 4ac: e5bef008 ldr pc, [lr, #8]! 4b0: 00008424 .word 0x00008424 000004b4 <__cxa_finalize@plt>: 4b4: e28fc600 add ip, pc, #0, 12 4b8: e28cca08 add ip, ip, #8, 20 ; 0x8000 4bc: e5bcf424 ldr pc, [ip, #1060]! ; 0x424 000004c0 <printf@plt>: 4c0: e28fc600 add ip, pc, #0, 12 4c4: e28cca08 add ip, ip, #8, 20 ; 0x8000 4c8: e5bcf41c ldr pc, [ip, #1052]! ; 0x41c On AARCH64, the plt header size is 32 bytes and the plt entry size is 16 bytes. The plt section is as follows: Disassembly of section .plt: 0000000000000560 <__cxa_finalize@plt-0x20>: 560: a9bf7bf0 stp x16, x30, [sp,#-16]! 564: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 568: f944be11 ldr x17, [x16,#2424] 56c: 9125e210 add x16, x16, #0x978 570: d61f0220 br x17 574: d503201f nop 578: d503201f nop 57c: d503201f nop 0000000000000580 <__cxa_finalize@plt>: 580: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 584: f944c211 ldr x17, [x16,#2432] 588: 91260210 add x16, x16, #0x980 58c: d61f0220 br x17 0000000000000590 <__gmon_start__@plt>: 590: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 594: f944c611 ldr x17, [x16,#2440] 598: 91262210 add x16, x16, #0x988 59c: d61f0220 br x17 NOTES: In addition to ARM and AARCH64, other architectures, such as s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider this issue. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: David Tolnay <dtolnay@gmail.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-29 11:41:27 -03:00
Li Bin	2c29461e27	perf probe: Fix kprobe blacklist checking condition The commit `9aaf5a5f47` ("perf probe: Check kprobes blacklist when adding new events"), 'perf probe' supports checking the blacklist of the fuctions which can not be probed. But the checking condition is wrong, that the end_addr of the symbol which is the start_addr of the next symbol can't be included. Committer notes: IOW make it match its kernel counterpart in kernel/kprobes.c: bool within_kprobe_blacklist(unsigned long addr) Each entry have as its end address not its end address, but the first address _outside_ that symbol, which for related functions, is the first address of the next symbol, like these from kernel/trace/trace_probe.c: 0xffffffffbd198df0-0xffffffffbd198e40 print_type_u8 0xffffffffbd198e40-0xffffffffbd198e90 print_type_u16 0xffffffffbd198e90-0xffffffffbd198ee0 print_type_u32 0xffffffffbd198ee0-0xffffffffbd198f30 print_type_u64 0xffffffffbd198f30-0xffffffffbd198f80 print_type_s8 0xffffffffbd198f80-0xffffffffbd198fd0 print_type_s16 0xffffffffbd198fd0-0xffffffffbd199020 print_type_s32 0xffffffffbd199020-0xffffffffbd199070 print_type_s64 0xffffffffbd199070-0xffffffffbd1990c0 print_type_x8 0xffffffffbd1990c0-0xffffffffbd199110 print_type_x16 0xffffffffbd199110-0xffffffffbd199160 print_type_x32 0xffffffffbd199160-0xffffffffbd1991b0 print_type_x64 But not always: 0xffffffffbd1997b0-0xffffffffbd1997c0 fetch_kernel_stack_address (kernel/trace/trace_probe.c) 0xffffffffbd1c57f0-0xffffffffbd1c58b0 __context_tracking_enter (kernel/context_tracking.c) Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Fixes: `9aaf5a5f47` ("perf probe: Check kprobes blacklist when adding new events") Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-29 11:14:12 -03:00
Arnaldo Carvalho de Melo	83bc9c371e	perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments Reuse 'mprotect' beautifiers for 'pkey_mprotect'. System wide tracing pkey_alloc, pkey_free and pkey_mprotect calls, with backtraces: # perf trace -e pkey_alloc,pkey_mprotect,pkey_free --max-stack=5 0.000 ( 0.011 ms): pkey/7818 pkey_alloc(init_val: DISABLE_ACCESS\|DISABLE_WRITE) = -1 EINVAL Invalid argument syscall (/usr/lib64/libc-2.25.so) pkey_alloc (/home/acme/c/pkey) 0.022 ( 0.003 ms): pkey/7818 pkey_mprotect(start: 0x7f28c3890000, len: 4096, prot: READ\|WRITE, pkey: -1) = 0 syscall (/usr/lib64/libc-2.25.so) pkey_mprotect (/home/acme/c/pkey) 0.030 ( 0.002 ms): pkey/7818 pkey_free(pkey: -1 ) = -1 EINVAL Invalid argument syscall (/usr/lib64/libc-2.25.so) pkey_free (/home/acme/c/pkey) The tools/include/uapi/asm-generic/mman-common.h file is used to find the access rights defines for the pkey_alloc syscall second argument. Since we have the detector of changes for the tools/include header files versus its kernel origin (include/uapi/asm-generic/mman-common.h), we'll get whatever new flag appears for that argument automatically. This method should be used in other cases where it is easy to generate those flags tables because the header has properly namespaced defines like PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3xq5312qlks7wtfzv2sk3nct@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:47 -03:00
David Carrillo-Cisneros	70ff7c6caa	perf tools: Pass full path of FEATURES_DUMP When building with an external FEATURES_DUMP, bpf complains that features dump file is not found. Fix it by passing full file path. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-7-davidcc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:46 -03:00
David Carrillo-Cisneros	3866058ef1	perf tools: Robustify detection of clang binary Prior to this patch, make scripts tested for CLANG with ifeq ($(CC), clang), failing to detect CLANG binaries with different names. Fix it by testing for the existence of __clang__ macro in the list of compiler defined macros. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:46 -03:00
David Carrillo-Cisneros	39a59f1e3e	perf tools: Allow external definition of flex and bison binary names Allow user to define flex and bison binary names by passing FLEX and BISON variables. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Paul Turner <pjt@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20170827075442.108534-3-davidcc@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:45 -03:00
Jiri Olsa	9933183e36	perf report: Group stat values on global event id There's no big value on displaying counts for every event ID, which is one per every CPU. Rather than that, displaying the whole sum for the event. $ perf record -c 100000 -e cycles:u -s test $ perf report -T Before: # PID TID cycles:u cycles:u cycles:u cycles:u ... [20 more columns of 'cycles:u'] 3339 3339 0 0 0 0 3340 3340 0 0 0 0 3341 3341 0 0 0 0 3342 3342 0 0 0 0 Now: # PID TID cycles:u 3339 3339 19678 3340 3340 18744 3341 3341 17335 3342 3342 26414 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:44 -03:00
Jiri Olsa	a1834fc938	perf values: Zero value buffers We need to make sure the array of value pointers are zero initialized, because we use them in realloc later on and uninitialized non zero value will cause allocation error and aborted execution. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:43 -03:00
Jiri Olsa	f4ef3b7c18	perf values: Fix allocation check Bailing out in case the allocation failed, not the other way round. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:43 -03:00
Jiri Olsa	64eed1deb6	perf values: Fix thread index bug We are taking wrong index (+1) for first thread, which leaves thread with index 0 unused and uninitialized. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:44:42 -03:00
Jiri Olsa	dac7f6b7ed	perf report: Add dump_read function Adding dump_read function to gather all the dump output of read function. Adding output of enabled and running times and id if enabled (3 new lines with '...' prefix below). $ perf record -s ... $ perf report -D 958358311769 0x91f8 [0x40]: PERF_RECORD_READ: 3339 3339 cycles:u 0 ... time enabled : 958358313731 ... time running : 958358313731 ... id : 80 Committer note: Do not use 'read' as a variable name as it breaks the build on older systems, such as RHEL6: CC /tmp/build/perf/util/session.o cc1: warnings being treated as errors util/session.c: In function 'dump_read': util/session.c:1132: error: declaration of 'read' shadows a global declaration /usr/include/bits/unistd.h:35: error: shadowed declaration is here mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 16:43:50 -03:00
Mike Leach	df770ff058	perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file The value passed into the perf.data file for the CONFIGR register in ETMv4 was incorrectly being set to the command line options/ETMv3 value. Adds bit definitions and function to remap this value to the correct ETMv4 CONFIGR bit values for all selected options. Signed-off-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-28 17:35:43 +02:00
Jiri Olsa	a17f069787	perf record: Set read_format for inherit_stat Set read_format for what we expect to get from read event generated by perf_event_attr::inherit_stat. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824162737.7813-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 11:05:10 -03:00
Jiri Olsa	12c15302dd	perf c2c: Fix remote HITM detection for Skylake Skylake introduced new mem_remote bit in union perf_mem_data_src [1]. It applies to any other memory level to express Remote unknown level, as is reported by Skylake. Adding this extra check to c2c_decode_stats to properly decode remote HITMs on Skylake. [1] http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20170824085732.28481-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 11:05:10 -03:00
Jiri Olsa	6bd76b8fab	perf tools: Fix static build with newer toolchains We can't pass --dynamic-list list into static build anymore, because compilers starts to scream about that. Fedora 26 started to fail build with following error: $ make LDFLAGS=-static ... /usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64/libc.a(strcmp.o +)' can not be used when making an executable; recompile with -fPIE and relink with -pie There's no sense for --dynamic-list in static build, because there's no .dynsym table in static binary. Consequently the traceevent plugins have never worked with static build, but it was quietly passed by. To fix this in future I think we should add support to compile plugins within the perf binary directly for static build. Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-jeg6a7ff9j9hlqn8k4gllzvv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 11:05:09 -03:00
Jack Henschel	726647d052	perf stat: Fix path to PMU formats in documentation As defined in tools/perf/util/pmu.c, the EVENT_SOURCE_DEVICE_PATH is /sys/bus/event_source/devices/ (no traling 's' in event_source) This patch corrects the path in the perf stat documentation Signed-off-by: Jack Henschel <jackdev@mailbox.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jack Henschel <jackdev@mailbox.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/20170824132022.10934-1-jackdev@mailbox.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-28 11:05:09 -03:00
Konstantin Khlebnikov	60913e005c	perf tools: Fix static linking with libunwind * libunwind-x86_64 must be linked before libunwind * libunwind requires liblzma * static libunwind conflicts with static libgcc_eh Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322917247.129799.14247751517961953155.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 13:24:55 -03:00
Konstantin Khlebnikov	ba335df4ea	perf tools: Fix static linking with libdw from elfutils Fix feature test for static libdw: link required dependencies. Backends of libebl are not statically linked thus libdl is required. In Debian/Ubuntu libdw-dev includes libebl.a starting from 0.166-1. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916720.129772.7959925864494283854.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 13:24:54 -03:00
Konstantin Khlebnikov	ac0bb6b72f	perf: Fix documentation for sysctls perf_event_paranoid and perf_event_mlock_kb Fix misprint CAP_IOC_LOCK -> CAP_IPC_LOCK. This capability have nothing to do with raw tracepoints. This part is about bypassing mlock limits. Sysctl kernel.perf_event_paranoid = -1 allows raw and ftrace function tracepoints without CAP_SYS_ADMIN. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916080.129746.11285255474738558340.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 13:24:54 -03:00
Konstantin Khlebnikov	2826478a66	perf tools: Really install manpages via 'make install-man' Target install-man builds them but forget to install. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: `af3df2cf17` ("perf tools: Try to build Documentation when installing") Link: http://lkml.kernel.org/r/150322915300.129715.13645857235229756834.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 13:24:53 -03:00
Andi Kleen	3067eaa7ce	perf test: Add test cases for new data source encoding Add some simple tests to perf test to test data source printing. v2: Make the tests actually checked for the correct name of Forward v3: Adjust to new encoding Committer notes: Avoid the in place declaration to make this build with older compilers, for instance, in Debian 7 we get: tests/mem.c: In function 'test__mem': tests/mem.c:30:5: error: missing initializer [-Werror=missing-field-initializers] tests/mem.c:30:5: error: (near initialization for '(anonymous).<anonymous>.mem_snoop') [-Werror=missing-field-initializers] So just zero a struct, then go on building the unions as needed, reusing settings from the previous test, i.e. local -> remote, etc. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170816222156.19953-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 13:23:10 -03:00
Andi Kleen	52839e653b	perf tools: Add support for printing new mem_info encodings Add decoding for the new "lvlx" and "snoopx" meminfo fields added earlier to the kernel so that "perf mem report" and other tools can print it properly. v2: Merge with persistent memory patch. Switch to new bit encoding for each combination. v3: Switch to generic lvlnum field. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:30:25 -03:00
Andi Kleen	41d3d6db17	perf vendor events: Add Skylake server uncore event list Add JSON uncore events for Skylake Server to perf. Based on JSON list V1.01 This is a much fuller list than with earlier uncores, including more low level (but also harder to understand) events. It does not include the "experimential" events. The previous high level metric (LLC_* etc.) are still available when applicable. C state power events are not included at this point. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20170816220553.GA19463@tassilo.jf.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:25:11 -03:00
Andi Kleen	630171d415	perf vendor events: Add core event list for Skylake Server Based on JSON list version v1.01 Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/3269ae458a883139110ec82bc895423bd8843d65 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:23:52 -03:00
Andi Kleen	d66dccdb13	perf tools: Dedup events in expression parsing Avoid adding redundant events while parsing an expression. When we add an "other" event check first if it already exists. v2: Fix perf test failure. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-10-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:19:08 -03:00
Andi Kleen	8d3db2b97f	perf tools: Increase maximum number of events in expressions Some of the upcoming metrics need more than 8 events. Increase the maximum number the parser supports. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-9-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:19:05 -03:00
Andi Kleen	d73bad0685	perf tools: Expression parser enhancements for metrics Enhance the expression parser for more complex metric formulas. - Support python style IF ELSE operators - Add an #SMT_On magic variable for formulas that depend on the SMT status. Example: 4 ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else cycles - Support MIN/MAX operations Example: min(1 , IDQ.MITE_UOPS / ( UPI 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) ) This is useful to fix up problems caused by multiplexing. - Support \| & ^ operators - Minor cleanups and fixes - Support an \ escape for operators. This allows to specify event names like c2-residency - Support @ as an alternative for / to be able to specify pmus without conflicts with operators (like msr/tsc/ as msr@tsc@) Example: (cstate_core@c3\\-residency@ / msr@tsc@) * 100 Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-8-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:15:53 -03:00
Andi Kleen	de5077c4e3	perf tools: Add utility function to detect SMT status Add an smt_on() function to return if SMT is enabled or disabled. Used in the next patch. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-7-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 12:09:04 -03:00
Andi Kleen	77d0871c76	perf bpf: Tighten detection of BPF events perf stat -e cpu/uops_executed.core,cmask=1/ would be detected as a BPF source event because the .c matches the .c source BPF pattern. v2: Originally I tried to use lex lookahead, but it doesn't seem to work. This now extends the BPF pattern to match longer events, but then does an extra check in the C code to reject BPF matches that do not end with .c/.o/.obj This uses REJECT, which makes the flex scanner slower, but that shouldn't be a big problem for the perf events. Committer testing: # perf trace -e write -e /home/acme/bpf/tracepoint.c cat /etc/passwd > /dev/null 0.000 ( 0.006 ms): cat/18485 write(fd: 1, buf: 0x7f59eebe1000, count: 3494 ) ... 0.006 ( ): raw_syscalls:sys_enter:NR 1 (1, 7f59eebe1000, da6, 22, 7f59eebe0010, 0)) 0.008 ( ): perf_bpf_probe:_write:(ffffffff9626b2c0)) 0.000 ( 0.010 ms): cat/18485 ... [continued]: write()) = 3494 # It continues doing what was expected, i.e. identifying /home/acme/bpf/tracepoint.c as a BPF event and activates the clang machinery to build an eBPF object and then uses sys_bpf() to hook it up to the raw_syscalls:sys_enter tracepoint, etc. Andi forgot to add Wang to the CC list, fix it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 11:56:22 -03:00
Andi Kleen	475fb533fb	perf evsel: Fix buffer overflow while freeing events Fix buffer overflow for: % perf stat -e msr/tsc/,cstate_core/c7-residency/ true that causes glibc free list corruption. For some reason it doesn't trigger in valgrind, but it is visible in AS: ================================================================= ==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0 READ of size 4 at 0x603000003f5c thread T0 #0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196 #1 0x56c57a in perf_evsel__close util/evsel.c:1717 #2 0x55ed5f in perf_evlist__close util/evlist.c:1631 #3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749 #4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767 #5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785 #6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296 #7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348 #8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392 #9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530 #10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400) #11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419) 0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c) allocated by thread T0 here: #0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020) #1 0x648a2d in zalloc util/util.h:23 #2 0x648a88 in xyarray__new util/xyarray.c:9 #3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039 #4 0x56b427 in perf_evsel__open util/evsel.c:1529 #5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730 #6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263 #7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600 #8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767 #9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785 #10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296 #11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348 #12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392 #13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530 #14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400) The event is allocated with cpus == 1, but freed with cpus == real number When the evsel close function walks the file descriptors it exceeds the fd xyarray boundaries and reads random memory. v2: Now that xyarrays save their original dimensions we can use these to iterate the two dimensional fd arrays. Fix some users (close, ioctl) in evsel.c to use these fields directly. This allows simplifying the code and dropping quite a few function arguments. Adjust all callers by removing the unneeded arguments. The actual perf event reading still uses the original values from the evsel list. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org [ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 11:51:31 -03:00
Andi Kleen	d74be47673	perf xyarray: Save max_x, max_y Save the original array dimensions in xyarrays, so that users can retrieve them later. Add some inline functions to access these fields. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170811232634.30465-1-andi@firstfloor.org [ As noticed by Jiri, fix up namespacing: xy__method() -> xyarray__method() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-22 11:51:28 -03:00
Taeung Song	3a555c7799	perf annotate browser: Circulate percent, total-period and nr-samples view Using the existing 't' hotkey, support the three views: percent, total period and number of samples on the annotate TUI browser, circulating them like below: Percent -> Total Period -> Nr Samples -> Percent ... Committer notes: Removed new 'e' hotkey, should be resubmitted as a separate patch, with proper justification for its inclusion. Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Milian Wolff <milian.wolff@kdab.com> Link: http://lkml.kernel.org/r/1503046028-5691-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-18 11:23:20 -03:00
Taeung Song	9cef4b0b5b	perf annotate browser: Support --show-nr-samples option Support the --show-nr-samples in the TUI browser. Committer notes: Lift the restriction about --tui but leave it for --gtk: $ export LD_LIBRARY_PATH=~/lib64 $ perf annotate --gtk --show-nr-samples --show-nr-samples is not available in --gtk mode at this time $ Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1503046023-5646-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-18 11:15:09 -03:00
Taeung Song	01c85629f5	perf annotate: Document --show-total-period option When the --show-total-period option was introduced we forgot to add an entry in the man page, fix it. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Martin Liška <mliska@suse.cz> Fixes: `0c4a5bcea4` ("perf annotate: Display total number of samples with --show-total-period") Link: http://lkml.kernel.org/r/1503046013-5555-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-18 10:34:08 -03:00
Taeung Song	1ac39372e0	perf annotate stdio: Support --show-nr-samples option Add --show-nr-samples option to "perf annotate" so that it matches "perf report". Committer note: Note that it can't be used together with --show-total-period, which seems like a silly limitation, that can be lifted at some point. Made it bail out if not on --stdio. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1503046008-5511-1-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-18 10:31:53 -03:00
Arnaldo Carvalho de Melo	9a57eaf1d2	perf tools: Use default CPUINFO_PROC where it fits Several architectures don't need to define it since the string is the same as the default one, so nuke them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-v1e1jr1u474w9xcelpaoxamu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:58:21 -03:00
Arnaldo Carvalho de Melo	4717e03cc7	perf tools: Remove unused cpu_relax() macros Since `1955643902` ("perf_counter: kerneltop: simplify data_head read") we do not use it, and this was way back in 2009, remove it before some other arch maintainer adds its implementation, like so many did, needlessly :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3l2su9c58eaq4twjzrf9uu08@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:51:25 -03:00
Arnaldo Carvalho de Melo	5d9cdc1181	perf events parse: Rename parse_events_parse arguments Calling them just "data" is too vague, call it 'perf_state', to make it clearer, for instance, when looking at patch hunks. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-rnhk5yb05wem77rjpclrh7so@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:15 -03:00
Arnaldo Carvalho de Melo	d17d0878f4	perf events parse: Use just one parse events state struct Andi reported problems when parse errors were detected with vendor events (json), because in the yyparse/parse_events_parse function we dereferenced the _data parameter to two different structs, with different layouts, which ended up making parse_events_evlist->error to point to random stack addresses. Fix it by making _data to always be struct parse_events_state, changing the only place where 'struct parse_events_term' was used in parse_events.y. Reported-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-bc27lshz823hxl8n9nkelcgh@git.kernel.org Fixes: `90e2b22dee` ("perf/tool: Add support to reuse event grammar to parse out terms") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:15 -03:00
Arnaldo Carvalho de Melo	5d369a75ed	perf events parse: Rename parsing state struct to clearer name Rename it from 'parse_events_evlist' to 'parse_events_state' to better state that this is parsing state that has to be passed around. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-dursqtg2h2w98ztaa297u43x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:15 -03:00
Arnaldo Carvalho de Melo	07806a1df1	perf events parse: Remove some needless local variables Those are just casting a void pointer to a struct to then pass them to functions, i.e. remove the local variables and pass the void pointer directly, the casting will be done and the code will be shorter. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-bzfodzr3mb46gy7u7v0mqad6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:15 -03:00
Arnaldo Carvalho de Melo	d6d4fc6fef	perf trace: Fix off by one string allocation problem We need to consider the null terminator, oops, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: `017037ff3d` ("perf trace: Allow specifying list of syscalls and events in -e/--expr/--event") Link: http://lkml.kernel.org/n/tip-j79jpqqe91gvxqmsgxgfn2ni@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:14 -03:00
Andi Kleen	c73881eeb1	perf jevents: Support FCMask and PortMask Skylake server uncore IIO events need new FCMask/PortMask fields. Support those in the json parser and pass it through as a filter. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170816220201.19182-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-17 16:39:14 -03:00
Kim Phillips	35435cd060	perf test shell: Replace '\|&' with '2>&1 \|' to work with more shells Since we do not specify bash (and/or zsh) as a requirement, use the standard error redirection that is more widely supported. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Link: http://lkml.kernel.org/n/tip-ji5mhn3iilgch3eaay6csr6z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-16 16:23:26 -03:00
Wang Nan	db26984a36	perf bpf: Fix endianness problem when loading parameters in prologue Perf's BPF prologue generator unconditionally fetches 8 bytes for function parameters, which causes problems on big endian machines. Thomas gives a detailed analysis for this problem: http://lkml.kernel.org/r/968ebda5-abe4-8830-8d69-49f62529d151@linux.vnet.ibm.com ---- 8< ---- I investigated perf test BPF for s390x and have a question regarding the 38.3 subtest (bpf-prologue test) which fails on s390x. When I turn on trace_printk in tests/bpf-script-test-prologue.c I see this output in /sys/kernel/debug/tracing/trace: [root@s8360047 perf]# cat /sys/kernel/debug/tracing/trace perf-30229 [000] d..2 170161.535791: : f_mode 2001d00000000 offset:0 orig:0 perf-30229 [000] d..2 170161.535809: : f_mode 6001f00000000 offset:0 orig:0 perf-30229 [000] d..2 170161.535815: : f_mode 6001f00000000 offset:1 orig:0 perf-30229 [000] d..2 170161.535819: : f_mode 2001d00000000 offset:1 orig:0 perf-30229 [000] d..2 170161.535822: : f_mode 2001d00000000 offset:2 orig:1 perf-30229 [000] d..2 170161.535825: : f_mode 6001f00000000 offset:2 orig:1 perf-30229 [000] d..2 170161.535828: : f_mode 6001f00000000 offset:3 orig:1 perf-30229 [000] d..2 170161.535832: : f_mode 2001d00000000 offset:3 orig:1 perf-30229 [000] d..2 170161.535835: : f_mode 2001d00000000 offset:4 orig:0 perf-30229 [000] d..2 170161.535841: : f_mode 6001f00000000 offset:4 orig:0 [...] There are 3 parameters the eBPF program tests/bpf-script-test-prologue.c accesses: f_mode (member of struct file at offset 140) offset and orig. They are parameters of the lseek() system call triggered in this test case in function llseek_loop(). What is really strange is the value of f_mode. It is an 8 byte value, whereas in the probe event it is defined as a 4 byte value. The lower 4 bytes are all zero and do not belong to member f_mode. The correct value should be 2001d for read-only and 6001f for read-write open mode. Here is the output of the 'perf test -vv bpf' trace: Try to find probe point from debuginfo. Matched function: null_lseek [2d9310d] Probe point found: null_lseek+0 Searching 'file' variable in context. Converting variable file into trace event. converting f_mode in file f_mode type is unsigned int. Opening /sys/kernel/debug/tracing//README write=0 Searching 'offset' variable in context. Converting variable offset into trace event. offset type is long long int. Searching 'orig' variable in context. Converting variable orig into trace event. orig type is int. Found 1 probe_trace_events. Opening /sys/kernel/debug/tracing//kprobe_events write=1 Writing event: p:perf_bpf_probe/func _text+8794224 f_mode=+140(%r2):x32 ---- 8< ---- This patch parses the type of each argument and converts data from memory to expected type. Now the test runs successfully on 4.13.0-rc5: [root@s8360046 perf]# ./perf test bpf 38: BPF filter : 38.1: Basic BPF filtering : Ok 38.2: BPF pinning : Ok 38.3: BPF prologue generation : Ok 38.4: BPF relocation checker : Ok [root@s8360046 perf]# Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20170815092159.31912-1-tmricht@linux.vnet.ibm.com Signed-off-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-16 10:31:11 -03:00
Adrian Hunter	1fe03b5f2d	perf script python: Add support for sqlite3 to call-graph-from-sql.py Add support for SQLite 3 to the call-graph-from-sql.py script. The SQL statements work as is, so just detect the database type by checking if the SQLite 3 file exists. Committer notes: Tested collecting the PT data on a RHEL7.4, generating the SQLite3 database there and then moving it to a Fedora 26 system where the call-graph-from-sql.py script was run, using python-pyside version 1.2.2-7fc26 to see the callgraphs using Qt4. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1501749090-20357-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 17:03:38 -03:00
Adrian Hunter	69e6e410f1	perf script python: Rename call-graph-from-postgresql.py to call-graph-from-sql.py Rename call-graph-from-postgresql.py to call-graph-from-sql.py in preparation for adding support to it for SQLite 3. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1501749090-20357-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 16:38:06 -03:00
Adrian Hunter	564b9527d1	perf script python: Add support for exporting to sqlite3 Add support for exporting to SQLite 3 the same data as the PostgreSQL export. Committer note: Tested on RHEL 7.4 using the 1.2.2-4el python-pyside packages from EPEL. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1501749090-20357-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 16:37:55 -03:00
Adrian Hunter	2295e9f850	perf scripts python: Fix query in call-graph-from-postgresql.py Add a missing space which seemed not to affect PostgreSQL but upsets SQLite. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1501749090-20357-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 16:06:20 -03:00
Adrian Hunter	c8a827285c	perf scripts python: Fix missing call_path_id in export-to-postgresql script The export does not work if only branches are exported because of a missing column in the samples table. Fix by adding the missing call_path_id. Fixes: `3521f3bc9d` ("perf script: Update export-to-postgresql to support callchain export") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/1501749090-20357-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 16:05:36 -03:00
Arnaldo Carvalho de Melo	2b728861a6	perf test shell vfs_getname: Skip for tools built with NO_LIBDWARF=1 If that is the case, or if the required lib is not present, e.g. elfutils-devel in Fedora systems, then just skip the tests requiring DWARF analysis. Before: # rpm -e elfutils-devel # perf test ping vfs_getname 60: Use vfs_getname probe to get syscall args filenames : FAILED! 61: probe libc's inet_pton & backtrace it with ping : Ok 62: Check open filename arg using perf trace + vfs_getname: FAILED! 63: Add vfs_getname probe to get syscall args filenames : FAILED! # After: # perf test vfs_getname 60: Use vfs_getname probe to get syscall args filenames : Skip 62: Check open filename arg using perf trace + vfs_getname: Skip 63: Add vfs_getname probe to get syscall args filenames : Skip # Then, reinstalling elfutils-devel, rebuilding the tool and running again: # perf test vfs_getname 60: Use vfs_getname probe to get syscall args filenames : Ok 62: Check open filename arg using perf trace + vfs_getname: Ok 63: Add vfs_getname probe to get syscall args filenames : Ok # Reported-by: Kim Phillips <kim.phillips@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-d67tvn401fxrwr97pu5ihfb1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2017-08-15 10:54:25 -03:00

1 2 3 4 5 ...

8166 Commits