License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2009-09-24 16:02:49 +00:00
|
|
|
#ifndef __PERF_SORT_H
|
|
|
|
#define __PERF_SORT_H
|
2017-04-18 15:33:30 +00:00
|
|
|
#include <regex.h>
|
2019-08-22 20:11:39 +00:00
|
|
|
#include <stdbool.h>
|
2009-09-24 16:02:49 +00:00
|
|
|
#include <linux/list.h>
|
|
|
|
#include <linux/rbtree.h>
|
2019-01-27 11:02:41 +00:00
|
|
|
#include "map_symbol.h"
|
|
|
|
#include "symbol_conf.h"
|
2009-09-24 16:02:49 +00:00
|
|
|
#include "callchain.h"
|
|
|
|
#include "values.h"
|
2013-10-31 01:17:39 +00:00
|
|
|
#include "hist.h"
|
2019-09-25 01:14:46 +00:00
|
|
|
#include "stat.h"
|
|
|
|
#include "spark.h"
|
2017-04-20 00:34:35 +00:00
|
|
|
|
2019-08-22 20:11:39 +00:00
|
|
|
struct option;
|
2017-04-20 00:34:35 +00:00
|
|
|
struct thread;
|
2009-09-24 16:02:49 +00:00
|
|
|
|
|
|
|
extern regex_t parent_regex;
|
2010-05-17 19:22:41 +00:00
|
|
|
extern const char *sort_order;
|
2014-03-04 01:46:34 +00:00
|
|
|
extern const char *field_order;
|
2010-05-17 19:22:41 +00:00
|
|
|
extern const char default_parent_pattern[];
|
|
|
|
extern const char *parent_pattern;
|
2016-08-12 23:41:01 +00:00
|
|
|
extern const char *default_sort_order;
|
2012-12-07 05:48:05 +00:00
|
|
|
extern regex_t ignore_callees_regex;
|
|
|
|
extern int have_ignore_callees;
|
2013-04-01 11:35:20 +00:00
|
|
|
extern enum sort_mode sort__mode;
|
2009-09-24 16:02:49 +00:00
|
|
|
extern struct sort_entry sort_comm;
|
|
|
|
extern struct sort_entry sort_dso;
|
|
|
|
extern struct sort_entry sort_sym;
|
|
|
|
extern struct sort_entry sort_parent;
|
2012-03-08 22:47:48 +00:00
|
|
|
extern struct sort_entry sort_dso_from;
|
|
|
|
extern struct sort_entry sort_dso_to;
|
|
|
|
extern struct sort_entry sort_sym_from;
|
|
|
|
extern struct sort_entry sort_sym_to;
|
2016-09-19 13:10:10 +00:00
|
|
|
extern struct sort_entry sort_srcline;
|
perf tools: Remove (null) value of "Sort order" for perf mem report
When '--sort' is not set, 'perf mem report" will print a null pointer as
the output value of sort order, so fix it.
Example:
Before this patch:
$ perf mem report
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 18 of event 'cpu/mem-loads/pp'
# Total weight : 188
# Sort order : (null)
#
...
After this patch:
$ perf mem report
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 18 of event 'cpu/mem-loads/pp'
# Total weight : 188
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427082605-12881-1-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-03-23 03:50:05 +00:00
|
|
|
extern const char default_mem_sort_order[];
|
perf c2c: Add report option to show false sharing in adjacent cachelines
Many platforms have feature of adjacent cachelines prefetch, when it is
enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
one is fetched to cache, the other one could likely be fetched too,
which sort of extends the cacheline size to double, thus the false
sharing could happens in adjacent cachelines.
0Day has captured performance changed related with this [1], and some
commercial software explicitly makes its hot global variables 128 bytes
aligned (2 cache lines) to avoid this kind of extended false sharing.
So add an option "--double-cl" for 'perf c2c report' to show false
sharing in double cache line granularity, which acts just like the
cacheline size is doubled. There is no change to c2c record. The
hardware events of shared cacheline are still per cacheline, and this
option just changes the granularity of how events are grouped and
displayed.
In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
on old kernel):
----------------------------------------------------------------------
26 31 2 0 0 0 0xffff888103ec6000
----------------------------------------------------------------------
35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm
6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge
25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm
19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge
9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge
3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch
The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
listed together to give users a hint of extended false sharing.
[1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
Committer notes:
Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
Removed -a, leaving just as --double-cl, as this probably is not used so
frequently and perhaps will be even auto-detected if we manage to record
the MSR where this is configured.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-14 07:58:23 +00:00
|
|
|
extern bool chk_double_cl;
|
2009-09-24 16:02:49 +00:00
|
|
|
|
2019-03-11 14:44:58 +00:00
|
|
|
struct res_sample {
|
|
|
|
u64 time;
|
|
|
|
int cpu;
|
|
|
|
int tid;
|
|
|
|
};
|
|
|
|
|
2012-10-04 12:49:41 +00:00
|
|
|
struct he_stat {
|
|
|
|
u64 period;
|
|
|
|
u64 period_sys;
|
|
|
|
u64 period_us;
|
|
|
|
u64 period_guest_sys;
|
|
|
|
u64 period_guest_us;
|
|
|
|
u32 nr_events;
|
|
|
|
};
|
|
|
|
|
perf tools: Add 'cgroup_id' sort order keyword
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.
With the assumption that each container is created with it's own cgroup
namespace, this allows assessment/analysis of multiple containers at
once.
A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads on each of those contexts, while running perf
record command with --namespaces option.
Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:
$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'kmem:kmalloc'
# Event count (approx.): 5965
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
81.27% 3/0xeffffffb 4848
16.24% 3/0xf00000d0 969
1.16% 3/0xf00000ce 69
0.82% 3/0xf00000cf 49
0.50% 0/0x0 30
While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.
Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-07 20:42:13 +00:00
|
|
|
struct namespace_id {
|
|
|
|
u64 dev;
|
|
|
|
u64 ino;
|
|
|
|
};
|
|
|
|
|
2012-10-05 14:44:42 +00:00
|
|
|
struct hist_entry_diff {
|
|
|
|
bool computed;
|
2015-04-19 04:04:10 +00:00
|
|
|
union {
|
|
|
|
/* PERF_HPP__DELTA */
|
|
|
|
double period_ratio_delta;
|
2012-10-05 14:44:42 +00:00
|
|
|
|
2015-04-19 04:04:10 +00:00
|
|
|
/* PERF_HPP__RATIO */
|
|
|
|
double period_ratio;
|
2012-10-05 14:44:43 +00:00
|
|
|
|
2015-04-19 04:04:10 +00:00
|
|
|
/* HISTC_WEIGHTED_DIFF */
|
|
|
|
s64 wdiff;
|
2019-06-28 09:23:01 +00:00
|
|
|
|
|
|
|
/* PERF_HPP_DIFF__CYCLES */
|
|
|
|
s64 cycles;
|
2015-04-19 04:04:10 +00:00
|
|
|
};
|
2019-09-25 01:14:46 +00:00
|
|
|
struct stats stats;
|
|
|
|
unsigned long svals[NUM_SPARKS];
|
2012-10-05 14:44:42 +00:00
|
|
|
};
|
|
|
|
|
2016-07-05 06:56:04 +00:00
|
|
|
struct hist_entry_ops {
|
|
|
|
void *(*new)(size_t size);
|
|
|
|
void (*free)(void *ptr);
|
|
|
|
};
|
|
|
|
|
2010-07-26 20:13:40 +00:00
|
|
|
/**
|
|
|
|
* struct hist_entry - histogram entry
|
|
|
|
*
|
|
|
|
* @row_offset - offset from the first callchain expanded to appear on screen
|
|
|
|
* @nr_rows - rows expanded in callchain, recalculated on folding/unfolding
|
|
|
|
*/
|
2009-09-24 16:02:49 +00:00
|
|
|
struct hist_entry {
|
2011-10-05 20:50:23 +00:00
|
|
|
struct rb_node rb_node_in;
|
2009-09-24 16:02:49 +00:00
|
|
|
struct rb_node rb_node;
|
2012-10-25 16:42:45 +00:00
|
|
|
union {
|
|
|
|
struct list_head node;
|
|
|
|
struct list_head head;
|
|
|
|
} pairs;
|
2012-10-04 12:49:41 +00:00
|
|
|
struct he_stat stat;
|
2012-09-11 04:15:07 +00:00
|
|
|
struct he_stat *stat_acc;
|
2010-03-24 19:40:17 +00:00
|
|
|
struct map_symbol ms;
|
2010-04-04 01:44:37 +00:00
|
|
|
struct thread *thread;
|
2013-09-13 07:28:57 +00:00
|
|
|
struct comm *comm;
|
perf tools: Add 'cgroup_id' sort order keyword
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.
With the assumption that each container is created with it's own cgroup
namespace, this allows assessment/analysis of multiple containers at
once.
A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads on each of those contexts, while running perf
record command with --namespaces option.
Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:
$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'kmem:kmalloc'
# Event count (approx.): 5965
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
81.27% 3/0xeffffffb 4848
16.24% 3/0xf00000d0 969
1.16% 3/0xf00000ce 69
0.82% 3/0xf00000cf 49
0.50% 0/0x0 30
While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.
Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-07 20:42:13 +00:00
|
|
|
struct namespace_id cgroup_id;
|
2020-03-25 12:45:32 +00:00
|
|
|
u64 cgroup;
|
2009-09-24 16:02:49 +00:00
|
|
|
u64 ip;
|
2013-09-20 14:40:43 +00:00
|
|
|
u64 transaction;
|
2015-09-04 14:45:42 +00:00
|
|
|
s32 socket;
|
2010-06-04 14:27:10 +00:00
|
|
|
s32 cpu;
|
2021-01-05 19:57:51 +00:00
|
|
|
u64 code_page_size;
|
perf sort: Fix the 'weight' sort key behavior
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses. And perf tool has 'weight'
and 'local_weight' sort keys to display the info.
But it's somewhat confusing what it shows exactly. In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.
For example:
$ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M
$ perf report --stdio -n -s +local_weight
...
#
# Overhead Samples Command Shared Object Symbol Local Weight
# ........ ....... ....... ................ ......................... ............
#
21.23% 313 dd [kernel.vmlinux] [k] lockref_get_not_zero 32
12.43% 183 dd [kernel.vmlinux] [k] lockref_get_not_zero 35
11.97% 159 dd [kernel.vmlinux] [k] lockref_get_not_zero 36
10.40% 141 dd [kernel.vmlinux] [k] lockref_put_return 32
7.63% 113 dd [kernel.vmlinux] [k] lockref_get_not_zero 33
6.37% 92 dd [kernel.vmlinux] [k] lockref_get_not_zero 34
6.15% 90 dd [kernel.vmlinux] [k] lockref_put_return 33
...
So let's look at the 'lockref_get_not_zero' symbols. The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016. But it's not the case:
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ ......
#
1.36% 4 dd [kernel.vmlinux] 36 144
0.47% 4 dd [kernel.vmlinux] 37 148
0.42% 4 dd [kernel.vmlinux] 32 128
0.40% 4 dd [kernel.vmlinux] 34 136
0.35% 4 dd [kernel.vmlinux] 36 144
0.34% 4 dd [kernel.vmlinux] 35 140
0.30% 4 dd [kernel.vmlinux] 36 144
0.30% 4 dd [kernel.vmlinux] 34 136
0.30% 4 dd [kernel.vmlinux] 32 128
0.30% 4 dd [kernel.vmlinux] 32 128
...
With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight'). I don't think this is
what we want.
I found this because of the way it aggregates the 'weight' value. Since
it's not a period, we should not add them in the he->stat. Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.
After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.
Let's keep the weight and display it differently. For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.
With this change, I can see the expected numbers.
$ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
...
#
# Overhead Samples Command Shared Object Local Weight Weight
# ........ ....... ....... ................ ............ .....
#
21.23% 313 dd [kernel.vmlinux] 32 10016
12.43% 183 dd [kernel.vmlinux] 35 6405
11.97% 159 dd [kernel.vmlinux] 36 5724
7.63% 113 dd [kernel.vmlinux] 33 3729
6.37% 92 dd [kernel.vmlinux] 34 3128
4.17% 59 dd [kernel.vmlinux] 37 2183
0.08% 1 dd [kernel.vmlinux] 269 269
0.08% 1 dd [kernel.vmlinux] 38 38
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-05 22:56:15 +00:00
|
|
|
u64 weight;
|
2021-11-05 22:56:16 +00:00
|
|
|
u64 ins_lat;
|
2021-11-05 22:56:17 +00:00
|
|
|
u64 p_stage_cyc;
|
2014-05-27 16:28:05 +00:00
|
|
|
u8 cpumode;
|
2016-02-24 15:13:34 +00:00
|
|
|
u8 depth;
|
perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.
Rows are labeled with the SIMD ISA, and the type of predicate (if any):
- [p] partial predicate
- [e] empty predicate (no elements in the vector being used)
Example with Arm SPE and SVE (Scalable Vector Extension):
#include <arm_sve.h>
double src[1025], dst[1025];
int main(void) {
svfloat64_t vc = svdup_f64(1);
for(;;)
for(int i = 0; i < 1025; i += svcntd())
{
svbool_t pg = svwhilelt_b64(i, 1025);
svfloat64_t vsrc = svld1(pg, &src[i]);
svfloat64_t vdst = svadd_x(pg, vsrc, vc);
svst1(pg, &dst[i], vdst);
}
return 0;
}
... compiled using "gcc-11 -march=armv8-a+sve -O3"
Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:
$ perf record -e arm_spe_0// -- ./a.out
$ perf report --itrace=i1i -s overhead,pid,simd,sym
Overhead Pid:Command Simd Symbol
........ ................ ....... ......................
53.76% 10758:program [.] main
46.14% 10758:program [.] SVE [.] main
0.09% 10758:program [p] SVE [.] main
The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 15:15:08 +00:00
|
|
|
struct simd_flags simd_flags;
|
2010-07-26 20:13:40 +00:00
|
|
|
|
2012-12-01 20:18:20 +00:00
|
|
|
/* We are added by hists__add_dummy_entry. */
|
|
|
|
bool dummy;
|
2016-02-24 15:13:34 +00:00
|
|
|
bool leaf;
|
2012-12-01 20:18:20 +00:00
|
|
|
|
2009-09-24 16:02:49 +00:00
|
|
|
char level;
|
2010-04-04 01:44:37 +00:00
|
|
|
u8 filtered;
|
2018-06-07 17:19:54 +00:00
|
|
|
|
|
|
|
u16 callchain_size;
|
2015-04-22 07:18:12 +00:00
|
|
|
union {
|
|
|
|
/*
|
|
|
|
* Since perf diff only supports the stdio output, TUI
|
|
|
|
* fields are only accessed from perf report (or perf
|
2017-02-27 22:28:49 +00:00
|
|
|
* top). So make it a union to reduce memory usage.
|
2015-04-22 07:18:12 +00:00
|
|
|
*/
|
|
|
|
struct hist_entry_diff diff;
|
|
|
|
struct /* for TUI */ {
|
|
|
|
u16 row_offset;
|
|
|
|
u16 nr_rows;
|
2015-04-22 07:18:13 +00:00
|
|
|
bool init_have_children;
|
2015-05-05 14:55:46 +00:00
|
|
|
bool unfolded;
|
|
|
|
bool has_children;
|
2016-02-26 12:13:19 +00:00
|
|
|
bool has_no_entry;
|
2015-04-22 07:18:12 +00:00
|
|
|
};
|
|
|
|
};
|
2012-05-30 13:33:24 +00:00
|
|
|
char *srcline;
|
2015-08-07 22:54:24 +00:00
|
|
|
char *srcfile;
|
2010-04-03 19:30:44 +00:00
|
|
|
struct symbol *parent;
|
2012-02-09 22:21:01 +00:00
|
|
|
struct branch_info *branch_info;
|
2019-03-11 14:44:54 +00:00
|
|
|
long time;
|
2012-10-04 12:49:35 +00:00
|
|
|
struct hists *hists;
|
2013-01-24 15:10:35 +00:00
|
|
|
struct mem_info *mem_info;
|
2019-06-28 09:22:59 +00:00
|
|
|
struct block_info *block_info;
|
2023-03-15 14:51:05 +00:00
|
|
|
struct kvm_info *kvm_info;
|
2015-12-24 02:16:17 +00:00
|
|
|
void *raw_data;
|
|
|
|
u32 raw_size;
|
2019-03-11 14:44:58 +00:00
|
|
|
int num_res;
|
|
|
|
struct res_sample *res_samples;
|
2015-12-22 17:07:03 +00:00
|
|
|
void *trace_output;
|
2016-03-07 19:44:46 +00:00
|
|
|
struct perf_hpp_list *hpp_list;
|
2016-02-24 15:13:34 +00:00
|
|
|
struct hist_entry *parent_he;
|
2016-07-05 06:56:04 +00:00
|
|
|
struct hist_entry_ops *ops;
|
2016-02-24 15:13:34 +00:00
|
|
|
union {
|
|
|
|
/* this is for hierarchical entry structure */
|
|
|
|
struct {
|
2018-12-06 19:18:18 +00:00
|
|
|
struct rb_root_cached hroot_in;
|
|
|
|
struct rb_root_cached hroot_out;
|
2016-02-24 15:13:34 +00:00
|
|
|
}; /* non-leaf entries */
|
|
|
|
struct rb_root sorted_chain; /* leaf entry has callchains */
|
|
|
|
};
|
2013-01-24 15:10:35 +00:00
|
|
|
struct callchain_root callchain[0]; /* must be last member */
|
2009-09-24 16:02:49 +00:00
|
|
|
};
|
|
|
|
|
2018-05-29 16:28:24 +00:00
|
|
|
static __pure inline bool hist_entry__has_callchains(struct hist_entry *he)
|
|
|
|
{
|
2018-06-07 17:27:19 +00:00
|
|
|
return he->callchain_size != 0;
|
2018-05-29 16:28:24 +00:00
|
|
|
}
|
|
|
|
|
2019-12-12 14:48:23 +00:00
|
|
|
int hist_entry__sym_snprintf(struct hist_entry *he, char *bf, size_t size, unsigned int width);
|
|
|
|
|
2012-10-25 16:42:45 +00:00
|
|
|
static inline bool hist_entry__has_pairs(struct hist_entry *he)
|
|
|
|
{
|
|
|
|
return !list_empty(&he->pairs.node);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct hist_entry *hist_entry__next_pair(struct hist_entry *he)
|
|
|
|
{
|
|
|
|
if (hist_entry__has_pairs(he))
|
|
|
|
return list_entry(he->pairs.node.next, struct hist_entry, pairs.node);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2012-12-13 13:09:00 +00:00
|
|
|
static inline void hist_entry__add_pair(struct hist_entry *pair,
|
|
|
|
struct hist_entry *he)
|
2012-10-25 16:42:45 +00:00
|
|
|
{
|
2012-12-13 13:09:00 +00:00
|
|
|
list_add_tail(&pair->pairs.node, &he->pairs.head);
|
2012-10-25 16:42:45 +00:00
|
|
|
}
|
|
|
|
|
2013-10-31 01:17:39 +00:00
|
|
|
static inline float hist_entry__get_percent_limit(struct hist_entry *he)
|
|
|
|
{
|
|
|
|
u64 period = he->stat.period;
|
|
|
|
u64 total_period = hists__total_period(he->hists);
|
|
|
|
|
|
|
|
if (unlikely(total_period == 0))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (symbol_conf.cumulate_callchain)
|
|
|
|
period = he->stat_acc->period;
|
|
|
|
|
|
|
|
return period * 100.0 / total_period;
|
|
|
|
}
|
|
|
|
|
2013-04-01 11:35:20 +00:00
|
|
|
enum sort_mode {
|
|
|
|
SORT_MODE__NORMAL,
|
|
|
|
SORT_MODE__BRANCH,
|
|
|
|
SORT_MODE__MEMORY,
|
2014-03-18 02:31:39 +00:00
|
|
|
SORT_MODE__TOP,
|
|
|
|
SORT_MODE__DIFF,
|
2015-12-22 17:07:10 +00:00
|
|
|
SORT_MODE__TRACEPOINT,
|
2013-04-01 11:35:20 +00:00
|
|
|
};
|
|
|
|
|
perf tools: Bind callchains to the first sort dimension column
Currently, the callchains are displayed using a constant left
margin. So depending on the current sort dimension
configuration, callchains may appear to be well attached to the
first sort dimension column field which is mostly the case,
except when the first dimension of sorting is done by comm,
because these are right aligned.
This patch binds the callchain to the first letter in the first
column, whatever type of column it is (dso, comm, symbol).
Before:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
After:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
Also, for clarity, we don't put anymore the callchain as is but:
- If we have a top level ancestor in the callchain, start it
with a first ascii hook.
Before:
0.80% perf [kernel] [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
After:
0.80% perf [kernel] [k] __lock_acquire
|
--- __lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
- Otherwise, if we have several top level ancestors, then
display these like we did before:
1.69% Xorg
|
|--21.21%-- vread_hpet
| 0x7fffd85b46fc
| 0x7fffd85b494d
| 0x7f4fafb4e54d
|
|--15.15%-- exaOffscreenAlloc
|
|--9.09%-- I830WaitLpRing
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-22 21:23:23 +00:00
|
|
|
enum sort_type {
|
2012-12-27 09:11:46 +00:00
|
|
|
/* common sort keys */
|
perf tools: Bind callchains to the first sort dimension column
Currently, the callchains are displayed using a constant left
margin. So depending on the current sort dimension
configuration, callchains may appear to be well attached to the
first sort dimension column field which is mostly the case,
except when the first dimension of sorting is done by comm,
because these are right aligned.
This patch binds the callchain to the first letter in the first
column, whatever type of column it is (dso, comm, symbol).
Before:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
After:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
Also, for clarity, we don't put anymore the callchain as is but:
- If we have a top level ancestor in the callchain, start it
with a first ascii hook.
Before:
0.80% perf [kernel] [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
After:
0.80% perf [kernel] [k] __lock_acquire
|
--- __lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
- Otherwise, if we have several top level ancestors, then
display these like we did before:
1.69% Xorg
|
|--21.21%-- vread_hpet
| 0x7fffd85b46fc
| 0x7fffd85b494d
| 0x7f4fafb4e54d
|
|--15.15%-- exaOffscreenAlloc
|
|--9.09%-- I830WaitLpRing
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-22 21:23:23 +00:00
|
|
|
SORT_PID,
|
|
|
|
SORT_COMM,
|
|
|
|
SORT_DSO,
|
|
|
|
SORT_SYM,
|
2010-06-04 14:27:10 +00:00
|
|
|
SORT_PARENT,
|
|
|
|
SORT_CPU,
|
2015-09-04 14:45:43 +00:00
|
|
|
SORT_SOCKET,
|
2012-12-27 09:11:46 +00:00
|
|
|
SORT_SRCLINE,
|
2015-08-07 22:54:24 +00:00
|
|
|
SORT_SRCFILE,
|
2013-07-18 22:58:53 +00:00
|
|
|
SORT_LOCAL_WEIGHT,
|
|
|
|
SORT_GLOBAL_WEIGHT,
|
2013-09-20 14:40:43 +00:00
|
|
|
SORT_TRANSACTION,
|
2015-12-22 17:07:04 +00:00
|
|
|
SORT_TRACE,
|
2017-02-24 13:32:56 +00:00
|
|
|
SORT_SYM_SIZE,
|
2018-03-27 11:09:56 +00:00
|
|
|
SORT_DSO_SIZE,
|
2020-03-25 12:45:32 +00:00
|
|
|
SORT_CGROUP,
|
perf tools: Add 'cgroup_id' sort order keyword
This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.
With the assumption that each container is created with it's own cgroup
namespace, this allows assessment/analysis of multiple containers at
once.
A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads on each of those contexts, while running perf
record command with --namespaces option.
Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:
$ perf report -s cgroup_id,sample --stdio
#
# Total Lost Samples: 0
#
# Samples: 5K of event 'kmem:kmalloc'
# Event count (approx.): 5965
#
# Overhead cgroup id (dev/inode) Samples
# ........ ..................... ............
#
81.27% 3/0xeffffffb 4848
16.24% 3/0xf00000d0 969
1.16% 3/0xf00000ce 69
0.82% 3/0xf00000cf 49
0.50% 0/0x0 30
While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.
Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-07 20:42:13 +00:00
|
|
|
SORT_CGROUP_ID,
|
2018-11-30 13:54:56 +00:00
|
|
|
SORT_SYM_IPC_NULL,
|
2019-03-11 14:44:54 +00:00
|
|
|
SORT_TIME,
|
2021-01-05 19:57:51 +00:00
|
|
|
SORT_CODE_PAGE_SIZE,
|
2021-02-02 20:09:10 +00:00
|
|
|
SORT_LOCAL_INS_LAT,
|
|
|
|
SORT_GLOBAL_INS_LAT,
|
2021-12-03 02:20:37 +00:00
|
|
|
SORT_LOCAL_PIPELINE_STAGE_CYC,
|
|
|
|
SORT_GLOBAL_PIPELINE_STAGE_CYC,
|
2022-09-23 17:31:41 +00:00
|
|
|
SORT_ADDR,
|
2023-01-04 20:13:48 +00:00
|
|
|
SORT_LOCAL_RETIRE_LAT,
|
|
|
|
SORT_GLOBAL_RETIRE_LAT,
|
perf report: Add 'simd' sort field
Add 'simd' sort field to visualize SIMD ops in 'perf report'.
Rows are labeled with the SIMD ISA, and the type of predicate (if any):
- [p] partial predicate
- [e] empty predicate (no elements in the vector being used)
Example with Arm SPE and SVE (Scalable Vector Extension):
#include <arm_sve.h>
double src[1025], dst[1025];
int main(void) {
svfloat64_t vc = svdup_f64(1);
for(;;)
for(int i = 0; i < 1025; i += svcntd())
{
svbool_t pg = svwhilelt_b64(i, 1025);
svfloat64_t vsrc = svld1(pg, &src[i]);
svfloat64_t vdst = svadd_x(pg, vsrc, vc);
svst1(pg, &dst[i], vdst);
}
return 0;
}
... compiled using "gcc-11 -march=armv8-a+sve -O3"
Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1:
$ perf record -e arm_spe_0// -- ./a.out
$ perf report --itrace=i1i -s overhead,pid,simd,sym
Overhead Pid:Command Simd Symbol
........ ................ ....... ......................
53.76% 10758:program [.] main
46.14% 10758:program [.] SVE [.] main
0.09% 10758:program [p] SVE [.] main
The report shows 0.09% of the sampled SVE operations use partial
predicates due to src and dst arrays not being multiples of the vector
register lengths.
Signed-off-by: German Gomez <german.gomez@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman.Khandual@arm.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20 15:15:08 +00:00
|
|
|
SORT_SIMD,
|
2012-12-27 09:11:46 +00:00
|
|
|
|
|
|
|
/* branch stack specific sort keys */
|
|
|
|
__SORT_BRANCH_STACK,
|
|
|
|
SORT_DSO_FROM = __SORT_BRANCH_STACK,
|
2012-02-09 22:21:01 +00:00
|
|
|
SORT_DSO_TO,
|
|
|
|
SORT_SYM_FROM,
|
|
|
|
SORT_SYM_TO,
|
|
|
|
SORT_MISPREDICT,
|
2013-09-20 14:40:41 +00:00
|
|
|
SORT_ABORT,
|
|
|
|
SORT_IN_TX,
|
2015-07-18 15:24:46 +00:00
|
|
|
SORT_CYCLES,
|
2016-05-20 20:15:08 +00:00
|
|
|
SORT_SRCLINE_FROM,
|
|
|
|
SORT_SRCLINE_TO,
|
2018-11-30 13:54:56 +00:00
|
|
|
SORT_SYM_IPC,
|
perf report: Add "addr_from" and "addr_to" sort dimensions
With the existing symbol_from/symbol_to, branches captured in the same
function would be collapsed into a single function if the latencies
associated with the each branch (cycles) were all the same. That is the
case on Intel Broadwell, for instance. Since Intel Skylake, the latency
is captured by hardware and therefore is used to disambiguate branches.
Add addr_from/addr_to sort dimensions to sort branches based on their
addresses and not the function there are in. The output is still the
function name but the offset within the function is provided to uniquely
identify each branch. These new sort dimensions also help with annotate
because they create different entries in the histogram which, in turn,
generates proper branch annotations.
Here is an example using AMD's branch sampling:
$ perf record -a -b -c 1000037 -e cpu/branch-brs/ test_prg
$ perf report
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycle
99.65% test_prg test_prg [.] test_thread [.] test_thread -
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_timer_interrupt [k] error_entry -
$ perf report -F overhead,comm,dso,addr_from,addr_to
Samples: 6M of event 'cpu/branch-brs/', Event count (approx.): 6901276
Overhead Command Shared Object Source Address Target Address
4.22% test_prg test_prg [.] test_thread+0x3c [.] test_thread+0x4
4.13% test_prg test_prg [.] test_thread+0x4 [.] test_thread+0x3a
4.09% test_prg test_prg [.] test_thread+0x3a [.] test_thread+0x6
4.08% test_prg test_prg [.] test_thread+0x2 [.] test_thread+0x3c
4.06% test_prg test_prg [.] test_thread+0x3e [.] test_thread+0x2
3.87% test_prg test_prg [.] test_thread+0x6 [.] test_thread+0x38
3.84% test_prg test_prg [.] test_thread [.] test_thread+0x3e
3.76% test_prg test_prg [.] test_thread+0x1e [.] test_thread
3.76% test_prg test_prg [.] test_thread+0x38 [.] test_thread+0x8
3.56% test_prg test_prg [.] test_thread+0x22 [.] test_thread+0x1e
3.54% test_prg test_prg [.] test_thread+0x8 [.] test_thread+0x36
3.47% test_prg test_prg [.] test_thread+0x1c [.] test_thread+0x22
3.45% test_prg test_prg [.] test_thread+0x36 [.] test_thread+0xa
3.28% test_prg test_prg [.] test_thread+0x24 [.] test_thread+0x1c
3.25% test_prg test_prg [.] test_thread+0xa [.] test_thread+0x34
3.24% test_prg test_prg [.] test_thread+0x1a [.] test_thread+0x24
3.20% test_prg test_prg [.] test_thread+0x34 [.] test_thread+0xc
3.04% test_prg test_prg [.] test_thread+0x26 [.] test_thread+0x1a
3.01% test_prg test_prg [.] test_thread+0xc [.] test_thread+0x32
2.98% test_prg test_prg [.] test_thread+0x18 [.] test_thread+0x26
2.94% test_prg test_prg [.] test_thread+0x32 [.] test_thread+0xe
2.76% test_prg test_prg [.] test_thread+0x28 [.] test_thread+0x18
2.73% test_prg test_prg [.] test_thread+0xe [.] test_thread+0x30
2.67% test_prg test_prg [.] test_thread+0x30 [.] test_thread+0x10
2.67% test_prg test_prg [.] test_thread+0x16 [.] test_thread+0x28
2.46% test_prg test_prg [.] test_thread+0x10 [.] test_thread+0x2e
2.44% test_prg test_prg [.] test_thread+0x2a [.] test_thread+0x16
2.38% test_prg test_prg [.] test_thread+0x14 [.] test_thread+0x2a
2.32% test_prg test_prg [.] test_thread+0x2e [.] test_thread+0x12
2.28% test_prg test_prg [.] test_thread+0x12 [.] test_thread+0x2c
2.16% test_prg test_prg [.] test_thread+0x2c [.] test_thread+0x14
0.02% test_prg [kernel.vmlinux] [k] asm_sysvec_apic_ti+0x5 [k] error_entry
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20220208211637.2221872-13-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-08 21:16:37 +00:00
|
|
|
SORT_ADDR_FROM,
|
|
|
|
SORT_ADDR_TO,
|
2013-04-03 12:26:11 +00:00
|
|
|
|
|
|
|
/* memory mode specific sort keys */
|
|
|
|
__SORT_MEMORY_MODE,
|
2013-07-18 22:58:53 +00:00
|
|
|
SORT_MEM_DADDR_SYMBOL = __SORT_MEMORY_MODE,
|
2013-04-03 12:26:11 +00:00
|
|
|
SORT_MEM_DADDR_DSO,
|
|
|
|
SORT_MEM_LOCKED,
|
|
|
|
SORT_MEM_TLB,
|
|
|
|
SORT_MEM_LVL,
|
|
|
|
SORT_MEM_SNOOP,
|
2014-06-01 13:38:29 +00:00
|
|
|
SORT_MEM_DCACHELINE,
|
2015-10-05 18:06:07 +00:00
|
|
|
SORT_MEM_IADDR_SYMBOL,
|
2017-08-29 17:11:09 +00:00
|
|
|
SORT_MEM_PHYS_DADDR,
|
2020-12-16 18:57:58 +00:00
|
|
|
SORT_MEM_DATA_PAGE_SIZE,
|
perf tools: Support data block and addr block
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.
Add a new sort function, SORT_MEM_BLOCKED, for the two fields.
For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.
Add blocked as a default mem sort key for perf report and perf mem
report.
Committer testing:
So in machines without this capability we get a "N/A" filling the new "Blocked"
column:
$ perf mem record ls
arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block
COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools
virt
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
$
$ perf mem report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 1381
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked
# ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... .......
#
32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A
25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A
22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A
8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A
6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A
3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A
# Samples: 11 of event 'cpu/mem-stores/Pu'
# Total weight : 11
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked
# ........ ....... ............ ............. ....................... ............. ...................... ........... ..... .......... ...... .......
#
9.09% 1 0 L1 hit [.] __strcoll_l libc-2.31.so [.] 0x00007fffe5648fc8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_lookup_symbol_x ld-2.31.so [.] 0x00007fffe56490b8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_name_match_p ld-2.31.so [.] 0x00007fffe56487d8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_start ld-2.31.so [.] start_time+0x0 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] _dl_sysdep_start ld-2.31.so [.] 0x00007fffe56494b8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5648ff8 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649064 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 hit [.] do_lookup_x ld-2.31.so [.] 0x00007fffe5649130 [stack] N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xaf8 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] _rtld_global+0xc28 ld-2.31.so N/A N/A N/A N/A
9.09% 1 0 L1 miss [.] _dl_start ld-2.31.so [.] 0x00007fffe56495b8 [stack] N/A N/A N/A N/A
# (Tip: Show user configuration overrides: perf config --user --list)
$
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-02 20:09:07 +00:00
|
|
|
SORT_MEM_BLOCKED,
|
perf tools: Bind callchains to the first sort dimension column
Currently, the callchains are displayed using a constant left
margin. So depending on the current sort dimension
configuration, callchains may appear to be well attached to the
first sort dimension column field which is mostly the case,
except when the first dimension of sorting is done by comm,
because these are right aligned.
This patch binds the callchain to the first letter in the first
column, whatever type of column it is (dso, comm, symbol).
Before:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
After:
0.80% perf [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
| | __fsnotify_parent
Also, for clarity, we don't put anymore the callchain as is but:
- If we have a top level ancestor in the callchain, start it
with a first ascii hook.
Before:
0.80% perf [kernel] [k] __lock_acquire
__lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
After:
0.80% perf [kernel] [k] __lock_acquire
|
--- __lock_acquire
lock_acquire
|
|--58.33%-- _spin_lock
| |
| |--28.57%-- inotify_should_send_event
| | fsnotify
[..] [..]
- Otherwise, if we have several top level ancestors, then
display these like we did before:
1.69% Xorg
|
|--21.21%-- vread_hpet
| 0x7fffd85b46fc
| 0x7fffd85b494d
| 0x7f4fafb4e54d
|
|--15.15%-- exaOffscreenAlloc
|
|--9.09%-- I830WaitLpRing
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-22 21:23:23 +00:00
|
|
|
};
|
|
|
|
|
2009-09-24 16:02:49 +00:00
|
|
|
/*
|
|
|
|
* configurable sorting bits
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct sort_entry {
|
2010-04-14 17:11:29 +00:00
|
|
|
const char *se_header;
|
2009-09-24 16:02:49 +00:00
|
|
|
|
2010-04-14 17:11:29 +00:00
|
|
|
int64_t (*se_cmp)(struct hist_entry *, struct hist_entry *);
|
|
|
|
int64_t (*se_collapse)(struct hist_entry *, struct hist_entry *);
|
2014-03-04 02:01:41 +00:00
|
|
|
int64_t (*se_sort)(struct hist_entry *, struct hist_entry *);
|
2013-11-05 18:32:36 +00:00
|
|
|
int (*se_snprintf)(struct hist_entry *he, char *bf, size_t size,
|
2010-04-14 17:11:29 +00:00
|
|
|
unsigned int width);
|
2016-02-24 15:13:37 +00:00
|
|
|
int (*se_filter)(struct hist_entry *he, int type, const void *arg);
|
2022-12-15 19:28:14 +00:00
|
|
|
void (*se_init)(struct hist_entry *he);
|
2010-07-20 17:42:52 +00:00
|
|
|
u8 se_width_idx;
|
2009-09-24 16:02:49 +00:00
|
|
|
};
|
|
|
|
|
2019-06-28 09:23:01 +00:00
|
|
|
struct block_hist {
|
|
|
|
struct hists block_hists;
|
|
|
|
struct perf_hpp_list block_list;
|
|
|
|
struct perf_hpp_fmt block_fmt;
|
|
|
|
int block_idx;
|
|
|
|
bool valid;
|
|
|
|
struct hist_entry he;
|
|
|
|
};
|
|
|
|
|
2009-09-24 16:02:49 +00:00
|
|
|
extern struct sort_entry sort_thread;
|
|
|
|
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist;
|
2018-08-08 18:02:46 +00:00
|
|
|
struct tep_handle;
|
2019-07-21 11:23:52 +00:00
|
|
|
int setup_sorting(struct evlist *evlist);
|
2014-03-04 01:46:34 +00:00
|
|
|
int setup_output_field(void);
|
2014-05-07 09:42:24 +00:00
|
|
|
void reset_output_field(void);
|
2013-04-03 12:26:19 +00:00
|
|
|
void sort__setup_elide(FILE *fp);
|
perf tools: Move elide bool into perf_hpp_fmt struct
After output/sort fields refactoring, it's expensive
to check the elide bool in its current location inside
the 'struct sort_entry'.
The perf_hpp__should_skip function gets highly noticable in
workloads with high number of output/sort fields, like for:
$ perf report -i perf-test.data -F overhead,sample,period,comm,pid,dso,symbol,cpu --stdio
Performance report:
9.70% perf [.] perf_hpp__should_skip
Moving the elide bool into the 'struct perf_hpp_fmt', which
makes the perf_hpp__should_skip just single struct read.
Got speedup of around 22% for my test perf.data workload.
The change should not harm any other workload types.
Performance counter stats for (10 runs):
before:
358,319,732,626 cycles ( +- 0.55% )
467,129,581,515 instructions # 1.30 insns per cycle ( +- 0.00% )
150.943975206 seconds time elapsed ( +- 0.62% )
now:
278,785,972,990 cycles ( +- 0.12% )
370,146,797,640 instructions # 1.33 insns per cycle ( +- 0.00% )
116.416670507 seconds time elapsed ( +- 0.31% )
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20140601142622.GA9131@krava.brq.redhat.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-05-23 15:15:47 +00:00
|
|
|
void perf_hpp__set_elide(int idx, bool elide);
|
2009-09-24 16:02:49 +00:00
|
|
|
|
2021-07-15 16:07:14 +00:00
|
|
|
char *sort_help(const char *prefix);
|
perf report: Show all sort keys in help output
Show all the supported sort keys in the command line help output, so
that it's not needed to refer to the manpage.
Before:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
After:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu ...
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-5-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-9r3uz2ch4izoi1uln3f889co@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-14 22:49:57 +00:00
|
|
|
|
2012-12-07 05:48:05 +00:00
|
|
|
int report_parse_ignore_callees_opt(const struct option *opt, const char *arg, int unset);
|
|
|
|
|
2014-08-22 13:58:38 +00:00
|
|
|
bool is_strict_order(const char *order);
|
2015-10-06 12:25:11 +00:00
|
|
|
|
|
|
|
int hpp_dimension__add_output(unsigned col);
|
2016-09-22 15:36:32 +00:00
|
|
|
void reset_dimensions(void);
|
2016-09-22 15:36:33 +00:00
|
|
|
int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
|
2019-07-21 11:23:52 +00:00
|
|
|
struct evlist *evlist,
|
2016-09-22 15:36:33 +00:00
|
|
|
int level);
|
|
|
|
int output_field_add(struct perf_hpp_list *list, char *tok);
|
2016-09-22 15:36:34 +00:00
|
|
|
int64_t
|
|
|
|
sort__iaddr_cmp(struct hist_entry *left, struct hist_entry *right);
|
|
|
|
int64_t
|
|
|
|
sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right);
|
|
|
|
int64_t
|
|
|
|
sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right);
|
2020-03-19 20:25:17 +00:00
|
|
|
int64_t
|
|
|
|
_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r);
|
2018-05-28 14:06:58 +00:00
|
|
|
char *hist_entry__srcline(struct hist_entry *he);
|
2009-09-24 16:02:49 +00:00
|
|
|
#endif /* __PERF_SORT_H */
|