2009-12-14 22:09:31 +00:00
|
|
|
perf-diff(1)
|
2010-05-05 14:23:27 +00:00
|
|
|
============
|
2009-12-14 22:09:31 +00:00
|
|
|
|
|
|
|
NAME
|
|
|
|
----
|
2012-10-24 12:56:51 +00:00
|
|
|
perf-diff - Read perf.data files and display the differential profile
|
2009-12-14 22:09:31 +00:00
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
--------
|
|
|
|
[verse]
|
2012-10-24 12:56:51 +00:00
|
|
|
'perf diff' [baseline file] [data file1] [[data file2] ... ]
|
2009-12-14 22:09:31 +00:00
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
2012-10-24 12:56:51 +00:00
|
|
|
This command displays the performance difference amongst two or more perf.data
|
|
|
|
files captured via perf record.
|
2009-12-14 22:09:31 +00:00
|
|
|
|
|
|
|
If no parameters are passed it will assume perf.data.old and perf.data.
|
|
|
|
|
2012-09-06 15:46:55 +00:00
|
|
|
The differential profile is displayed only for events matching both
|
|
|
|
specified perf.data files.
|
|
|
|
|
perf diff: Support for different binaries
Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.
Actually, perf diff once intended to compare the symbol names. The
commit as below can look for a pair by name.
604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.
For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.
Then the commit as below used IP to replace symbol name.
9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.
The global dsos is discarded from a patch in 2010.
a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.
This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.
Here are some examples:
The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.
example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
Old perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
Event 'cycles'
Baseline Delta Shared Object Symbol
........ ....... ................ ...............................
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% example [.] f2
53.71% -7.55% example [.] f3
+53.81% example [.] f3
0.02% example [.] main
New perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% -0.08% example [.] f2
53.71% +0.11% example [.] f3
0.02% example [.] main
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1423460384-11645-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-09 05:39:44 +00:00
|
|
|
If no parameters are passed the samples will be sorted by dso and symbol.
|
|
|
|
As the perf.data files could come from different binaries, the symbols addresses
|
|
|
|
could vary. So perf diff is based on the comparison of the files and
|
|
|
|
symbols name.
|
|
|
|
|
2009-12-14 22:09:31 +00:00
|
|
|
OPTIONS
|
|
|
|
-------
|
2010-12-01 01:57:12 +00:00
|
|
|
-D::
|
|
|
|
--dump-raw-trace::
|
|
|
|
Dump raw trace in ASCII.
|
|
|
|
|
2015-03-24 15:51:57 +00:00
|
|
|
--kallsyms=<file>::
|
|
|
|
kallsyms pathname
|
|
|
|
|
2010-12-01 01:57:12 +00:00
|
|
|
-m::
|
|
|
|
--modules::
|
|
|
|
Load module symbols. WARNING: use only with -k and LIVE kernel
|
|
|
|
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
-d::
|
|
|
|
--dsos=::
|
|
|
|
Only consider symbols in these dsos. CSV that understands
|
2014-02-07 03:06:07 +00:00
|
|
|
file://filename entries. This option will affect the percentage
|
|
|
|
of the Baseline/Delta column. See --percentage for more info.
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
|
|
|
|
-C::
|
|
|
|
--comms=::
|
|
|
|
Only consider symbols in these comms. CSV that understands
|
2014-02-07 03:06:07 +00:00
|
|
|
file://filename entries. This option will affect the percentage
|
|
|
|
of the Baseline/Delta column. See --percentage for more info.
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
|
|
|
|
-S::
|
|
|
|
--symbols=::
|
|
|
|
Only consider these symbols. CSV that understands
|
2014-02-07 03:06:07 +00:00
|
|
|
file://filename entries. This option will affect the percentage
|
|
|
|
of the Baseline/Delta column. See --percentage for more info.
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
|
|
|
|
-s::
|
|
|
|
--sort=::
|
2014-03-04 00:06:42 +00:00
|
|
|
Sort by key(s): pid, comm, dso, symbol, cpu, parent, srcline.
|
|
|
|
Please see description of --sort in the perf-report man page.
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
|
|
|
|
-t::
|
|
|
|
--field-separator=::
|
|
|
|
|
|
|
|
Use a special separator character and don't pad with spaces, replacing
|
2010-12-01 01:57:12 +00:00
|
|
|
all occurrences of this separator in symbol names (and other output)
|
perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:
$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal
2.91% -1.00% find [kernel] [k] __kmalloc
2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent
1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock
2.44% find [kernel] [k] half_md4_transform
$
So if you want to zoom into libc:
$ perf diff --dsos libc-2.10.1.so | head -8
37.34% find [.] _IO_vfprintf_internal
10.34% find [.] __GI_memmove
8.25% +2.00% find [.] _int_malloc
5.07% -1.00% find [.] __GI_mempcpy
7.62% +2.00% find [.] _int_free
$
And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:
$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% [.] __GI_mempcpy
7.62% +2.00% [.] _int_free
$
The displacement column now is off by default, to use it:
perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34% [.] _IO_vfprintf_internal
10.34% [.] __GI_memmove
8.25% +2.00% [.] _int_malloc
5.07% -1.00% +2 [.] __GI_mempcpy
7.62% +2.00% -1 [.] _int_free
$
Using -t/--field-separator can be used for scripting:
$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%, +2,[.] __GI_mempcpy
7.62,+2.00%, -1,[.] _int_free
6.99,+1.00%, -1,[.] _IO_new_file_xsputn
1.89,-2.00%, +4,[.] __readdir64
$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 15:49:27 +00:00
|
|
|
with a '.' character, that thus it's the only non valid separator.
|
|
|
|
|
2009-12-14 22:09:31 +00:00
|
|
|
-v::
|
|
|
|
--verbose::
|
2009-12-15 09:24:08 +00:00
|
|
|
Be verbose, for instance, show the raw counts in addition to the
|
2009-12-14 22:09:31 +00:00
|
|
|
diff.
|
2009-12-15 13:01:22 +00:00
|
|
|
|
2017-02-17 08:17:40 +00:00
|
|
|
-q::
|
|
|
|
--quiet::
|
|
|
|
Do not show any message. (Suppress -v)
|
|
|
|
|
2010-12-01 01:57:12 +00:00
|
|
|
-f::
|
|
|
|
--force::
|
2016-03-24 12:52:19 +00:00
|
|
|
Don't do ownership validation.
|
2010-12-01 01:57:12 +00:00
|
|
|
|
2010-12-09 20:27:07 +00:00
|
|
|
--symfs=<directory>::
|
|
|
|
Look for files with symbols relative to this directory.
|
2010-12-01 01:57:12 +00:00
|
|
|
|
2012-10-05 14:44:40 +00:00
|
|
|
-b::
|
|
|
|
--baseline-only::
|
|
|
|
Show only items with match in baseline.
|
|
|
|
|
2012-10-05 14:44:41 +00:00
|
|
|
-c::
|
|
|
|
--compute::
|
2019-06-28 09:23:04 +00:00
|
|
|
Differential computation selection - delta, ratio, wdiff, cycles,
|
|
|
|
delta-abs (default is delta-abs). Default can be changed using
|
|
|
|
diff.compute config option. See COMPARISON METHODS section for
|
|
|
|
more info.
|
2012-10-05 14:44:41 +00:00
|
|
|
|
2019-09-25 01:14:46 +00:00
|
|
|
--cycles-hist::
|
|
|
|
Report a histogram and the standard deviation for cycles data.
|
|
|
|
It can help us to judge if the reported cycles data is noisy or
|
|
|
|
not. This option should be used with '-c cycles'.
|
|
|
|
|
2012-10-05 14:44:44 +00:00
|
|
|
-p::
|
|
|
|
--period::
|
|
|
|
Show period values for both compared hist entries.
|
|
|
|
|
2012-10-05 14:44:45 +00:00
|
|
|
-F::
|
|
|
|
--formula::
|
|
|
|
Show formula for given computation.
|
|
|
|
|
2012-11-25 22:10:20 +00:00
|
|
|
-o::
|
|
|
|
--order::
|
2017-02-10 07:36:12 +00:00
|
|
|
Specify compute sorting column number. 0 means sorting by baseline
|
2017-02-10 16:18:56 +00:00
|
|
|
overhead and 1 (default) means sorting by computed value of column 1
|
2017-02-10 07:36:12 +00:00
|
|
|
(data from the first file other base baseline). Values more than 1
|
|
|
|
can be used only if enough data files are provided.
|
|
|
|
The default value can be set using the diff.order config option.
|
2012-11-25 22:10:20 +00:00
|
|
|
|
2014-02-07 03:06:07 +00:00
|
|
|
--percentage::
|
|
|
|
Determine how to display the overhead percentage of filtered entries.
|
|
|
|
Filters can be applied by --comms, --dsos and/or --symbols options.
|
|
|
|
|
|
|
|
"relative" means it's relative to filtered entries only so that the
|
|
|
|
sum of shown entries will be always 100%. "absolute" means it retains
|
|
|
|
the original value before and after the filter is applied.
|
|
|
|
|
perf diff: Support --time filter option
To improve 'perf diff', implement a --time filter option to diff the
samples within given time window.
It supports time percent with multiple time ranges. The time string
format is 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
For example:
Select the second 10% time slice to diff:
perf diff --time 10%/2
Select from 0% to 10% time slice to diff:
perf diff --time 0%-10%
Select the first and the second 10% time slices to diff:
perf diff --time 10%/1,10%/2
Select from 0% to 10% and 30% to 40% slices to diff:
perf diff --time 0%-10%,30%-40%
It also supports analysing samples within a given time window
<start>,<stop>.
Times have the format seconds.microseconds.
If 'start' is not given (i.e., time string is ',x.y') then analysis starts at
the beginning of the file.
If the stop time is not given (i.e, time string is 'x.y,') then analysis
goes to end of file.
Time string is 'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps for
different perf.data files.
For example, we get the timestamp information from perf script.
perf script -i perf.data.old
mgen 13940 [000] 3946.361400: ...
perf script -i perf.data
mgen 13940 [000] 3971.150589 ...
perf diff --time 3946.361400,:3971.150589,
It analyzes the perf.data.old from the timestamp 3946.361400 to the end of
perf.data.old and analyzes the perf.data from the timestamp 3971.150589 to the
end of perf.data.
v4:
---
Update abstime_str_dup(), let it return error if strdup
is failed, and update __cmd_diff() accordingly.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1551791143-10334-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-05 13:05:41 +00:00
|
|
|
--time::
|
|
|
|
Analyze samples within given time window. It supports time
|
|
|
|
percent with multiple time ranges. Time string is 'a%/n,b%/m,...'
|
|
|
|
or 'a%-b%,c%-%d,...'.
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
Select the second 10% time slice to diff:
|
|
|
|
|
|
|
|
perf diff --time 10%/2
|
|
|
|
|
|
|
|
Select from 0% to 10% time slice to diff:
|
|
|
|
|
|
|
|
perf diff --time 0%-10%
|
|
|
|
|
|
|
|
Select the first and the second 10% time slices to diff:
|
|
|
|
|
|
|
|
perf diff --time 10%/1,10%/2
|
|
|
|
|
|
|
|
Select from 0% to 10% and 30% to 40% slices to diff:
|
|
|
|
|
|
|
|
perf diff --time 0%-10%,30%-40%
|
|
|
|
|
|
|
|
It also supports analyzing samples within a given time window
|
2019-06-04 13:00:13 +00:00
|
|
|
<start>,<stop>. Times have the format seconds.nanoseconds. If 'start'
|
|
|
|
is not given (i.e. time string is ',x.y') then analysis starts at
|
|
|
|
the beginning of the file. If stop time is not given (i.e. time
|
2019-06-04 13:00:17 +00:00
|
|
|
string is 'x.y,') then analysis goes to the end of the file.
|
|
|
|
Multiple ranges can be separated by spaces, which requires the argument
|
|
|
|
to be quoted e.g. --time "1234.567,1234.789 1235,"
|
|
|
|
Time string is'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps
|
|
|
|
for different perf.data files.
|
perf diff: Support --time filter option
To improve 'perf diff', implement a --time filter option to diff the
samples within given time window.
It supports time percent with multiple time ranges. The time string
format is 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
For example:
Select the second 10% time slice to diff:
perf diff --time 10%/2
Select from 0% to 10% time slice to diff:
perf diff --time 0%-10%
Select the first and the second 10% time slices to diff:
perf diff --time 10%/1,10%/2
Select from 0% to 10% and 30% to 40% slices to diff:
perf diff --time 0%-10%,30%-40%
It also supports analysing samples within a given time window
<start>,<stop>.
Times have the format seconds.microseconds.
If 'start' is not given (i.e., time string is ',x.y') then analysis starts at
the beginning of the file.
If the stop time is not given (i.e, time string is 'x.y,') then analysis
goes to end of file.
Time string is 'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps for
different perf.data files.
For example, we get the timestamp information from perf script.
perf script -i perf.data.old
mgen 13940 [000] 3946.361400: ...
perf script -i perf.data
mgen 13940 [000] 3971.150589 ...
perf diff --time 3946.361400,:3971.150589,
It analyzes the perf.data.old from the timestamp 3946.361400 to the end of
perf.data.old and analyzes the perf.data from the timestamp 3971.150589 to the
end of perf.data.
v4:
---
Update abstime_str_dup(), let it return error if strdup
is failed, and update __cmd_diff() accordingly.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1551791143-10334-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-05 13:05:41 +00:00
|
|
|
|
|
|
|
For example, we get the timestamp information from 'perf script'.
|
|
|
|
|
|
|
|
perf script -i perf.data.old
|
|
|
|
mgen 13940 [000] 3946.361400: ...
|
|
|
|
|
|
|
|
perf script -i perf.data
|
|
|
|
mgen 13940 [000] 3971.150589 ...
|
|
|
|
|
|
|
|
perf diff --time 3946.361400,:3971.150589,
|
|
|
|
|
|
|
|
It analyzes the perf.data.old from the timestamp 3946.361400 to
|
|
|
|
the end of perf.data.old and analyzes the perf.data from the
|
|
|
|
timestamp 3971.150589 to the end of perf.data.
|
|
|
|
|
2019-03-05 13:05:42 +00:00
|
|
|
--cpu:: Only diff samples for the list of CPUs provided. Multiple CPUs can
|
|
|
|
be provided as a comma-separated list with no space: 0,1. Ranges of
|
|
|
|
CPUs are specified with -: 0-2. Default is to report samples on all
|
|
|
|
CPUs.
|
|
|
|
|
2019-03-05 13:05:43 +00:00
|
|
|
--pid=::
|
|
|
|
Only diff samples for given process ID (comma separated list).
|
|
|
|
|
|
|
|
--tid=::
|
|
|
|
Only diff samples for given thread ID (comma separated list).
|
|
|
|
|
perf diff: Support hot streams comparison
This patch enables perf-diff with "--stream" option.
"--stream": Enable hot streams comparison
Now let's see example.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream
[ Matched hot streams ]
hot chain pair 1:
cycles: 1, hits: 27.77% cycles: 1, hits: 9.24%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 34, hits: 20.06% cycles: 27, hits: 16.98%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39
hot chain pair 3:
cycles: 9, hits: 4.48% cycles: 6, hits: 4.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot streams in old perf data only ]
hot chain 1:
cycles: 18, hits: 6.75%
--------------------------
__random_r random_r.c:360
__random_r random_r.c:388
__random_r random_r.c:388
__random_r random_r.c:380
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
hot chain 2:
cycles: 29, hits: 2.78%
--------------------------
compute_flag div.c:22
main div.c:40
main div.c:40
main div.c:39
[ Hot streams in new perf data only ]
hot chain 1:
cycles: 4, hits: 4.54%
--------------------------
main div.c:42
compute_flag div.c:28
hot chain 2:
cycles: 5, hits: 3.51%
--------------------------
main div.c:39
main div.c:44
main div.c:42
compute_flag div.c:28
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 02:28:45 +00:00
|
|
|
--stream::
|
|
|
|
Enable hot streams comparison. Stream can be a callchain which is
|
|
|
|
aggregated by the branch records from samples.
|
|
|
|
|
2012-10-24 12:56:51 +00:00
|
|
|
COMPARISON
|
|
|
|
----------
|
|
|
|
The comparison is governed by the baseline file. The baseline perf.data
|
|
|
|
file is iterated for samples. All other perf.data files specified on
|
|
|
|
the command line are searched for the baseline sample pair. If the pair
|
|
|
|
is found, specified computation is made and result is displayed.
|
|
|
|
|
|
|
|
All samples from non-baseline perf.data files, that do not match any
|
|
|
|
baseline entry, are displayed with empty space within baseline column
|
|
|
|
and possible computation results (delta) in their related column.
|
|
|
|
|
|
|
|
Example files samples:
|
|
|
|
- file A with samples f1, f2, f3, f4, f6
|
|
|
|
- file B with samples f2, f4, f5
|
|
|
|
- file C with samples f1, f2, f5
|
|
|
|
|
|
|
|
Example output:
|
|
|
|
x - computation takes place for pair
|
|
|
|
b - baseline sample percentage
|
|
|
|
|
|
|
|
- perf diff A B C
|
|
|
|
|
|
|
|
baseline/A compute/B compute/C samples
|
|
|
|
---------------------------------------
|
|
|
|
b x f1
|
|
|
|
b x x f2
|
|
|
|
b f3
|
|
|
|
b x f4
|
|
|
|
b f6
|
|
|
|
x x f5
|
|
|
|
|
|
|
|
- perf diff B A C
|
|
|
|
|
|
|
|
baseline/B compute/A compute/C samples
|
|
|
|
---------------------------------------
|
|
|
|
b x x f2
|
|
|
|
b x f4
|
|
|
|
b x f5
|
|
|
|
x x f1
|
|
|
|
x f3
|
|
|
|
x f6
|
|
|
|
|
|
|
|
- perf diff C B A
|
|
|
|
|
|
|
|
baseline/C compute/B compute/A samples
|
|
|
|
---------------------------------------
|
|
|
|
b x f1
|
|
|
|
b x x f2
|
|
|
|
b x f5
|
|
|
|
x f3
|
|
|
|
x x f4
|
|
|
|
x f6
|
|
|
|
|
2012-10-05 14:44:41 +00:00
|
|
|
COMPARISON METHODS
|
|
|
|
------------------
|
|
|
|
delta
|
|
|
|
~~~~~
|
|
|
|
If specified the 'Delta' column is displayed with value 'd' computed as:
|
|
|
|
|
|
|
|
d = A->period_percent - B->period_percent
|
|
|
|
|
|
|
|
with:
|
2012-10-24 12:56:51 +00:00
|
|
|
- A/B being matching hist entry from data/baseline file specified
|
2012-10-05 14:44:41 +00:00
|
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
|
|
|
|
- period_percent being the % of the hist entry period value within
|
|
|
|
single data file
|
|
|
|
|
2014-02-07 03:06:07 +00:00
|
|
|
- with filtering by -C, -d and/or -S, period_percent might be changed
|
|
|
|
relative to how entries are filtered. Use --percentage=absolute to
|
|
|
|
prevent such fluctuation.
|
|
|
|
|
2017-02-10 07:36:11 +00:00
|
|
|
delta-abs
|
|
|
|
~~~~~~~~~
|
|
|
|
Same as 'delta` method, but sort the result with the absolute values.
|
|
|
|
|
2012-10-05 14:44:41 +00:00
|
|
|
ratio
|
|
|
|
~~~~~
|
|
|
|
If specified the 'Ratio' column is displayed with value 'r' computed as:
|
|
|
|
|
|
|
|
r = A->period / B->period
|
|
|
|
|
|
|
|
with:
|
2012-10-24 12:56:51 +00:00
|
|
|
- A/B being matching hist entry from data/baseline file specified
|
2012-10-05 14:44:41 +00:00
|
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
|
|
|
|
- period being the hist entry period value
|
|
|
|
|
2012-10-24 12:56:51 +00:00
|
|
|
wdiff:WEIGHT-B,WEIGHT-A
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
2012-10-05 14:44:43 +00:00
|
|
|
If specified the 'Weighted diff' column is displayed with value 'd' computed as:
|
|
|
|
|
|
|
|
d = B->period * WEIGHT-A - A->period * WEIGHT-B
|
|
|
|
|
2012-10-24 12:56:51 +00:00
|
|
|
- A/B being matching hist entry from data/baseline file specified
|
2012-10-05 14:44:43 +00:00
|
|
|
(or perf.data/perf.data.old) respectively.
|
|
|
|
|
|
|
|
- period being the hist entry period value
|
|
|
|
|
2014-09-09 15:18:50 +00:00
|
|
|
- WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option
|
2012-10-05 14:44:43 +00:00
|
|
|
behind ':' separator like '-c wdiff:1,2'.
|
2014-09-09 15:18:50 +00:00
|
|
|
- WEIGHT-A being the weight of the data file
|
|
|
|
- WEIGHT-B being the weight of the baseline data file
|
2012-10-05 14:44:41 +00:00
|
|
|
|
2019-06-28 09:23:04 +00:00
|
|
|
cycles
|
|
|
|
~~~~~~
|
|
|
|
If specified the '[Program Block Range] Cycles Diff' column is displayed.
|
|
|
|
It displays the cycles difference of same program basic block amongst
|
|
|
|
two perf.data. The program basic block is the code between two branches.
|
|
|
|
|
|
|
|
'[Program Block Range]' indicates the range of a program basic block.
|
|
|
|
Source line is reported if it can be found otherwise uses symbol+offset
|
|
|
|
instead.
|
|
|
|
|
2009-12-14 22:09:31 +00:00
|
|
|
SEE ALSO
|
|
|
|
--------
|
2014-03-04 00:06:42 +00:00
|
|
|
linkperf:perf-record[1], linkperf:perf-report[1]
|