Pull perf updates from Ingo Molnar:
"Main kernel side changes:
- Big reorganization of the x86 perf support code. The old code grew
organically deep inside arch/x86/kernel/cpu/perf* and its naming
became somewhat messy.
The new location is under arch/x86/events/, using the following
cleaner hierarchy of source code files:
perf/x86: Move perf_event.c .................. => x86/events/core.c
perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c
(Borislav Petkov)
- Update various x86 PMU constraint and hw support details (Stephane
Eranian)
- Optimize kprobes for BPF execution (Martin KaFai Lau)
- Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
Gleixner)
- Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)
- Various fixes and smaller cleanups.
There are lots of perf tooling updates as well. A few highlights:
perf report/top:
- Hierarchy histogram mode for 'perf top' and 'perf report',
showing multiple levels, one per --sort entry: (Namhyung Kim)
On a mostly idle system:
# perf top --hierarchy -s comm,dso
Then expand some levels and use 'P' to take a snapshot:
# cat perf.hist.0
- 92.32% perf
58.20% perf
22.29% libc-2.22.so
5.97% [kernel]
4.18% libelf-0.165.so
1.69% [unknown]
- 4.71% qemu-system-x86
3.10% [kernel]
1.60% qemu-system-x86_64 (deleted)
+ 2.97% swapper
#
- Add 'L' hotkey to dynamicly set the percent threshold for
histogram entries and callchains, i.e. dynamicly do what the
--percent-limit command line option to 'top' and 'report' does.
(Namhyung Kim)
perf mem:
- Allow specifying events via -e in 'perf mem record', also listing
what events can be specified via 'perf mem record -e list' (Jiri
Olsa)
perf record:
- Add 'perf record' --all-user/--all-kernel options, so that one
can tell that all the events in the command line should be
restricted to the user or kernel levels (Jiri Olsa), i.e.:
perf record -e cycles:u,instructions:u
is equivalent to:
perf record --all-user -e cycles,instructions
- Make 'perf record' collect CPU cache info in the perf.data file header:
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
$ perf report --header-only -I | tail -10 | head -8
# CPU cache info:
# L1 Data 32K [0-1]
# L1 Instruction 32K [0-1]
# L1 Data 32K [2-3]
# L1 Instruction 32K [2-3]
# L2 Unified 256K [0-1]
# L2 Unified 256K [2-3]
# L3 Unified 4096K [0-3]
Will be used in 'perf c2c' and eventually in 'perf diff' to
allow, for instance running the same workload in multiple
machines and then when using 'diff' show the hardware difference.
(Jiri Olsa)
- Improved support for Java, using the JVMTI agent library to do
jitdumps that then will be inserted in synthesized
PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
ELF files stored in ~/.debug and keyed with build-ids, to allow
symbol resolution and even annotation with source line info, see
the changeset comments to see how to use it (Stephane Eranian)
perf script/trace:
- Decode data_src values (e.g. perf.data files generated by 'perf
mem record') in 'perf script': (Jiri Olsa)
# perf script
perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Improve support to 'data_src', 'weight' and 'addr' fields in
'perf script' (Jiri Olsa)
- Handle empty print fmts in 'perf script -s' i.e. when running
python or perl scripts (Taeung Song)
perf stat:
- 'perf stat' now shows shadow metrics (insn per cycle, etc) in
interval mode too. E.g:
# perf stat -I 1000 -e instructions,cycles sleep 1
# time counts unit events
1.000215928 519,620 instructions # 0.69 insn per cycle
1.000215928 752,003 cycles
<SNIP>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
- Implement CSV metrics output in 'perf stat' (Andi Kleen)
perf BPF support:
- Support converting data from bpf events in 'perf data' (Wang Nan)
- Print bpf-output events in 'perf script': (Wang Nan).
# perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
# perf script
usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
#
- Add API to set values of map entries in a BPF object, be it
individual map slots or ranges (Wang Nan)
- Introduce support for the 'bpf-output' event (Wang Nan)
- Add glue to read perf events in a BPF program (Wang Nan)
- Improve support for bpf-output events in 'perf trace' (Wang Nan)
... and tons of other changes as well - see the shortlog and git log
for details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
perf stat: Add --metric-only support for -A
perf stat: Implement --metric-only mode
perf stat: Document CSV format in manpage
perf hists browser: Check sort keys before hot key actions
perf hists browser: Allow thread filtering for comm sort key
perf tools: Add sort__has_comm variable
perf tools: Recalc total periods using top-level entries in hierarchy
perf tools: Remove nr_sort_keys field
perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
perf tools: Remove hist_entry->fmt field
perf tools: Fix command line filters in hierarchy mode
perf tools: Add more sort entry check functions
perf tools: Fix hist_entry__filter() for hierarchy
perf jitdump: Build only on supported archs
tools lib traceevent: Add '~' operation within arg_num_eval()
perf tools: Omit unnecessary cast in perf_pmu__parse_scale
perf tools: Pass perf_hpp_list all the way through setup_sort_list
perf tools: Fix perf script python database export crash
perf jitdump: DWARF is also needed
perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
...
Pull turbostat updates for 4.6 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
tools/power turbostat: Decode MSR_MISC_PWR_MGMT
tools/power turbostat: decode HWP registers
x86 msr-index: Simplify syntax for HWP fields
tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
tools/power turbostat: decode more CPUID fields
MSR_CONFIG_TDP_NOMINAL:
should print all 8 bits of base_ratio (bit 0:7) 0xFF
MSR_CONFIG_TDP_LEVEL_1:
should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF
should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF
should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF
should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF
And the same modification to MSR_CONFIG_TDP_LEVEL_2.
MSR_TURBO_ACTIVATION_RATIO:
should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited)
should print as
MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited)
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat already checks whether calling each cpuid leavf is legal,
and it doesn't look at the function return value,
so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid().
syntax only, no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
The accuracy of Bzy_Mhz and Busy% depend on reading
the TSC, APERF, and MPERF close together in time.
When there is a very short measurement interval,
or a large system is profoundly idle, the changes
in APERF and MPERF may be very small.
They can be small enough that an expensive interrupt
between reading APERF and MPERF can cause the APERF/MPERF
ratio to become inaccurate, resulting in invalid
calculation and display of Bzy_MHz.
A dummy APERF read of APERF makes this problem
much more rare. Apparently this 1st systemn call
after exiting a long stretch of idle is when we
typically see expensive timer interrupts that cause
large jitter.
For the cases that dummy APERF read fails to prevent,
we compare the latency of the APERF and MPERF reads.
If they differ by more than 2x, we re-issue them.
Signed-off-by: Len Brown <len.brown@intel.com>
The column "GFX%c6" show the percentage of time the GPU
is in the "render C6" state, rc6. Deep package C-states on several
systems depend on the GPU being in RC6.
This information comes from the counter
/sys/class/drm/card0/power/rc6_residency_ms,
as read before and after the measurement interval.
Signed-off-by: Len Brown <len.brown@intel.com>
Under the column "GFXMHz", show a snapshot of this attribute:
/sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz
This is an instantaneous snapshot of what sysfs presents
at the end of the measurement interval. turbostat does
not average or otherwise perform any math on this value.
Signed-off-by: Len Brown <len.brown@intel.com>
The new IRQ column shows how many interrupts have occurred on each CPU
during the measurement inteval. This information comes from
the difference between /proc/interrupts shapshots made before
and after the measurement interval.
The first row, the system summary, shows the sum of the IRQS
for all CPUs during that interval.
Signed-off-by: Len Brown <len.brown@intel.com>
skip the open(2)/close(2) on each msr read
by keeping the /dev/cpu/*/msr files open.
The remaining read(2) is generally far fewer cycles
than the removed open(2) system call.
Signed-off-by: Len Brown <len.brown@intel.com>
By default...
Turbostat --debug gconfiguration info goes to stderr.
In FORK mode, turbostat statistics go to stderr.
In PERIODIC mode, turbostat statistics go to stdout.
These defaults do not change, but an option "--out file"
will send all output above only to the specified file.
Signed-off-by: Len Brown <len.brown@intel.com>
some tools processing turbostat output
have difficulty with items that begin with %...
Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Following changes have been made:
- changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print
for consistency with Developer Manual
- updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate
parsing code
- added x200 to list of architectures that do not support Nahlem compatible
definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but
bits definition is custom)
- fixed typo in code that parses MSR_TURBO_RATIO_LIMIT
(logical instead of bitwise operator)
- changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the
same order as implementations for other platforms
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
x200 does not enable any way to programmatically obtain bus clock
speed. Bclk for the architecture has a fixed value of 100 MHz.
At the same time x200 cannot be included in has_snb_msrs since
it does not support C7 idle state.
prior to this patch, MHz values reported on this chip
were erroneously calculated using bclk of 133MHz,
causing MHz values to be reported 33% higher than actual.
Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat -i interval_sec
will sample and display statistics every interval_sec.
interval_sec used to be a whole number of seconds,
but now we accept a decimal, as small as 0.001 sec (1 ms).
Signed-off-by: Len Brown <len.brown@intel.com>
The topology test case of 'perf test' seems to be broken on my x86
system - due to the comparison of a "core-id" with # of CPUs online.
There are 8 online CPUs:
$ cat /sys/devices/system/cpu/online
0-7
but core-ids are not sequential and some core-ids exceed the number
of online CPUs.
$ cat /sys/devices/system/cpu/cpu?/topology/core_id
0
1
9
10
0
1
9
10
Looks like we can safely remove the check. Output before:
$ perf --version
perf version 4.4.rc1.g34258a
$ perf test -v topo
36: Test topology in session :
--- start ---
test child forked, pid 5906
templ file: /tmp/perf-test-vCwWG3
core_id number is too big.You may need to upgrade the perf tool.
test child interrupted
---- end ----
Test topology in session: FAILED!
and after:
$ perf test -v topo
36: Test topology in session :
--- start ---
test child forked, pid 6532
templ file: /tmp/perf-test-y10wFJ
CPU 0, core 0, socket 0
CPU 1, core 1, socket 0
CPU 2, core 9, socket 0
CPU 3, core 10, socket 0
CPU 4, core 0, socket 1
CPU 5, core 1, socket 1
CPU 6, core 9, socket 1
CPU 7, core 10, socket 1
test child finished with 0
---- end ----
Test topology in session: Ok
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/20151203233219.GA27696@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add metric only support for -A too. This requires a new print function
that prints the metrics in the right order.
v2: Fix manpage
v3: Simplify nrcpus computation
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1457049458-28956-7-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a new mode to only print metrics. Sometimes we don't care about the
raw values, just want the computed metrics. This allows more compact
printing, so with -I each sample is only a single line. This also
allows easier plotting and processing with other tools.
The main target is with using --topdown, but it also works with -T and
standard perf stat. A few metrics are not supported.
To avoiding having to hardcode all the metrics in the code it uses a two
pass approach: first compute dummy metrics and only print the headers in
the print_metric callback. Then use the callback to print the actual
values.
There are some additional changes in the stat printout code to handle
all metrics being on a single line.
One issue is that the column code doesn't know in advance what events
are not supported by the CPU, and it would be hard to find out as this
could change based on dynamic conditions. That causes empty columns in
some cases.
The output can be fairly wide, often you may need more than 80 columns.
Example:
% perf stat -a -I 1000 --metric-only
1.001452803 frontend cycles idle insn per cycle stalled cycles per insn branch-misses of all branches
1.001452803 158.91% 0.66 2.39 2.92%
2.002192321 180.63% 0.76 2.08 2.96%
3.003088282 150.59% 0.62 2.57 2.84%
4.004369835 196.20% 0.98 1.62 3.79%
5.005227314 231.98% 0.84 1.90 4.71%
v2: Lots of updates.
v3: Use slightly narrower columns
v4: Add comment
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1457049458-28956-6-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With all the recently added fields in the perf stat CSV output we should
finally document them in the man page. Do this here.
v2: Fix fields in documentation (Jiri)
v3: fix order of fields again (Jiri)
v4: Change order again.
v5: Document more fields (Jiri)
v6: Move time stamp first
v7: More fixes (Jiri)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1457049458-28956-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The context menu in TUI hists browser checks corresponding sort keys
when creating the menu item. But hotkey actions lacks these checks so
it can filter using incorrect info.
For example, default sort key of 'perf top' doesn't contain 'comm' or
'pid' sort key so each hist entry's thread info is not reliable. Thus
it should prohibit using thread filter on 't' key.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1457533253-21419-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The commit 2eafd410e6 ("perf hists browser: Only 'Zoom into thread'
only when sort order has 'pid'") disabled thread filtering in hist
browser for the default sort key. However the he->thread is still valid
even if 'pid' sort key is not given. Only thing it should not use is
the pid (or tid) of the thread. So allow to filter by thread when
'comm' sort key is given and show pid only if 'pid' sort key is given.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1457536490-24084-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sort__has_comm variable is to check whether the comm sort key is
given. This is necessary to support thread filtering in the TUI hists
browser later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1457533253-21419-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When hierarchy mode is enabled, each entry in a hierarchy level shares
the period. IOW an upper level entry's period is the sum of lower level
entries. Thus perf uses only one of them to calculate the total period
of hists. It was lowest-level (leaf) entries but it has a problem when
it comes to filters.
If a filter is applied, entries in the same level will be filtered or
not. But upper level entries still have period of their sum including
filtered one. So total sum of upper level entries will not be same as
sum of lower level entries.
This resulted in entries having more than 100% of overhead and it can be
produced using perf top with filter(s).
Reported-and-Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The nr_sort_keys field is to carry the number of sort entries in a
hpp_list or hists to determine the depth of indentation of a hist entry.
As it's only used in hierarchy mode and now we have used nr_hpp_node for
this reason, there's no need to keep it anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The hist_browser__fprintf_hierarchy_entry() if to dump current output
into a file so it needs to be sync-ed with the corresponding function
hist_browser__show_hierarchy_entry(). So use hists->nr_hpp_node to
indent width and use first fmt_node to print overhead columns instead of
checking whether it's a sort entry (or dynamic entry).
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's not used anymore and the output format is accessed by the hpp_list
pointer instead when hierarchy is enabled. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a command-line filter is applied in hierarchy mode, output is
broken especially when filtering on lower level. The higher level
entries doesn't show up so it's hard to see the results.
Also it needs to handle multi sort keys in a single hierarchy level.
Before:
$ perf report --hierarchy -s 'cpu,{dso,comm}' --comms swapper --stdio
...
# Overhead CPU / Shared Object+Command
# ........... ...........................
#
13.79% [kernel.vmlinux] swapper
31.71% 000
13.80% [kernel.vmlinux] swapper
0.43% [e1000e] swapper
11.89% [kernel.vmlinux] swapper
9.18% [kernel.vmlinux] swapper
After:
# Overhead CPU / Shared Object+Command
# ........... ...............................
#
33.09% 003
13.79% [kernel.vmlinux] swapper
31.71% 000
13.80% [kernel.vmlinux] swapper
0.43% [e1000e] swapper
21.90% 002
11.89% [kernel.vmlinux] swapper
13.30% 001
9.18% [kernel.vmlinux] swapper
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those functions are for checkinf if a given perf_hpp_fmt is a
filter-related sort entry. With hierarchy mode, it needs to check
filters on the hist entries with its own hpp format list.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When hierarchy mode is enabled each output format is in a separate hpp
list. So when applying a filter it should check all formats in the
list. Currently it only checks a single ->fmt field which was not set
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457531222-18130-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Build jitdump only on architectures defined in util/genelf.h file, to avoid
breaking the build on such arches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160310164113.GA11357@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Setting TF prevents fastpath returns in most cases, which causes the
test to fail on 32-bit kernels because 32-bit kernels do not, in
fact, handle NT correctly on SYSENTER entries.
The next patch will fix 32-bit kernels.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bd4bb48af6b10c0dc84aec6dbcf487ed25683495.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's no need to use a const char pointer, we can used char pointer
from the beginning and omit the unnecessary cast.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160308184230.GB7897@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pass perf_hpp_list all the way through setup_sort_list so that the sort
entry can be added on the arbitrary list.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20160309100417.GA30910@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove the union in evsel so that the database id and priv pointer can
be used simultainously without conflicting and crashing.
Detailed Description for the fixed bug follows:
perf script crashes with a segmentation fault on user space tool version
4.5.rc7.ge2857b when using the python database export API. It works
properly in 4.4 and prior versions.
the crash fist appeared in:
cfc8874a48 ("perf script: Process cpu/threads maps")
How to reproduce the bug:
Remove any temporary files left over from a previous crash (if you have
already attemped to reproduce the bug):
$ rm -r test_db-perf-data
$ dropdb test_db
$ perf record timeout 1 yes >/dev/null
$ perf script -s scripts/python/export-to-postgresql.py test_db
Stack Trace:
Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x1) at malloc.c:2929
2929 malloc.c: No such file or directory.
(gdb) bt
at util/stat.c:122
argv=<optimized out>, prefix=<optimized out>) at builtin-script.c:2231
argc=argc@entry=4, argv=argv@entry=0x7fffffffdf70) at perf.c:390
at perf.c:451
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: cfc8874a48 ("perf script: Process cpu/threads maps")
Link: http://lkml.kernel.org/r/1457500314-8912-1-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While building on a Docker container for ubuntu and installing package
by package one ends up with:
MKDIR /tmp/build/util/
CC /tmp/build/util/genelf.o
util/genelf.c:22:19: fatal error: dwarf.h: No such file or directory
#include <dwarf.h>
^
compilation terminated.
mv: cannot stat '/tmp/build/util/.genelf.o.tmp': No such file or directory
Because the jitdump code needs the DWARF related development packages to
be installed. So make it dependent on that so that the build can succeed
without jitdump support.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-le498robnmxd40237wej3w62@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When objtool discovers an issue, it's very common for it to flood the
terminal with a lot of duplicate warnings. For example:
warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
warning: objtool: rtlwifi_rate_mapping()+0x2f3: frame pointer state mismatch
warning: objtool: rtlwifi_rate_mapping()+0x2ff: frame pointer state mismatch
warning: objtool: rtlwifi_rate_mapping()+0x30b: frame pointer state mismatch
...
The first warning is usually all you need. Change it to only warn once
per function.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/c47f3ca38aa01e2a9b6601f9e38efd414c3f3c18.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use hash tables for instruction and rela lookups (and keep the linked
lists around for sequential access).
Also cache the section struct for the "__func_stack_frame_non_standard"
section.
With this change, "objtool check net/wireless/nl80211.o" goes from:
real 0m1.168s
user 0m1.163s
sys 0m0.005s
to:
real 0m0.059s
user 0m0.042s
sys 0m0.017s
for a 20x speedup.
With the same object, it should be noted that the memory heap usage grew
from 8MB to 62MB. Reducing the memory usage is on the TODO list.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/dd0d8e1449506cfa7701b4e7ba73577077c44253.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Copy hashtable.h from include/linux/tools.h. It's needed by objtool in
the next patch in the series.
Add some includes that it needs, and remove references to
kernel-specific features like RCU and __read_mostly.
Also change some if its dependency headers' includes to use quotes
instead of brackets so gcc can find them.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/be3bef72f6540d8a510515408119d968a0e18179.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo reported [1] some false positive objtool warnings:
drivers/net/wireless/realtek/rtlwifi/base.o: warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
drivers/net/wireless/realtek/rtlwifi/base.o: warning: objtool: rtlwifi_rate_mapping()+0x2f3: frame pointer state mismatch
...
And so did the 0-day bot [2]:
drivers/gpu/drm/radeon/cik.o: warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
drivers/gpu/drm/radeon/cik.o: warning: objtool: cik_tiling_mode_table_init()+0x72b: call without frame pointer save/setup
...
Both sets of warnings involve functions which have multiple switch
statements. When there's more than one switch statement in a function,
objtool interprets all the switch jump tables as a single table. If the
targets of one jump table assume a stack frame and the targets of
another one don't, it prints false positive warnings.
Fix the bug by detecting the size of each switch jump table. For
multiple tables, each one ends where the next one begins.
[1] https://lkml.kernel.org/r/20160308103716.GA9618@gmail.com
[2] https://lists.01.org/pipermail/kbuild-all/2016-March/018124.html
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/2d7eecc6bc52d301f494b80f5fd62c2b6c895658.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename some list heads to distinguish them from hash node heads, which
are added later in the patch series.
Also rename the get_*() functions to add_*(), which is more descriptive:
they "add" data to the objtool_file struct.
Also rename rodata_rela and text_rela to be clearer:
- text_rela refers to a rela entry in .rela.text.
- rodata_rela refers to a rela entry in .rela.rodata.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/ee0eca2bba8482aa45758958c5586c00a7b71e62.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The insns list is initialized twice, in cmd_check() and in
decode_instructions(). Remove the latter.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/be6e21d7eec1f072095d22a1cbe144057135e097.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add some helper macros to make it easier to traverse instructions, and
to abstract the details of the instruction list implementation in
preparation for creating a hash structure.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/8e1715d5035bc02b4db28d0fccef6bb1170d1f12.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With some configs [1], objtool prints a bunch of false positive warnings
like:
arch/x86/events/core.o: warning: objtool: x86_del_exclusive()+0x0: frame pointer state mismatch
For some reason this config has a bunch of sibling calls. When objtool
follows a sibling call jump, it attempts to compare the frame pointer
state. But it also accidentally compares the FENTRY state, resulting in
a false positive warning.
[1] https://lkml.kernel.org/r/20160308154909.GA20956@gmail.com
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/382de77ccaaa8cd79b27a155c3d109ebd4ce0219.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I don't _think_ dead_end_function() can get into a recursive loop, but
just in case, stop the loop and print a warning.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/ff489a63e6feb88abb192cfb361d81626dcf3e89.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo reported an infinite loop in objtool with a certain randconfig [1].
With the given config, two functions in crypto/ablkcipher.o contained
sibling calls to each other, which threw the recursive call in
dead_end_function() for a loop (literally!).
Split the noreturn detection into two passes. In the first pass, check
for return instructions. In the second pass, do the potentially
recursive sibling call check. In most cases, the first pass will be
good enough. In the rare case where a second pass is needed, recursion
should hopefully no longer be possible.
[1] https://lkml.kernel.org/r/20160308154909.GA20956@gmail.com
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/16afb602640ef43b7782087d6cca17bf6fc13603.1457502970.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following upcoming upstream commit:
92b0729c34 ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Adds _ASM_EXTABLE_FAULT(), which is not available in user-space
and breaks the build.
We don't really need _ASM_EXTABLE_FAULT() in user-space, so simply
wrap it to nothing.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now hpp formats are linked using perf_hpp_list_node when hierarchy is
enabled. Like in stdio, use this info to print entries with multiple
sort keys in a single hierarchy properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-8-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now hpp formats are linked using perf_hpp_list_node when hierarchy is
enabled. Like in stdio, use this info to print entries with multiple
sort keys in a single hierarchy properly.
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-7-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now hpp formats are linked using perf_hpp_list_node when hierarchy is
enabled. Use this info to print entries with multiple sort keys in a
single hierarchy properly.
For example, the below example shows using 4 sort keys with 2 levels.
$ perf report --hierarchy -s '{prev_pid,prev_comm},{next_pid,next_comm}' \
--percent-limit 1 -i perf.data.sched
...
# Overhead prev_pid+prev_comm / next_pid+next_comm
# ........... .......................................
#
22.36% 0 swapper/0
9.48% 17773 transmission-gt
5.25% 109 kworker/0:1H
1.53% 6524 Xephyr
21.39% 17773 transmission-gt
9.52% 0 swapper/0
9.04% 0 swapper/2
1.78% 0 swapper/3
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-6-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When multiple sort keys are used in a single hierarchy, it should indent
using number of hierarchy levels instead of number of sort keys.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-5-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This implements having multiple sort keys in a single hierarchy level.
Originally only single sort key is supported for each level, but now
using the group syntax with '{ }', it can set more than one sort key in
one level. Note that now it needs to quote in order to prevent shell
interpretation.
For example:
$ perf report --hierarchy -s '{comm,dso},sym'
...
# Overhead Command / Shared Object / Symbol
# .............. ..........................................
#
48.67% swapper [kernel.vmlinux]
34.42% [k] intel_idle
1.30% [k] __tick_nohz_idle_enter
1.03% [k] cpuidle_reflect
8.87% firefox libpthread-2.22.so
6.60% [.] __GI___libc_recvmsg
1.18% [.] pthread_cond_signal@@GLIBC_2.3.2
1.09% [.] 0x000000000000ff4b
6.11% Xorg libc-2.22.so
5.27% [.] __memcpy_sse2_unaligned
In the above example, the command name and the shared object name are
shown on the same line but the symbol name is on the different line.
Since the first two are grouped by '{}', they are in the same level.
Suggested-and-Tested=by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-4-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now each hists has its own hpp lists in hierarchy. So instead of having
a pointer to a single perf_hpp_fmt in a hist entry, make it point the
hpp_list for its level. This will be used to support multiple sort keys
in a single hierarchy level.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-3-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The perf_hpp__setup_hists_formats() is to build hists-specific output
formats (and sort keys). Currently it's only used in order to build the
output format in a hierarchy with same sort keys, but it could be used
with different sort keys in non-hierarchy mode later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457361308-514-2-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I'm surprised this remained undocumented since at least 2011. And it is
actually a very useful switch, as Steve and I came to realize recently.
Add the text from
2cba3ffb9a ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
which added the incrementing aspect to -d.
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2cba3ffb9a ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
Link: http://lkml.kernel.org/r/1457347294-32546-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The level field is to distinguish levels in the hierarchy mode.
Currently each column (perf_hpp_fmt) has a different level.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457103582-28396-2-git-send-email-namhyung@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit b9511cd761 ("perf/x86: Fix time_shift in perf_event_mmap_page")
altered the time conversion algorithms documented in the perf_event.h
header file, to use 64-bit shifts. That was done to make the code more
future-proof (i.e. some time in the future a 32-bit shift could be
allowed). Reflect those changes in perf tools.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457005856-6143-9-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move clockid validation into jit_process() so it can later be made
conditional.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457005856-6143-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for moving clockid validation into jit_process().
Previously a return value of zero meant the processing had been done and
non-zero meant either the processing was not done (i.e. not the jitdump
file mmap event) or an error occurred.
Change it so that zero means the processing was not done, one means the
processing was done and successful, and negative values are an error.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457005856-6143-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the stubs are identical so just have one function for them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457005856-6143-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, when injecting build ids, if there is AUX data then 'perf
inject' hits all DSOs because it is not known which DSOs the trace data
would hit.
That needs to be done for JIT injection also, and in fact there is no
reason to distinguish what kind of injection is being done. That is,
any time there is AUX data and the HEADER_BUID_ID feature flag is set,
and the AUX data is not being processed, then hit all DSOs. This patch
does that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457005856-6143-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The return type is not defined, so it defaults to int, however, the
function is not returning anything, so this is clearly not correct. Make
it a void function.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1457008214-14393-1-git-send-email-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the boiler-plate for a 'clear error' command based on section
9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI
6.1 specification, and add a reference implementation in nfit_test.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Simulate platform-firmware-initiated and asynchronous scrub results.
This injects poison in the middle of all nfit_test pmem address ranges.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The nvdimm unit test infrastructure performs its own initialization of
an acpi_nfit_desc to specify test overrides over the native
implementation. Make it clear which attributes and operations it is
overriding by re-using acpi_nfit_init_desc() as a common starting point.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The return value from an 'ndctl_fn' reports the command execution
status, i.e. was the command properly formatted and was it successfully
submitted to the bus provider. The new 'cmd_rc' parameter allows the bus
provider to communicate command specific results, translated into
common error codes.
Convert the ARS commands to this scheme to:
1/ Consolidate status reporting
2/ Prepare for for expanding ars unit test cases
3/ Make the implementation more generic
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ACPI 6.1 clarifies that "The system shall include an NVDIMM Control
Region Structure for every Function Interface in the NVDIMM."
Implement this clarification in nfit_test.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ACPI 6.1 and JEDEC Annex L Release 3 formalize the format interface
code. Add definitions and update their usage in the unit test.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When building with CONFIG_STACK_VALIDATION on a ppc64le host with an x86
cross-compiler, Stephen Rothwell saw the following objtool build errors:
DESCEND objtool
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/builtin-check.o
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/special.o
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/elf.o
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/objtool.o
MKDIR /home/sfr/next/x86_64_allmodconfig/tools/objtool/arch/x86/insn/
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/libstring.o
elf.c:22:23: fatal error: sys/types.h: No such file or directory
compilation terminated.
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/exec-cmd.o
CC /home/sfr/next/x86_64_allmodconfig/tools/objtool/help.o
builtin-check.c:28:20: fatal error: string.h: No such file or directory
compilation terminated.
objtool.c:28:19: fatal error: stdio.h: No such file or directory
compilation terminated.
It fails to build because it tries to compile objtool with the
cross-compiler instead of the host compiler.
Ensure that it always uses the host compiler by ignoring CROSS_COMPILE.
In order to do that properly, the libsubcmd.a library needs to be built
in tools/objtool/ rather than tools/lib/subcmd/. The latter directory
contains the cross-compiled version which is needed for perf and
possibly other tools.
Note that cross-compiling for x86 on a _big_ endian system would result
in a bunch of false positive objtool warnings during the kernel build
because it isn't endian-aware. But that's generally a rare edge case
and there haven't been any reports of anybody needing that.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55b63eefc347f1bb28573f972d8d1adbf1f1c31d.1456962210.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When running objtool on a ppc64le host to analyze x86 binaries, it
reports a lot of false warnings like:
ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5
The warnings are caused by the x86 instruction decoder setting the wrong
value for the jump instruction's immediate field because it assumes that
"char == signed char", which isn't true for all architectures. When
converting char to int, gcc sign-extends on x86 but doesn't sign-extend
on ppc64le.
According to the gcc man page, that's a feature, not a bug:
> Each kind of machine has a default for what "char" should be. It is
> either like "unsigned char" by default or like "signed char" by
> default.
>
> Ideally, a portable program should always use "signed char" or
> "unsigned char" when it depends on the signedness of an object.
Conform to the "standards" by changing the "char" casts to "signed
char". This results in no actual changes to the object code on x86.
Note: the x86 decoder now lives in three different locations in the
kernel tree, which are all kept in sync via makefile checks and
warnings: in-kernel, perf, and objtool. This fixes all three locations.
Eventually we should probably try to at least converge the two separate
"tools" locations into a single shared location.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add an extra check for frontend stalled in the metrics. This avoids an
extra column for the --metric-only case when the CPU does not support
frontend stalled.
v2: Add separate init function
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1456858672-21594-8-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When building with gcc 6 we're getting various build warnings that just
require some trivial function declaration and call fixes:
turbostat.c: In function ‘dump_cstate_pstate_config_info’:
turbostat.c:1973:1: warning: type of ‘family’ defaults to ‘int’
dump_cstate_pstate_config_info(family, model)
turbostat.c:1973:1: warning: type of ‘model’ defaults to ‘int’
turbostat.c: In function ‘get_tdp’:
turbostat.c:2145:8: warning: type of ‘model’ defaults to ‘int’
double get_tdp(model)
turbostat.c: In function ‘perf_limit_reasons_probe’:
turbostat.c:2259:6: warning: type of ‘family’ defaults to ‘int’
void perf_limit_reasons_probe(family, model)
turbostat.c:2259:6: warning: type of ‘model’ defaults to ‘int’
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-wbicer8n0s9qe6ql8h9x478e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sa_flags field is not being initialized, so a garbage value is being
passed to sigaction. Initialize it to zero.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1456923322-29697-1-git-send-email-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That got broken by d3a72fd818 ("perf report: Fix indentation of
dynamic entries in hierarchy"), by using the evlist in setup_sorting()
without checking if it is NULL, as done in some 'perf test' entries:
$ find tools/ -name "*.c" | xargs grep 'setup_sorting(NULL);'
tools/perf/tests/hists_output.c: setup_sorting(NULL);
tools/perf/tests/hists_output.c: setup_sorting(NULL);
tools/perf/tests/hists_output.c: setup_sorting(NULL);
tools/perf/tests/hists_output.c: setup_sorting(NULL);
tools/perf/tests/hists_output.c: setup_sorting(NULL);
tools/perf/tests/hists_cumulate.c: setup_sorting(NULL);
tools/perf/tests/hists_cumulate.c: setup_sorting(NULL);
tools/perf/tests/hists_cumulate.c: setup_sorting(NULL);
tools/perf/tests/hists_cumulate.c: setup_sorting(NULL);
$
Fix it.
Before:
[root@jouet ~]# perf test
<SNIP>
15: Test matching and linking multiple hists : FAILED!
16: Try 'import perf' in python, checking link problems : Ok
17: Test breakpoint overflow signal handler : Ok
18: Test breakpoint overflow sampling : Ok
19: Test number of exit event of a simple workload : Ok
20: Test software clock events have valid period values : Ok
21: Test object code reading : Ok
22: Test sample parsing : Ok
23: Test using a dummy software event to keep tracking : Ok
24: Test parsing with no sample_id_all bit set : Ok
25: Test filtering hist entries : FAILED!
26: Test mmap thread lookup : Ok
27: Test thread mg sharing : Ok
28: Test output sorting of hist entries : FAILED!
29: Test cumulation of child hist entries : FAILED!
<SNIP>
After the patch the above failed tests complete successfully.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: d3a72fd818 ("perf report: Fix indentation of dynamic entries in hierarchy")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a long value is read on 32 bit machines for 64 bit output, the
parsing needs to change "%lu" into "%llu", as the value is read
natively.
Unfortunately, if "%llu" is already there, the code will add another "l"
to it and fail to parse it properly.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20160209204237.337024613@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Had a bug where on error of parsing __print_array() where the fields are
freed after they were allocated, but since they were not set to NULL,
the freeing of the arg also tried to free the already freed fields
causing a double free.
Fix process_hex() while at it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20160209204237.188327674@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When rounding to microseconds, if the timestamp subsecond is between
.999999500 and .999999999, it is rounded to .1000000, when it should
instead increment the second counter due to the overflow.
For example, if the timestamp is 1234.999999501 instead of seeing:
1235.000000
we see:
1234.1000000
Signed-off-by: Chaos.Chen <rainboy1215@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20160209204236.824426460@goodmis.org
[ fixed incrementing "secs" instead of decrementing it ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'command_line' variable is free'd twice if db_export__branch_types()
fails. To avoid this, defer the free'ing of 'command_line' to after this
call so that the error return path will just free 'command_line' once.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1456875980-25606-1-git-send-email-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "man gcc" says .i extension represents the file is C source code
that should not be preprocessed. Here, .s should be used.
For clarification,
.c ---(preprocess)---> .i
.S ---(preprocess)---> .s
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Lukas Wunner <lukas@wunner.de>
Link: http://lkml.kernel.org/r/1454263140-19670-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now support CSV output for metrics. With the new output callbacks this
is relatively straight forward by creating new callbacks.
This allows to easily plot metrics from CSV files.
The new line callback needs to know the number of fields to skip them
correctly
Example output before:
% perf stat -x, true
0.200687,,task-clock,200687,100.00
0,,context-switches,200687,100.00
0,,cpu-migrations,200687,100.00
40,,page-faults,200687,100.00
730871,,cycles,203601,100.00
551056,,stalled-cycles-frontend,203601,100.00
<not supported>,,stalled-cycles-backend,0,100.00
385523,,instructions,203601,100.00
78028,,branches,203601,100.00
3946,,branch-misses,203601,100.00
After:
% perf stat -x, true
.502457,,task-clock,502457,100.00,0.485,CPUs utilized
0,,context-switches,502457,100.00,0.000,K/sec
0,,cpu-migrations,502457,100.00,0.000,K/sec
45,,page-faults,502457,100.00,0.090,M/sec
644692,,cycles,509102,100.00,1.283,GHz
423470,,stalled-cycles-frontend,509102,100.00,65.69,frontend cycles idle
<not supported>,,stalled-cycles-backend,0,100.00,,,,
492701,,instructions,509102,100.00,0.76,insn per cycle
,,,,,0.86,stalled cycles per insn
97767,,branches,509102,100.00,194.578,M/sec
4788,,branch-misses,509102,100.00,4.90,of all branches
or easier readable
$ perf stat -x, -o x.csv true
$ column -s, -t x.csv
0.490635 task-clock 490635 100.00 0.489 CPUs utilized
0 context-switches 490635 100.00 0.000 K/sec
0 cpu-migrations 490635 100.00 0.000 K/sec
45 page-faults 490635 100.00 0.092 M/sec
629080 cycles 497698 100.00 1.282 GHz
409498 stalled-cycles-frontend 497698 100.00 65.09 frontend cycles idle
<not supported> stalled-cycles-backend 0 100.00
491424 instructions 497698 100.00 0.78 insn per cycle
0.83 stalled cycles per insn
97278 branches 497698 100.00 198.270 M/sec
4569 branch-misses 497698 100.00 4.70 of all branches
Two new fields are added: metric value and metric name.
v2: Split out function argument changes
v3: Reenable metrics for real.
v4: Fix wrong hunk from refactoring.
v5: Remove extra "noise" printing (Jiri), but add it to the not counted case.
Print empty metrics for not counted.
v6: Avoid outputting metric on empty format.
v7: Print metric at the end
v8: Remove extra run, ena fields
v9: Avoid extra new line for unsupported counters
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1456785386-19481-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_evlist__mmap_ex() can fail without setting errno (for example, fail
in condition checking. In this case all syscall is success).
If this happen, record__open() incorrectly returns 0. Force setting rc
is a quick way to avoid this problem, or we have to follow all possible
code path in perf_evlist__mmap_ex() to make sure there's at least one
system call before returning an error.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-30-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move code for finalizing 'perf.data' to record__finish_output(). It will
be used by following commits to split output to multiple files.
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-23-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Create record__synthesize(). It can be used to create tracking events
for each perf.data after perf supporting splitting into multiple
outputs.
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-20-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commits in a BPF patchkit will extract kernel and module synthesizing
code into a separated function and call it multiple times. This patch
replace 'if (err < 0)' using WARN_ONCE, makes sure the error message
show one time.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-19-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After babeltrace commit 5cec03e402aa ("ir: copy variants and sequences
when setting a field path"), 'perf data convert' gets incorrect result
if there's bpf output data. For example:
# perf data convert --to-ctf ./out.ctf
# babeltrace ./out.ctf
[10:44:31.186045346] (+?.?????????) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810E7DD1, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0xC028E32F, [1] = 0x815D0100, [2] = 0x1000000 ] }
[10:44:31.286101003] (+0.100055657) evt: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF8105B609, perf_tid = 23819, perf_pid = 23819, perf_id = 518, raw_len = 3, raw_data = [ [0] = 0x35D9F1EB, [1] = 0x15D81, [2] = 0x2 ] }
The expected result of the first sample should be:
raw_data = [ [0] = 0x2FE328C0, [1] = 0x15D81, [2] = 0x1 ] }
however, 'perf data convert' output big endian value to resuling CTF
file.
The reason is a internal change (or a bug?) of babeltrace.
Before this patch, at the first add_bpf_output_values(), byte order of
all integer type is uncertain (is 0, neither 1234 (le) nor 4321 (be)).
It would be fixed by:
perf_evlist__deliver_sample
-> process_sample_event
-> ctf_stream
...
->bt_ctf_trace_add_stream_class
->bt_ctf_field_type_structure_set_byte_order
->bt_ctf_field_type_integer_set_byte_order
during creating the stream.
However, the babeltrace commit mentioned above duplicates types in
sequence to prevent potential conflict in following call stack and link
the newly allocated type into the 'raw_data' sequence:
perf_evlist__deliver_sample
-> process_sample_event
-> ctf_stream
...
-> bt_ctf_trace_add_stream_class
-> bt_ctf_stream_class_resolve_types
...
-> bt_ctf_field_type_sequence_copy
->bt_ctf_field_type_integer_copy
This happens before byte order setting, so only the newly allocated
type is initialized, the byte order of original type perf choose to
create the first raw_data is still uncertain.
Byte order in CTF output is not related to byte order in perf.data.
Setting it to anything other than BT_CTF_BYTE_ORDER_NATIVE solves this
problem (only BT_CTF_BYTE_ORDER_NATIVE needs to be fixed). To reduce
behavior changing, set byte order according to compiling options.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456479154-136027-10-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo reported regression on display format of big numbers, which is
missing separators (in default perf stat output).
triton:~/tip> perf stat -a sleep 1
...
127008602 cycles # 0.011 GHz
279538533 stalled-cycles-frontend # 220.09% frontend cycles idle
119213269 instructions # 0.94 insn per cycle
This is caused by recent change:
perf stat: Check existence of frontend/backed stalled cycles
that added call to pmu_have_event, that subsequently calls
perf_pmu__parse_scale, which has a bug in locale handling.
The lc string returned from setlocale, that we use to store old locale
value, may be allocated in static storage. Getting a dynamic copy to
make it survive another setlocale call.
$ perf stat ls
...
2,360,602 cycles # 3.080 GHz
2,703,090 instructions # 1.15 insn per cycle
546,031 branches # 712.511 M/sec
Committer note:
Since the patch introducing the regression didn't made to perf/core,
move it to just before where the regression was introduced, so that we
don't break bisection for this feature.
Reported-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20160303095348.GA24511@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
virtio_ring currently sends the device (usually a hypervisor)
physical addresses of its I/O buffers. This is okay when DMA
addresses and physical addresses are the same thing, but this isn't
always the case. For example, this never works on Xen guests, and
it is likely to fail if a physical "virtio" device ever ends up
behind an IOMMU or swiotlb.
The immediate use case for me is to enable virtio on Xen guests.
For that to work, we need vring to support DMA address translation
as well as a corresponding change to virtio_pci or to another
driver.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Loop in assembly checking the registers with many threads.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.
fork() works nicely for this purpose, the registers should be the same for
both parent and child
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Add include guards to basic_asm.h, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>