Commit Graph

21709 Commits

Author SHA1 Message Date
Yonghong Song
de4e05cac4 bpf: Support bpf tracing/iter programs for BPF_LINK_CREATE
Given a bpf program, the step to create an anonymous bpf iterator is:
  - create a bpf_iter_link, which combines bpf program and the target.
    In the future, there could be more information recorded in the link.
    A link_fd will be returned to the user space.
  - create an anonymous bpf iterator with the given link_fd.

The bpf_iter_link can be pinned to bpffs mount file system to
create a file based bpf iterator as well.

The benefit to use of bpf_iter_link:
  - using bpf link simplifies design and implementation as bpf link
    is used for other tracing bpf programs.
  - for file based bpf iterator, bpf_iter_link provides a standard
    way to replace underlying bpf programs.
  - for both anonymous and free based iterators, bpf link query
    capability can be leveraged.

The patch added support of tracing/iter programs for BPF_LINK_CREATE.
A new link type BPF_LINK_TYPE_ITER is added to facilitate link
querying. Currently, only prog_id is needed, so there is no
additional in-kernel show_fdinfo() and fill_link_info() hook
is needed for BPF_LINK_TYPE_ITER link.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Yonghong Song
15d83c4d7c bpf: Allow loading of a bpf_iter program
A bpf_iter program is a tracing program with attach type
BPF_TRACE_ITER. The load attribute
  attach_btf_id
is used by the verifier against a particular kernel function,
which represents a target, e.g., __bpf_iter__bpf_map
for target bpf_map which is implemented later.

The program return value must be 0 or 1 for now.
  0 : successful, except potential seq_file buffer overflow
      which is handled by seq_file reader.
  1 : request to restart the same object

In the future, other return values may be used for filtering or
teminating the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Jiri Pirko
aa7431123f selftests: mlxsw: tc_restrictions: add couple of test for the correct matchall-flower ordering
Make sure that the drive restricts incorrect order of inserted matchall
vs. flower rules.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
240fe73457 selftests: mlxsw: tc_restrictions: add test to check sample action restrictions
Check that matchall rules with sample actions are not possible to be
inserted to egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jiri Pirko
b886dea37b selftests: mlxsw: rename tc_flower_restrictions.sh to tc_restrictions.sh
The file is about to contain matchall restrictions too, so change the
name to make it more generic.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09 16:02:43 -07:00
Jens Axboe
873f1c8df7 Merge branch 'block-5.7' into for-5.8/block
Pull in block-5.7 fixes for 5.8. Mostly to resolve a conflict with
the blk-iocost changes, but we also need the base of the bdi
use-after-free as well as we build on top of it.

* block-5.7:
  nvme: fix possible hang when ns scanning fails during error recovery
  nvme-pci: fix "slimmer CQ head update"
  bdi: add a ->dev_name field to struct backing_dev_info
  bdi: use bdi_dev_name() to get device name
  bdi: move bdi_dev_name out of line
  vboxsf: don't use the source name in the bdi name
  iocost: protect iocg->abs_vdebt with iocg->waitq.lock
  block: remove the bd_openers checks in blk_drop_partitions
  nvme: prevent double free in nvme_alloc_ns() error handling
  null_blk: Cleanup zoned device initialization
  null_blk: Fix zoned command handling
  block: remove unused header
  blk-iocost: Fix error on iocost_ioc_vrate_adj
  bdev: Reduce time holding bd_mutex in sync in blkdev_close()
  buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:13:58 -06:00
Stanislav Fomichev
8086fbaf49 bpf: Allow any port in bpf_bind helper
We want to have a tighter control on what ports we bind to in
the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
connect() becomes slightly more expensive. The expensive part
comes from the fact that we now need to call inet_csk_get_port()
that verifies that the port is not used and allocates an entry
in the hash table for it.

Since we can't rely on "snum || !bind_address_no_port" to prevent
us from calling POST_BIND hook anymore, let's add another bind flag
to indicate that the call site is BPF program.

v5:
* fix wrong AF_INET (should be AF_INET6) in the bpf program for v6

v3:
* More bpf_bind documentation refinements (Martin KaFai Lau)
* Add UDP tests as well (Martin KaFai Lau)
* Don't start the thread, just do socket+bind+listen (Martin KaFai Lau)

v2:
* Update documentation (Andrey Ignatov)
* Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
2020-05-09 00:48:20 +02:00
Stanislav Fomichev
488a23b89d selftests/bpf: Move existing common networking parts into network_helpers
1. Move pkt_v4 and pkt_v6 into network_helpers and adjust the users.
2. Copy-paste spin_lock_thread into two tests that use it.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-3-sdf@google.com
2020-05-09 00:48:20 +02:00
Stanislav Fomichev
33181bb8e8 selftests/bpf: Generalize helpers to control background listener
Move the following routines that let us start a background listener
thread and connect to a server by fd to the test_prog:
* start_server - socket+bind+listen
* connect_to_fd - connect to the server identified by fd

These will be used in the next commit.

Also, extend these helpers to support AF_INET6 and accept the family
as an argument.

v5:
* drop pthread.h (Martin KaFai Lau)
* add SO_SNDTIMEO (Martin KaFai Lau)

v4:
* export extra helper to start server without a thread (Martin KaFai Lau)
* tcp_rtt is no longer starting background thread (Martin KaFai Lau)

v2:
* put helpers into network_helpers.c (Andrii Nakryiko)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-2-sdf@google.com
2020-05-09 00:48:20 +02:00
Zou Wei
7b0bf99b9e cpupower: Remove unneeded semicolon
Fixes coccicheck warnings:

tools/power/cpupower/utils/cpupower-info.c:65:2-3: Unneeded semicolon
tools/power/cpupower/utils/cpupower-set.c:75:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c:120:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c:175:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c:56:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c:75:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c:82:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/nhm_idle.c:94:2-3: Unneeded semicolon
tools/power/cpupower/utils/idle_monitor/snb_idle.c:80:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 10:13:26 -06:00
Michael Ellerman
851c4df54d selftests/lkdtm: Use grep -E instead of egrep
shellcheck complains that egrep is deprecated, and the grep man page
agrees. Use grep -E instead.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 09:46:17 -06:00
Michael Ellerman
f131d9edc2 selftests/lkdtm: Don't clear dmesg when running tests
It is Very Rude to clear dmesg in test scripts. That's because the
script may be part of a larger test run, and clearing dmesg
potentially destroys the output of other tests.

We can avoid using dmesg -c by saving the content of dmesg before the
test, and then using diff to compare that to the dmesg afterward,
producing a log with just the added lines.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 09:45:50 -06:00
Po-Hsu Lin
adb571649c selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist
The UNRESOLVED state is much more apporiate than the UNSUPPORTED state
for the absence of the test module, as it matches "test was set up
incorrectly" situation in the README file.

A possible scenario is that the function was enabled (supported by the
kernel) but the module was not installed properly, in this case we
cannot call this as UNSUPPORTED.

This change also make it consistent with other module-related tests
in ftrace.

Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 09:43:30 -06:00
Gustavo A. R. Silva
d8238f9eb6 tools/testing: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 09:42:14 -06:00
Linus Torvalds
af38553c66 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes and one selftest to verify the ipc fixes herein"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: limit boost_watermark on small zones
  ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
  mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
  epoll: atomically remove wait entry on wake up
  kselftests: introduce new epoll60 testcase for catching lost wakeups
  percpu: make pcpu_alloc() aware of current gfp context
  mm/slub: fix incorrect interpretation of s->offset
  scripts/gdb: repair rb_first() and rb_last()
  eventpoll: fix missing wakeup for ovflist in ep_poll_callback
  arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
  scripts/decodecode: fix trapping instruction formatting
  kernel/kcov.c: fix typos in kcov_remote_start documentation
  mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  mm, memcg: fix error return value of mem_cgroup_css_alloc()
  ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08 08:41:09 -07:00
John Stultz
4bb9d46d47 kselftests: dmabuf-heaps: Fix confused return value on expected error testing
When I added the expected error testing, I forgot I need to set
the return to zero when we successfully see an error.

Without this change we only end up testing a single heap
before the test quits.

Cc: Shuah Khan <shuah@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Brian Starkey <brian.starkey@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Andrew F. Davis" <afd@ti.com>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-08 09:40:58 -06:00
Paolo Bonzini
45981dedf5 KVM: VMX: pass correct DR6 for GD userspace exit
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM.  Instead,
DR6.BD should be set in order to catch this case.

On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted.  However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:31 -04:00
Kees Cook
8d58f222e8 ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
The documentation for UBSAN_ALIGNMENT already mentions that it should
not be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).

Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc9 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
  Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Roman Penyaev
474328c06e kselftests: introduce new epoll60 testcase for catching lost wakeups
This test case catches lost wake up introduced by commit 339ddb53d3
("fs/epoll: remove unnecessary wakeups of nested epoll")

The test is simple: we have 10 threads and 10 event fds.  Each thread
can harvest only 1 event.  1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.

In case of lost wakeup epoll_wait() will timeout and 0 will be returned.

Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:

  5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")

the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups.  Problem is fixed in the
following patch.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:21 -07:00
Linus Torvalds
192ffb7515 This includes the following tracing fixes:
- Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM enabled
  - Fix allocation leaks in bootconfig tool
  - Fix a double initialization of a variable
  - Fix API bootconfig usage from kprobe boot time events
  - Reject NULL location for kprobes
  - Fix crash caused by preempt delay module not cleaning up kthread
    correctly
  - Add vmalloc_sync_mappings() to prevent x86_64 page faults from
    recursively faulting from tracing page faults
  - Fix comment in gpu/trace kerneldoc header
  - Fix documentation of how to create a trace event class
  - Make the local tracing_snapshot_instance_cond() function static
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXrRUBhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qveTAP4iNCnMeS/Isb+MXQx2Pnu7OP+0BeRP
 2ahlKG2sBgEdnwEAoUzxQoYWtfC6xoM38YwLuZPRlcScRea/5CRHyW8BFQc=
 =o3pV
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM
   enabled

 - Fix allocation leaks in bootconfig tool

 - Fix a double initialization of a variable

 - Fix API bootconfig usage from kprobe boot time events

 - Reject NULL location for kprobes

 - Fix crash caused by preempt delay module not cleaning up kthread
   correctly

 - Add vmalloc_sync_mappings() to prevent x86_64 page faults from
   recursively faulting from tracing page faults

 - Fix comment in gpu/trace kerneldoc header

 - Fix documentation of how to create a trace event class

 - Make the local tracing_snapshot_instance_cond() function static

* tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tools/bootconfig: Fix resource leak in apply_xbc()
  tracing: Make tracing_snapshot_instance_cond() static
  tracing: Fix doc mistakes in trace sample
  gpu/trace: Minor comment updates for gpu_mem_total tracepoint
  tracing: Add a vmalloc_sync_mappings() for safe measure
  tracing: Wait for preempt irq delay thread to finish
  tracing/kprobes: Reject new event if loc is NULL
  tracing/boottime: Fix kprobe event API usage
  tracing/kprobes: Fix a double initialization typo
  bootconfig: Fix to remove bootconfig data from initrd while boot
2020-05-07 15:27:11 -07:00
Linus Torvalds
9ecc4d775f linux-kselftest-5.7-rc5
This Kselftest update for Linux 5.7-rc5 consists of ftrace test fixes
 and fix to kvm Makefile for relocatable native/cross builds and installs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl60SCYACgkQCwJExA0N
 QxzlTQ/+LWXBGbHJPPOT2UPjoa+f5JQgYePxTVbnnG+nG4SQcbHMBGMORwbz7S8e
 o/dc2H+jhfYNnTc2L/JoruJVT7E2CwehBKTZzqCu6AbBWdPNgQCmxFIh5Gos8JMb
 cj9J/7CEtnOAQF7IArGYlyd9gblRc8ACwvgVLRJbKcNFHV0EBRV2Kg5JOE9RaKPK
 xIA2GNb30cA3P4IFykc199zrOsfwf2Dr0Bf1wf3HkO1UnhrFIDsT7PRNBLGWWB+n
 c9Mi9fFmLEJr3yDL6B87Mt7+sUGRb7Jvvz6h8gIhJaE8IpbkD9nq0PhuuQNawFy4
 KS/FYFtRShKU/m+g+Nudg9PJR+vzpL/wR6kp/5Z1W5Us9ba200GZBtFT0x+NUP3n
 fNczO61iwAi4qesfCpKgDfGco754mw1+hUZAdRJkYY6tyZmjrPVpVKgOR8kN4qjP
 anUl1trQsDM5kaScG/6RmBhoAtaU91Mdhcw1/8YLBpoQUQEMxWtwlXdvKkJ3zjfE
 Gs0jBoThuCHAj3IMZmL3dPaa88IA904aelTW6OLnq8BmHEW2938Y/T1L47bHO5C+
 WZnFVzNgEqFTLHqX/MYzmoJ1y+A09fA3Owi6OmsvJ5ZY+LjEiKGXuBHj+VhbwJ4G
 COphD9CcIVx0gQ3Zct8feajloQcGu5v4/El3/3kPQkuZjzhxRq0=
 =jUcW
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "ftrace test fixes and a fix to kvm Makefile for relocatable
  native/cross builds and installs"

* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: fix kvm relocatable native/cross builds and installs
  selftests/ftrace: Make XFAIL green color
  ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
  ftrace/selftests: workaround cgroup RT scheduling issues
2020-05-07 15:22:08 -07:00
Yunfeng Ye
8842604446 tools/bootconfig: Fix resource leak in apply_xbc()
Fix the @data and @fd allocations that are leaked in the error path of
apply_xbc().

Link: http://lkml.kernel.org/r/583a49c9-c27a-931d-e6c2-6f63a4b18bea@huawei.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 14:18:27 -04:00
Paul E. McKenney
f736e0f1a5 Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD
fixes.2020.04.27a:  Miscellaneous fixes.
kfree_rcu.2020.04.27a:  Changes related to kfree_rcu().
rcu-tasks.2020.04.27a:  Addition of new RCU-tasks flavors.
stall.2020.04.27a:  RCU CPU stall-warning updates.
torture.2020.05.07a:  Torture-test updates.
2020-05-07 10:18:32 -07:00
Paul E. McKenney
04dbcdb42f torture: Add a --kasan argument
Make it a bit easier to apply KASAN to rcutorture runs with a new --kasan
argument, again leveraging the config_override_param() bash function.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
409670aa26 torture: Save a few lines by using config_override_param initially
This commit saves a few lines of code by also using the bash
config_override_param() to set the initial list of Kconfig options from
the CFcommon file.  While in the area, it makes this function capable of
update-in-place on the file containing the cumulative Kconfig options,
thus avoiding annoying changes when adding another source of options.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
5b6b4b69ad torture: Allow scenario-specific Kconfig options to override CFcommon
This commit applies config_override_param() to allow scenario-specific
Kconfig options to override those in CFcommon.  This in turn will allow
additional Kconfig options to be placed in CFcommon, for example, an
option common to all but a few scenario can be placed in CFcommon and
then overridden in those few scenarios.  Plus this change saves one
whole line of code.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
3d17ded902 torture: Allow --kconfig options to override --kcsan defaults
Currently, attempting to override a --kcsan default with a --kconfig
option might or might not work.  However, it would be good to allow the
user to adjust the --kcsan defaults, for example, to specify a different
time for CONFIG_KCSAN_REPORT_ONCE_IN_MS.  This commit therefore uses the
new config_override_param() bash function to apply the --kcsan defaults
and then apply the --kconfig options, which allows this overriding
to occur.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
6be63d7d9c torture: Abstract application of additional Kconfig options
This commit introduces a config_override_param() bash function that
folds in an additional set of Kconfig options.  This is initially applied
to fold in the --kconfig kvm.sh parameter, but later commits will also
apply it to the Kconfig options added by the --kcsan kvm.sh parameter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
b5744d3c6c torture: Eliminate duplicate #CHECK# from ConfigFragment
The #CHECK# directives that can be present in CFcommon and in the
rcutorture scenario Kconfig files are both copied to ConfigFragment
and grepped out of the two directive files and added to ConfigFragment.
This commit therefore removes the redundant "grep" commands and takes
advantage of the consequent opportunity to simplify redirection.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
10cec0de11 torture: Make --kcsan argument also create a summary
The KCSAN tool emits a great many warnings for current kernels, for
example, a one-hour run of the full set of rcutorture scenarios results
in no fewer than 3252 such warnings, many of which are duplicates
or are otherwise closely related.  This commit therefore introduces
a kcsan-collapse.sh script that maps these warnings down to a set of
function pairs (22 of them given the 3252 individual warnings), placing
the resulting list in decreasing order of frequency of occurrence into
a kcsan.sum file.  If any KCSAN warnings were produced, the pathname of
this file is emitted at the end of the summary of the rcutorture runs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:29 -07:00
Paul E. McKenney
7226c5cbaa torture: Add --kcsan argument to top-level kvm.sh script
Although the existing --kconfig argument can be used to run KCSAN for
an rcutorture test, it is not as straightforward as one might like:

	--kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y \
		   CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n \
		   CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n \
		   CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 \
		   CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"

This commit therefore adds a "--kcsan" argument that emulates the above
--kconfig command.  Note that if you specify a Kconfig option using
-kconfig that conflicts with one that --kcsan adds, you get whatever
the script and the build system decide to give you.

Cc: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:28 -07:00
Paul E. McKenney
df5916845d rcutorture: Right-size TREE10 CPU consumption
The number of CPUs is tuned to allow "4*CFLIST TREE10" on a large system,
up from "3*CFLIST TREE10" previously.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-05-07 10:15:28 -07:00
Linus Torvalds
8c16ec94dc Bugfixes, mostly for ARM and AMD, and more documentation.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6yqbIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroObBQf+NH9DCs6X92YggAoNpJl6uSIOX35X
 ErdWqYj80Xx95QU73aMukjs3Zqxe6WfYI9jPEOD8SDUZzZlVfIA35D8BYlqt1c5R
 A2K2ebTQbZ+j487QTUPbEvEivyxyVSozwvOdKBfL5kv0D9Cn2STyjVjmguUoCp9n
 VztmwbwpSZdOnexRSolwAWuyOriYbvpV12cIZpcMGrjL67yZPv8UyCxxJplDCLlB
 1c8tvGI2Md8apE/YZDqlCFh3H4YBQsact8uOoyY8cXKO/xIAsZOI+Dhm/cQAhGDk
 QIQqv/hkM4HPvOXQluwIau4Cx+Fl05xY/ggtQt4z/8yml2pOw8PKmwziZA==
 =60QX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, mostly for ARM and AMD, and more documentation.

  Slightly bigger than usual because I couldn't send out what was
  pending for rc4, but there is nothing worrisome going on. I have more
  fixes pending for guest debugging support (gdbstub) but I will send
  them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
  KVM: selftests: Fix build for evmcs.h
  kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
  KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
  docs/virt/kvm: Document configuring and running nested guests
  KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
  kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
  KVM: x86: Fixes posted interrupt check for IRQs delivery modes
  KVM: SVM: fill in kvm_run->debug.arch.dr[67]
  KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
  KVM: arm64: Fix 32bit PC wrap-around
  KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
  KVM: arm64: Save/restore sp_el0 as part of __guest_enter
  KVM: arm64: Delete duplicated label in invalid_vector
  KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
  KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
  KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
  KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
  KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
  KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
  ...
2020-05-07 09:50:59 -07:00
Josh Poimboeuf
1119d265bc objtool: Fix infinite loop in find_jump_table()
Kristen found a hang in objtool when building with -ffunction-sections.

It was caused by evergreen_pcie_gen2_enable.cold() being laid out
immediately before evergreen_pcie_gen2_enable().  Since their "pfunc" is
always the same, find_jump_table() got into an infinite loop because it
didn't recognize the boundary between the two functions.

Fix that with a new prev_insn_same_sym() helper, which doesn't cross
subfunction boundaries.

Reported-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/378b51c9d9c894dc3294bc460b4b0869e950b7c5.1588110291.git.jpoimboe@redhat.com
2020-05-07 17:22:31 +02:00
Qais Yousef
5655585589 cpu/hotplug: Remove disable_nonboot_cpus()
The single user could have called freeze_secondary_cpus() directly.

Since this function was a source of confusion, remove it as it's
just a pointless wrapper.

While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.

Done automatically via:

	git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
2020-05-07 15:18:40 +02:00
Peter Xu
449aa906e6 KVM: selftests: Add KVM_SET_GUEST_DEBUG test
Covers fundamental tests for KVM_SET_GUEST_DEBUG. It is very close to the debug
test in kvm-unit-test, but doing it from outside the guest.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200505205000.188252-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07 06:13:42 -04:00
David S. Miller
3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Linus Torvalds
a811c1fa0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix reference count leaks in various parts of batman-adv, from Xiyu
    Yang.

 2) Update NAT checksum even when it is zero, from Guillaume Nault.

 3) sk_psock reference count leak in tls code, also from Xiyu Yang.

 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
    fq_codel, from Eric Dumazet.

 5) Fix panic in choke_reset(), also from Eric Dumazet.

 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.

 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.

 8) Fix crash in x25_disconnect(), from Yue Haibing.

 9) Don't pass pointer to local variable back to the caller in
    nf_osf_hdr_ctx_init(), from Arnd Bergmann.

10) Wireguard should use the ECN decap helper functions, from Toke
    Høiland-Jørgensen.

11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.

12) Fix uninitialized variable access in mptcp's
    subflow_syn_recv_sock(), from Paolo Abeni.

13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
    Scott Dial.

14) IPv6 needs to use a global serial number for dst validation just
    like ipv4, from David Ahern.

15) Fix up PTP_1588_CLOCK deps, from Clay McClure.

16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
    Yoshiyuki Kurauchi.

17) Fix a regression in that dsa user port errors should not be fatal,
    from Florian Fainelli.

18) Fix iomap leak in enetc driver, from Dejin Zheng.

19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.

20) Initialize protocol value earlier in neigh code paths when
    generating events, from Roman Mashak.

21) netdev_update_features() must be called with RTNL mutex in macsec
    driver, from Antoine Tenart.

22) Validate untrusted GSO packets even more strictly, from Willem de
    Bruijn.

23) Wireguard decrypt worker needs a cond_resched(), from Jason
    Donenfeld.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
  MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
  wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
  wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
  wireguard: send/receive: cond_resched() when processing worker ringbuffers
  wireguard: socket: remove errant restriction on looping to self
  wireguard: selftests: use normal kernel stack size on ppc64
  net: ethernet: ti: am65-cpsw-nuss: fix irqs type
  ionic: Use debugfs_create_bool() to export bool
  net: dsa: Do not leave DSA master with NULL netdev_ops
  net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
  net: stricter validation of untrusted gso packets
  seg6: fix SRH processing to comply with RFC8754
  net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
  net: dsa: ocelot: the MAC table on Felix is twice as large
  net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
  selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
  net: hsr: fix incorrect type usage for protocol variable
  net: macsec: fix rtnl locking issue
  net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
  ...
2020-05-06 20:53:22 -07:00
Jason A. Donenfeld
b673e24aad wireguard: socket: remove errant restriction on looping to self
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.

At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Jason A. Donenfeld
a0fd7cc87a wireguard: selftests: use normal kernel stack size on ppc64
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 20:03:47 -07:00
Alexei Starovoitov
f87b87a1c9 CAP_PERFMON for BPF
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6zMIUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTEOEACWUbZlswquZqSAr6pESTjAbjSPZ58s
 QfmiY218aXvyvwJeINfJduYIjtVjPL9F0qbqmm5Sh4CFflgF9QRUZLxsyuu6K7uY
 Bsy7hjHWUCO79anxgjie1rqhCxT/orW39E0nlDW72AlrebVwPRc4PmERsV9bl/9z
 Z7M7aJzza60938K8qN24A63Q4wb3uCfygUYDGkeaN46jmwlHLj8Qwu/L2pVv2/3O
 7FHEHYFK0UuvVw6byRpbPoHDbBEGApYszRlEUMRtkk/7zsvdIQJGHQpPMD5PkGY1
 kS6x1a7sdnA7++A7Oin7Uq+0y7sgVNeJyPO9o+u9wZgePZL4t87YZlwIL5NlyXIL
 JHDrz0DAckSYEAYuPnFrXtYW2AQ9TZxVuIHGsJdKW8KOdkHkYUqtXwLuj+0xf8v/
 szeJR+l6mOJzDHML7W7KzZdl+AJB/+GE55cLaRPs0bBFyLpUs/vL8BjkoDiDm3/i
 okGIWQkh8PwpujiS/mHDHqoXuVpVHYAcHD0X+zuLIUVCzKf71Kq7y2fIiHyTR4Wo
 +x5aeOFWHYOC2DwFdUQ1EiOUCtLbORqq1CDsnIE4KCtsfx1K6IHmqI8X77D8CTEp
 oSPz0kd8kgBfHwPhCepQ7DnBA1cJTiRbs6/++frU0S5R2HhIOGeLN6hrhrMjNhYD
 07jorSqL6hjjog==
 =+J1X
 -----END PGP SIGNATURE-----

Merge tag 'perf-for-bpf-2020-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next

CAP_PERFMON for BPF
2020-05-06 17:12:44 -07:00
Eric Dumazet
a84724178b selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
Since chunk_size is no longer an integer, we can not
use it directly as an argument of setsockopt().

This patch should fix tcp_mmap for Big Endian kernels.

Fixes: 597b01edaf ("selftests: net: avoid ptl lock contention in tcp_mmap")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 15:01:30 -07:00
Eric Dumazet
bf5525f3a8 selftests: net: tcp_mmap: clear whole tcp_zerocopy_receive struct
We added fields in tcp_zerocopy_receive structure,
so make sure to clear all fields to not pass garbage to the kernel.

We were lucky because recent additions added 'out' parameters,
still we need to clean our reference implementation, before folks
copy/paste it.

Fixes: c8856c0514 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Fixes: 33946518d4 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 13:45:09 -07:00
Peter Xu
8ffdaf9155 KVM: selftests: Fix build for evmcs.h
I got this error when building kvm selftests:

/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
/usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here

I think it's because evmcs.h is included both in a test file and a lib file so
the structs have multiple declarations when linking.  After all it's not a good
habit to declare structs in the header files.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200504220607.99627-1-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06 06:51:36 -04:00
Arnaldo Carvalho de Melo
19ce232173 perf flamegraph: Use /bin/bash for report and record scripts
As all the other tools/perf/scripts/python/bin/*-{report,record}
scripts, fixing the this problem reported by Daniel Diaz:

  Our OpenEmbedded builds detected an issue with 5287f92692 ("perf
  script: Add flamegraph.py script"):
    ERROR: perf-1.0-r9 do_package_qa: QA Issue:
  /usr/libexec/perf-core/scripts/python/bin/flamegraph-report contained
  in package perf-python requires /usr/bin/sh, but no providers found in
  RDEPENDS_perf-python? [file-rdeps]

  This means that there is a new binary pulled in in the shebang line
  which was unaccounted for: `/usr/bin/sh`. I don't see any other usage
  of /usr/bin/sh in the kernel tree (does not even exist on my Ubuntu
  dev machine) but plenty of /bin/sh. This patch is needed:
  -----8<----------8<----------8<-----
  diff --git a/tools/perf/scripts/python/bin/flamegraph-record
  b/tools/perf/scripts/python/bin/flamegraph-record
  index 725d66e71570..a2f3fa25ef81 100755
  --- a/tools/perf/scripts/python/bin/flamegraph-record
  +++ b/tools/perf/scripts/python/bin/flamegraph-record
  @@ -1,2 +1,2 @@
  -#!/usr/bin/sh
  +#!/bin/sh
   perf record -g "$@"
  diff --git a/tools/perf/scripts/python/bin/flamegraph-report
  b/tools/perf/scripts/python/bin/flamegraph-report
  index b1a79afd903b..b0177355619b 100755
  --- a/tools/perf/scripts/python/bin/flamegraph-report
  +++ b/tools/perf/scripts/python/bin/flamegraph-report
  @@ -1,3 +1,3 @@
  -#!/usr/bin/sh
  +#!/bin/sh
   # description: create flame graphs
   perf script -s "$PERF_EXEC_PATH"/scripts/python/flamegraph.py -- "$@"
  ----->8---------->8---------->8-----

Fixes: 5287f92692 ("perf script: Add flamegraph.py script")
Reported-by: Daniel Díaz <daniel.diaz@linaro.org>
Acked-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: lkft-triage@lists.linaro.org
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/CAEUSe7_wmKS361mKLTB1eYbzYXcKkXdU26BX5BojdKRz8MfPCw@mail.gmail.com
Link: http://lore.kernel.org/lkml/20200505170320.GZ30487@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Leo Yan
168200b6d6 perf cs-etm: Move definition of 'traceid_list' global variable from header file
The variable 'traceid_list' is defined in the header file cs-etm.h,
if multiple C files include cs-etm.h the compiler might complaint for
multiple definition of 'traceid_list'.

To fix multiple definition error, move the definition of 'traceid_list'
into cs-etm.c.

Fixes: cd8bfd8c97 ("perf tools: Add processing of coresight metadata")
Reported-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Thomas Backlund <tmb@mageia.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Tor Jeremiassen <tor@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200505133642.4756-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Ian Rogers
32add10f95 libsymbols kallsyms: Move hex2u64 out of header
hex2u64 is a helper that's out of place in kallsyms.h as not being
kallsyms related. Move from kallsyms.h to the only user.

Committer notes:

Move it out of tools/lib/symbol/kallsyms.c as well, as we had to leave
it there in the previous patch lest we break the build.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200501221315.54715-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Ian Rogers
53df2b9344 libsymbols kallsyms: Parse using io api
'perf record' will call kallsyms__parse 4 times during startup and
process megabytes of data. This changes kallsyms__parse to use the io
library rather than fgets to improve performance of the user code by
over 8%.

Before:

  Running 'internals/kallsyms-parse' benchmark:
  Average kallsyms__parse took: 103.988 ms (+- 0.203 ms)

After:

  Running 'internals/kallsyms-parse' benchmark:
  Average kallsyms__parse took: 95.571 ms (+- 0.006 ms)

For a workload like:

  $ perf record /bin/true
  Run under 'perf record -e cycles:u -g' the time goes from:
  Before
  30.10%     1.67%  perf     perf                [.] kallsyms__parse
  After
  25.55%    20.04%  perf     perf                [.] kallsyms__parse

So a little under 5% of the start-up time is removed. A lot of what
remains is on the kernel side, but caching kallsyms within perf would at
least impact memory footprint.

Committer notes:

The internal/kallsyms-parse bench is run using:

  [root@five ~]# perf bench internals kallsyms-parse
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 80.381 ms (+- 0.115 ms)
  [root@five ~]#

And this pre-existing test uses these routines to parse kallsyms and
then compare with the info obtained from the matching ELF symtab:

  [root@five ~]# perf test vmlinux
   1: vmlinux symtab matches kallsyms                       : Ok
  [root@five ~]#

Also we can't remove hex2u64() in this patch as this breaks the build:

  /usr/bin/ld: /tmp/build/perf/perf-in.o: in function `modules__parse':
  /home/acme/git/perf/tools/perf/util/symbol.c:607: undefined reference to `hex2u64'
  /usr/bin/ld: /home/acme/git/perf/tools/perf/util/symbol.c:607: undefined reference to `hex2u64'
  /usr/bin/ld: /tmp/build/perf/perf-in.o: in function `dso__load_perf_map':
  /home/acme/git/perf/tools/perf/util/symbol.c:1477: undefined reference to `hex2u64'
  /usr/bin/ld: /home/acme/git/perf/tools/perf/util/symbol.c:1483: undefined reference to `hex2u64'
  collect2: error: ld returned 1 exit status

Leave it there, move it in the next patch.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200501221315.54715-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Ian Rogers
51876bd452 perf bench: Add kallsyms parsing
Add a benchmark for kallsyms parsing. Example output:

  Running 'internals/kallsyms-parse' benchmark:
  Average kallsyms__parse took: 103.971 ms (+- 0.121 ms)

Committer testing:

Test Machine: AMD Ryzen 5 3600X 6-Core Processor

  [root@five ~]# perf bench internals kallsyms-parse
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 79.692 ms (+- 0.101 ms)
  [root@five ~]# perf stat -r5 perf bench internals kallsyms-parse
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 80.563 ms (+- 0.079 ms)
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 81.046 ms (+- 0.155 ms)
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 80.874 ms (+- 0.104 ms)
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 81.173 ms (+- 0.133 ms)
  # Running 'internals/kallsyms-parse' benchmark:
    Average kallsyms__parse took: 81.169 ms (+- 0.074 ms)

   Performance counter stats for 'perf bench internals kallsyms-parse' (5 runs):

            8,093.54 msec task-clock                #    0.999 CPUs utilized            ( +-  0.14% )
               3,165      context-switches          #    0.391 K/sec                    ( +-  0.18% )
                  10      cpu-migrations            #    0.001 K/sec                    ( +- 23.13% )
                 744      page-faults               #    0.092 K/sec                    ( +-  0.21% )
      34,551,564,954      cycles                    #    4.269 GHz                      ( +-  0.05% )  (83.33%)
       1,160,584,308      stalled-cycles-frontend   #    3.36% frontend cycles idle     ( +-  1.60% )  (83.33%)
      14,974,323,985      stalled-cycles-backend    #   43.34% backend cycles idle      ( +-  0.24% )  (83.33%)
      58,712,905,705      instructions              #    1.70  insn per cycle
                                                    #    0.26  stalled cycles per insn  ( +-  0.01% )  (83.34%)
      14,136,433,778      branches                  # 1746.632 M/sec                    ( +-  0.01% )  (83.33%)
         141,943,217      branch-misses             #    1.00% of all branches          ( +-  0.04% )  (83.33%)

              8.1040 +- 0.0115 seconds time elapsed  ( +-  0.14% )

  [root@five ~]#

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200501221315.54715-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Mike Leach
29e2eb2a9e perf: cs-etm: Update to build with latest opencsd version.
OpenCSD version v0.14.0 adds in a new output element. This is represented
by a new value in the generic element type enum, which must be added to
the handling code in perf cs-etm-decoder to prevent build errors due to
build options on the perf project.

This element is not currently used by the perf decoder.

Perf build feature test updated to require a minimum of 0.14.0

Tested on Linux 5.7-rc3.

Signed-off-by: Mike Leach <mike.leach@linaro.org>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20200501143615.1180-1-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Thomas Richter
51d9635582 perf symbol: Fix kernel symbol address display
Running commands

   ./perf record -e rb0000 -- find .
   ./perf report -v

reveals symbol names and its addresses. There is a mismatch between
kernel symbol and address. Here is an example for kernel symbol
check_chain_key:

 3.55%  find /lib/modules/.../build/vmlinux  0xf11ec  v [k] check_chain_key

This address is off by 0xff000 as can be seen with:

[root@t35lp46 ~]# fgrep check_chain_key /proc/kallsyms
00000000001f00d0 t check_chain_key
[root@t35lp46 ~]# objdump -t ~/linux/vmlinux| fgrep check_chain_key
00000000001f00d0 l     F .text	00000000000001e8 check_chain_key
[root@t35lp46 ~]#

This function is located in main memory 0x1f00d0 - 0x1f02b4. It has
several entries in the perf data file with the correct address:

[root@t35lp46 perf]# ./perf report -D -i perf.data.find-bad | \
				fgrep SAMPLE| fgrep 0x1f01ec
PERF_RECORD_SAMPLE(IP, 0x1): 22228/22228: 0x1f01ec period: 1300000 addr: 0
PERF_RECORD_SAMPLE(IP, 0x1): 22228/22228: 0x1f01ec period: 1300000 addr: 0

The root cause happens when reading symbol tables during perf report.
A long gdb call chain leads to

   machine__deliver_events
     perf_evlist__deliver_event
       perf_evlist__deliver_sample
         build_id__mark_dso_hits
	   thread__find_map(1)      Read correct address from sample entry
	     map__load
	       dso__load            Some more functions to end up in
	         ....
		 dso__load_sym.

Function dso__load_syms  checks for kernel relocation and symbol
adjustment for the kernel and results in kernel map adjustment of
	 kernel .text segment address (0x100000 on s390)
	 kernel .text segment offset in file (0x1000 on s390).
This results in all kernel symbol addresses to be changed by subtracting
0xff000 (on s390). For the symbol check_chain_key we end up with

    0x1f00d0 - 0x100000 + 0x1000 = 0xf11d0

and this address is saved in the perf symbol table. This calculation is
also applied by the mapping functions map__mapip() and map__unmapip()
to map IP addresses to dso mappings.

During perf report processing functions

   process_sample_event    (builtin-report.c)
     machine__resolve
       thread__find_map
     hist_entry_iter_add

are called. Function thread__find_map(1)
takes the correct sample address and applies the mapping function
map__mapip() from the kernel dso and saves the modified address
in struct addr_location for further reference. From now on this address
is used.

Funktion process_sample_event() then calls hist_entry_iter_add() to save
the address in member ip of struct hist_entry.

When samples are displayed using

    perf_evlist__tty_browse_hists
      hists__fprintf
        hist_entry__fprintf
	  hist_entry__snprintf
	    __hist_entry__snprintf
	      _hist_entry__sym_snprintf()

This simply displays the address of the symbol and ignores the dso <-> map
mappings done in function thread__find_map. This leads to the address
mismatch.

Output before:

ot@t35lp46 perf]# ./perf report -v | fgrep check_chain_key
     3.55%  find     /lib/modules/5.6.0d-perf+/build/vmlinux
     						0xf11ec v [k] check_chain_key
[root@t35lp46 perf]#

Output after:

[root@t35lp46 perf]# ./perf report -v | fgrep check_chain_key
     3.55%  find     /lib/modules/5.6.0d-perf+/build/vmlinux
     						0x1f01ec v [k] check_chain_key
[root@t35lp46 perf]#

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200415070744.59919-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:32 -03:00
Arnaldo Carvalho de Melo
b14b36d020 perf inject: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
74aa90e865 perf annotate: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
794bca26e5 perf trace: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
ec98b6df37 perf script: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
3b7313f2d7 perf sched: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
3d65581301 perf lock: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
8cf5d0e09d perf kmem: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
ddc6999eaf perf stat: Rename perf_evsel__*() operating on 'struct evsel *' to evsel__*()
As those is a 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
343977534c perf evsel: Rename perf_evsel__store_ids() to evsel__store_id()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
6e6d1d654e perf evsel: Rename perf_evsel__env() to evsel__env()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
2bb72dbb82 perf evsel: Rename perf_evsel__group_idx() to evsel__group_idx()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
ae4308927e perf evsel: Rename perf_evsel__fallback() to evsel__fallback()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
4f138a9e08 perf evsel: Rename perf_evsel__has*() to evsel__has*()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
e470daeaa3 perf evsel: Rename perf_evsel__{prev,next}() to evsel__{prev,next}()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
6b6017a206 perf evsel: Rename perf_evsel__parse_sample*() to evsel__parse_sample*()
As these are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
ea08969273 perf evsel: Rename *perf_evsel__read*() to *evsel__read()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
53fcfa6b8e perf evsel: Ditch perf_evsel__cmp(), not used for quite a while
In 4c358d5cf3 ("perf stat: Replace transaction event possition check
with id check") all its uses were removed, so ditch it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
c754c382c9 perf evsel: Rename perf_evsel__is_*() to evsel__is*()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Stephane Eranian
3a50dc7605 perf pmu: Add perf_pmu__find_by_type helper
This is used by libpfm4 during event parsing to locate the pmu for an
event.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200429231443.207201-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Stephane Eranian
5ef86146de tools feature: Add support for detecting libpfm4
libpfm4 provides an alternate command line encoding of perf events.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200429231443.207201-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Ian Rogers
4b1984491e perf doc: Pass ASCIIDOC_EXTRA as an argument
commit e9cfa47e68 ("perf doc: allow ASCIIDOC_EXTRA to be an argument")
allowed ASCIIDOC_EXTRA to be passed as an option to the Documentation
Makefile. This change passes ASCIIDOC_EXTRA, set by detected features or
command line options, prior to doing a Documentation build. This is
necessary to allow conditional compilation, based on configuration
variables, in asciidoc code.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiwei Sun <jiwei.sun@windriver.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200429231443.207201-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Ian Rogers
266150c94c perf mem2node: Avoid double free related to realloc
Realloc of size zero is a free not an error, avoid this causing a double
free. Caught by clang's address sanitizer:

==2634==ERROR: AddressSanitizer: attempting double-free on 0x6020000015f0 in thread T0:
    #0 0x5649659297fd in free llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3
    #1 0x5649659e9251 in __zfree tools/lib/zalloc.c:13:2
    #2 0x564965c0f92c in mem2node__exit tools/perf/util/mem2node.c:114:2
    #3 0x564965a08b4c in perf_c2c__report tools/perf/builtin-c2c.c:2867:2
    #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #8 0x564965942e41 in main tools/perf/perf.c:538:3

0x6020000015f0 is located 0 bytes inside of 1-byte region [0x6020000015f0,0x6020000015f1)
freed by thread T0 here:
    #0 0x564965929da3 in realloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x564965c0f55e in mem2node__init tools/perf/util/mem2node.c:97:16
    #2 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
    #3 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #4 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #5 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #6 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #7 0x564965942e41 in main tools/perf/perf.c:538:3

previously allocated by thread T0 here:
    #0 0x564965929c42 in calloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0x5649659e9220 in zalloc tools/lib/zalloc.c:8:9
    #2 0x564965c0f32d in mem2node__init tools/perf/util/mem2node.c:61:12
    #3 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8
    #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10
    #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11
    #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8
    #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2
    #8 0x564965942e41 in main tools/perf/perf.c:538:3

v2: add a WARN_ON_ONCE when the free condition arises.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200320182347.87675-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
efc0cdc9ed perf evsel: Rename perf_evsel__{str,int}val() and other tracepoint field metehods to to evsel__*()
As those are not 'struct evsel' methods, not part of tools/lib/perf/,
aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
aa8c406b0a perf evsel: Rename perf_evsel__open_per_*() to evsel__open_per_*()
As those are not 'struct evsel' methods, not part of tools/lib/perf/,
aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
ad681adf1d perf evsel: Rename perf_evsel__*filter*() to evsel__*filter*()
As those are not 'struct evsel' methods, not part of tools/lib/perf/,
aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
862b2f8fbc perf evsel: Rename *perf_evsel__*set_sample_*() to *evsel__*set_sample_*()
As they are not 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
347c751a64 perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
8ab2e96d8f perf evsel: Rename *perf_evsel__*name() to *evsel__*name()
As they are 'struct evsel' methods or related routines, not part of
tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
2aaefde4d9 perf evsel: Rename __perf_evsel__sample_size() to __evsel__sample_size()
As it is a 'struct evsel' related method, not part of tools/lib/perf/,
aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
4b5e87b741 perf evsel: Rename perf_evsel__calc_id_pos() to evsel__calc_id_pos()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
6ec17b4e25 perf evsel: Rename perf_evsel__config*() to evsel__config*()
As they are all 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
30f7c59124 perf evsel: Rename perf_evsel__exit() to evsel__exit()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
39453ed559 perf evsel: Rename perf_evsel__is_aux_event() to evsel__is_aux_event()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
e76026bdd5 perf evsel: Rename perf_evsel__find_pmu() to evsel__find_pmu()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
12f5261dac perf evsel: Rename perf_evsel__compute_deltas() to evsel__compute_deltas()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
5eb88f0476 perf evsel: Rename perf_evsel__nr_cpus() to evsel__nr_cpus()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
65ddce3fd8 perf evsel: Rename 'struct perf_evsel__sb_cb_t' to 'struct evsel__sb_cb_t'
As the "perf_" prefix should be restricted to functions and types in
tools/lib/perf/, aka libperf, this way we reduce a bit the confusion for
types only in libperf or the ones in the more contained tools/perf/
project.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Adrian Hunter
6dd912cbad perf intel-pt: Update documentation about using /proc/kcore
Update documentation to reflect the advent of the --kcore option for
'perf record'.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Adrian Hunter
43358d9dfb perf intel-pt: Update documentation about itrace G and L options
Provide a little more information about the new G and L options,
particularly the issue with large PEBs.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Adrian Hunter
f0a0251cee perf intel-pt: Add support for synthesizing branch stacks for regular events
Use the new thread_stack__br_sample_late() function to create a thread
stack for regular events.

Example:

 # perf record --kcore --aux-sample -e '{intel_pt//,cycles:ppp}' -c 10000 uname
 Linux
 [ perf record: Woken up 2 times to write data ]
 [ perf record: Captured and wrote 0.743 MB perf.data ]
 # perf report --itrace=Le --stdio | head -30 | tail -18

 # Samples: 11K of event 'cycles:ppp'
 # Event count (approx.): 11648
 #
 # Overhead  Command  Source Shared Object  Source Symbol                 Target Symbol                 Basic Block Cycles
 # ........  .......  ....................  ............................  ............................  ..................
 #
      5.49%  uname    libc-2.30.so          [.] _dl_addr                  [.] _dl_addr                  -
      2.41%  uname    ld-2.30.so            [.] _dl_relocate_object       [.] _dl_relocate_object       -
      2.31%  uname    ld-2.30.so            [.] do_lookup_x               [.] do_lookup_x               -
      2.17%  uname    [kernel.kallsyms]     [k] unmap_page_range          [k] unmap_page_range          -
      2.05%  uname    ld-2.30.so            [k] _dl_start                 [k] _dl_start                 -
      1.97%  uname    ld-2.30.so            [.] _dl_lookup_symbol_x       [.] _dl_lookup_symbol_x       -
      1.94%  uname    [kernel.kallsyms]     [k] filemap_map_pages         [k] filemap_map_pages         -
      1.60%  uname    [kernel.kallsyms]     [k] __handle_mm_fault         [k] __handle_mm_fault         -
      1.44%  uname    [kernel.kallsyms]     [k] page_add_file_rmap        [k] page_add_file_rmap        -
      1.12%  uname    [kernel.kallsyms]     [k] vma_interval_tree_insert  [k] vma_interval_tree_insert  -
      0.94%  uname    [kernel.kallsyms]     [k] perf_iterate_ctx          [k] perf_iterate_ctx          -

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Adrian Hunter
3749e0bbde perf thread-stack: Add thread_stack__br_sample_late()
Add a thread stack function to create a branch stack for hardware events
where the sample records get created some time after the event occurred.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Adrian Hunter
6cd2cbfc68 perf evsel: Add support for synthesized branch stack sample type
Allow for a synthesized branch stack to be added to samples. As with
synthesized call chains, the sample type cannot be changed because it is
needed to continue to parse events. So add and use helper function
evsel__has_br_stack() to indicate a branch stack, whether original or
synthesized.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Adrian Hunter
ec90e42ce5 perf auxtrace: Add option to synthesize branch stack for regular events
There is an existing option to synthesize branch stacks for synthesized
events. Add a new option to synthesize branch stacks for regular events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Adrian Hunter
cf888e08a0 perf intel-pt: Change branch stack support to use thread-stacks
Change Intel PT's branch stack support to use thread stacks. The
advantages of using branch stack support from the thread-stack are:

1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces

This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.

While the 2 approaches are not identical, in simple cases the results
can be identical e.g.

  Before:

    # perf record --kcore -e intel_pt// uname
    # perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp1.txt

  After:

    # perf script --itrace=i10usl -F+brstacksym,+addr,+flags > cmp2.txt
    # diff -s cmp1.txt cmp2.txt
    Files cmp1.txt and cmp2.txt are identical

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Adrian Hunter
1ef998ff18 perf intel-pt: Consolidate thread-stack use condition
The components of the condition do not change, so consolidate them in
one variable.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Adrian Hunter
86d67180b9 perf thread-stack: Add branch stack support
Intel PT already has support for creating branch stacks for each context
(per-cpu or per-thread). In the more common per-cpu case, the branch stack
is not separated for different threads, instead being cleared in between
each sample.

That approach will not work very well for adding branch stacks to
regular events. The branch stacks really need to be accumulated
separately for each thread.

As a start to accomplishing that, this patch adds support for putting
branch stack support into the thread-stack. The advantages are:

1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces

This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20200429150751.12570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Konstantin Khlebnikov
bb629484d9 perf tools: Simplify checking if SMT is active.
SMT now could be disabled via "/sys/devices/system/cpu/smt/control".

Status is shown in "/sys/devices/system/cpu/smt/active" simply as "0" / "1".

If this knob isn't here then fallback to checking topology as before.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/158817741394.748034.9273604089138009552.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Konstantin Khlebnikov
846de4371f perf tools: Fix reading new topology attribute "core_cpus"
Check if access("devices/system/cpu/cpu%d/topology/core_cpus", F_OK)
fails, which will happen unless the current directory is "/sys".

Simply try to read this file first.

Fixes: 0ccdb8407a ("perf tools: Apply new CPU topology sysfs attributes")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/158817718710.747528.11009278875028211991.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Ian Rogers
4599d29212 libperf evlist: Fix a refcount leak
Memory leaks found by applying LLVM's libfuzzer on the tools/perf
parse_events function.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200319023101.82458-2-irogers@google.com
[ Did a minor adjustment due to some other previous patch having already set evlist->all_cpus to NULL at perf_evlist__exit() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Ian Rogers
ba08829aac perf parse-events: Fix another memory leaks found on parse_events()
Fix another memory leak found by applying LLVM's libfuzzer on parse_events().

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Ian Rogers
672f707ef5 perf parse-events: Fix memory leaks found on parse_events
free_list_evsel() deals with tools/perf/ evsels, not with libperf
perf_evsels, use the right destructor and avoid a leak, as
evsel__delete() will delete something perf_evsel__delete() doesn't.

Signed-off-by: Ian Rogers <irogers@google.com>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Ian Rogers
e8dfb81838 perf parse-events: Fix memory leaks found on parse_events
Fix a memory leak found by applying LLVM's libfuzzer on parse_events().

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com
[ split from a larger patch, use zfree() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
He Zhe
44d041b7b2 libperf: Add NULL pointer check for cpu_map iteration and NULL assignment for all_cpus.
A NULL pointer may be passed to perf_cpu_map__cpu and then cause a
crash, such as the one commit cb71f7d43e ("libperf: Setup initial
evlist::all_cpus value") fix.

Signed-off-by: He Zhe <zhe.he@windriver.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kyle Meyer <meyerk@hpe.com>
Link: http://lore.kernel.org/lkml/1583665157-349023-1-git-send-email-zhe.he@windriver.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
23cbb41c93 perf record: Move side band evlist setup to separate routine
It is quite big by now, move that code to a separate
record__setup_sb_evlist() routine.

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-9-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
899e5ffbf2 perf record: Introduce --switch-output-event
Now we can use it with --overwrite to have a flight recorder mode that
gets snapshot requests from arbitrary events that are processed in the
side band thread together with the PERF_RECORD_BPF_EVENT processing.

Example:

To collect scheduler events until a recvmmsg syscall happens, system
wide:

  [root@five a]# rm -f perf.data.2020042717*
  [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042717585458 ]
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042717590235 ]
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042717590398 ]
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Dump perf.data.2020042717590511 ]
  [ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ]

So in the above case we had 3 snapshots, the fourth was forced by
control+C:

  [root@five a]# ls -la
  total 20440
  drwxr-xr-x.  2 root root    4096 Apr 27 17:59 .
  dr-xr-x---. 12 root root    4096 Apr 27 17:46 ..
  -rw-------.  1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458
  -rw-------.  1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235
  -rw-------.  1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398
  -rw-------.  1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511
  [root@five a]#

One can make this more precise by adding the switch output event to the
main -e events list, as since this is done asynchronously, a few events
after the signal event will appear in the snapshots, as can be seen
with:

  [root@five a]# rm -f perf.data.20200427175*
  [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042718024203 ]
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042718024301 ]
  [ perf record: dump data: Woken up 1 times ]
  [ perf record: Dump perf.data.2020042718024484 ]
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Dump perf.data.2020042718024562 ]
  [ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ]
  [root@five a]# perf script -i perf.data.2020042718024203 | tail -15
       PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586...
           swapper      0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=...
    NetworkManager   1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1...
           swapper      0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=...
           swapper      0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=...
    NetworkManager   1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1
   kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894...
           swapper      0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=...
    NetworkManager   1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage...
              perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827...
           swapper      0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1...
           swapper      0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/...
   kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738...
   kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203...
              perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582...
  [root@five a]#

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-8-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
636eb4d001 libsubcmd: Introduce OPT_CALLBACK_SET()
To register that an option was set, like with the upcoming 'perf record
--switch-output-option' one.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-7-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
976be84504 perf evlist: Allow reusing the side band thread for more purposes
I.e. so far we had just one event in that side band thread, a dummy one
with attr.bpf_event set, so that 'perf record' can go ahead and ask the
kernel for further information about BPF programs being loaded.

Allow for more than one event to be there, so that we can use it as
well for the upcoming --switch-output-event feature.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-6-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
9a39994467 perf evlist: Move the sideband thread routines to separate object
To avoid dragging more stuff into the perf python binding in the
following csets.

Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
d0abbc3ce6 perf parse-events: Add parse_events_option() variant that creates evlist
For the upcoming --switch-output-event option we want to create the side
band event, populate it with the specified events and then, if it is
present multiple times, go on adding to it, then, if the BPF tracking is
required, use the first event to set its attr.bpf_event to get those
PERF_RECORD_BPF_EVENT metadata events too.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
b38d85ef49 perf bpf: Decouple creating the evlist from adding the SB event
Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and
requiring that the evlist be allocated beforehand.

This will allow using the same side band thread and evlist to be used
for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon
after they are generated.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
ca6c9c8b10 perf top: Move sb_evlist to 'struct perf_top'
Where state related to a 'perf top' session is grouped.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:29 -03:00
Arnaldo Carvalho de Melo
bc477d7983 perf record: Move sb_evlist to 'struct record'
Where state related to a 'perf record' session is grouped.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200429131106.27974-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:28 -03:00
Arnaldo Carvalho de Melo
40c7d2460e perf tools: Move routines that probe for perf API features to separate file
Trying to disentangle this a bit further, unfortunately it uses
parse_events(), its interesting to have it separated anyway, so do it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:26 -03:00
Tejun Heo
0b80f9866e iocost: protect iocg->abs_vdebt with iocg->waitq.lock
abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
is and controls the activation of use_delay mechanism. Once a cgroup goes
over budget from forced IOs, it has to pay it back with its future budget.
The progress guarantee on debt paying comes from the iocg being active -
active iocgs are processed by the periodic timer, which ensures that as time
passes the debts dissipate and the iocg returns to normal operation.

However, both iocg activation and vdebt handling are asynchronous and a
sequence like the following may happen.

1. The iocg is in the process of being deactivated by the periodic timer.

2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
   without anything because it still sees that the iocg is already active.

3. The iocg is deactivated.

4. The bio from #2 is over budget but needs to be forced. It increases
   abs_vdebt and goes over the threshold and enables use_delay.

5. IO control is enabled for the iocg's subtree and now IOs are attributed
   to the descendant cgroups and the iocg itself no longer issues IOs.

This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
further IOs which can activate it. This can end up unduly punishing all the
descendants cgroups.

The usual throttling path has the same issue - the iocg must be active while
throttled to ensure that future event will wake it up - and solves the
problem by synchronizing the throttling path with a spinlock. abs_vdebt
handling is another form of overage handling and shares a lot of
characteristics including the fact that it isn't in the hottest path.

This patch fixes the above and other possible races by strictly
synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vlad Dmitriev <vvd@fb.com>
Cc: stable@vger.kernel.org # v5.4+
Fixes: e1518f63f2 ("blk-iocost: Don't let merges push vtime into the future")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-05 09:23:18 -06:00
Jakub Kicinski
043b3e2276 devlink: let kernel allocate region snapshot id
Currently users have to choose a free snapshot id before
calling DEVLINK_CMD_REGION_NEW. This is potentially racy
and inconvenient.

Make the DEVLINK_ATTR_REGION_SNAPSHOT_ID optional and try
to allocate id automatically. Send a message back to the
caller with the snapshot info.

Example use:
$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 1

$ id=$(devlink -j region new netdevsim/netdevsim1/dummy | \
       jq '.[][][][]')
$ devlink region dump netdevsim/netdevsim1/dummy snapshot $id
[...]
$ devlink region del netdevsim/netdevsim1/dummy snapshot $id

v4:
 - inline the notification code
v3:
 - send the notification only once snapshot creation completed.
v2:
 - don't wrap the line containing extack;
 - add a few sentences to the docs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04 11:58:31 -07:00
Vincent Cheng
d3f1cbd29f ptp: Add adjust_phase to ptp_clock_caps capability.
Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02 16:31:45 -07:00
David S. Miller
115506fea4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-01 (v2)

The following pull-request contains BPF updates for your *net-next* tree.

We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).

The main changes are:

1) pulled work.sysctl from vfs tree with sysctl bpf changes.

2) bpf_link observability, from Andrii.

3) BTF-defined map in map, from Andrii.

4) asan fixes for selftests, from Andrii.

5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.

6) production cloudflare classifier as a selftes, from Lorenz.

7) bpf_ktime_get_*_ns() helper improvements, from Maciej.

8) unprivileged bpftool feature probe, from Quentin.

9) BPF_ENABLE_STATS command, from Song.

10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.

11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
 from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 17:02:27 -07:00
Stanislav Fomichev
57dc6f3b41 selftests/bpf: Use reno instead of dctcp
Andrey pointed out that we can use reno instead of dctcp for CC
tests and drop CONFIG_TCP_CONG_DCTCP=y requirement.

Fixes: beecf11bc2 ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr")
Suggested-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
2020-05-01 16:51:07 -07:00
Stanislav Fomichev
beecf11bc2 bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the
'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program.
Let's generalize them and make them available for 'struct bpf_sock_addr'.
That way, in the future, we can allow those helpers in more places.

As an example, let's expose those 'struct bpf_sock_addr' based helpers to
BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the
connection is made.

v3:
* Expose custom helpers for bpf_sock_addr context instead of doing
  generic bpf_sock argument (as suggested by Daniel). Even with
  try_socket_lock that doesn't sleep we have a problem where context sk
  is already locked and socket lock is non-nestable.

v2:
* s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
2020-05-01 12:44:28 -07:00
Song Liu
31a9f7fe93 bpf: Add selftest for BPF_ENABLE_STATS
Add test for BPF_ENABLE_STATS, which should enable run_time_ns stats.

~/selftests/bpf# ./test_progs -t enable_stats  -v
test_enable_stats:PASS:skel_open_and_load 0 nsec
test_enable_stats:PASS:get_stats_fd 0 nsec
test_enable_stats:PASS:attach_raw_tp 0 nsec
test_enable_stats:PASS:get_prog_info 0 nsec
test_enable_stats:PASS:check_stats_enabled 0 nsec
test_enable_stats:PASS:check_run_cnt_valid 0 nsec
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-4-songliubraving@fb.com
2020-05-01 10:36:32 -07:00
Song Liu
0bee106716 libbpf: Add support for command BPF_ENABLE_STATS
bpf_enable_stats() is added to enable given stats.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-3-songliubraving@fb.com
2020-05-01 10:36:32 -07:00
Song Liu
d46edd671a bpf: Sharing bpf runtime stats with BPF_ENABLE_STATS
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
Typical userspace tools use kernel.bpf_stats_enabled as follows:

  1. Enable kernel.bpf_stats_enabled;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Disable kernel.bpf_stats_enabled.

The problem with this approach is that only one userspace tool can toggle
this sysctl. If multiple tools toggle the sysctl at the same time, the
measurement may be inaccurate.

To fix this problem while keep backward compatibility, introduce a new
bpf command BPF_ENABLE_STATS. On success, this command enables stats and
returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently,
only one type, BPF_STATS_RUN_TIME, is supported. We can extend the
command to support other types of stats in the future.

With BPF_ENABLE_STATS, user space tool would have the following flow:

  1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid;
  2. Check program run_time_ns;
  3. Sleep for the monitoring period;
  4. Check program run_time_ns again, calculate the difference;
  5. Close the fd.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
2020-05-01 10:36:32 -07:00
Shuah Khan
66d69e081b selftests: fix kvm relocatable native/cross builds and installs
kvm test Makefile doesn't fully support cross-builds and installs.
UNAME_M = $(shell uname -m) variable is used to define the target
programs and libraries to be built from arch specific sources in
sub-directories.

For cross-builds to work, UNAME_M has to map to ARCH and arch specific
directories and targets in this Makefile.

UNAME_M variable to used to run the compiles pointing to the right arch
directories and build the right targets for these supported architectures.

TEST_GEN_PROGS and LIBKVM are set using UNAME_M variable.
LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable.

x86_64 targets are named to include x86_64 as a suffix and directories
for includes are in x86_64 sub-directory. s390x and aarch64 follow the
same convention. "uname -m" doesn't result in the correct mapping for
s390x and aarch64. Fix it to set UNAME_M correctly for s390x and aarch64
cross-builds.

In addition, Makefile doesn't create arch sub-directories in the case of
relocatable builds and test programs under s390x and x86_64 directories
fail to build. This is a problem for native and cross-builds. Fix it to
create all necessary directories keying off of TEST_GEN_PROGS.

The following use-cases work with this change:

Native x86_64:
make O=/tmp/kselftest -C tools/testing/selftests TARGETS=kvm install \
 INSTALL_PATH=$HOME/x86_64

arm64 cross-build:
make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
	CROSS_COMPILE=aarch64-linux-gnu- defconfig

make O=$HOME/arm64_build/ ARCH=arm64 HOSTCC=gcc \
	CROSS_COMPILE=aarch64-linux-gnu- all

make kselftest-install TARGETS=kvm O=$HOME/arm64_build ARCH=arm64 \
	HOSTCC=gcc CROSS_COMPILE=aarch64-linux-gnu-

s390x cross-build:
make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
	CROSS_COMPILE=s390x-linux-gnu- defconfig

make O=$HOME/s390x_build/ ARCH=s390 HOSTCC=gcc \
	CROSS_COMPILE=s390x-linux-gnu- all

make kselftest-install TARGETS=kvm O=$HOME/s390x_build/ ARCH=s390 \
	HOSTCC=gcc CROSS_COMPILE=s390x-linux-gnu- all

No regressions in the following use-cases:
make -C tools/testing/selftests TARGETS=kvm
make kselftest-all TARGETS=kvm

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:47:55 -06:00
Masami Hiramatsu
6734d211fe selftests/ftrace: Make XFAIL green color
Since XFAIL (Expected Failure) is expected to fail the test, which
means that test case works as we expected. IOW, XFAIL is same as
PASS. So make it green.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:31:38 -06:00
Alan Maguire
b730d66813 ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
Currently, ftracetest will return 1 (failure) if any unresolved cases
are encountered.  The unresolved status results from modules and
programs not being available, and as such does not indicate any
issues with ftrace itself.  As such, change the behaviour of
ftracetest in line with unsupported cases; if unsupported cases
happen, ftracetest still returns 0 unless --fail-unsupported.  Here
--fail-unresolved is added and the default is to return 0 if
unresolved results occur.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:26:51 -06:00
Alan Maguire
57c4cfd4a2 ftrace/selftests: workaround cgroup RT scheduling issues
wakeup_rt.tc and wakeup.tc tests in tracers/ subdirectory
fail due to the chrt command returning:

 chrt: failed to set pid 0's policy: Operation not permitted.

To work around this, temporarily disable grout RT scheduling
during ftracetest execution.  Restore original value on
test run completion.  With these changes in place, both
tests consistently pass.

Fixes: c575dea2c1 ("selftests/ftrace: Add wakeup_rt tracer testcase")
Fixes: c1edd060b4 ("selftests/ftrace: Add wakeup tracer testcase")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-05-01 09:25:39 -06:00
Linus Torvalds
75ec0ba2ac linux-kselftest-5.7-rc4
This kselftest update for Linux 5.7-rc4 consists of:
 
 - ftrace test fixes to check for required filter files and kprobe args.
 
 - Kselftest build/cross-build dependency check script to make it easier
   for test ring admins/users to configure build systems correctly for
   build/cross-build kselftests. Currently checks library dependencies.
 
     - Checks if Kselftests can be built/cross-built on a system running
       compile test on a trivial C file with LDLIBS specified for each
       individual test in their Makefiles.
     - Prints suggested target list for a system filtering out tests
       failed the build dependency check from the TARGETS in Selftests
       the main Makefile when optional -p is specified.
     - Prints pass/fail dependency check for each tests/sub-test.
     - Prints pass/fail targets and libraries.
     - Default: runs dependency checks on all tests.
     - Optional test name can be specified to check dependencies for it.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6rC9sACgkQCwJExA0N
 QxwOqBAAuDgdAMKgks2ei8gc1A3NEeOCXmq5NtCeM5H80o+v5hALY4XeEHUAYpdI
 /RC4XJ5sK8l/9AMhwoTxHdOjxGWGb7mznsNjUwqEicwpVxzpMRkcCfUavJdIWw3v
 WkICPFuz5qj2ixdTvrYl1a3sHZr0yXAhoroI+Nbl4N2fhB80T9aZeE/mI6tgc4af
 7pfSR6C4S+e0VocQeqof8uAMuUI4hJXg5nkSRLNLI6RHoTh4dE2Ozxhp+1KaW2P2
 JivGD7mnqEn91r0Ai14uhvGQqJBFdLWhjqVhuSFf8I1cHBbTHu7ZEG+uARKYd8lP
 SKF/AOkWxduBVQCY03gcynlhSjP6wN4/XIg/HMcS7KTyEQHB4nNavQY9XiHB2ARI
 6f/1cKwA2VJMVbSlCMfC+E3/sIG05ht0XwiMhGapDy+B8pWKRiwsKn2GptG7R5th
 b3y78Mkz4jK9OzC5rWY/tzOH9UVdI8sxSa3KXr1X2UDxX21R/XNzEVquj+zTEawo
 jx63We3qLpzoh7d0K7hOCpAdZa6x7HpYGgc4ntUAttQLiALg7R+hEyltnsEd5WUs
 w1hncEwcCGDw4QzIpOAvqlePZcqi+3C6ezo8bHhVJ/X5B6ikzzaB1pUiqpeeJfEj
 J9Pus0+QQUdf19wJQwOjYcrxt8fmKRLPTHPQOAxOWzEnpFeGYzE=
 =/gZD
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:

 - ftrace test fixes to check for required filter files and kprobe args.

 - Kselftest build/cross-build dependency check script to make it easier
   for test ring admins/users to configure build systems correctly for
   build/cross-build kselftests. Currently checks library dependencies.

    - Checks if Kselftests can be built/cross-built on a system running
      compile test on a trivial C file with LDLIBS specified for each
      individual test in their Makefiles.

    - Prints suggested target list for a system filtering out tests
      failed the build dependency check from the TARGETS in Selftests
      the main Makefile when optional -p is specified.

    - Prints pass/fail dependency check for each tests/sub-test.

    - Prints pass/fail targets and libraries.

    - Default: runs dependency checks on all tests.

    - Optional test name can be specified to check dependencies for it.

* tag 'linux-kselftest-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Check the first record for kprobe_args_type.tc
  selftests: add build/cross-build dependency check script
  selftests/ftrace: Check required filter files before running test
2020-04-30 16:28:49 -07:00
Tejun Heo
21f3cfeab3 iocost_monitor: drop string wrap around numbers when outputting json
Wrapping numbers in strings is used by some to work around bit-width issues in
some enviroments. The problem isn't innate to json and the workaround seems to
cause more integration problems than help. Let's drop the string wrapping.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 15:54:45 -06:00
Tejun Heo
f4fe3ea636 iocost_monitor: exit successfully if interval is zero
This is to help external tools to decide whether iocost_monitor has all its
requirements met or not based on the exit status of an -i0 run.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30 15:54:45 -06:00
Alexandre Chartre
8aa8eb2a8f objtool: Add support for intra-function calls
Change objtool to support intra-function calls. On x86, an intra-function
call is represented in objtool as a push onto the stack (of the return
address), and a jump to the destination address. That way the stack
information is correctly updated and the call flow is still accurate.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200414103618.12657-4-alexandre.chartre@oracle.com
2020-04-30 20:14:33 +02:00
Miroslav Benes
b490f45362 objtool: Move the IRET hack into the arch decoder
Quoting Julien:

  "And the other suggestion is my other email was that you don't even
  need to add INSN_EXCEPTION_RETURN. You can keep IRET as
  INSN_CONTEXT_SWITCH by default and x86 decoder lookups the symbol
  conaining an iret. If it's a function symbol, it can just set the type
  to INSN_OTHER so that it caries on to the next instruction after
  having handled the stack_op."

Suggested-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191659.913283807@infradead.org
2020-04-30 20:14:33 +02:00
Peter Zijlstra
b09fb65e86 objtool: Remove INSN_STACK
With the unconditional use of handle_insn_ops(), INSN_STACK has lost
its purpose. Remove it.

Suggested-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191659.854203028@infradead.org
2020-04-30 20:14:33 +02:00
Peter Zijlstra
60041bcd8f objtool: Make handle_insn_ops() unconditional
Now that every instruction has a list of stack_ops; we can trivially
distinquish those instructions that do not have stack_ops, their list
is empty.

This means we can now call handle_insn_ops() unconditionally.

Suggested-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191659.795115188@infradead.org
2020-04-30 20:14:32 +02:00
Peter Zijlstra
7d989fcadd objtool: Rework allocating stack_ops on decode
Wrap each stack_op in a macro that allocates and adds it to the list.
This simplifies trying to figure out what to do with the pre-allocated
stack_op at the end.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191659.736151601@infradead.org
2020-04-30 20:14:32 +02:00
Alexandre Chartre
c721b3f80f objtool: UNWIND_HINT_RET_OFFSET should not check registers
UNWIND_HINT_RET_OFFSET will adjust a modified stack. However if a
callee-saved register was pushed on the stack then the stack frame
will still appear modified. So stop checking registers when
UNWIND_HINT_RET_OFFSET is used.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200407073142.20659-3-alexandre.chartre@oracle.com
2020-04-30 20:14:32 +02:00
Alexandre Chartre
87cf61fe84 objtool: is_fentry_call() crashes if call has no destination
Fix is_fentry_call() so that it works if a call has no destination
set (call_dest). This needs to be done in order to support intra-
function calls.

Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200414103618.12657-2-alexandre.chartre@oracle.com
2020-04-30 20:14:32 +02:00
Peter Zijlstra
7117f16bf4 objtool: Fix ORC vs alternatives
Jann reported that (for instance) entry_64.o:general_protection has
very odd ORC data:

  0000000000000f40 <general_protection>:
  #######sp:sp+8 bp:(und) type:iret end:0
    f40:       90                      nop
  #######sp:(und) bp:(und) type:call end:0
    f41:       90                      nop
    f42:       90                      nop
  #######sp:sp+8 bp:(und) type:iret end:0
    f43:       e8 a8 01 00 00          callq  10f0 <error_entry>
  #######sp:sp+0 bp:(und) type:regs end:0
    f48:       f6 84 24 88 00 00 00    testb  $0x3,0x88(%rsp)
    f4f:       03
    f50:       74 00                   je     f52 <general_protection+0x12>
    f52:       48 89 e7                mov    %rsp,%rdi
    f55:       48 8b 74 24 78          mov    0x78(%rsp),%rsi
    f5a:       48 c7 44 24 78 ff ff    movq   $0xffffffffffffffff,0x78(%rsp)
    f61:       ff ff
    f63:       e8 00 00 00 00          callq  f68 <general_protection+0x28>
    f68:       e9 73 02 00 00          jmpq   11e0 <error_exit>
  #######sp:(und) bp:(und) type:call end:0
    f6d:       0f 1f 00                nopl   (%rax)

Note the entry at 0xf41. Josh found this was the result of commit:

  764eef4b10 ("objtool: Rewrite alt->skip_orig")

Due to the early return in validate_branch() we no longer set
insn->cfi of the original instruction stream (the NOPs at 0xf41 and
0xf42) and we'll end up with the above weirdness.

In other discussions we realized alternatives should be ORC invariant;
that is, due to there being only a single ORC table, it must be valid
for all alternatives. The easiest way to ensure this is to not allow
any stack modifications in alternatives.

When we enforce this latter observation, we get the property that the
whole alternative must have the same CFI, which we can employ to fix
the former report.

Fixes: 764eef4b10 ("objtool: Rewrite alt->skip_orig")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200428191659.499074346@infradead.org
2020-04-30 20:14:31 +02:00
Alexandre Chartre
13fab06d9a objtool: Uniquely identify alternative instruction groups
Assign a unique identifier to every alternative instruction group in
order to be able to tell which instructions belong to what
alternative.

[peterz: extracted from a larger patch]
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
2020-04-30 20:14:31 +02:00
Julien Thierry
9e98d62aa7 objtool: Remove check preventing branches within alternative
While jumping from outside an alternative region to the middle of an
alternative region is very likely wrong, jumping from an alternative
region into the same region is valid. It is a common pattern on arm64.

The first pattern is unlikely to happen in practice and checking only
for this adds a lot of complexity.

Just remove the current check.

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20200327152847.15294-6-jthierry@redhat.com
2020-04-30 20:14:31 +02:00
Will Deacon
bf60333977 Merge branch 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/asm
As agreed with Boris, merge in the 'x86/asm' branch from -tip so that we
can select the new 'ARCH_USE_SYM_ANNOTATIONS' Kconfig symbol, which is
required by the BTI kernel patches.

* 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Provide a Kconfig symbol for disabling old assembly annotations
  x86/32: Remove CONFIG_DOUBLEFAULT
2020-04-30 17:39:42 +01:00
Jakub Sitnicki
c321022244 selftests/bpf: Test allowed maps for bpf_sk_select_reuseport
Check that verifier allows passing a map of type:

 BPF_MAP_TYPE_REUSEPORT_SOCKARRARY, or
 BPF_MAP_TYPE_SOCKMAP, or
 BPF_MAP_TYPE_SOCKHASH

... to bpf_sk_select_reuseport helper.

Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200430104738.494180-1-jakub@cloudflare.com
2020-04-30 16:21:14 +02:00
Andrii Nakryiko
063e688133 libbpf: Fix false uninitialized variable warning
Some versions of GCC falsely detect that vi might not be initialized. That's
not true, but let's silence it with NULL initialization.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200430021436.1522502-1-andriin@fb.com
2020-04-30 16:16:01 +02:00
Kajol Jain
354575c00d perf vendor events power9: Add hv_24x7 socket/chip level metric events
The hv_24×7 feature in IBM® POWER9™ processor-based servers provide the
facility to continuously collect large numbers of hardware performance
metrics efficiently and accurately.

This patch adds hv_24x7  metric file for different Socket/chip
resources.

Result:

power9 platform:

  command:# ./perf stat --metric-only -M Memory_RD_BW_Chip -C 0 -I 1000

     1.000096188          0.9           0.3
     2.000285720          0.5           0.1
     3.000424990          0.4           0.1

  command:# ./perf stat --metric-only -M PowerBUS_Frequency -C 0 -I 1000

     1.000097981          2.3           2.3
     2.000291713          2.3           2.3
     3.000421719          2.3           2.3
     4.000550912          2.3           2.3

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-8-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Kajol Jain
3351c6da89 perf tools: Enable Hz/hz prinitg for --metric-only option
Commit 54b5091606 ("perf stat: Implement --metric-only mode") added
function 'valid_only_metric()' which drops "Hz" or "hz", if it is part
of "ScaleUnit". This patch enable it since hv_24x7 supports couple of
frequency events.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-7-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Kajol Jain
9022608ec5 perf tests expr: Added test for runtime param in metric expression
Added test case for parsing  "?" in metric expression.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-6-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Kajol Jain
1e1a873dc6 perf metricgroups: Enhance JSON/metric infrastructure to handle "?"
Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.

Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created.  By
default it return 1.

This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".

With this patch basically we are trying to create as many metric events
as define by runtime_param.

For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.

To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.

As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.

To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.

For example,

  command:# ./perf stat  -M PowerBUS_Frequency -C 0 -I 1000
    1.000101867  9,356,933  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  PowerBUS_Frequency_0
    1.000101867  9,366,134  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  PowerBUS_Frequency_1
    2.000314878  9,365,868  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  PowerBUS_Frequency_0
    2.000314878  9,366,092  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  PowerBUS_Frequency_1

So, here _0 and _1 after PowerBUS_Frequency specify parameter value.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Shaokun Zhang
454a8be0cf perf pmu: Fix function name in comment, its get_cpuid_str(), not get_cpustr()
get_cpuid_str() is used in tools/perf/arch/xxx/util/header.c,
fix the name in comment.

Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lore.kernel.org/lkml/1588141992-48382-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Zou Wei
6fa9c3e779 perf report: Fix warning assignment of 0/1 to bool variable
Fixes coccicheck warning:

  tools/perf/builtin-report.c:1403:2-34: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1587904683-3510-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:33 -03:00
Zou Wei
8284bbeab7 perf tools: Remove unneeded semicolons
Fixes coccicheck warnings:

  tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon
  tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon
  tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon
  tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon
  tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon
  tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon
  tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:32 -03:00
Zou Wei
2cca512ad2 perf c2c: Remove unneeded semicolon
Fixes coccicheck warnings:

 tools/perf/builtin-c2c.c:1712:2-3: Unneeded semicolon
 tools/perf/builtin-c2c.c:1928:2-3: Unneeded semicolon
 tools/perf/builtin-c2c.c:2962:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1588064336-70456-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:32 -03:00
Zou Wei
eebe80c982 libtraceevent: Remove unneeded semicolon
Fixes coccicheck warning:

 tools/lib/traceevent/kbuffer-parse.c:441:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lore.kernel.org/lkml/1588065121-71236-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:32 -03:00
Stephane Eranian
fad1f1e7de perf script: Remove extraneous newline in perf_sample__fprintf_regs()
When printing iregs, there was a double newline printed because
perf_sample__fprintf_regs() was printing its own and then at the end of
all fields, perf script was adding one.  This was causing blank line in
the output:

Before:

  $ perf script -Fip,iregs
             401b8d ABI:2    DX:0x100    SI:0x4a8340    DI:0x4a9340

             401b8d ABI:2    DX:0x100    SI:0x4a9340    DI:0x4a8340

             401b8d ABI:2    DX:0x100    SI:0x4a8340    DI:0x4a9340

             401b8d ABI:2    DX:0x100    SI:0x4a9340    DI:0x4a8340

After:

  $ perf script -Fip,iregs
             401b8d ABI:2    DX:0x100    SI:0x4a8340    DI:0x4a9340
             401b8d ABI:2    DX:0x100    SI:0x4a9340    DI:0x4a8340
             401b8d ABI:2    DX:0x100    SI:0x4a8340    DI:0x4a9340

Committer testing:

First we need to figure out how to request that registers be recorded,
so we use:

  # perf record -h reg

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -I, --intr-regs[=<any register>]
                            sample selected machine registers on interrupt, use '-I?' to list register names
          --buildid-all     Record build-id of all DSOs regardless of hits
          --user-regs[=<any register>]
                            sample selected machine registers on interrupt, use '--user-regs=?' to list register names

  #

Ok, now lets ask for them all:

  # perf record -a --intr-regs --user-regs sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 4.105 MB perf.data (2760 samples) ]
  #

Lets look at the first 6 output lines:

  # perf script -Fip,iregs | head -6
   ffffffff8a06f2f4 ABI:2    AX:0xffffd168fee0a980    BX:0xffff8a23b087f000    CX:0xfffeb69aaeb25d73    DX:0xffff8a253e8310f0    SI:0xfffffff9bafe7359    DI:0xffffb1690204fb10    BP:0xffffd168fee0a950    SP:0xffffb1690204fb88    IP:0xffffffff8a06f2f4 FLAGS:0x4e    CS:0x10    SS:0x18    R8:0x1495f0a91129a    R9:0xffff8a23b087f000   R10:0x1   R11:0xffffffff   R12:0x0   R13:0xffff8a253e827e00   R14:0xffffd168fee0aa5c   R15:0xffffd168fee0a980

   ffffffff8a06f2f4 ABI:2    AX:0x0    BX:0xffffd168fee0a950    CX:0x5684cc1118491900    DX:0x0    SI:0xffffd168fee0a9d0    DI:0x202    BP:0xffffb1690204fd70    SP:0xffffb1690204fd20    IP:0xffffffff8a06f2f4 FLAGS:0x24e    CS:0x10    SS:0x18    R8:0x0    R9:0xffffd168fee0a9d0   R10:0x1   R11:0xffffffff   R12:0xffffffff8a23e480   R13:0xffff8a23b087f240   R14:0xffff8a23b087f000   R15:0xffffd168fee0a950

   ffffffff8a06f2f4 ABI:2    AX:0x0    BX:0x0    CX:0x7f25f334335b    DX:0x0    SI:0x2400    DI:0x4    BP:0x7fff5f264570    SP:0x7fff5f264538    IP:0xffffffff8a06f2f4 FLAGS:0x24e    CS:0x10    SS:0x2b    R8:0x0    R9:0x2312d20   R10:0x0   R11:0x246   R12:0x22cc0e0   R13:0x0   R14:0x0   R15:0x22d0780

  #

Reproduced, apply the patch and:

[root@five ~]# perf script -Fip,iregs | head -6
 ffffffff8a06f2f4 ABI:2    AX:0xffffd168fee0a980    BX:0xffff8a23b087f000    CX:0xfffeb69aaeb25d73    DX:0xffff8a253e8310f0    SI:0xfffffff9bafe7359    DI:0xffffb1690204fb10    BP:0xffffd168fee0a950    SP:0xffffb1690204fb88    IP:0xffffffff8a06f2f4 FLAGS:0x4e    CS:0x10    SS:0x18    R8:0x1495f0a91129a    R9:0xffff8a23b087f000   R10:0x1   R11:0xffffffff   R12:0x0   R13:0xffff8a253e827e00   R14:0xffffd168fee0aa5c   R15:0xffffd168fee0a980
 ffffffff8a06f2f4 ABI:2    AX:0x0    BX:0xffffd168fee0a950    CX:0x5684cc1118491900    DX:0x0    SI:0xffffd168fee0a9d0    DI:0x202    BP:0xffffb1690204fd70    SP:0xffffb1690204fd20    IP:0xffffffff8a06f2f4 FLAGS:0x24e    CS:0x10    SS:0x18    R8:0x0    R9:0xffffd168fee0a9d0   R10:0x1   R11:0xffffffff   R12:0xffffffff8a23e480   R13:0xffff8a23b087f240   R14:0xffff8a23b087f000   R15:0xffffd168fee0a950
 ffffffff8a06f2f4 ABI:2    AX:0x0    BX:0x0    CX:0x7f25f334335b    DX:0x0    SI:0x2400    DI:0x4    BP:0x7fff5f264570    SP:0x7fff5f264538    IP:0xffffffff8a06f2f4 FLAGS:0x24e    CS:0x10    SS:0x2b    R8:0x0    R9:0x2312d20   R10:0x0   R11:0x246   R12:0x22cc0e0   R13:0x0   R14:0x0   R15:0x22d0780
 ffffffff8a24074b ABI:2    AX:0xcb    BX:0xcb    CX:0x0    DX:0x0    SI:0xffffb1690204ff58    DI:0xcb    BP:0xffffb1690204ff58    SP:0xffffb1690204ff40    IP:0xffffffff8a24074b FLAGS:0x24e    CS:0x10    SS:0x18    R8:0x0    R9:0x0   R10:0x0   R11:0x0   R12:0x0   R13:0x0   R14:0x0   R15:0x0
 ffffffff8a310600 ABI:2    AX:0x0    BX:0xffffffff8b8c39a0    CX:0x0    DX:0xffff8a2503890300    SI:0xffffb1690204ff20    DI:0xffff8a23e4080000    BP:0xffff8a23e4080000    SP:0xffffb1690204fec0    IP:0xffffffff8a310600 FLAGS:0x28e    CS:0x10    SS:0x18    R8:0x0    R9:0x0   R10:0x0   R11:0x0   R12:0xffffffffffffffea   R13:0xffff8a23e4080020   R14:0x0   R15:0x0
 ffffffff8a11b688 ABI:2    AX:0x0    BX:0xffff8a237b7c8800    CX:0xffffb1690204fae0    DX:0x78    SI:0xffff8a237b7c8800    DI:0xffffb1690204fa10    BP:0xffffb1690204fb00    SP:0xffffb1690204fa00    IP:0xffffffff8a11b688 FLAGS:0x8a    CS:0x10    SS:0x18    R8:0x1495f0a917eba    R9:0xffffd168fde19a48   R10:0xffffb1690204fd98   R11:0xffff8a253e82afb0   R12:0xffff8a237b7c8800   R13:0xffffb1690204fb00   R14:0x0   R15:0xffff8a237b7c8800
[root@five ~]#

To see it more clearly, lets get just two of those registers by sample:

  # perf record -a --intr-regs=ax,bx --user-regs=cx,dx sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 3.502 MB perf.data (1653 samples) ]
  #

Extra info, lets see what gets setup in that 'struct perf_event_attr':

  # perf evlist -v
  cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|REGS_USER|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_user: 0xc, sample_regs_intr: 0x3
  #

Cook, some PERF_SAMPLE_REGS_USER|PERF_SAMPLE_REGS_INTR +
attr.sample_regs_user and attr.sample_regs_intr register masks, now lets
see if those newlines are gone in a more compact fashion:

  # perf script -Fip,iregs,uregs
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a29b78d ABI:2    AX:0x2a20ffcd6000    BX:0x2ec7d9000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
  #

And where was that?

  # perf script -Fip,iregs,uregs,sym,dso
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0xffff8a25137b6028    BX:0xffff8a2502f18000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
   ffffffff8a29b78d __vma_link_rb (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2    AX:0x2a20ffcd6000    BX:0x2ec7d9000  ABI:2    CX:0x7f204460e49b    DX:0xf42920
  #

Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200418231908.152212-1-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:32 -03:00
Ian Rogers
2069425eb3 perf synthetic events: Remove use of sscanf from /proc reading
The synthesize benchmark, run on a single process and thread, shows
perf_event__synthesize_mmap_events as the hottest function with fgets
and sscanf taking the majority of execution time.

fscanf performs similarly well. Replace the scanf call with manual
reading of each field of the /proc/pid/maps line, and remove some
unnecessary buffering.

This change also addresses potential, but unlikely, buffer overruns for
the string values read by scanf.

Performance before is:

  $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
  \# Running 'internals/synthesize' benchmark:
  Computing performance of single threaded perf event synthesis by
  synthesizing events on the perf process itself:
    Average synthesis took: 102.810 usec (+- 0.027 usec)
    Average num. events: 17.000 (+- 0.000)
    Average time per event 6.048 usec
    Average data synthesis took: 106.325 usec (+- 0.018 usec)
    Average num. events: 89.000 (+- 0.000)
    Average time per event 1.195 usec
  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
    Number of synthesis threads: 16
      Average synthesis took: 68103.100 usec (+- 441.234 usec)
      Average num. events: 30703.000 (+- 0.730)
      Average time per event 2.218 usec

And after is:

  $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
  \# Running 'internals/synthesize' benchmark:
  Computing performance of single threaded perf event synthesis by
  synthesizing events on the perf process itself:
    Average synthesis took: 50.388 usec (+- 0.031 usec)
    Average num. events: 17.000 (+- 0.000)
    Average time per event 2.964 usec
    Average data synthesis took: 52.693 usec (+- 0.020 usec)
    Average num. events: 89.000 (+- 0.000)
    Average time per event 0.592 usec
  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
    Number of synthesis threads: 16
      Average synthesis took: 45022.400 usec (+- 552.740 usec)
      Average num. events: 30624.200 (+- 10.037)
      Average time per event 1.470 usec

On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.

Committer testing:

On a AMD Ryzen 5 3600X 6-Core Processor:

Before:

  # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
  # Running 'internals/synthesize' benchmark:
  Computing performance of single threaded perf event synthesis by
  synthesizing events on the perf process itself:
    Average synthesis took: 267.491 usec (+- 0.176 usec)
    Average num. events: 56.000 (+- 0.000)
    Average time per event 4.777 usec
    Average data synthesis took: 277.257 usec (+- 0.169 usec)
    Average num. events: 287.000 (+- 0.000)
    Average time per event 0.966 usec
  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
    Number of synthesis threads: 12
      Average synthesis took: 81599.500 usec (+- 346.315 usec)
      Average num. events: 36096.100 (+- 2.523)
      Average time per event 2.261 usec
  #

After:

  # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
  # Running 'internals/synthesize' benchmark:
  Computing performance of single threaded perf event synthesis by
  synthesizing events on the perf process itself:
    Average synthesis took: 110.125 usec (+- 0.080 usec)
    Average num. events: 56.000 (+- 0.000)
    Average time per event 1.967 usec
    Average data synthesis took: 118.518 usec (+- 0.057 usec)
    Average num. events: 287.000 (+- 0.000)
    Average time per event 0.413 usec
  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
    Number of synthesis threads: 12
      Average synthesis took: 43490.700 usec (+- 284.527 usec)
      Average num. events: 37028.500 (+- 0.563)
      Average time per event 1.175 usec
  #

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:29 -03:00
Ian Rogers
e95770af4c tools api: Add a lightweight buffered reading api
The synthesize benchmark shows the majority of execution time going to
fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
reading library that will be used to replace these calls in a follow-up
CL. Add tests for the library to perf test.

Committer tests:

  $ perf test api
  63: Test api io                                           : Ok
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:28 -03:00
Ian Rogers
13edc23720 perf bench: Add a multi-threaded synthesize benchmark
By default this isn't run as it reads /proc and may not have access.
For consistency, modify the single threaded benchmark to compute an
average time per event.

Committer testing:

  $ grep -m1 "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  $ grep "model name" /proc/cpuinfo  | wc -l
  8
  $
  $ perf bench internals synthesize -h
  # Running 'internals/synthesize' benchmark:

   Usage: perf bench internals synthesize <options>

      -I, --multi-iterations <n>
                            Number of iterations used to compute multi-threaded average
      -i, --single-iterations <n>
                            Number of iterations used to compute single-threaded average
      -M, --max-threads <n>
                            Maximum number of threads in multithreaded bench
      -m, --min-threads <n>
                            Minimum number of threads in multithreaded bench
      -s, --st              Run single threaded benchmark
      -t, --mt              Run multi-threaded benchmark

  $
  $ perf bench internals synthesize -t
  # Running 'internals/synthesize' benchmark:
  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
    Number of synthesis threads: 1
      Average synthesis took: 65449.000 usec (+- 586.442 usec)
      Average num. events: 9405.400 (+- 0.306)
      Average time per event 6.959 usec
    Number of synthesis threads: 2
      Average synthesis took: 37838.300 usec (+- 130.259 usec)
      Average num. events: 9501.800 (+- 20.469)
      Average time per event 3.982 usec
    Number of synthesis threads: 3
      Average synthesis took: 48551.400 usec (+- 225.686 usec)
      Average num. events: 9544.000 (+- 0.000)
      Average time per event 5.087 usec
    Number of synthesis threads: 4
      Average synthesis took: 29632.500 usec (+- 50.808 usec)
      Average num. events: 9544.000 (+- 0.000)
      Average time per event 3.105 usec
    Number of synthesis threads: 5
      Average synthesis took: 33920.400 usec (+- 284.509 usec)
      Average num. events: 9544.000 (+- 0.000)
      Average time per event 3.554 usec
    Number of synthesis threads: 6
      Average synthesis took: 27604.100 usec (+- 72.344 usec)
      Average num. events: 9548.000 (+- 0.000)
      Average time per event 2.891 usec
    Number of synthesis threads: 7
      Average synthesis took: 25406.300 usec (+- 933.371 usec)
      Average num. events: 9545.500 (+- 0.167)
      Average time per event 2.662 usec
    Number of synthesis threads: 8
      Average synthesis took: 24110.400 usec (+- 73.229 usec)
      Average num. events: 9551.000 (+- 0.000)
      Average time per event 2.524 usec
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:25 -03:00
Jakub Sitnicki
0b9ad56b1e selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test
Update bpf_sk_assign test to fetch the server socket from SOCKMAP, now that
map lookup from BPF in SOCKMAP is enabled. This way the test TC BPF program
doesn't need to know what address server socket is bound to.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429181154.479310-4-jakub@cloudflare.com
2020-04-29 23:31:00 +02:00
Jakub Sitnicki
34a2cc6eee selftests/bpf: Test that lookup on SOCKMAP/SOCKHASH is allowed
Now that bpf_map_lookup_elem() is white-listed for SOCKMAP/SOCKHASH,
replace the tests which check that verifier prevents lookup on these map
types with ones that ensure that lookup operation is permitted, but only
with a release of acquired socket reference.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429181154.479310-3-jakub@cloudflare.com
2020-04-29 23:30:59 +02:00
Quentin Monnet
0b3b9ca3d1 tools: bpftool: Make libcap dependency optional
The new libcap dependency is not used for an essential feature of
bpftool, and we could imagine building the tool without checks on
CAP_SYS_ADMIN by disabling probing features as an unprivileged users.

Make it so, in order to avoid a hard dependency on libcap, and to ease
packaging/embedding of bpftool.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429144506.8999-4-quentin@isovalent.com
2020-04-29 23:25:11 +02:00
Quentin Monnet
cf9bf71452 tools: bpftool: Allow unprivileged users to probe features
There is demand for a way to identify what BPF helper functions are
available to unprivileged users. To do so, allow unprivileged users to
run "bpftool feature probe" to list BPF-related features. This will only
show features accessible to those users, and may not reflect the full
list of features available (to administrators) on the system.

To avoid the case where bpftool is inadvertently run as non-root and
would list only a subset of the features supported by the system when it
would be expected to list all of them, running as unprivileged is gated
behind the "unprivileged" keyword passed to the command line. When used
by a privileged user, this keyword allows to drop the CAP_SYS_ADMIN and
to list the features available to unprivileged users. Note that this
addsd a dependency on libpcap for compiling bpftool.

Note that there is no particular reason why the probes were restricted
to root, other than the fact I did not need them for unprivileged and
did not bother with the additional checks at the time probes were added.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429144506.8999-3-quentin@isovalent.com
2020-04-29 23:25:11 +02:00
Quentin Monnet
e3450b79df tools: bpftool: For "feature probe" define "full_mode" bool as global
The "full_mode" variable used for switching between full or partial
feature probing (i.e. with or without probing helpers that will log
warnings in kernel logs) was piped from the main do_probe() function
down to probe_helpers_for_progtype(), where it is needed.

Define it as a global variable: the calls will be more readable, and if
other similar flags were to be used in the future, we could use global
variables as well instead of extending again the list of arguments with
new flags.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200429144506.8999-2-quentin@isovalent.com
2020-04-29 23:25:11 +02:00
Andrii Nakryiko
e4e8f4d047 selftests/bpf: Add runqslower binary to .gitignore
With recent changes, runqslower is being copied into selftests/bpf root
directory. So add it into .gitignore.

Fixes: b26d1e2b60 ("selftests/bpf: Copy runqslower to OUTPUT directory")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Veronika Kabatova <vkabatov@redhat.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-12-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
8d30e80a04 selftests/bpf: Fix bpf_link leak in ns_current_pid_tgid selftest
If condition is inverted, but it's also just not necessary.

Fixes: 1c1052e014 ("tools/testing/selftests/bpf: Add self-tests for new helper bpf_get_ns_current_pid_tgid.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Carlos Neira <cneirabustos@gmail.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-11-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
36d0b6159f selftests/bpf: Disable ASAN instrumentation for mmap()'ed memory read
AddressSanitizer assumes that all memory dereferences are done against memory
allocated by sanitizer's malloc()/free() code and not touched by anyone else.
Seems like this doesn't hold for perf buffer memory. Disable instrumentation
on perf buffer callback function.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-10-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
3521ffa2ee libbpf: Fix huge memory leak in libbpf_find_vmlinux_btf_id()
BTF object wasn't freed.

Fixes: a6ed02cac6 ("libbpf: Load btf_vmlinux only once per object.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-9-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
13c908495e selftests/bpf: Fix invalid memory reads in core_relo selftest
Another one found by AddressSanitizer. input_len is bigger than actually
initialized data size.

Fixes: c7566a6969 ("selftests/bpf: Add field existence CO-RE relocs tests")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-8-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
9f56bb531a selftests/bpf: Fix memory leak in extract_build_id()
getline() allocates string, which has to be freed.

Fixes: 81f77fd0de ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-7-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
f25d5416d6 selftests/bpf: Fix memory leak in test selector
Free test selector substrings, which were strdup()'ed.

Fixes: b65053cd94 ("selftests/bpf: Add whitelist/blacklist of test names to test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-6-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
229bf8bf4d libbpf: Fix memory leak and possible double-free in hashmap__clear
Fix memory leak in hashmap_clear() not freeing hashmap_entry structs for each
of the remaining entries. Also NULL-out bucket list to prevent possible
double-free between hashmap__clear() and hashmap__free().

Running test_progs-asan flavor clearly showed this problem.

Reported-by: Alston Tang <alston64@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-5-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
42fce2cfb4 selftests/bpf: Convert test_hashmap into test_progs test
Fold stand-alone test_hashmap test into test_progs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-4-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
02995dd4bb selftests/bpf: Add SAN_CFLAGS param to selftests build to allow sanitizers
Add ability to specify extra compiler flags with SAN_CFLAGS for compilation of
all user-space C files.  This allows to build all of selftest programs with,
e.g., custom sanitizer flags, without requiring support for such sanitizers
from anyone compiling selftest/bpf.

As an example, to compile everything with AddressSanitizer, one would do:

  $ make clean && make SAN_CFLAGS="-fsanitize=address"

For AddressSanitizer to work, one needs appropriate libasan shared library
installed in the system, with version of libasan matching what GCC links
against. E.g., GCC8 needs libasan5, while GCC7 uses libasan4.

For CentOS 7, to build everything successfully one would need to:
  $ sudo yum install devtoolset-8-gcc devtoolset-libasan-devel
  $ scl enable devtoolset-8 bash # set up environment

For Arch Linux to run selftests, one would need to install gcc-libs package to
get libasan.so.5:
  $ sudo pacman -S gcc-libs

N.B. EXTRA_CFLAGS name wasn't used, because it's also used by libbpf's
Makefile and this causes few issues:
1. default "-g -Wall" flags are overriden;
2. compiling shared library with AddressSanitizer generates a bunch of symbols
   like: "_GLOBAL__sub_D_00099_0_btf_dump.c", "_GLOBAL__sub_D_00099_0_bpf.c",
   etc, which screws up versioned symbols check.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Julia Kartseva <hex@fb.com>
Link: https://lore.kernel.org/bpf/20200429012111.277390-3-andriin@fb.com
2020-04-28 19:48:05 -07:00
Andrii Nakryiko
76148faa16 selftests/bpf: Ensure test flavors use correct skeletons
Ensure that test runner flavors include their own skeletons from <flavor>/
directory. Previously, skeletons generated for no-flavor test_progs were used.
Apart from fixing correctness, this also makes it possible to compile only
flavors individually:

  $ make clean && make test_progs-no_alu32
  ... now succeeds ...

Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429012111.277390-2-andriin@fb.com
2020-04-28 19:48:04 -07:00
Andrii Nakryiko
646f02ffdd libbpf: Add BTF-defined map-in-map support
As discussed at LPC 2019 ([0]), this patch brings (a quite belated) support
for declarative BTF-defined map-in-map support in libbpf. It allows to define
ARRAY_OF_MAPS and HASH_OF_MAPS BPF maps without any user-space initialization
code involved.

Additionally, it allows to initialize outer map's slots with references to
respective inner maps at load time, also completely declaratively.

Despite a weak type system of C, the way BTF-defined map-in-map definition
works, it's actually quite hard to accidentally initialize outer map with
incompatible inner maps. This being C, of course, it's still possible, but
even that would be caught at load time and error returned with helpful debug
log pointing exactly to the slot that failed to be initialized.

As an example, here's a rather advanced HASH_OF_MAPS declaration and
initialization example, filling slots #0 and #4 with two inner maps:

  #include <bpf/bpf_helpers.h>

  struct inner_map {
          __uint(type, BPF_MAP_TYPE_ARRAY);
          __uint(max_entries, 1);
          __type(key, int);
          __type(value, int);
  } inner_map1 SEC(".maps"),
    inner_map2 SEC(".maps");

  struct outer_hash {
          __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
          __uint(max_entries, 5);
          __uint(key_size, sizeof(int));
          __array(values, struct inner_map);
  } outer_hash SEC(".maps") = {
          .values = {
                  [0] = &inner_map2,
                  [4] = &inner_map1,
          },
  };

Here's the relevant part of libbpf debug log showing pretty clearly of what's
going on with map-in-map initialization:

  libbpf: .maps relo #0: for 6 value 0 rel.r_offset 96 name 260 ('inner_map1')
  libbpf: .maps relo #0: map 'outer_arr' slot [0] points to map 'inner_map1'
  libbpf: .maps relo #1: for 7 value 32 rel.r_offset 112 name 249 ('inner_map2')
  libbpf: .maps relo #1: map 'outer_arr' slot [2] points to map 'inner_map2'
  libbpf: .maps relo #2: for 7 value 32 rel.r_offset 144 name 249 ('inner_map2')
  libbpf: .maps relo #2: map 'outer_hash' slot [0] points to map 'inner_map2'
  libbpf: .maps relo #3: for 6 value 0 rel.r_offset 176 name 260 ('inner_map1')
  libbpf: .maps relo #3: map 'outer_hash' slot [4] points to map 'inner_map1'
  libbpf: map 'inner_map1': created successfully, fd=4
  libbpf: map 'inner_map2': created successfully, fd=5
  libbpf: map 'outer_hash': created successfully, fd=7
  libbpf: map 'outer_hash': slot [0] set to map 'inner_map2' fd=5
  libbpf: map 'outer_hash': slot [4] set to map 'inner_map1' fd=4

Notice from the log above that fd=6 (not logged explicitly) is used for inner
"prototype" map, necessary for creation of outer map. It is destroyed
immediately after outer map is created.

See also included selftest with some extra comments explaining extra details
of usage. Additionally, similar initialization syntax and libbpf functionality
can be used to do initialization of BPF_PROG_ARRAY with references to BPF
sub-programs. This can be done in follow up patches, if there will be a demand
for this.

  [0] https://linuxplumbersconf.org/event/4/contributions/448/

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200429002739.48006-4-andriin@fb.com
2020-04-28 17:35:03 -07:00
Andrii Nakryiko
2d39d7c56f libbpf: Refactor map creation logic and fix cleanup leak
Factor out map creation and destruction logic to simplify code and especially
error handling. Also fix map FD leak in case of partially successful map
creation during bpf_object load operation.

Fixes: 57a00f4164 ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200429002739.48006-3-andriin@fb.com
2020-04-28 17:35:03 -07:00
Andrii Nakryiko
41017e56af libbpf: Refactor BTF-defined map definition parsing logic
Factor out BTF map definition logic into stand-alone routine for easier reuse
for map-in-map case.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429002739.48006-2-andriin@fb.com
2020-04-28 17:35:03 -07:00
Andrii Nakryiko
5d085ad2e6 bpftool: Add link bash completions
Extend bpftool's bash-completion script to handle new link command and its
sub-commands.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200429001614.1544-11-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
7464d013cc bpftool: Add bpftool-link manpage
Add bpftool-link manpage with information and examples of link-related
commands.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200429001614.1544-10-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
c5481f9a95 bpftool: Add bpf_link show and pin support
Add `bpftool link show` and `bpftool link pin` commands.

Example plain output for `link show` (with showing pinned paths):

[vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool -f link
1: tracing  prog 12
        prog_type tracing  attach_type fentry
        pinned /sys/fs/bpf/my_test_link
        pinned /sys/fs/bpf/my_test_link2
2: tracing  prog 13
        prog_type tracing  attach_type fentry
3: tracing  prog 14
        prog_type tracing  attach_type fentry
4: tracing  prog 15
        prog_type tracing  attach_type fentry
5: tracing  prog 16
        prog_type tracing  attach_type fentry
6: tracing  prog 17
        prog_type tracing  attach_type fentry
7: raw_tracepoint  prog 21
        tp 'sys_enter'
8: cgroup  prog 25
        cgroup_id 584  attach_type egress
9: cgroup  prog 25
        cgroup_id 599  attach_type egress
10: cgroup  prog 25
        cgroup_id 614  attach_type egress
11: cgroup  prog 25
        cgroup_id 629  attach_type egress

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200429001614.1544-9-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
50325b1761 bpftool: Expose attach_type-to-string array to non-cgroup code
Move attach_type_strings into main.h for access in non-cgroup code.
bpf_attach_type is used for non-cgroup attach types quite widely now. So also
complete missing string translations for non-cgroup attach types.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200429001614.1544-8-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
2c2837b09e selftests/bpf: Test bpf_link's get_next_id, get_fd_by_id, and get_obj_info
Extend bpf_obj_id selftest to verify bpf_link's observability APIs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-7-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
0dbc866832 libbpf: Add low-level APIs for new bpf_link commands
Add low-level API calls for bpf_link_get_next_id() and
bpf_link_get_fd_by_id().

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-6-andriin@fb.com
2020-04-28 17:27:08 -07:00
Andrii Nakryiko
f2e10bff16 bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).

Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200429001614.1544-5-andriin@fb.com
2020-04-28 17:27:08 -07:00
Alexei Starovoitov
9b329d0dbe selftests/bpf: fix test_sysctl_prog with alu32
Similar to commit b7a0d65d80 ("bpf, testing: Workaround a verifier failure for test_progs")
fix test_sysctl_prog.c as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-04-28 15:31:59 -07:00
Mauro Carvalho Chehab
cb3f0d56e1 docs: networking: convert filter.txt to ReST
- add SPDX header;
- adjust title markup;
- mark code blocks and literals as such;
- use footnote markup;
- mark tables as such;
- adjust identation, whitespaces and blank lines;
- add to networking/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 14:39:46 -07:00
Jakub Kicinski
0feba2219b selftests: tls: run all tests for TLS 1.2 and TLS 1.3
TLS 1.2 and TLS 1.3 differ in the implementation.
Use fixture parameters to run all tests for both
versions, and remove the one-off TLS 1.2 test.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:30:44 -07:00
Jakub Kicinski
74bc7c97fa kselftest: add fixture variants
Allow users to build parameterized variants of fixtures.

If fixtures want variants, they call FIXTURE_VARIANT() to declare
the structure to fill for each variant. Each fixture will be re-run
for each of the variants defined by calling FIXTURE_VARIANT_ADD()
with the differing parameters initializing the structure.

Since tests are being re-run, additional initialization (steps,
no_print) is also added.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:30:44 -07:00
Jakub Kicinski
e7f3046077 kselftest: run tests by fixture
Now that all tests have a fixture object move from a global
list of tests to a list of tests per fixture.

Order of tests may change as we will now group and run test
fixture by fixture, rather than in declaration order.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:30:44 -07:00
Jakub Kicinski
142aca6b38 kselftest: create fixture objects
Grouping tests by fixture will allow us to parametrize
test runs. Create full objects for fixtures.

Add a "global" fixture for tests without a fixture.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:30:43 -07:00
Jakub Kicinski
1a89595c22 kselftest: factor out list manipulation to a helper
Kees suggest to factor out the list append code to a macro,
since following commits need it, which leads to code duplication.

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 13:30:43 -07:00
Roopa Prabhu
4dddb5be13 selftests: net: add new testcases for nexthop API compat mode sysctl
New tests to check route dump and notifications with
net.ipv4.nexthop_compat_mode on and off.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-28 12:50:37 -07:00
Zou Wei
a6bbdf2e75 libbpf: Remove unneeded semicolon in btf_dump_emit_type
Fixes the following coccicheck warning:

 tools/lib/bpf/btf_dump.c:661:4-5: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1588064829-70613-1-git-send-email-zou_wei@huawei.com
2020-04-28 21:47:47 +02:00
Veronika Kabatova
b26d1e2b60 selftests/bpf: Copy runqslower to OUTPUT directory
$(OUTPUT)/runqslower makefile target doesn't actually create runqslower
binary in the $(OUTPUT) directory. As lib.mk expects all
TEST_GEN_PROGS_EXTENDED (which runqslower is a part of) to be present in
the OUTPUT directory, this results in an error when running e.g. `make
install`:

rsync: link_stat "tools/testing/selftests/bpf/runqslower" failed: No
       such file or directory (2)

Copy the binary into the OUTPUT directory after building it to fix the
error.

Fixes: 3a0d3092a4 ("selftests/bpf: Build runqslower from selftests")
Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200428173742.2988395-1-vkabatov@redhat.com
2020-04-28 21:27:20 +02:00
Jiri Pirko
075c8aa79d selftests: forwarding: tc_actions.sh: add matchall mirror test
Add test for matchall classifier with mirred egress mirror action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 12:43:30 -07:00
Horatiu Vultur
3e54442c93 net: bridge: Add port attribute IFLA_BRPORT_MRP_RING_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.

This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27 11:40:25 -07:00
Paul E. McKenney
b3578186b2 rcutorture: Make kvm-recheck-rcu.sh handle truncated lines
System hangs or killed rcutorture guest OSes can result in truncated
"Reader Pipe:" lines, which can in turn result in false-positive
reader-batch near-miss warnings.  This commit therefore adjusts the
reader-batch checks to account for possible line truncation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:05:13 -07:00
Paul E. McKenney
039f3cc93a rcutorture: Add TRACE02 scenario enabling RCU Tasks Trace IPIs
This commit adds a TRACE02 scenario which enables preemption and RCU
Tasks Trace IPIs, more specifically, disabling heavyweight readers.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:53 -07:00
Paul E. McKenney
c1a76c0b6a rcutorture: Add torture tests for RCU Tasks Trace
This commit adds the definitions required to torture the tracing flavor
of RCU tasks.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Paul E. McKenney
3d6e43c75d rcutorture: Add torture tests for RCU Tasks Rude
This commit adds the definitions required to torture the rude flavor of
RCU tasks.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:51 -07:00
Mark Brown
e5c9a223da Linux 5.7-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6l9D0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGthEH/jEMvU7Hc6zIGNmG
 Akrjf7q6NX+wfmqKIsmSmvvoE1c2OioBYmCzlVz4sQFRj0Yy5WYJcI4Bzh+y8cOA
 0GQ6eQNhfVGyj7uiTClkccK8G20M59HQ1C34Oa/u3Ofoy4S89DiNa5aEY9TxWx9B
 jNV3rCfPgwKaPfwsO5oaIWZd1Ah5mwwwqxICnw7WQfdplQ76eqi/lL7jArncPjmN
 01yyAwsCZyfaeO2NqmHrCOlZkFJcP8Ftj0XeFK94XKdl6VrXuKtEX0JBa3RRWGA+
 KSWBhx4Ml6Q1hnYAIA6T78XKawhoeF+MErlmdBpez4EDd7vCOqz2HXnip6DYh2v7
 wkcvNtg=
 =2hir
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAl6m5oQTHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0JxvB/0UX1QwRfvkWh/ZNrsZRFx2zQ7mdSX+
 oXH5gNMgyC7nUicPVu2Haks3ZJ1rPR8xjaz/iRa9qpWLDNy0Tkez34mnToYYGFvR
 mnOUMin6e5paqFqumNIwUhhtZu0PhzMGvKW7OnXfVRILAQvvuhJbPTAPsBpp53gX
 UHIAUTQRAjSLz7Q8MR2JbrsX/IQLouEgtVKIWGZie5r1ofWWPihVLopfBjDZrOkM
 3fstAoMFaEVbt72bT5uR0adaU/g/7BUUcUmpEqaHfe3H5+7EZWelzTxyTDEUYqEM
 8J4HaOA7vB6IMsjpYxGemp6W03b7l4SgibBNiquVtDDb8Trp44mEiikG
 =f5Qj
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc3' into spi-5.8

Linux 5.7-rc3
2020-04-27 15:04:50 +01:00
Mao Wenan
e411eb257b libbpf: Return err if bpf_object__load failed
bpf_object__load() has various return code, when it failed to load
object, it must return err instead of -EINVAL.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200426063635.130680-3-maowenan@huawei.com
2020-04-27 14:43:20 +02:00
Lorenz Bauer
234589012b selftests/bpf: Add cls_redirect classifier
cls_redirect is a TC clsact based replacement for the glb-redirect iptables
module available at [1]. It enables what GitHub calls "second chance"
flows [2], similarly proposed by the Beamer paper [3]. In contrast to
glb-redirect, it also supports migrating UDP flows as long as connected
sockets are used. cls_redirect is in production at Cloudflare, as part of
our own L4 load balancer.

We have modified the encapsulation format slightly from glb-redirect:
glbgue_chained_routing.private_data_type has been repurposed to form a
version field and several flags. Both have been arranged in a way that
a private_data_type value of zero matches the current glb-redirect
behaviour. This means that cls_redirect will understand packets in
glb-redirect format, but not vice versa.

The test suite only covers basic features. For example, cls_redirect will
correctly forward path MTU discovery packets, but this is not exercised.
It is also possible to switch the encapsulation format to GRE on the last
hop, which is also not tested.

There are two major distinctions from glb-redirect: first, cls_redirect
relies on receiving encapsulated packets directly from a router. This is
because we don't have access to the neighbour tables from BPF, yet. See
forward_to_next_hop for details. Second, cls_redirect performs decapsulation
instead of using separate ipip and sit tunnel devices. This
avoids issues with the sit tunnel [4] and makes deploying the classifier
easier: decapsulated packets appear on the same interface, so existing
firewall rules continue to work as expected.

The code base started it's life on v4.19, so there are most likely still
hold overs from old workarounds. In no particular order:

- The function buf_off is required to defeat a clang optimization
  that leads to the verifier rejecting the program due to pointer
  arithmetic in the wrong order.

- The function pkt_parse_ipv6 is force inlined, because it would
  otherwise be rejected due to returning a pointer to stack memory.

- The functions fill_tuple and classify_tcp contain kludges, because
  we've run out of function arguments.

- The logic in general is rather nested, due to verifier restrictions.
  I think this is either because the verifier loses track of constants
  on the stack, or because it can't track enum like variables.

1: https://github.com/github/glb-director/tree/master/src/glb-redirect
2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md
3: https://www.usenix.org/conference/nsdi18/presentation/olteanu
4: https://github.com/github/glb-director/issues/64

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200424185556.7358-2-lmb@cloudflare.com
2020-04-26 10:00:36 -07:00
Andrii Nakryiko
6f8a57ccf8 bpf: Make verifier log more relevant by default
To make BPF verifier verbose log more releavant and easier to use to debug
verification failures, "pop" parts of log that were successfully verified.
This has effect of leaving only verifier logs that correspond to code branches
that lead to verification failure, which in practice should result in much
shorter and more relevant verifier log dumps. This behavior is made the
default behavior and can be overriden to do exhaustive logging by specifying
BPF_LOG_LEVEL2 log level.

Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some
cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump
verbosity, but still have only relevant verifier branches logged. But for this
patch, I didn't want to add any new flags. It might be worth-while to just
rethink how BPF verifier logging is performed and requested and streamline it
a bit. But this trimming of successfully verified branches seems to be useful
and a good default behavior.

To test this, I modified runqslower slightly to introduce read of
uninitialized stack variable. Log (**truncated in the middle** to save many
lines out of this commit message) BEFORE this change:

; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)

[ ... instruction dump from insn #7 through #50 are cut out ... ]

51: (b7) r2 = 16
52: (85) call bpf_get_current_comm#16
last_idx 52 first_idx 42
regs=4 stack=0 before 51: (b7) r2 = 16
; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
53: (bf) r1 = r6
54: (18) r2 = 0xffff8881f3868800
56: (18) r3 = 0xffffffff
58: (bf) r4 = r7
59: (b7) r5 = 32
60: (85) call bpf_perf_event_output#25
last_idx 60 first_idx 53
regs=20 stack=0 before 59: (b7) r5 = 32
61: (bf) r2 = r10
; event.pid = pid;
62: (07) r2 += -16
; bpf_map_delete_elem(&start, &pid);
63: (18) r1 = 0xffff8881f3868000
65: (85) call bpf_map_delete_elem#3
; }
66: (b7) r0 = 0
67: (95) exit

from 44 to 66: safe

from 34 to 66: safe

from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881f3868000
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4

Notice how there is a successful code path from instruction 0 through 67, few
successfully verified jumps (44->66, 34->66), and only after that 11->28 jump
plus error on instruction #32.

AFTER this change (full verifier log, **no truncation**):

; int handle__sched_switch(u64 *ctx)
0: (bf) r6 = r1
; struct task_struct *prev = (struct task_struct *)ctx[1];
1: (79) r1 = *(u64 *)(r6 +8)
func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct'
2: (b7) r2 = 0
; struct event event = {};
3: (7b) *(u64 *)(r10 -24) = r2
last_idx 3 first_idx 0
regs=4 stack=0 before 2: (b7) r2 = 0
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
; if (prev->state == TASK_RUNNING)
7: (79) r2 = *(u64 *)(r1 +16)
; if (prev->state == TASK_RUNNING)
8: (55) if r2 != 0x0 goto pc+19
 R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; trace_enqueue(prev->tgid, prev->pid);
9: (61) r1 = *(u32 *)(r1 +1184)
10: (63) *(u32 *)(r10 -4) = r1
; if (!pid || (targ_pid && targ_pid != pid))
11: (15) if r1 == 0x0 goto pc+16

from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000
; bpf_map_update_elem(&start, &pid, &ts, 0);
28: (bf) r2 = r10
;
29: (07) r2 += -16
; tsp = bpf_map_lookup_elem(&start, &pid);
30: (18) r1 = 0xffff8881db3ce800
32: (85) call bpf_map_lookup_elem#1
invalid indirect read from stack off -16+0 size 4
processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4

Notice how in this case, there are 0-11 instructions + jump from 11 to
28 is recorded + 28-32 instructions with error on insn #32.

test_verifier test runner was updated to specify BPF_LOG_LEVEL2 for
VERBOSE_ACCEPT expected result due to potentially "incomplete" success verbose
log at BPF_LOG_LEVEL1.

On success, verbose log will only have a summary of number of processed
instructions, etc, but no branch tracing log. Having just a last succesful
branch tracing seemed weird and confusing. Having small and clean summary log
in success case seems quite logical and nice, though.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200423195850.1259827-1-andriin@fb.com
2020-04-26 09:47:37 -07:00
Maciej Żenczykowski
71d1921477 bpf: add bpf_ktime_get_boot_ns()
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.

Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-04-26 09:43:05 -07:00
Yoshiki Komachi
ae460c0224 bpf_helpers.h: Add note for building with vmlinux.h or linux/types.h
The following error was shown when a bpf program was compiled without
vmlinux.h auto-generated from BTF:

 # clang -I./linux/tools/lib/ -I/lib/modules/$(uname -r)/build/include/ \
   -O2 -Wall -target bpf -emit-llvm -c bpf_prog.c -o bpf_prog.bc
 ...
 In file included from linux/tools/lib/bpf/bpf_helpers.h:5:
 linux/tools/lib/bpf/bpf_helper_defs.h:56:82: error: unknown type name '__u64'
 ...

It seems that bpf programs are intended for being built together with
the vmlinux.h (which will have all the __u64 and other typedefs). But
users may mistakenly think "include <linux/types.h>" is missing
because the vmlinux.h is not common for non-bpf developers. IMO, an
explicit comment therefore should be added to bpf_helpers.h as this
patch shows.

Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1587427527-29399-1-git-send-email-komachi.yoshiki@gmail.com
2020-04-26 08:40:01 -07:00
Stanislav Fomichev
0456ea170c bpf: Enable more helpers for BPF_PROG_TYPE_CGROUP_{DEVICE,SYSCTL,SOCKOPT}
Currently the following prog types don't fall back to bpf_base_func_proto()
(instead they have cgroup_base_func_proto which has a limited set of
helpers from bpf_base_func_proto):
* BPF_PROG_TYPE_CGROUP_DEVICE
* BPF_PROG_TYPE_CGROUP_SYSCTL
* BPF_PROG_TYPE_CGROUP_SOCKOPT

I don't see any specific reason why we shouldn't use bpf_base_func_proto(),
every other type of program (except bpf-lirc and, understandably, tracing)
use it, so let's fall back to bpf_base_func_proto for those prog types
as well.

This basically boils down to adding access to the following helpers:
* BPF_FUNC_get_prandom_u32
* BPF_FUNC_get_smp_processor_id
* BPF_FUNC_get_numa_node_id
* BPF_FUNC_tail_call
* BPF_FUNC_ktime_get_ns
* BPF_FUNC_spin_lock (CAP_SYS_ADMIN)
* BPF_FUNC_spin_unlock (CAP_SYS_ADMIN)
* BPF_FUNC_jiffies64 (CAP_SYS_ADMIN)

I've also added bpf_perf_event_output() because it's really handy for
logging and debugging.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200420174610.77494-1-sdf@google.com
2020-04-26 08:40:01 -07:00
Jagadeesh Pagadala
93e5168947 tools/bpf/bpftool: Remove duplicate headers
Code cleanup: Remove duplicate headers which are included twice.

Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/1587274757-14101-1-git-send-email-jagdsh.linux@gmail.com
2020-04-26 08:40:01 -07:00
Josh Poimboeuf
53fb6e990d objtool: Fix infinite loop in for_offset_range()
Randy reported that objtool got stuck in an infinite loop when
processing drivers/i2c/busses/i2c-parport.o.  It was caused by the
following code:

  00000000000001fd <line_set>:
   1fd:	48 b8 00 00 00 00 00	movabs $0x0,%rax
   204:	00 00 00
			1ff: R_X86_64_64	.rodata-0x8
   207:	41 55                	push   %r13
   209:	41 89 f5             	mov    %esi,%r13d
   20c:	41 54                	push   %r12
   20e:	49 89 fc             	mov    %rdi,%r12
   211:	55                   	push   %rbp
   212:	48 89 d5             	mov    %rdx,%rbp
   215:	53                   	push   %rbx
   216:	0f b6 5a 01          	movzbl 0x1(%rdx),%ebx
   21a:	48 8d 34 dd 00 00 00 	lea    0x0(,%rbx,8),%rsi
   221:	00
			21e: R_X86_64_32S	.rodata
   222:	48 89 f1             	mov    %rsi,%rcx
   225:	48 29 c1             	sub    %rax,%rcx

find_jump_table() saw the .rodata reference and tried to find a jump
table associated with it (though there wasn't one).  The -0x8 rela
addend is unusual.  It caused find_jump_table() to send a negative
table_offset (unsigned 0xfffffffffffffff8) to find_rela_by_dest().

The negative offset should have been harmless, but it actually threw
for_offset_range() for a loop... literally.  When the mask value got
incremented past the end value, it also wrapped to zero, causing the
loop exit condition to remain true forever.

Prevent this scenario from happening by ensuring the incremented value
is always >= the starting value.

Fixes: 74b873e49d ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/02b719674b031800b61e33c30b2e823183627c19.1587842122.git.jpoimboe@redhat.com
2020-04-26 09:28:14 +02:00
David S. Miller
d483389678 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Simple overlapping changes to linux/vermagic.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25 20:18:53 -07:00
Linus Torvalds
9b3e59e3de Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit systems.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl6j+9ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j3Sw/+K7EWwSqtNfF03ebs7YFqb2L3rifS8nnL
 wB1G/kK2rErvFMt3+IDI2zPn8E+LXEysI3O6K4LbFpBYP5g5dl0fz5+CZ2AzEfyn
 i1iJ9vh/MU76T3jv8oMdnU56WoMTBjY/gBBvcJhbCLXzn/m7jxXgjzC4tZiVjDNP
 sXuKGR3iZiwjlPA3CMFdQFymc1MelfAWQIVNxPIKWgDM3KnxcxY/awox6t+xs/cv
 pnH7pb9xieBmgzaizT1IlgGy8tV+l5Lisycv7Xd+MiPGi1QFrCqjtEXS+joPe921
 wVbksIzN7qYqHHYzMCeWQlQeiRrydhJDfFbc67DNCGx9Qg1bttuHJ2EW3WhHba/v
 +XXJgxvZpopfjOYoaJsOO5w4q6eGWcjYuHA7oQ3/8FAt5aoUItiLLm0492zNKIQ4
 PmTks++4bu7GGwgWwVKcZz7+uDX+NLHUSMZfb/w+3Fa1O6Xeg0IRNzgGfvVW2VtL
 W/iUsCw80IVOiKrC0ZIv7I9nwc1xpJq1PnwpaVTz85gs4K/ghoZtYyyIlMiH3mQu
 FBbnCPWk8/Rw/RhxqTc7zLwTyKzH9Bw8+Uj9OU/FoFERBZ1BTS/39vog1Wn9xthQ
 qqsWGAxmtjuKyWogt4SxVNXdH8s9hq42tR1XvIeDf3HaAIyQWSmMCLRQXpRPn/XZ
 1yMzNFFiyD8=
 =mIzT
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
 "Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit
  systems"

* tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix off-by-one in symbol_by_offset()
  objtool: Fix 32bit cross builds
2020-04-25 11:52:02 -07:00
Josh Poimboeuf
d8dd25a461 objtool: Fix stack offset tracking for indirect CFAs
When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops.  This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.

This fixes the following unwinder warning:

  WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0

Fixes: 627fce1480 ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
2020-04-25 12:22:27 +02:00
Linus Torvalds
ab51cac00e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix memory leak in netfilter flowtable, from Roi Dayan.

 2) Ref-count leaks in netrom and tipc, from Xiyu Yang.

 3) Fix warning when mptcp socket is never accepted before close, from
    Florian Westphal.

 4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.

 5) Fix large delays during PTP synchornization in cxgb4, from Rahul
    Lakkireddy.

 6) team_mode_get() can hang, from Taehee Yoo.

 7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
    from Niklas Schnelle.

 8) Fix handling of bpf XADD on BTF memory, from Jann Horn.

 9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.

10) Missing queue memory release in iwlwifi pcie code, from Johannes
    Berg.

11) Fix NULL deref in macvlan device event, from Taehee Yoo.

12) Initialize lan87xx phy correctly, from Yuiko Oshino.

13) Fix looping between VRF and XFRM lookups, from David Ahern.

14) etf packet scheduler assumes all sockets are full sockets, which is
    not necessarily true. From Eric Dumazet.

15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.

16) fib_select_default() needs to handle nexthop objects, from David
    Ahern.

17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.

18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.

19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.

20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
    from Luke Nelson.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
  selftests/bpf: Fix a couple of broken test_btf cases
  tools/runqslower: Ensure own vmlinux.h is picked up first
  bpf: Make bpf_link_fops static
  bpftool: Respect the -d option in struct_ops cmd
  selftests/bpf: Add test for freplace program with expected_attach_type
  bpf: Propagate expected_attach_type when verifying freplace programs
  bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
  bpf, x86_32: Fix logic error in BPF_LDX zero-extension
  bpf, x86_32: Fix clobbering of dst for BPF_JSET
  bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
  bpf: Fix reStructuredText markup
  net: systemport: suppress warnings on failed Rx SKB allocations
  net: bcmgenet: suppress warnings on failed Rx SKB allocations
  macsec: avoid to set wrong mtu
  mac80211: sta_info: Add lockdep condition for RCU list usage
  mac80211: populate debugfs only after cfg80211 init
  net: bcmgenet: correct per TX/RX ring statistics
  net: meth: remove spurious copyright text
  net: phy: bcm84881: clear settings on link down
  chcr: Fix CPU hard lockup
  ...
2020-04-24 19:17:30 -07:00
David S. Miller
167ff131cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-04-24

The following pull-request contains BPF updates for your *net* tree.

We've added 17 non-merge commits during the last 5 day(s) which contain
a total of 19 files changed, 203 insertions(+), 85 deletions(-).

The main changes are:

1) link_update fix, from Andrii.

2) libbpf get_xdp_id fix, from David.

3) xadd verifier fix, from Jann.

4) x86-32 JIT fixes, from Luke and Wang.

5) test_btf fix, from Stanislav.

6) freplace verifier fix, from Toke.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-24 18:26:14 -07:00
Stanislav Fomichev
e1cebd841b selftests/bpf: Fix a couple of broken test_btf cases
Commit 51c39bb1d5 ("bpf: Introduce function-by-function verification")
introduced function linkage flag and changed the error message from
"vlen != 0" to "Invalid func linkage" and broke some fake BPF programs.

Adjust the test accordingly.

AFACT, the programs don't really need any arguments and only look
at BTF for maps, so let's drop the args altogether.

Before:
BTF raw test[103] (func (Non zero vlen)): do_test_raw:3703:FAIL expected
err_str:vlen != 0
magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 72
str_off: 72
str_len: 10
btf_total_size: 106
[1] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=(none)
[3] FUNC_PROTO (anon) return=0 args=(1 a, 2 b)
[4] FUNC func type_id=3 Invalid func linkage

BTF libbpf test[1] (test_btf_haskv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_haskv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[2] (test_btf_newkv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_newkv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[3] (test_btf_nokv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_nokv.o'
do_test_file:4201:FAIL bpf_object__load: -4007

Fixes: 51c39bb1d5 ("bpf: Introduce function-by-function verification")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200422003753.124921-1-sdf@google.com
2020-04-24 17:47:40 -07:00
Andrii Nakryiko
dfc55ace99 tools/runqslower: Ensure own vmlinux.h is picked up first
Reorder include paths to ensure that runqslower sources are picking up
vmlinux.h, generated by runqslower's own Makefile. When runqslower is built
from selftests/bpf, due to current -I$(BPF_INCLUDE) -I$(OUTPUT) ordering, it
might pick up not-yet-complete vmlinux.h, generated by selftests Makefile,
which could lead to compilation errors like [0]. So ensure that -I$(OUTPUT)
goes first and rely on runqslower's Makefile own dependency chain to ensure
vmlinux.h is properly completed before source code relying on it is compiled.

  [0] https://travis-ci.org/github/libbpf/libbpf/jobs/677905925

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200422012407.176303-1-andriin@fb.com
2020-04-24 17:45:20 -07:00
Martin KaFai Lau
32e4c6f4bc bpftool: Respect the -d option in struct_ops cmd
In the prog cmd, the "-d" option turns on the verifier log.
This is missed in the "struct_ops" cmd and this patch fixes it.

Fixes: 65c9362859 ("bpftool: Add struct_ops support")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200424182911.1259355-1-kafai@fb.com
2020-04-24 17:40:54 -07:00
Toke Høiland-Jørgensen
1d8a0af5ee selftests/bpf: Add test for freplace program with expected_attach_type
This adds a new selftest that tests the ability to attach an freplace
program to a program type that relies on the expected_attach_type of the
target program to pass verification.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158773526831.293902.16011743438619684815.stgit@toke.dk
2020-04-24 17:34:30 -07:00
Jakub Wilk
a33d314794 bpf: Fix reStructuredText markup
The patch fixes:
$ scripts/bpf_helpers_doc.py > bpf-helpers.rst
$ rst2man bpf-helpers.rst > bpf-helpers.7
bpf-helpers.rst:1105: (WARNING/2) Inline strong start-string without end-string.

Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200422082324.2030-1-jwilk@jwilk.net
2020-04-24 17:01:26 -07:00
Linus Torvalds
8e9ccd0f26 Power management updates for 5.7-rc3
Restore an optimization related to asynchronous suspend and resume of
 devices during system-wide power transitions that was disabled by
 mistake (Kai-Heng Feng) and update the pm-graph suite of power
 management utilities (Todd Brandt).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6jMsgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxUQUP/j85sXCtGmiwd0WLvHEP5dh6bztiji5m
 wIaUQVqGE1BmGupQle8Zhv6KRq1LYAZ9F6pCZigIzXSX70t5twla9Dac/bsueRz6
 2F29+13YIQFJtw8MI7t8Rhh+ztdjh7UwWjF2T3KJ37/LtNNMkm0x1siBSqQpwCsL
 rEAVmqPEaYuHdVemXpNIvo7Aj62ZTAqdvm/vr2w4yw1OGqFryNkqbRMQUc+5wxhV
 A5WOdUcTReOxo/g2mNwFdsSQKPvuogHRs7g+kCHiUM4g+/pO0bZTouzvAuPaz6b8
 uBfYpPFMaGqS4aqi2Ahx6z0qKHbUlxSLVadV3Y7HXBPHdGfyMh2KCniHyVx0tsl2
 C7sLH3Wq6pG1BjFA2lcWQzyjT9rhp8b4rquxcQ3s2KD+U4XnaWCJ0eBargQ/ORas
 E3ZPvhM/yg5t84EZShW1XFDg3zME7baiFTGWbgtDr5+CP2Q1eDHMzmkAh1EIIuar
 nl7a6BiWS+OnObcGJUype+SDEdtSqYDj4gN4Qg/q8D4XCX/gT9CaxDo0y1IqsCYh
 adp4Xr3YrSBjlzcpj80zMGN8dqpimo7BDLvrrrv0I0uONbSxvio3CKv4/g+anPVz
 l57+cbP/K7mImBV00uSlzK+jSu8KrxtQFgLkE3QtlDQo8Yd3JB9BqQ1amdHLDY50
 gtK2nQSvUISv
 =pHmC
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Restore an optimization related to asynchronous suspend and resume of
  devices during system-wide power transitions that was disabled by
  mistake (Kai-Heng Feng) and update the pm-graph suite of power
  management utilities (Todd Brandt)"

* tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: core: Switch back to async_schedule_dev()
  pm-graph v5.6
2020-04-24 13:43:37 -07:00
Xiao Yang
f0c0d0cf59 selftests/ftrace: Check the first record for kprobe_args_type.tc
It is possible to get multiple records from trace during test and then more
than 4 arguments are assigned to ARGS.  This situation results in the failure
of kprobe_args_type.tc.  For example:
-----------------------------------------------------------
grep testprobe trace
   ftracetest-5902  [001] d... 111195.682227: testprobe: (_do_fork+0x0/0x460) arg1=334823024 arg2=334823024 arg3=0x13f4fe70 arg4=7
     pmlogger-5949  [000] d... 111195.709898: testprobe: (_do_fork+0x0/0x460) arg1=345308784 arg2=345308784 arg3=0x1494fe70 arg4=7
 grep testprobe trace
 sed -e 's/.* arg1=\(.*\) arg2=\(.*\) arg3=\(.*\) arg4=\(.*\)/\1 \2 \3 \4/'
ARGS='334823024 334823024 0x13f4fe70 7
345308784 345308784 0x1494fe70 7'
-----------------------------------------------------------

We don't care which process calls do_fork so just check the first record to
fix the issue.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-24 09:39:26 -06:00
Shuah Khan
93a4388b76 selftests: add build/cross-build dependency check script
Add build/cross-build dependency check script kselftest_deps.sh
This script does the following:

Usage: ./kselftest_deps.sh -[p] <compiler> [test_name]

	kselftest_deps.sh [-p] gcc
	kselftest_deps.sh [-p] gcc vm
	kselftest_deps.sh [-p] aarch64-linux-gnu-gcc
	kselftest_deps.sh [-p] aarch64-linux-gnu-gcc vm

- Should be run in selftests directory in the kernel repo.
- Checks if Kselftests can be built/cross-built on a system.
- Parses all test/sub-test Makefile to find library dependencies.
- Runs compile test on a trivial C file with LDLIBS specified
  in the test Makefiles to identify missing library dependencies.
- Prints suggested target list for a system filtering out tests
  failed the build dependency check from the TARGETS in Selftests
  the main Makefile when optional -p is specified.
- Prints pass/fail dependency check for each tests/sub-test.
- Prints pass/fail targets and libraries.
- Default: runs dependency checks on all tests.
- Optional test name can be specified to check dependencies for it.

To make LDLIBS parsing easier
- change gpio and memfd Makefiles to use the same temporary variable used
  to find and add libraries to LDLIBS.
- simlify LDLIBS append logic in intel_pstate/Makefile.

Results from run on x86_64 system (trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh gcc ] results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 53 Fail: 2
--------------------------------------------------------
Targets passed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc safesetid timens timers vDSO vm
--------------------------------------------------------
FAIL: netfilter/Makefile dependency check: -lmnl
FAIL: gpio/Makefile dependency check: -lmount
--------------------------------------------------------
Targets failed build dependency check on system:
gpio netfilter
--------------------------------------------------------
Missing libraries system
-lmnl -lmount
--------------------------------------------------------
========================================================

Results from run on x86_64 system with aarch64-linux-gnu-gcc:
(trimmed detailed pass/fail list):
========================================================
Kselftest Dependency Check for [./kselftest_deps.sh aarch64-linux-gnu-gcc ]
results...
========================================================
Checked tests defining LDLIBS dependencies
--------------------------------------------------------
Total tests with Dependencies:
55 Pass: 41 Fail: 14
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities filesystems futex gpio intel_pstate membarrier memfd
mqueue net powerpc ptp rseq rtc timens timers vDSO vm
--------------------------------------------------------
--------------------------------------------------------
Targets failed build dependency check on system:
bpf capabilities gpio memfd mqueue net netfilter safesetid vm
--------------------------------------------------------
Missing libraries system
-lcap -lcap-ng -lelf -lfuse -lmnl -lmount -lnuma -lpopt -lz
--------------------------------------------------------
========================================================

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-23 17:23:20 -06:00
Xiao Yang
16bcd0f509 selftests/ftrace: Check required filter files before running test
Without CONFIG_DYNAMIC_FTRACE, some tests get failure because required
filter files(set_ftrace_filter/available_filter_functions/stack_trace_filter)
are missing.  So implement check_filter_file() and make all related tests
check required filter files by it.

BTW: set_ftrace_filter and available_filter_functions are introduced together
so just check either of them.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-04-23 17:11:37 -06:00
Stephane Eranian
d99c22eabe perf record: Add num-synthesize-threads option
To control degree of parallelism of the synthesize_mmap() code which
is scanning /proc/PID/task/PID/maps and can be time consuming.
Mimic perf top way of handling the option.
If not specified will default to 1 thread, i.e. default behavior before
this option.

On a desktop computer the processing of /proc/PID/task/PID/maps isn't
slow enough to warrant parallel processing and the thread creation has
some cost - hence the default of 1. On a loaded server with
>100 cores it is possible to see synthesis times in the order of
seconds and in this case having the option is desirable.

As the processing is a synchronization point, it is legitimate to worry if
Amdahl's law will apply to this patch. Profiling with this patch in
place:
https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/
shows:
...
      - 32.59% __perf_event__synthesize_threads
         - 32.54% __event__synthesize_thread
            + 22.13% perf_event__synthesize_mmap_events
            + 6.68% perf_event__get_comm_ids.constprop.0
            + 1.49% process_synthesized_event
            + 1.29% __GI___readdir64
            + 0.60% __opendir
...
That is the processing is 1.49% of execution time and there is plenty to
make parallel. This is shown in the benchmark in this patch:

https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/

  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
   Number of synthesis threads: 1
     Average synthesis took: 127729.000 usec (+- 3372.880 usec)
     Average num. events: 21548.600 (+- 0.306)
     Average time per event 5.927 usec
   Number of synthesis threads: 2
     Average synthesis took: 88863.500 usec (+- 385.168 usec)
     Average num. events: 21552.800 (+- 0.327)
     Average time per event 4.123 usec
   Number of synthesis threads: 3
     Average synthesis took: 83257.400 usec (+- 348.617 usec)
     Average num. events: 21553.200 (+- 0.327)
     Average time per event 3.863 usec
   Number of synthesis threads: 4
     Average synthesis took: 75093.000 usec (+- 422.978 usec)
     Average num. events: 21554.200 (+- 0.200)
     Average time per event 3.484 usec
   Number of synthesis threads: 5
     Average synthesis took: 64896.600 usec (+- 353.348 usec)
     Average num. events: 21558.000 (+- 0.000)
     Average time per event 3.010 usec
   Number of synthesis threads: 6
     Average synthesis took: 59210.200 usec (+- 342.890 usec)
     Average num. events: 21560.000 (+- 0.000)
     Average time per event 2.746 usec
   Number of synthesis threads: 7
     Average synthesis took: 54093.900 usec (+- 306.247 usec)
     Average num. events: 21562.000 (+- 0.000)
     Average time per event 2.509 usec
   Number of synthesis threads: 8
     Average synthesis took: 48938.700 usec (+- 341.732 usec)
     Average num. events: 21564.000 (+- 0.000)
     Average time per event 2.269 usec

Where average time per synthesized event goes from 5.927 usec with 1
thread to 2.269 usec with 8. This isn't a linear speed up as not all of
synthesize code has been made parallel. If the synthesis time was about
10 seconds then using 8 threads may bring this down to less than 4.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Jones <tonyj@suse.de>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23 11:10:41 -03:00
Tommi Rantala
dbd660e6b2 perf test session topology: Fix data path
Commit 2d4f27999b ("perf data: Add global path holder") missed path
conversion in tests/topology.c, causing the "Session topology" testcase
to "hang" (waits forever for input from stdin) when doing "ssh $VM perf
test".

Can be reproduced by running "cat | perf test topo", and crashed by
replacing cat with true:

  $ true | perf test -v topo
  40: Session topology                                      :
  --- start ---
  test child forked, pid 3638
  templ file: /tmp/perf-test-QPvAch
  incompatible file format
  incompatible file format (rerun with -v to learn more)
  free(): invalid pointer
  test child interrupted
  ---- end ----
  Session topology: FAILED!

Committer testing:

Reproduced the above result before the patch and after it is back
working:

  # true | perf test -v topo
  41: Session topology                                      :
  --- start ---
  test child forked, pid 19374
  templ file: /tmp/perf-test-YOTEQg
  CPU 0, core 0, socket 0
  CPU 1, core 1, socket 0
  CPU 2, core 2, socket 0
  CPU 3, core 3, socket 0
  CPU 4, core 0, socket 0
  CPU 5, core 1, socket 0
  CPU 6, core 2, socket 0
  CPU 7, core 3, socket 0
  test child finished with 0
  ---- end ----
  Session topology: Ok
  #

Fixes: 2d4f27999b ("perf data: Add global path holder")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200423115341.562782-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23 11:08:24 -03:00
Jin Yao
197ba86fdc perf stat: Improve runtime stat for interval mode
For interval mode, the metric is printed after the '#' character if it
exists. But it's not calculated by the counts generated in this
interval.

See the following examples:

  root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
  #           time             counts unit events
       1.000422803            764,809      inst_retired.any          #      2.9 CPI
       1.000422803          2,234,932      cycles
       2.001464585          1,960,061      inst_retired.any          #      1.6 CPI
       2.001464585          4,022,591      cycles

The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1)

  root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
  #           time             counts unit events
       1.000429493          2,869,311      cycles
       1.000429493            816,875      instructions              #    0.28  insn per cycle
       2.001516426          9,260,973      cycles
       2.001516426          5,250,634      instructions              #    0.87  insn per cycle

The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is
0.57).

The current code uses a global variable 'rt_stat' for tracking and
updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not
reset for interval. While the counts are reset for interval.

  perf_stat_process_counter()
  {
          if (config->interval)
                  init_stats(ps->res_stats);
  }

So for interval mode, the 'rt_stat' variable should be reset too.

This patch resets 'rt_stat' before read_counters(), so the runtime stat
is only calculated by the counts generated in this interval.

With this patch:

  root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2
  #           time             counts unit events
       1.000420924          2,408,818      inst_retired.any          #      2.1 CPI
       1.000420924          5,010,111      cycles
       2.001448579          2,798,407      inst_retired.any          #      1.6 CPI
       2.001448579          4,599,861      cycles

  root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
  #           time             counts unit events
       1.000428555          2,769,714      cycles
       1.000428555            774,462      instructions              #    0.28  insn per cycle
       2.001471562          3,595,904      cycles
       2.001471562          1,243,703      instructions              #    0.35  insn per cycle

Now the second 'insn per cycle' and CPI are calculated by the counts
generated in this interval.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-By: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23 11:03:46 -03:00
Ingo Molnar
0c98be8118 objtool: Constify arch_decode_instruction()
Mostly straightforward constification, except that WARN_FUNC()
needs a writable pointer while we have read-only pointers,
so deflect this to WARN().

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200422103205.61900-4-mingo@kernel.org
2020-04-23 08:34:18 +02:00
Ingo Molnar
bc359ff2f6 objtool: Rename elf_read() to elf_open_read()
'struct elf *' handling is an open/close paradigm, make sure the naming
matches that:

   elf_open_read()
   elf_write()
   elf_close()

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200422103205.61900-3-mingo@kernel.org
2020-04-23 08:34:18 +02:00
Ingo Molnar
894e48cada objtool: Constify 'struct elf *' parameters
In preparation to parallelize certain parts of objtool, map out which uses
of various data structures are read-only vs. read-write.

As a first step constify 'struct elf' pointer passing, most of the secondary
uses of it in find_symbol_*() methods are read-only.

Also, while at it, better group the 'struct elf' handling methods in elf.h.

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200422103205.61900-2-mingo@kernel.org
2020-04-23 08:34:18 +02:00
David Ahern
257d7d4f0e libbpf: Only check mode flags in get_xdp_id
The commit in the Fixes tag changed get_xdp_id to only return prog_id
if flags is 0, but there are other XDP flags than the modes - e.g.,
XDP_FLAGS_UPDATE_IF_NOEXIST. Since the intention was only to look at
MODE flags, clear other ones before checking if flags is 0.

Fixes: f07cbad297 ("libbpf: Fix bpf_get_link_xdp_id flags handling")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrey Ignatov <rdna@fb.com>
2020-04-22 22:07:22 -07:00
David Ahern
493f3cc7ee selftests: A few improvements to fib_nexthops.sh
Add nodad when adding IPv6 addresses and remove the sleep.

A recent change to iproute2 moved the 'pref medium' to the prefix
(where it belongs). Change the expected route check to strip
'pref medium' to be compatible with old and new iproute2.

Add IPv4 runtime test with an IPv6 address as the gateway in
the default route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:59:57 -07:00
David Ahern
7c74b0bec9 ipv4: Update fib_select_default to handle nexthop objects
A user reported [0] hitting the WARN_ON in fib_info_nh:

    [ 8633.839816] ------------[ cut here ]------------
    [ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
    ...
    [ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
    ...
    [ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
    [ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
    [ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
    [ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
    [ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
    [ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
    [ 8633.839857] FS:  00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
    [ 8633.839858] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
    [ 8633.839867] Call Trace:
    [ 8633.839871]  ip_route_output_key_hash_rcu+0x421/0x890
    [ 8633.839873]  ip_route_output_key_hash+0x5e/0x80
    [ 8633.839876]  ip_route_output_flow+0x1a/0x50
    [ 8633.839878]  __ip4_datagram_connect+0x154/0x310
    [ 8633.839880]  ip4_datagram_connect+0x28/0x40
    [ 8633.839882]  __sys_connect+0xd6/0x100
    ...

The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.

Add test case that covers the affected code path.

[0] https://github.com/FRRouting/frr/issues/6089

Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:57:39 -07:00
Petr Machata
f132ccc56e selftests: tc-testing: Add a TDC test for pedit munge ip6 dsfield
Add a self-test for the IPv6 dsfield munge that iproute2 will support.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:48:57 -07:00
Petr Machata
93e106da6a selftests: forwarding: pedit_dsfield: Add pedit munge ip6 dsfield
Extend the pedit_dsfield forwarding selftest with coverage of "pedit ex
munge ip6 dsfield set".

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:48:57 -07:00
David Ahern
3f251d7411 selftests: Add tests for vrf and xfrms
Add tests for vrf and xfrms with a second round after adding a
qdisc. There are a few known problems documented with the test
cases that fail. The fix is non-trivial; will come back to it
when time allows.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 19:31:40 -07:00
Julien Thierry
7f9b34f36c objtool: Fix off-by-one in symbol_by_offset()
Sometimes, WARN_FUNC() and other users of symbol_by_offset() will
associate the first instruction of a symbol with the symbol preceding
it.  This is because symbol->offset + symbol->len is already outside of
the symbol's range.

Fixes: 2a362ecc3e ("objtool: Optimize find_symbol_*() and read_symbols()")
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-22 23:14:46 +02:00
Peter Zijlstra
df2b384366 objtool: Fix 32bit cross builds
Apparently there's people doing 64bit builds on 32bit machines.

Fixes: 74b873e49d ("objtool: Optimize find_rela_by_dest_range()")
Reported-by: youling257@gmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-04-22 23:09:50 +02:00
David Ahern
2c1dd4c110 selftests: Fix suppress test in fib_tests.sh
fib_tests is spewing errors:
    ...
    Cannot open network namespace "ns1": No such file or directory
    Cannot open network namespace "ns1": No such file or directory
    Cannot open network namespace "ns1": No such file or directory
    Cannot open network namespace "ns1": No such file or directory
    ping: connect: Network is unreachable
    Cannot open network namespace "ns1": No such file or directory
    Cannot open network namespace "ns1": No such file or directory
    ...

Each test entry in fib_tests is supposed to do its own setup and
cleanup. Right now the $IP commands in fib_suppress_test are
failing because there is no ns1. Add the setup/cleanup and logging
expected for each test.

Fixes: ca7a03c417 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 13:02:52 -07:00
Jin Yao
0e0bf1ea11 perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode
As the code comments in perf_stat_process_counter() say, we calculate
counter's data every interval, and the display code shows ps->res_stats
avg value. We need to zero the stats for interval mode.

But the current code only zeros the res_stats[0], it doesn't zero the
res_stats[1] and res_stats[2], which are for ena and run of counter.

This patch zeros the whole res_stats[] for interval mode.

Fixes: 51fd2df1e8 ("perf stat: Fix interval output values")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200409070755.17261-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 15:51:01 -03:00
Linus Torvalds
c578ddb39e linux-kselftest-5.7-rc3
This kselftest update for Linux 5.7-rc3 consists of fixes to runner
 scripts and individual test run-time bugs. Includes fixes to tpm2
 and memfd test run-time regressions.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6gcfsACgkQCwJExA0N
 QxxmFQ/+NCoDIatTBqn6sFWC17RuwYJQDkcuOF1NM45IXUGt0URUIxRbmM0DlDJ6
 Y3HUCFjXdTTzS4cC/8lIFIH7Gf4NTFswviODC9qfeE9zi4BXwG9CyJZpOsPhlo5T
 aZsjTVgJfeN4EgjXKn/9MWCPE1u3pgMPUfZRSsnQyEjT0tgGYChq/rADc8Y1Mgax
 9GnOf9QoZ21gZhWRJVxwPVDDik4OU6GhmILee2GULtl2O/tecnsbSisnQKS/WHoD
 qCnNfxuaZAlpO7Ma+ywtxURuXspE9nWnsJmckCQ018WvpN26LymyRmAsDul0R2oU
 nTk3YhiMo8TB6nFq1BEE+ti3oi7dN3DndAM25++PjxMyFKh4UVvaWw6V38ITVvNd
 e5zS57CmYEA6gOdN05xf0Uk/mSfdbsdIPGsiql49YoFjrXNzaWxEE+kkgxul+wht
 GjBcJwxDPwfL7GQ+SgcxANffIwn2N4/welPAW2dc5Nhx5mGZZMBEfKZWhq/vwXn4
 mmmr2oX3V9tj0Hlo8g8jgSRknRFwMOjlyvtNdjDOrqDsumPDueWUuJMX0rGiDJU/
 W8w3mjrI5WdPbH7qKG/xWARhKxVhSkSh+TtNjz5Rj4oQ2A70/mxlCp9R+PmZlEqJ
 OjorLur3J88J4MdSSiQJvInJPQ5Bd80v0oz/ZXlQVs1S6zzQHyk=
 =6l5V
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "This consists of fixes to runner scripts and individual test run-time
  bugs. Includes fixes to tpm2 and memfd test run-time regressions"

* tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ipc: Fix test failure seen after initial test run
  Revert "Kernel selftests: tpm2: check for tpm support"
  selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig
  selftests/seccomp: allow clock_nanosleep instead of nanosleep
  kselftest/runner: allow to properly deliver signals to tests
  selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM"
  selftests: Fix memfd test run-time regression
  selftests: vm: Fix 64-bit test builds for powerpc64le
  selftests: vm: Do not override definition of ARCH
2020-04-22 10:47:49 -07:00
Ian Rogers
1e76b171b7 perf script: Avoid NULL dereference on symbol
al->sym may be NULL given current if conditions and may cause a segv.

Fixes: d2bedb7863 ("perf script: Allow --symbol to accept hexadecimal addresses")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200421004329.43109-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 10:59:02 -03:00
Jagadeesh Pagadala
8fbd301bf2 perf evlist: Remove duplicate headers
Code cleanup: Remove duplicate headers which are included twice.

Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/1587276836-17088-1-git-send-email-jagdsh.linux@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 10:01:33 -03:00
Tommi Rantala
41e7c32b97 perf bench: Fix div-by-zero if runtime is zero
Fix div-by-zero if runtime is zero:

  $ perf bench futex hash --runtime=0
  # Running 'futex/hash' benchmark:
  Run summary [PID 12090]: 4 threads, each operating on 1024 [private] futexes for 0 secs.
  Floating point exception (core dumped)

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200417132330.119407-4-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 10:01:33 -03:00
Tommi Rantala
d2e7d8636f perf cgroup: Avoid needless closing of unopened fd
Do not bother with close() if fd is not valid, just to silence valgrind:

    $ valgrind ./perf script
    ==59169== Memcheck, a memory error detector
    ==59169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==59169== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
    ==59169== Command: ./perf script
    ==59169==
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()
    ==59169== Warning: invalid file descriptor -1 in syscall close()

Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200417132330.119407-1-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 10:01:33 -03:00
Ingo Molnar
87cfeb1920 perf/core fixes and improvements:
kernel + tools/perf:
 
   Alexey Budankov:
 
   - Introduce CAP_PERFMON to kernel and user space.
 
 callchains:
 
   Adrian Hunter:
 
   - Allow using Intel PT to synthesize callchains for regular events.
 
   Kan Liang:
 
   - Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Add flamegraph.py script
 
 BPF:
 
   Jiri Olsa:
 
   - Synthesize bpf_trampoline/dispatcher ksymbol events.
 
 perf stat:
 
   Arnaldo Carvalho de Melo:
 
   - Honour --timeout for forked workloads.
 
   Stephane Eranian:
 
   - Force error in fallback on :k events, to avoid counting nothing when
     the user asks for kernel events but is not allowed to.
 
 perf bench:
 
   Ian Rogers:
 
   - Add event synthesis benchmark.
 
 tools api fs:
 
   Stephane Eranian:
 
  - Make xxx__mountpoint() more scalable
 
 libtraceevent:
 
   He Zhe:
 
   - Handle return value of asprintf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
 J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
 qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
 =FHm4
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

  Alexey Budankov:

  - Introduce CAP_PERFMON to kernel and user space.

callchains:

  Adrian Hunter:

  - Allow using Intel PT to synthesize callchains for regular events.

  Kan Liang:

  - Stitch LBR records from multiple samples to get deeper backtraces,
    there are caveats, see the csets for details.

perf script:

  Andreas Gerstmayr:

  - Add flamegraph.py script

BPF:

  Jiri Olsa:

  - Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

  Arnaldo Carvalho de Melo:

  - Honour --timeout for forked workloads.

  Stephane Eranian:

  - Force error in fallback on :k events, to avoid counting nothing when
    the user asks for kernel events but is not allowed to.

perf bench:

  Ian Rogers:

  - Add event synthesis benchmark.

tools api fs:

  Stephane Eranian:

 - Make xxx__mountpoint() more scalable

libtraceevent:

  He Zhe:

  - Handle return value of asprintf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 14:08:28 +02:00
Thomas Gleixner
0cc9ac8db0 objtool: Also consider .entry.text as noinstr
Consider all of .entry.text as noinstr. This gets us coverage across
the PTI boundary. While we could add everything .noinstr.text into
.entry.text that would bloat the amount of code in the user mapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.525037514@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:51 +02:00
Peter Zijlstra
932f8e987b objtool: Add STT_NOTYPE noinstr validation
Make sure to also check STT_NOTYPE symbols for noinstr violations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.465335884@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:51 +02:00
Peter Zijlstra
4b5e2e7ffe objtool: Rearrange validate_section()
In preparation of further changes, once again break out the loop body.
No functional changes intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.405863817@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:51 +02:00
Peter Zijlstra
da837bd6f1 objtool: Avoid iterating !text section symbols
validate_functions() iterates all sections their symbols; this is
pointless to do for !text sections as they won't have instructions
anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.346582716@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:51 +02:00
Peter Zijlstra
87ecb582f0 objtool: Use sec_offset_hash() for insn_hash
In preparation for find_insn_containing(), change insn_hash to use
sec_offset_hash().

This actually reduces runtime; probably because mixing in the section
index reduces the collisions due to text sections all starting their
instructions at offset 0.

Runtime on vmlinux.o from 3.1 to 2.5 seconds.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.227240432@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
34f7c96d96 objtool: Optimize !vmlinux.o again
When doing kbuild tests to see if the objtool changes affected those I
found that there was a measurable regression:

          pre		  post

  real    1m13.594        1m16.488s
  user    34m58.246s      35m23.947s
  sys     4m0.393s        4m27.312s

Perf showed that for small files the increased hash-table sizes were a
measurable difference. Since we already have -l "vmlinux" to
distinguish between the modes, make it also use a smaller portion of
the hash-tables.

This flips it into a small win:

  real    1m14.143s
  user    34m49.292s
  sys     3m44.746s

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.167588731@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00
Peter Zijlstra
c4a33939a7 objtool: Implement noinstr validation
Validate that any call out of .noinstr.text is in between
instr_begin() and instr_end() annotations.

This annotation is useful to ensure correct behaviour wrt tracing
sensitive code like entry/exit and idle code. When we run code in a
sensitive context we want a guarantee no unknown code is ran.

Since this validation relies on knowing the section of call
destination symbols, we must run it on vmlinux.o instead of on
individual object files.

Add two options:

 -d/--duplicate "duplicate validation for vmlinux"
 -l/--vmlinux "vmlinux.o validation"

Where the latter auto-detects when objname ends with "vmlinux.o" and
the former will force all validations, also those already done on
!vmlinux object files.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20200416115119.106268040@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 10:53:50 +02:00