Commit Graph

30456 Commits

Author SHA1 Message Date
Song Liu
29c77550ee perf script: Check session->header.env.arch before using it
When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in 'perf script' for NULL
session->header.env.arch.

Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.

Committer notes:

If it is a pipe, we already assume is a native arch, so no need to check
session->header.env.arch.

Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211004053238.514936-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Adrian Hunter
095729484e perf build: Suppress 'rm dlfilter' build message
The following build message:

	rm dlfilters/dlfilter-test-api-v0.o

is unwanted.

The object file is being treated as an intermediate file and being
automatically removed. Mark the object file as .SECONDARY to prevent
removal and hence the message.

Requested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210930062849.110416-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31 12:51:41 -03:00
Paolo Bonzini
4e33868433 Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.16

- More progress on the protected VM front, now with the full
  fixed feature set as well as the limitation of some hypercalls
  after initialisation.

- Cleanup of the RAZ/WI sysreg handling, which was pointlessly
  complicated

- Fixes for the vgic placement in the IPA space, together with a
  bunch of selftests

- More memcg accounting of the memory allocated on behalf of a guest

- Timer and vgic selftests

- Workarounds for the Apple M1 broken vgic implementation

- KConfig cleanups

- New kvmarm.mode=none option, for those who really dislike us
2021-10-31 02:28:48 -04:00
Borislav Petkov
a72fdfd21e selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage
Commit in Fixes changed the iopl emulation to not #GP on CLI and STI
because it would break some insane luserspace tools which would toggle
interrupts.

The corresponding selftest would rely on the fact that executing CLI/STI
would trigger a #GP and thus detect it this way but since that #GP is
not happening anymore, the detection is now wrong too.

Extend the test to actually look at the IF flag and whether executing
those insns had any effect on it. The STI detection needs to have the
fact that interrupts were previously disabled, passed in so do that from
the previous CLI test, i.e., STI test needs to follow a previous CLI one
for it to make sense.

Fixes: b968e84b50 ("x86/iopl: Fake iopl(3) CLI/STI usage")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20211030083939.13073-1-bp@alien8.de
2021-10-30 23:18:04 +02:00
Shuah Khan
f35dcaa0a8 selftests/core: fix conflicting types compile error for close_range()
close_range() test type conflicts with close_range() library call in
x86_64-linux-gnu/bits/unistd_ext.h. Fix it by changing the name to
core_close_range().

gcc -g -I../../../../usr/include/    close_range_test.c  -o ../tools/testing/selftests/core/close_range_test
In file included from close_range_test.c:16:
close_range_test.c:57:6: error: conflicting types for ‘close_range’; have ‘void(struct __test_metadata *)’
   57 | TEST(close_range)
      |      ^~~~~~~~~~~
../kselftest_harness.h:181:21: note: in definition of macro ‘__TEST_IMPL’
  181 |         static void test_name(struct __test_metadata *_metadata); \
      |                     ^~~~~~~~~
close_range_test.c:57:1: note: in expansion of macro ‘TEST’
   57 | TEST(close_range)
      | ^~~~
In file included from /usr/include/unistd.h:1204,
                 from close_range_test.c:13:
/usr/include/x86_64-linux-gnu/bits/unistd_ext.h:56:12: note: previous declaration of ‘close_range’ with type ‘int(unsigned int,  unsigned int,  int)’
   56 | extern int close_range (unsigned int __fd, unsigned int __max_fd,
      |            ^~~~~~~~~~~

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-29 13:09:42 -06:00
Daniel Latypov
52a5d80a22 kunit: tool: fix typecheck errors about loading qemu configs
Currently, we have these errors:
$ mypy ./tools/testing/kunit/*.py
tools/testing/kunit/kunit_kernel.py:213: error: Item "_Loader" of "Optional[_Loader]" has no attribute "exec_module"
tools/testing/kunit/kunit_kernel.py:213: error: Item "None" of "Optional[_Loader]" has no attribute "exec_module"
tools/testing/kunit/kunit_kernel.py:214: error: Module has no attribute "QEMU_ARCH"
tools/testing/kunit/kunit_kernel.py:215: error: Module has no attribute "QEMU_ARCH"

exec_module
===========

pytype currently reports no errors, but that's because there's a comment
disabling it on 213.

This is due to https://github.com/python/typeshed/pull/2626.
The fix is to assert the loaded module implements the ABC
(abstract base class) we want which has exec_module support.

QEMU_ARCH
=========

pytype is fine with this, but mypy is not:
https://github.com/python/mypy/issues/5059

Add a check that the loaded module does indeed have QEMU_ARCH.
Note: this is not enough to appease mypy, so we also add a comment to
squash the warning.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-29 13:05:47 -06:00
Andrea Righi
f48ad69097 selftests/bpf: Fix fclose/pclose mismatch in test_progs
Make sure to use pclose() to properly close the pipe opened by popen().

Fixes: 81f77fd0de ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211026143409.42666-1-andrea.righi@canonical.com
2021-10-29 17:48:43 +02:00
Nikolay Aleksandrov
34d7ecb3d4 selftests: net: bridge: update IGMP/MLD membership interval value
When I fixed IGMPv3/MLDv2 to use the bridge's multicast_membership_interval
value which is chosen by user-space instead of calculating it based on
multicast_query_interval and multicast_query_response_interval I forgot
to update the selftests relying on that behaviour. Now we have to
manually set the expected GMI value to perform the tests correctly and get
proper results (similar to IGMPv2 behaviour).

Fixes: fac3cb82a5 ("net: bridge: mcast: use multicast_membership_interval for IGMPv3")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 13:58:21 +01:00
Shuah Khan
e300a85db1 selftests/net: update .gitignore with newly added tests
Update .gitignore with newly added tests:
	tools/testing/selftests/net/af_unix/test_unix_oob
	tools/testing/selftests/net/gro
	tools/testing/selftests/net/ioam6_parser
	tools/testing/selftests/net/toeplitz

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 13:14:58 +01:00
Petr Machata
2b11e24eba selftests: mlxsw: Test port shaper
TBF can be used as a root qdisc, in which case it is supposed to configure
port shaper. Add a test that verifies that this is so by installing a root
TBF with a ETS or PRIO below it, and then expecting individual bands to all
be shaped according to the root TBF configuration.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 19:47:50 -07:00
Petr Machata
3d5290ea1d selftests: mlxsw: Test offloadability of root TBF
TBF can be used as a root qdisc, with the usual ETS/RED/TBF hierarchy below
it. This use should now be offloaded. Add a test that verifies that it is.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 19:47:49 -07:00
David Yang
9c7516d669 tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
The coccinelle check report:

  ./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42:
  ERROR: application of sizeof to pointer

Use "strlen" to fix it.

Link: https://lkml.kernel.org/r/20211012030116.184027-1-davidcomponentone@gmail.com
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Kumar Kartikeya Dwivedi
efadf2ad17 selftests/bpf: Fix memory leak in test_ima
The allocated ring buffer is never freed, do so in the cleanup path.

Fixes: f446b570ac ("bpf/selftests: Update the IMA test to use BPF ring buffer")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-9-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi
c3fc706e94 selftests/bpf: Fix fd cleanup in sk_lookup test
Similar to the fix in commit:
e31eec77e4 ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")

We use designated initializer to set fds to -1 without breaking on
future changes to MAX_SERVER constant denoting the array size.

The particular close(0) occurs on non-reuseport tests, so it can be seen
with -n 115/{2,3} but not 115/4. This can cause problems with future
tests if they depend on BTF fd never being acquired as fd 0, breaking
internal libbpf assumptions.

Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-8-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi
087cba799c selftests/bpf: Add weak/typeless ksym test for light skeleton
Also, avoid using CO-RE features, as lskel doesn't support CO-RE, yet.
Include both light and libbpf skeleton in same file to test both of them
together.

In c48e51c8b0 ("bpf: selftests: Add selftests for module kfunc support"),
I added support for generating both lskel and libbpf skel for a BPF
object, however the name parameter for bpftool caused collisions when
included in same file together. This meant that every test needed a
separate file for a libbpf/light skeleton separation instead of
subtests.

Change that by appending a "_lskel" suffix to the name for files using
light skeleton, and convert all existing users.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-7-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi
92274e24b0 libbpf: Use O_CLOEXEC uniformly when opening fds
There are some instances where we don't use O_CLOEXEC when opening an
fd, fix these up. Otherwise, it is possible that a parallel fork causes
these fds to leak into a child process on execve.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-6-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi
549a632386 libbpf: Ensure that BPF syscall fds are never 0, 1, or 2
Add a simple wrapper for passing an fd and getting a new one >= 3 if it
is one of 0, 1, or 2. There are two primary reasons to make this change:
First, libbpf relies on the assumption a certain BPF fd is never 0 (e.g.
most recently noticed in [0]). Second, Alexei pointed out in [1] that
some environments reset stdin, stdout, and stderr if they notice an
invalid fd at these numbers. To protect against both these cases, switch
all internal BPF syscall wrappers in libbpf to always return an fd >= 3.
We only need to modify the syscall wrappers and not other code that
assumes a valid fd by doing >= 0, to avoid pointless churn, and because
it is still a valid assumption. The cost paid is two additional syscalls
if fd is in range [0, 2].

  [0]: e31eec77e4 ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")
  [1]: https://lore.kernel.org/bpf/CAADnVQKVKY8o_3aU8Gzke443+uHa-eGoM0h7W4srChMXU1S4Bg@mail.gmail.com

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-5-memxor@gmail.com
2021-10-28 16:30:07 -07:00
Kumar Kartikeya Dwivedi
585a357198 libbpf: Add weak ksym support to gen_loader
This extends existing ksym relocation code to also support relocating
weak ksyms. Care needs to be taken to zero out the src_reg (currently
BPF_PSEUOD_BTF_ID, always set for gen_loader by bpf_object__relocate_data)
when the BTF ID lookup fails at runtime.  This is not a problem for
libbpf as it only sets ext->is_set when BTF ID lookup succeeds (and only
proceeds in case of failure if ext->is_weak, leading to src_reg
remaining as 0 for weak unresolved ksym).

A pattern similar to emit_relo_kfunc_btf is followed of first storing
the default values and then jumping over actual stores in case of an
error. For src_reg adjustment, we also need to perform it when copying
the populated instruction, so depending on if copied insn[0].imm is 0 or
not, we decide to jump over the adjustment.

We cannot reach that point unless the ksym was weak and resolved and
zeroed out, as the emit_check_err will cause us to jump to cleanup
label, so we do not need to recheck whether the ksym is weak before
doing the adjustment after copying BTF ID and BTF FD.

This is consistent with how libbpf relocates weak ksym. Logging
statements are added to show the relocation result and aid debugging.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-4-memxor@gmail.com
2021-10-28 16:30:06 -07:00
Kumar Kartikeya Dwivedi
c24941cd37 libbpf: Add typeless ksym support to gen_loader
This uses the bpf_kallsyms_lookup_name helper added in previous patches
to relocate typeless ksyms. The return value ENOENT can be ignored, and
the value written to 'res' can be directly stored to the insn, as it is
overwritten to 0 on lookup failure. For repeating symbols, we can simply
copy the previously populated bpf_insn.

Also, we need to take care to not close fds for typeless ksym_desc, so
reuse the 'off' member's space to add a marker for typeless ksym and use
that to skip them in cleanup_relos.

We add a emit_ksym_relo_log helper that avoids duplicating common
logging instructions between typeless and weak ksym (for future commit).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-3-memxor@gmail.com
2021-10-28 16:30:06 -07:00
Kumar Kartikeya Dwivedi
d6aef08a87 bpf: Add bpf_kallsyms_lookup_name helper
This helper allows us to get the address of a kernel symbol from inside
a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can
relocate typeless ksym vars.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
2021-10-28 16:30:06 -07:00
Peter Zijlstra
134ab5bd18 objtool,x86: Replace alternatives with .retpoline_sites
Instead of writing complete alternatives, simply provide a list of all
the retpoline thunk calls. Then the kernel is free to do with them as
it pleases. Simpler code all-round.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.850007165@infradead.org
2021-10-28 23:25:25 +02:00
Peter Zijlstra
c509331b41 objtool: Shrink struct instruction
Any one instruction can only ever call a single function, therefore
insn->mcount_loc_node is superfluous and can use insn->call_node.

This shrinks struct instruction, which is by far the most numerous
structure objtool creates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.785456706@infradead.org
2021-10-28 23:25:25 +02:00
Peter Zijlstra
dd003edeff objtool: Explicitly avoid self modifying code in .altinstr_replacement
Assume ALTERNATIVE()s know what they're doing and do not change, or
cause to change, instructions in .altinstr_replacement sections.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.722511775@infradead.org
2021-10-28 23:25:25 +02:00
Peter Zijlstra
1739c66eb7 objtool: Classify symbols
In order to avoid calling str*cmp() on symbol names, over and over, do
them all once upfront and store the result.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120309.658539311@infradead.org
2021-10-28 23:25:24 +02:00
Joanne Koong
f44bc543a0 bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter
This patch adds benchmark tests for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.

Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.

On average, using 5 hash functions in the bloom filter tended to perform
the best across the widest range of different entry sizes. The benchmark
results using 5 hash functions (running on 8 threads on a machine with one
numa node, and taking the average of 3 runs) were roughly as follows:

value_size = 4 bytes -
	10k entries: 30% faster
	50k entries: 40% faster
	100k entries: 40% faster
	500k entres: 70% faster
	1 million entries: 90% faster
	5 million entries: 140% faster

value_size = 8 bytes -
	10k entries: 30% faster
	50k entries: 40% faster
	100k entries: 50% faster
	500k entres: 80% faster
	1 million entries: 100% faster
	5 million entries: 150% faster

value_size = 16 bytes -
	10k entries: 20% faster
	50k entries: 30% faster
	100k entries: 35% faster
	500k entres: 65% faster
	1 million entries: 85% faster
	5 million entries: 110% faster

value_size = 40 bytes -
	10k entries: 5% faster
	50k entries: 15% faster
	100k entries: 20% faster
	500k entres: 65% faster
	1 million entries: 75% faster
	5 million entries: 120% faster

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-6-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
57fd1c63c9 bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
This patch adds benchmark tests for the throughput (for lookups + updates)
and the false positive rate of bloom filter lookups, as well as some
minor refactoring of the bash script for running the benchmarks.

These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter decreases.
>From the benchmark data, the approximate average false-positive rates
are roughly as follows:

1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions  = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%

For reference data, the benchmarks run on one thread on a machine
with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
values are as follows:

1 hash function:
  50k entries
	8-byte value
	    Lookups - 51.1 M/s operations
	    Updates - 33.6 M/s operations
	    False positive rate: 24.15%
	64-byte value
	    Lookups - 15.7 M/s operations
	    Updates - 15.1 M/s operations
	    False positive rate: 24.2%
  100k entries
	8-byte value
	    Lookups - 51.0 M/s operations
	    Updates - 33.4 M/s operations
	    False positive rate: 24.04%
	64-byte value
	    Lookups - 15.6 M/s operations
	    Updates - 14.6 M/s operations
	    False positive rate: 24.06%
  500k entries
	8-byte value
	    Lookups - 50.5 M/s operations
	    Updates - 33.1 M/s operations
	    False positive rate: 27.45%
	64-byte value
	    Lookups - 15.6 M/s operations
	    Updates - 14.2 M/s operations
	    False positive rate: 27.42%
  1 mil entries
	8-byte value
	    Lookups - 49.7 M/s operations
	    Updates - 32.9 M/s operations
	    False positive rate: 27.45%
	64-byte value
	    Lookups - 15.4 M/s operations
	    Updates - 13.7 M/s operations
	    False positive rate: 27.58%
  2.5 mil entries
	8-byte value
	    Lookups - 47.2 M/s operations
	    Updates - 31.8 M/s operations
	    False positive rate: 30.94%
	64-byte value
	    Lookups - 15.3 M/s operations
	    Updates - 13.2 M/s operations
	    False positive rate: 30.95%
  5 mil entries
	8-byte value
	    Lookups - 41.1 M/s operations
	    Updates - 28.1 M/s operations
	    False positive rate: 31.01%
	64-byte value
	    Lookups - 13.3 M/s operations
	    Updates - 11.4 M/s operations
	    False positive rate: 30.98%

2 hash functions:
  50k entries
	8-byte value
	    Lookups - 34.1 M/s operations
	    Updates - 20.1 M/s operations
	    False positive rate: 9.13%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.9 M/s operations
	    False positive rate: 9.21%
  100k entries
	8-byte value
	    Lookups - 33.7 M/s operations
	    Updates - 18.9 M/s operations
	    False positive rate: 9.13%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.7 M/s operations
	    False positive rate: 9.19%
  500k entries
	8-byte value
	    Lookups - 32.7 M/s operations
	    Updates - 18.1 M/s operations
	    False positive rate: 12.61%
	64-byte value
	    Lookups - 8.4 M/s operations
	    Updates - 7.5 M/s operations
	    False positive rate: 12.61%
  1 mil entries
	8-byte value
	    Lookups - 30.6 M/s operations
	    Updates - 18.9 M/s operations
	    False positive rate: 12.54%
	64-byte value
	    Lookups - 8.0 M/s operations
	    Updates - 7.0 M/s operations
	    False positive rate: 12.52%
  2.5 mil entries
	8-byte value
	    Lookups - 25.3 M/s operations
	    Updates - 16.7 M/s operations
	    False positive rate: 16.77%
	64-byte value
	    Lookups - 7.9 M/s operations
	    Updates - 6.5 M/s operations
	    False positive rate: 16.88%
  5 mil entries
	8-byte value
	    Lookups - 20.8 M/s operations
	    Updates - 14.7 M/s operations
	    False positive rate: 16.78%
	64-byte value
	    Lookups - 7.0 M/s operations
	    Updates - 6.0 M/s operations
	    False positive rate: 16.78%

3 hash functions:
  50k entries
	8-byte value
	    Lookups - 25.1 M/s operations
	    Updates - 14.6 M/s operations
	    False positive rate: 7.65%
	64-byte value
	    Lookups - 5.8 M/s operations
	    Updates - 5.5 M/s operations
	    False positive rate: 7.58%
  100k entries
	8-byte value
	    Lookups - 24.7 M/s operations
	    Updates - 14.1 M/s operations
	    False positive rate: 7.71%
	64-byte value
	    Lookups - 5.8 M/s operations
	    Updates - 5.3 M/s operations
	    False positive rate: 7.62%
  500k entries
	8-byte value
	    Lookups - 22.9 M/s operations
	    Updates - 13.9 M/s operations
	    False positive rate: 2.62%
	64-byte value
	    Lookups - 5.6 M/s operations
	    Updates - 4.8 M/s operations
	    False positive rate: 2.7%
  1 mil entries
	8-byte value
	    Lookups - 19.8 M/s operations
	    Updates - 12.6 M/s operations
	    False positive rate: 2.60%
	64-byte value
	    Lookups - 5.3 M/s operations
	    Updates - 4.4 M/s operations
	    False positive rate: 2.69%
  2.5 mil entries
	8-byte value
	    Lookups - 16.2 M/s operations
	    Updates - 10.7 M/s operations
	    False positive rate: 4.49%
	64-byte value
	    Lookups - 4.9 M/s operations
	    Updates - 4.1 M/s operations
	    False positive rate: 4.41%
  5 mil entries
	8-byte value
	    Lookups - 18.8 M/s operations
	    Updates - 9.2 M/s operations
	    False positive rate: 4.45%
	64-byte value
	    Lookups - 5.2 M/s operations
	    Updates - 3.9 M/s operations
	    False positive rate: 4.54%

4 hash functions:
  50k entries
	8-byte value
	    Lookups - 19.7 M/s operations
	    Updates - 11.1 M/s operations
	    False positive rate: 1.01%
	64-byte value
	    Lookups - 4.4 M/s operations
	    Updates - 4.0 M/s operations
	    False positive rate: 1.00%
  100k entries
	8-byte value
	    Lookups - 19.5 M/s operations
	    Updates - 10.9 M/s operations
	    False positive rate: 1.00%
	64-byte value
	    Lookups - 4.3 M/s operations
	    Updates - 3.9 M/s operations
	    False positive rate: 0.97%
  500k entries
	8-byte value
	    Lookups - 18.2 M/s operations
	    Updates - 10.6 M/s operations
	    False positive rate: 2.05%
	64-byte value
	    Lookups - 4.3 M/s operations
	    Updates - 3.7 M/s operations
	    False positive rate: 2.05%
  1 mil entries
	8-byte value
	    Lookups - 15.5 M/s operations
	    Updates - 9.6 M/s operations
	    False positive rate: 1.99%
	64-byte value
	    Lookups - 4.0 M/s operations
	    Updates - 3.4 M/s operations
	    False positive rate: 1.99%
  2.5 mil entries
	8-byte value
	    Lookups - 13.8 M/s operations
	    Updates - 7.7 M/s operations
	    False positive rate: 3.91%
	64-byte value
	    Lookups - 3.7 M/s operations
	    Updates - 3.6 M/s operations
	    False positive rate: 3.78%
  5 mil entries
	8-byte value
	    Lookups - 13.0 M/s operations
	    Updates - 6.9 M/s operations
	    False positive rate: 3.93%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.7 M/s operations
	    False positive rate: 3.39%

5 hash functions:
  50k entries
	8-byte value
	    Lookups - 16.4 M/s operations
	    Updates - 9.1 M/s operations
	    False positive rate: 0.78%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.2 M/s operations
	    False positive rate: 0.77%
  100k entries
	8-byte value
	    Lookups - 16.3 M/s operations
	    Updates - 9.0 M/s operations
	    False positive rate: 0.79%
	64-byte value
	    Lookups - 3.5 M/s operations
	    Updates - 3.2 M/s operations
	    False positive rate: 0.78%
  500k entries
	8-byte value
	    Lookups - 15.1 M/s operations
	    Updates - 8.8 M/s operations
	    False positive rate: 1.82%
	64-byte value
	    Lookups - 3.4 M/s operations
	    Updates - 3.0 M/s operations
	    False positive rate: 1.78%
  1 mil entries
	8-byte value
	    Lookups - 13.2 M/s operations
	    Updates - 7.8 M/s operations
	    False positive rate: 1.81%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.8 M/s operations
	    False positive rate: 1.80%
  2.5 mil entries
	8-byte value
	    Lookups - 10.5 M/s operations
	    Updates - 5.9 M/s operations
	    False positive rate: 0.29%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.4 M/s operations
	    False positive rate: 0.28%
  5 mil entries
	8-byte value
	    Lookups - 9.6 M/s operations
	    Updates - 5.7 M/s operations
	    False positive rate: 0.30%
	64-byte value
	    Lookups - 3.2 M/s operations
	    Updates - 2.7 M/s operations
	    False positive rate: 0.30%

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
ed9109ad64 selftests/bpf: Add bloom filter map test cases
This patch adds test cases for bpf bloom filter maps. They include tests
checking against invalid operations by userspace, tests for using the
bloom filter map as an inner map, and a bpf program that queries the
bloom filter map for values added by a userspace program.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
47512102cd libbpf: Add "map_extra" as a per-map-type extra flag
This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.

For example, for the bloom filter map, the lower 4 bits of
map_extra is used to denote the number of hash functions.

Please note that until libbpf 1.0 is here, the
"bpf_create_map_params" struct is used as a temporary
means for propagating the map_extra field to the kernel.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Joanne Koong
9330986c03 bpf: Add bloom filter map implementation
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.

The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.

A few things to please take note of:
 * If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
 * The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
 * Deleting an element in the bloom filter map is not supported.
 * The bloom filter map may be used as an inner map.
 * The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
2021-10-28 13:22:49 -07:00
Jakub Kicinski
7df621a3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  7b50ecfcc6 ("net: Rename ->stream_memory_read to ->sock_is_readable")
  4c1e34c0db ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f6 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 10:43:58 -07:00
Chang S. Bae
101c669d16 selftests/x86/amx: Add context switch test
XSAVE state is thread-local.  The kernel switches between thread
state at context switch time.  Generally, running a selftest for
a while will naturally expose it to some context switching and
and will test the XSAVE code.

Instead of just hoping that the tests get context-switched at
random times, force context-switches on purpose.  Spawn off a few
userspace threads and force context-switches between them.
Ensure that the kernel correctly context switches each thread's
unique AMX state.

 [ dhansen: bunches of cleanups ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211026122525.6EFD5758@davehans-spike.ostc.intel.com
2021-10-28 14:35:27 +02:00
Chang S. Bae
6a3e0651b4 selftests/x86/amx: Add test cases for AMX state management
AMX TILEDATA is a very large XSAVE feature.  It could have caused
nasty XSAVE buffer space waste in two places:

 * Signal stacks
 * Kernel task_struct->fpu buffers

To avoid this waste, neither of these buffers have AMX state by
default.  The non-default features are called "dynamic" features.

There is an arch_prctl(ARCH_REQ_XCOMP_PERM) which allows a task
to declare that it wants to use AMX or other "dynamic" XSAVE
features.  This arch_prctl() ensures that sufficient sigaltstack
space is available before it will succeed.  It also expands the
task_struct buffer.

Functions of this test:
 * Test arch_prctl(ARCH_REQ_XCOMP_PERM).  Ensure that it checks for
   proper sigaltstack sizing and that the sizing is enforced for
   future sigaltstack calls.
 * Ensure that ARCH_REQ_XCOMP_PERM is inherited across fork()
 * Ensure that TILEDATA use before the prctl() is fatal
 * Ensure that TILEDATA is cleared across fork()

Note: Generally, compiler support is needed to do something with
AMX.  Instead, directly load AMX state from userspace with a
plain XSAVE.  Do not depend on the compiler.

 [ dhansen: bunches of cleanups ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211026122524.7BEDAA95@davehans-spike.ostc.intel.com
2021-10-28 14:35:27 +02:00
Madhavan Srinivasan
10269a2ca2 perf test sample-parsing: Add endian test for struct branch_flags
Extend the sample-parsing test to include a branch_flag bitfield-endian
swap test.

This patch adds a include for "util/trace-event.h" in the sample-parsing
test for importing tep_is_bigendian() and extends samples_same() to
include "needs_swap" to detect/enable check for bitfield-endian swap.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211028113714.600549-2-maddy@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-28 09:34:01 -03:00
Madhavan Srinivasan
63c12ae2f2 perf evsel: Add bitfield_swap() to handle branch_stack endian issue
The branch_stack struct has bit field definition which produces
different bit ordering for big/little endian.

Because of this, when branch_stack sample is collected in a BE system
and viewed/reported in a LE system, bit fields of the branch stack are
not presented properly.

To address this issue, a evsel__bitfield_swap_branch_stack() is defined
and introduced in evsel__parse_sample.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211028113714.600549-1-maddy@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-28 09:33:02 -03:00
Kan Liang
6ea5d1a3e3 perf script: Support instruction latency
The instruction latency information can be recorded on
some platforms, e.g., the Intel Sapphire Rapids server. With both memory
latency (weight) and the new instruction latency information, users can
easily locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.

Add a new field "ins_lat" to filter the instruction latency information,
which is available with sample type PERF_SAMPLE_WEIGHT_STRUCT.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Link: https://lore.kernel.org/r/1632929894-102778-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-28 09:28:03 -03:00
Lexi Shao
57d7ecfd11 perf script: Show binary offsets for userspace addr
Show binary offsets for userspace addr with map in perf script output
with callchain.

In commit 19610184693c("perf script: Show virtual addresses instead of
offsets"), the addr shown in perf script output with callchain is changed
from binary offsets to virtual address to fix the incorrectness when
displaying symbol offset.

This is inconvenient in scenario that the binary is stripped and
symbol cannot be resolved. If someone wants to further resolve symbols for
specific binaries later, he would need an extra step to translate virtual
address to binary offset with mapping information recorded in perf.data,
which can be difficult for people not familiar with perf.

This patch modifies function sample__fprintf_callchain to print binary
offset for userspace addr with dsos, and virtual address otherwise. It
does not affect symbol offset calculation so symoff remains correct.

Before applying this patch:

  test  1512    78.711307:     533129 cycles:
  	aaaae0da07f4 [unknown] (/tmp/test)
  	aaaae0da0704 [unknown] (/tmp/test)
  	ffffbe9f7ef4 __libc_start_main+0xe4 (/lib64/libc-2.31.so)

After this patch:

  test  1519   111.330127:     406953 cycles:
  	7f4 [unknown] (/tmp/test)
  	704 [unknown] (/tmp/test)
  	20ef4 __libc_start_main+0xe4 (/lib64/libc-2.31.so)

Fixes: 19610184693c("perf script: Show virtual addresses instead of offsets")

Signed-off-by: Lexi Shao <shaolexi@huawei.com>

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: QiuXi <qiuxi1@huawei.com>
Cc: Wangbing <wangbing6@huawei.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Link: http://lore.kernel.org/lkml/20211019072417.122576-1-shaolexi@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 20:56:36 -03:00
Alistair Francis
c1ff12dac4 perf bench futex: Add support for 32-bit systems with 64-bit time_t
Some 32-bit architectures (such are 32-bit RISC-V) only have a 64-bit
time_t and as such don't have the SYS_futex syscall. This patch will
allow us to use the SYS_futex_time64 syscall on those platforms.

This also converts the futex calls to be y2038 safe (when built for a
5.1+ kernel).

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20211022013343.2262938-2-alistair.francis@opensource.wdc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 20:55:17 -03:00
Alistair Francis
fec5c3a515 perf bench futex: Call the futex syscall from a function
In preparation for a more complex futex() function let's convert the
current macro into two functions. We need two functions to avoid
compiler failures as the macro is overloaded.

This will allow us to include pre-processor conditionals in the futex
syscall functions.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alistair Francis <alistair23@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20211022013343.2262938-1-alistair.francis@opensource.wdc.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 20:54:00 -03:00
Adrian Hunter
624ff63abf perf intel-pt: Support itrace d+o option to direct debug log to stdout
It can be useful to see debug output in between normal output.

Add support for AUXTRACE_LOG_FLG_USE_STDOUT to Intel PT.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:21:01 -03:00
Adrian Hunter
4b2b2c6a7d perf auxtrace: Add itrace d+o option to direct debug log to stdout
It can be useful to see debug output in between normal output.

Add 'o' to the flags of debug option 'd', so that '--itrace=d+o' can
specify output of the debug log to stdout.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:20:45 -03:00
Adrian Hunter
c3afd6e50f perf dlfilter: Add dlfilter-show-cycles
Add a new dlfilter to show cycles.

Cycle counts are accumulated per CPU (or per thread if CPU is not recorded)
from IPC information, and printed together with the change since the last
print, at the start of each line. Separate counts are kept for branches,
instructions or other events.

Note also, the itrace A option can be useful to provide higher granularity
cycle information.

Example:

  $ perf record -e intel_pt/cyc/u uname
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.044 MB perf.data ]
  $ perf script --itrace=A --call-trace --dlfilter dlfilter-show-cycles.so --deltatime | head
         0                   perf-exec  8509 [001]     0.000000000:  psb offs: 0
         0                   perf-exec  8509 [001]     0.000000000:  cbr: 42 freq: 4219 MHz (156%)
       833        833            uname  8509 [001]     0.000047689: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )        _start
       833                       uname  8509 [001]     0.000003261: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      2015       1182            uname  8509 [001]     0.000000282: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      2676        661            uname  8509 [001]     0.000002629: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      3612        936            uname  8509 [001]     0.000001232: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      4579        967            uname  8509 [001]     0.000002519: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      6145       1566            uname  8509 [001]     0.000001050: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )                _dl_setup_hash
      6239         94            uname  8509 [001]     0.000000023: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )                _dl_sysdep_start

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:20:30 -03:00
Adrian Hunter
f2b91386ff perf intel-pt: Support itrace A option to approximate IPC
Normally, for cycle-acccurate mode, IPC values are an exact number of
instructions and cycles. Due to the granularity of timestamps, that happens
only when a CYC packet correlates to the event.

Support the itrace 'A' option, to use instead, the number of cycles
associated with the current timestamp. This provides IPC information for
every change of timestamp, but at the expense of accuracy. Due to the
granularity of timestamps, the actual number of cycles increases even
though the cycles reported does not. The number of instructions is known,
but if IPC is reported, cycles can be too low and so IPC is too high. Note
that inaccuracy decreases as the period of sampling increases i.e. if the
number of cycles is too low by a small amount, that becomes less
significant if the number of cycles is large.

Furthermore, it can be used in conjunction with dlfilter-show-cycles.so
to provide higher granularity cycle information.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:20:18 -03:00
Adrian Hunter
b6778fe1bb perf auxtrace: Add itrace A option to approximate IPC
Add an option to specify that synthesized IPC can be approximate, rather
than completely accurate.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:20:08 -03:00
Adrian Hunter
cf14013b6c perf auxtrace: Add missing Z option to ITRACE_HELP
ITRACE_HELP is used by perf commands to display help text for the --itrace
option. Add missing Z option.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 16:19:58 -03:00
Yucong Sun
e1ef62a4dd selftests/bpf: Adding a namespace reset for tc_redirect
This patch delete ns_src/ns_dst/ns_redir namespaces before recreating
them, making the test more robust.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com
2021-10-27 11:59:02 -07:00
Yucong Sun
9e7240fb2d selftests/bpf: Fix attach_probe in parallel mode
This patch makes attach_probe uses its own method as attach point,
avoiding conflict with other tests like bpf_cookie.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com
2021-10-27 11:59:02 -07:00
Yucong Sun
547208a386 selfetests/bpf: Update vmtest.sh defaults
Increase memory to 4G, 8 SMP core with host cpu passthrough. This
make it run faster in parallel mode and more likely to succeed.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com
2021-10-27 11:58:44 -07:00
Joe Burton
689624f037 libbpf: Deprecate bpf_objects_list
Add a flag to `enum libbpf_strict_mode' to disable the global
`bpf_objects_list', preventing race conditions when concurrent threads
call bpf_object__open() or bpf_object__close().

bpf_object__next() will return NULL if this option is set.

Callers may achieve the same workflow by tracking bpf_objects in
application code.

  [0] Closes: https://github.com/libbpf/libbpf/issues/293

Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
2021-10-27 11:00:12 -07:00
Russell Currey
cb662608e5 selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh
The EPOCHSECONDS environment variable was added in bash 5.0 (released
2019).  Some distributions of the "stable" and "long-term" variety ship
older versions of bash than this, so swap to using the date command
instead.

"%s" was added to coreutils `date` in 1993 so we should be good, but who
knows, it is a GNU extension and not part of the POSIX spec for `date`.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211025102436.19177-1-ruscur@russell.cc
2021-10-27 22:34:02 +11:00
Masami Hiramatsu
25b9513872 selftests/ftrace: Stop tracing while reading the trace file by default
Stop tracing while reading the trace file by default, to prevent
the test results while checking it and to avoid taking a long time
to check the result.
If there is any testcase which wants to test the tracing while reading
the trace file, please override this setting inside the test case.

This also recovers the pause-on-trace when clean it up.

Link: https://lkml.kernel.org/r/163529053143.690749.15365238954175942026.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26 20:27:19 -04:00