Currently the test of BPF STRUCT_OPS depends on the specific bpf
implementation of tcp_congestion_ops, but it can not cover all
basic functionalities (e.g, return value handling), so introduce
a dummy BPF STRUCT_OPS for test purpose.
Loading a bpf_dummy_ops implementation from userspace is prohibited,
and its only purpose is to run BPF_PROG_TYPE_STRUCT_OPS program
through bpf(BPF_PROG_TEST_RUN). Now programs for test_1() & test_2()
are supported. The following three cases are exercised in
bpf_dummy_struct_ops_test_run():
(1) test and check the value returned from state arg in test_1(state)
The content of state is copied from userspace pointer and copied back
after calling test_1(state). The user pointer is saved in an u64 array
and the array address is passed through ctx_in.
(2) test and check the return value of test_1(NULL)
Just simulate the case in which an invalid input argument is passed in.
(3) test multiple arguments passing in test_2(state, ...)
5 arguments are passed through ctx_in in form of u64 array. The first
element of array is userspace pointer of state and others 4 arguments
follow.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211025064025.2567443-4-houtao1@huawei.com
Factor out two helpers to check the read access of ctx for raw tp
and BTF function. bpf_tracing_ctx_access() is used to check
the read access to argument is valid, and bpf_tracing_btf_ctx_access()
checks whether the btf type of argument is valid besides the checking
of argument read. bpf_tracing_btf_ctx_access() will be used by the
following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com
Factor out a helper bpf_struct_ops_prepare_trampoline() to prepare
trampoline for BPF_PROG_TYPE_STRUCT_OPS prog. It will be used by
.test_run callback in following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211025064025.2567443-2-houtao1@huawei.com
This patch is closely related to commit 6016df8fe8 ("selftests/bpf:
Fix broken riscv build"). When clang includes the system include
directories, but targeting BPF program, __BITS_PER_LONG defaults to
32, unless explicitly set. Work around this problem, by explicitly
setting __BITS_PER_LONG to __riscv_xlen.
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211028161057.520552-5-bjorn@kernel.org
Add RISC-V to the HOSTARCH parsing, so that ARCH is "riscv", and not
"riscv32" or "riscv64".
This affects the perf and libbpf builds, so that arch specific
includes are correctly picked up for RISC-V.
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211028161057.520552-3-bjorn@kernel.org
Now that BPF programs can be up to 1M instructions, it is not uncommon
that a program requires more than the current 16 iterations to
converge.
Bump it to 32, which is enough for selftests/bpf, and test_bpf.ko.
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211028161057.520552-2-bjorn@kernel.org
Add the test to check sockmap with strparser is working well.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211029141216.211899-3-liujian56@huawei.com
After "skmsg: lose offset info in sk_psock_skb_ingress", the test case
with ktls failed. This because ktls parser(tls_read_size) return value
is 285 not 256.
The case like this:
tls_sk1 --> redir_sk --> tls_sk2
tls_sk1 sent out 512 bytes data, after tls related processing redir_sk
recved 570 btyes data, and redirect 512 (skb_use_parser) bytes data to
tls_sk2; but tls_sk2 needs 285 * 2 bytes data, receive timeout occurred.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211029141216.211899-2-liujian56@huawei.com
If sockmap enable strparser, there are lose offset info in
sk_psock_skb_ingress(). If the length determined by parse_msg function is not
skb->len, the skb will be converted to sk_msg multiple times, and userspace
app will get the data multiple times.
Fix this by get the offset and length from strp_msg. And as Cong suggested,
add one bit in skb->_sk_redir to distinguish enable or disable strparser.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20211029141216.211899-1-liujian56@huawei.com
Disabling unprivileged BPF would help prevent unprivileged users from
creating certain conditions required for potential speculative execution
side-channel attacks on unmitigated affected hardware.
A deep dive on such attacks and current mitigations is available here [0].
Sync with what many distros are currently applying already, and disable
unprivileged BPF by default. An admin can enable this at runtime, if
necessary, as described in 08389d8882 ("bpf: Add kconfig knob for
disabling unpriv bpf by default").
[0] "BPF and Spectre: Mitigating transient execution attacks", Daniel Borkmann, eBPF Summit '21
https://ebpf.io/summit-2021-slides/eBPF_Summit_2021-Keynote-Daniel_Borkmann-BPF_and_Spectre.pdf
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/bpf/0ace9ce3f97656d5f62d11093ad7ee81190c3c25.1635535215.git.pawan.kumar.gupta@linux.intel.com
Make sure to use pclose() to properly close the pipe opened by popen().
Fixes: 81f77fd0de ("bpf: add selftest for stackmap with BPF_F_STACK_BUILD_ID")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20211026143409.42666-1-andrea.righi@canonical.com
Kumar Kartikeya says:
====================
Patches (1,2,3,6) add typeless and weak ksym support to gen_loader. It is follow
up for the recent kfunc from modules series.
The later patches (7,8) are misc fixes for selftests, and patch 4 for libbpf
where we try to be careful to not end up with fds == 0, as libbpf assumes in
various places that they are greater than 0. Patch 5 fixes up missing O_CLOEXEC
in libbpf.
Changelog:
----------
v4 -> v5
v4: https://lore.kernel.org/bpf/20211020191526.2306852-1-memxor@gmail.com
* Address feedback from Andrii
* Drop use of ensure_good_fd in unneeded call sites
* Add sys_bpf_fd
* Add _lskel suffix to all light skeletons and change all current selftests
* Drop early break in close loop for sk_lookup
* Fix other nits
v3 -> v4
v3: https://lore.kernel.org/bpf/20211014205644.1837280-1-memxor@gmail.com
* Remove gpl_only = true from bpf_kallsyms_lookup_name (Alexei)
* Add bpf_dump_raw_ok check to ensure kptr_restrict isn't bypassed (Alexei)
v2 -> v3
v2: https://lore.kernel.org/bpf/20211013073348.1611155-1-memxor@gmail.com
* Address feedback from Song
* Move ksym logging to separate helper to avoid code duplication
* Move src_reg mask stuff to separate helper
* Fix various other nits, add acks
* __builtin_expect is used instead of likely to as skel_internal.h is
included in isolation.
v1 -> v2
v1: https://lore.kernel.org/bpf/20211006002853.308945-1-memxor@gmail.com
* Remove redundant OOM checks in emit_bpf_kallsyms_lookup_name
* Use designated initializer for sk_lookup fd array (Jakub)
* Do fd check for all fd returning low level APIs (Andrii, Alexei)
* Make Fixes: tag quote commit message, use selftests/bpf prefix (Song, Andrii)
* Split typeless and weak ksym support into separate patches, expand commit
message (Song)
* Fix duplication in selftests stemming from use of LSKELS_EXTRA (Song)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The allocated ring buffer is never freed, do so in the cleanup path.
Fixes: f446b570ac ("bpf/selftests: Update the IMA test to use BPF ring buffer")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-9-memxor@gmail.com
Similar to the fix in commit:
e31eec77e4 ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")
We use designated initializer to set fds to -1 without breaking on
future changes to MAX_SERVER constant denoting the array size.
The particular close(0) occurs on non-reuseport tests, so it can be seen
with -n 115/{2,3} but not 115/4. This can cause problems with future
tests if they depend on BTF fd never being acquired as fd 0, breaking
internal libbpf assumptions.
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-8-memxor@gmail.com
Also, avoid using CO-RE features, as lskel doesn't support CO-RE, yet.
Include both light and libbpf skeleton in same file to test both of them
together.
In c48e51c8b0 ("bpf: selftests: Add selftests for module kfunc support"),
I added support for generating both lskel and libbpf skel for a BPF
object, however the name parameter for bpftool caused collisions when
included in same file together. This meant that every test needed a
separate file for a libbpf/light skeleton separation instead of
subtests.
Change that by appending a "_lskel" suffix to the name for files using
light skeleton, and convert all existing users.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-7-memxor@gmail.com
There are some instances where we don't use O_CLOEXEC when opening an
fd, fix these up. Otherwise, it is possible that a parallel fork causes
these fds to leak into a child process on execve.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-6-memxor@gmail.com
Add a simple wrapper for passing an fd and getting a new one >= 3 if it
is one of 0, 1, or 2. There are two primary reasons to make this change:
First, libbpf relies on the assumption a certain BPF fd is never 0 (e.g.
most recently noticed in [0]). Second, Alexei pointed out in [1] that
some environments reset stdin, stdout, and stderr if they notice an
invalid fd at these numbers. To protect against both these cases, switch
all internal BPF syscall wrappers in libbpf to always return an fd >= 3.
We only need to modify the syscall wrappers and not other code that
assumes a valid fd by doing >= 0, to avoid pointless churn, and because
it is still a valid assumption. The cost paid is two additional syscalls
if fd is in range [0, 2].
[0]: e31eec77e4 ("bpf: selftests: Fix fd cleanup in get_branch_snapshot")
[1]: https://lore.kernel.org/bpf/CAADnVQKVKY8o_3aU8Gzke443+uHa-eGoM0h7W4srChMXU1S4Bg@mail.gmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-5-memxor@gmail.com
This extends existing ksym relocation code to also support relocating
weak ksyms. Care needs to be taken to zero out the src_reg (currently
BPF_PSEUOD_BTF_ID, always set for gen_loader by bpf_object__relocate_data)
when the BTF ID lookup fails at runtime. This is not a problem for
libbpf as it only sets ext->is_set when BTF ID lookup succeeds (and only
proceeds in case of failure if ext->is_weak, leading to src_reg
remaining as 0 for weak unresolved ksym).
A pattern similar to emit_relo_kfunc_btf is followed of first storing
the default values and then jumping over actual stores in case of an
error. For src_reg adjustment, we also need to perform it when copying
the populated instruction, so depending on if copied insn[0].imm is 0 or
not, we decide to jump over the adjustment.
We cannot reach that point unless the ksym was weak and resolved and
zeroed out, as the emit_check_err will cause us to jump to cleanup
label, so we do not need to recheck whether the ksym is weak before
doing the adjustment after copying BTF ID and BTF FD.
This is consistent with how libbpf relocates weak ksym. Logging
statements are added to show the relocation result and aid debugging.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-4-memxor@gmail.com
This uses the bpf_kallsyms_lookup_name helper added in previous patches
to relocate typeless ksyms. The return value ENOENT can be ignored, and
the value written to 'res' can be directly stored to the insn, as it is
overwritten to 0 on lookup failure. For repeating symbols, we can simply
copy the previously populated bpf_insn.
Also, we need to take care to not close fds for typeless ksym_desc, so
reuse the 'off' member's space to add a marker for typeless ksym and use
that to skip them in cleanup_relos.
We add a emit_ksym_relo_log helper that avoids duplicating common
logging instructions between typeless and weak ksym (for future commit).
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-3-memxor@gmail.com
This helper allows us to get the address of a kernel symbol from inside
a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can
relocate typeless ksym vars.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
Joanne Koong says:
====================
This patchset adds a new kind of bpf map: the bloom filter map.
Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
For a brief overview about how bloom filters work,
https://en.wikipedia.org/wiki/Bloom_filter
may be helpful.
One example use-case is an application leveraging a bloom filter
map to determine whether a computationally expensive hashmap
lookup can be avoided. If the element was not found in the bloom
filter map, the hashmap lookup can be skipped.
This patchset includes benchmarks for testing the performance of
the bloom filter for different entry sizes and different number of
hash functions used, as well as comparisons for hashmap lookups
with vs. without the bloom filter.
A high level overview of this patchset is as follows:
1/5 - kernel changes for adding bloom filter map
2/5 - libbpf changes for adding map_extra flags
3/5 - tests for the bloom filter map
4/5 - benchmarks for bloom filter lookup/update throughput and false positive
rate
5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom
filter
v5 -> v6:
* in 1/5: remove "inline" from the hash function, add check in syscall to
fail out in cases where map_extra is not 0 for non-bloom-filter maps,
fix alignment matching issues, move "map_extra flags" comments to inside
the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra
assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of
a u64
* in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about
extending BTF arrays to cover u64s, cast to unsigned long long for %llx when
printing out map_extra flags
* in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values
and keys
* in 4/5: fix wrong bounds for the index when iterating through random values,
update commit message to include update+lookup benchmark results for 8 byte
and 64-byte value sizes, remove explicit global bool initializaton to false
for hashmap_use_bloom and count_false_hits variables
v4 -> v5:
* Change the "bitset map with bloom filter capabilities" to a bloom filter map
with max_entries signifying the number of unique entries expected in the bloom
filter, remove bitset tests
* Reduce verbiage by changing "bloom_filter" to "bloom", and renaming progs to
more concise names.
* in 2/5: remove "map_extra" from struct definitions that are frozen, create a
"bpf_create_map_params" struct to propagate map_extra to the kernel at map
creation time, change map_extra to __u64
* in 4/5: check pthread condition variable in a loop when generating initial
map data, remove "err" checks where not pragmatic, generate random values
for the hashmap in the setup() instead of in the bpf program, add check_args()
for checking that there aren't more requested entries than possible unique
entries for the specified value size
* in 5/5: Update commit message with updated benchmark data
v3 -> v4:
* Generalize the bloom filter map to be a bitset map with bloom filter
capabilities
* Add map_extra flags; pass in nr_hash_funcs through lower 4 bits of map_extra
for the bitset map
* Add tests for the bitset map (non-bloom filter) functionality
* In the benchmarks, stats are computed only as monotonic increases, and place
stats in a struct instead of as a percpu_array bpf map
v2 -> v3:
* Add libbpf changes for supporting nr_hash_funcs, instead of passing the
number of hash functions through map_flags.
* Separate the hashing logic in kernel/bpf/bloom_filter.c into a helper
function
v1 -> v2:
* Remove libbpf changes, and pass the number of hash functions through
map_flags instead.
* Default to using 5 hash functions if no number of hash functions
is specified.
* Use set_bit instead of spinlocks in the bloom filter bitmap. This
improved the speed significantly. For example, using 5 hash functions
with 100k entries, there was roughly a 35% speed increase.
* Use jhash2 (instead of jhash) for u32-aligned value sizes. This
increased the speed by roughly 5 to 15%. When using jhash2 on value
sizes non-u32 aligned (truncating any remainder bits), there was not
a noticeable difference.
* Add test for using the bloom filter as an inner map.
* Reran the benchmarks, updated the commit messages to correspond to
the new results.
====================
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds benchmark tests for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.
Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.
On average, using 5 hash functions in the bloom filter tended to perform
the best across the widest range of different entry sizes. The benchmark
results using 5 hash functions (running on 8 threads on a machine with one
numa node, and taking the average of 3 runs) were roughly as follows:
value_size = 4 bytes -
10k entries: 30% faster
50k entries: 40% faster
100k entries: 40% faster
500k entres: 70% faster
1 million entries: 90% faster
5 million entries: 140% faster
value_size = 8 bytes -
10k entries: 30% faster
50k entries: 40% faster
100k entries: 50% faster
500k entres: 80% faster
1 million entries: 100% faster
5 million entries: 150% faster
value_size = 16 bytes -
10k entries: 20% faster
50k entries: 30% faster
100k entries: 35% faster
500k entres: 65% faster
1 million entries: 85% faster
5 million entries: 110% faster
value_size = 40 bytes -
10k entries: 5% faster
50k entries: 15% faster
100k entries: 20% faster
500k entres: 65% faster
1 million entries: 75% faster
5 million entries: 120% faster
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-6-joannekoong@fb.com
This patch adds benchmark tests for the throughput (for lookups + updates)
and the false positive rate of bloom filter lookups, as well as some
minor refactoring of the bash script for running the benchmarks.
These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter decreases.
>From the benchmark data, the approximate average false-positive rates
are roughly as follows:
1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%
For reference data, the benchmarks run on one thread on a machine
with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
values are as follows:
1 hash function:
50k entries
8-byte value
Lookups - 51.1 M/s operations
Updates - 33.6 M/s operations
False positive rate: 24.15%
64-byte value
Lookups - 15.7 M/s operations
Updates - 15.1 M/s operations
False positive rate: 24.2%
100k entries
8-byte value
Lookups - 51.0 M/s operations
Updates - 33.4 M/s operations
False positive rate: 24.04%
64-byte value
Lookups - 15.6 M/s operations
Updates - 14.6 M/s operations
False positive rate: 24.06%
500k entries
8-byte value
Lookups - 50.5 M/s operations
Updates - 33.1 M/s operations
False positive rate: 27.45%
64-byte value
Lookups - 15.6 M/s operations
Updates - 14.2 M/s operations
False positive rate: 27.42%
1 mil entries
8-byte value
Lookups - 49.7 M/s operations
Updates - 32.9 M/s operations
False positive rate: 27.45%
64-byte value
Lookups - 15.4 M/s operations
Updates - 13.7 M/s operations
False positive rate: 27.58%
2.5 mil entries
8-byte value
Lookups - 47.2 M/s operations
Updates - 31.8 M/s operations
False positive rate: 30.94%
64-byte value
Lookups - 15.3 M/s operations
Updates - 13.2 M/s operations
False positive rate: 30.95%
5 mil entries
8-byte value
Lookups - 41.1 M/s operations
Updates - 28.1 M/s operations
False positive rate: 31.01%
64-byte value
Lookups - 13.3 M/s operations
Updates - 11.4 M/s operations
False positive rate: 30.98%
2 hash functions:
50k entries
8-byte value
Lookups - 34.1 M/s operations
Updates - 20.1 M/s operations
False positive rate: 9.13%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.9 M/s operations
False positive rate: 9.21%
100k entries
8-byte value
Lookups - 33.7 M/s operations
Updates - 18.9 M/s operations
False positive rate: 9.13%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.7 M/s operations
False positive rate: 9.19%
500k entries
8-byte value
Lookups - 32.7 M/s operations
Updates - 18.1 M/s operations
False positive rate: 12.61%
64-byte value
Lookups - 8.4 M/s operations
Updates - 7.5 M/s operations
False positive rate: 12.61%
1 mil entries
8-byte value
Lookups - 30.6 M/s operations
Updates - 18.9 M/s operations
False positive rate: 12.54%
64-byte value
Lookups - 8.0 M/s operations
Updates - 7.0 M/s operations
False positive rate: 12.52%
2.5 mil entries
8-byte value
Lookups - 25.3 M/s operations
Updates - 16.7 M/s operations
False positive rate: 16.77%
64-byte value
Lookups - 7.9 M/s operations
Updates - 6.5 M/s operations
False positive rate: 16.88%
5 mil entries
8-byte value
Lookups - 20.8 M/s operations
Updates - 14.7 M/s operations
False positive rate: 16.78%
64-byte value
Lookups - 7.0 M/s operations
Updates - 6.0 M/s operations
False positive rate: 16.78%
3 hash functions:
50k entries
8-byte value
Lookups - 25.1 M/s operations
Updates - 14.6 M/s operations
False positive rate: 7.65%
64-byte value
Lookups - 5.8 M/s operations
Updates - 5.5 M/s operations
False positive rate: 7.58%
100k entries
8-byte value
Lookups - 24.7 M/s operations
Updates - 14.1 M/s operations
False positive rate: 7.71%
64-byte value
Lookups - 5.8 M/s operations
Updates - 5.3 M/s operations
False positive rate: 7.62%
500k entries
8-byte value
Lookups - 22.9 M/s operations
Updates - 13.9 M/s operations
False positive rate: 2.62%
64-byte value
Lookups - 5.6 M/s operations
Updates - 4.8 M/s operations
False positive rate: 2.7%
1 mil entries
8-byte value
Lookups - 19.8 M/s operations
Updates - 12.6 M/s operations
False positive rate: 2.60%
64-byte value
Lookups - 5.3 M/s operations
Updates - 4.4 M/s operations
False positive rate: 2.69%
2.5 mil entries
8-byte value
Lookups - 16.2 M/s operations
Updates - 10.7 M/s operations
False positive rate: 4.49%
64-byte value
Lookups - 4.9 M/s operations
Updates - 4.1 M/s operations
False positive rate: 4.41%
5 mil entries
8-byte value
Lookups - 18.8 M/s operations
Updates - 9.2 M/s operations
False positive rate: 4.45%
64-byte value
Lookups - 5.2 M/s operations
Updates - 3.9 M/s operations
False positive rate: 4.54%
4 hash functions:
50k entries
8-byte value
Lookups - 19.7 M/s operations
Updates - 11.1 M/s operations
False positive rate: 1.01%
64-byte value
Lookups - 4.4 M/s operations
Updates - 4.0 M/s operations
False positive rate: 1.00%
100k entries
8-byte value
Lookups - 19.5 M/s operations
Updates - 10.9 M/s operations
False positive rate: 1.00%
64-byte value
Lookups - 4.3 M/s operations
Updates - 3.9 M/s operations
False positive rate: 0.97%
500k entries
8-byte value
Lookups - 18.2 M/s operations
Updates - 10.6 M/s operations
False positive rate: 2.05%
64-byte value
Lookups - 4.3 M/s operations
Updates - 3.7 M/s operations
False positive rate: 2.05%
1 mil entries
8-byte value
Lookups - 15.5 M/s operations
Updates - 9.6 M/s operations
False positive rate: 1.99%
64-byte value
Lookups - 4.0 M/s operations
Updates - 3.4 M/s operations
False positive rate: 1.99%
2.5 mil entries
8-byte value
Lookups - 13.8 M/s operations
Updates - 7.7 M/s operations
False positive rate: 3.91%
64-byte value
Lookups - 3.7 M/s operations
Updates - 3.6 M/s operations
False positive rate: 3.78%
5 mil entries
8-byte value
Lookups - 13.0 M/s operations
Updates - 6.9 M/s operations
False positive rate: 3.93%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.7 M/s operations
False positive rate: 3.39%
5 hash functions:
50k entries
8-byte value
Lookups - 16.4 M/s operations
Updates - 9.1 M/s operations
False positive rate: 0.78%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.2 M/s operations
False positive rate: 0.77%
100k entries
8-byte value
Lookups - 16.3 M/s operations
Updates - 9.0 M/s operations
False positive rate: 0.79%
64-byte value
Lookups - 3.5 M/s operations
Updates - 3.2 M/s operations
False positive rate: 0.78%
500k entries
8-byte value
Lookups - 15.1 M/s operations
Updates - 8.8 M/s operations
False positive rate: 1.82%
64-byte value
Lookups - 3.4 M/s operations
Updates - 3.0 M/s operations
False positive rate: 1.78%
1 mil entries
8-byte value
Lookups - 13.2 M/s operations
Updates - 7.8 M/s operations
False positive rate: 1.81%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.8 M/s operations
False positive rate: 1.80%
2.5 mil entries
8-byte value
Lookups - 10.5 M/s operations
Updates - 5.9 M/s operations
False positive rate: 0.29%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.4 M/s operations
False positive rate: 0.28%
5 mil entries
8-byte value
Lookups - 9.6 M/s operations
Updates - 5.7 M/s operations
False positive rate: 0.30%
64-byte value
Lookups - 3.2 M/s operations
Updates - 2.7 M/s operations
False positive rate: 0.30%
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com
This patch adds test cases for bpf bloom filter maps. They include tests
checking against invalid operations by userspace, tests for using the
bloom filter map as an inner map, and a bpf program that queries the
bloom filter map for values added by a userspace program.
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com
This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.
For example, for the bloom filter map, the lower 4 bits of
map_extra is used to denote the number of hash functions.
Please note that until libbpf 1.0 is here, the
"bpf_create_map_params" struct is used as a temporary
means for propagating the map_extra field to the kernel.
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.
The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:
BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push
The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.
For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.
A few things to please take note of:
* If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
* The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
* Deleting an element in the bloom filter map is not supported.
* The bloom filter map may be used as an inner map.
* The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.
Signed-off-by: Joanne Koong <joannekoong@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
After commit 9298e63eaf ("bpf/tests: Add exhaustive tests of ALU
operand magnitudes"), when modprobe test_bpf.ko with JIT on mips64,
there exists segment fault due to the following reason:
[...]
ALU64_MOV_X: all register value magnitudes jited:1
Break instruction in kernel code[#1]
[...]
It seems that the related JIT implementations of some test cases
in test_bpf() have problems. At this moment, I do not care about
the segment fault while I just want to verify the test cases of
tail calls.
Based on the above background and motivation, add the following
module parameter test_suite to the test_bpf.ko:
test_suite=<string>: only the specified test suite will be run, the
string can be "test_bpf", "test_tail_calls" or "test_skb_segment".
If test_suite is not specified, but test_id, test_name or test_range
is specified, set 'test_bpf' as the default test suite. This is useful
to only test the corresponding test suite when specifying the valid
test_suite string.
Any invalid test suite will result in -EINVAL being returned and no
tests being run. If the test_suite is not specified or specified as
empty string, it does not change the current logic, all of the test
cases will be run.
Here are some test results:
# dmesg -c
# modprobe test_bpf
# dmesg | grep Summary
test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_bpf
# dmesg | tail -1
test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_tail_calls
# dmesg
test_bpf: #0 Tail call leaf jited:0 21 PASS
[...]
test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_skb_segment
# dmesg
test_bpf: #0 gso_with_rx_frags PASS
test_bpf: #1 gso_linear_no_head_frag PASS
test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_id=1
# dmesg
test_bpf: test_bpf: set 'test_bpf' as the default test_suite.
test_bpf: #1 TXA jited:0 54 51 50 PASS
test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_bpf test_name=TXA
# dmesg
test_bpf: #1 TXA jited:0 54 50 51 PASS
test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_tail_calls test_range=6,7
# dmesg
test_bpf: #6 Tail call error path, NULL target jited:0 41 PASS
test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
test_bpf: test_tail_calls: Summary: 2 PASSED, 0 FAILED, [0/2 JIT'ed]
# rmmod test_bpf
# dmesg -c
# modprobe test_bpf test_suite=test_skb_segment test_id=1
# dmesg
test_bpf: #1 gso_linear_no_head_frag PASS
test_bpf: test_skb_segment: Summary: 1 PASSED, 0 FAILED
By the way, the above segment fault has been fixed in the latest bpf-next
tree which contains the mips64 JIT rework.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
Link: https://lore.kernel.org/bpf/1635384321-28128-1-git-send-email-yangtiezhu@loongson.cn
When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the riscv JIT does not currently recognize
this flag it falls back to the interpreter.
Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.
A more generic solution would add a "handler" field to the table entry,
like on x86 and s390. The same issue in ARM64 is fixed in 8008342853
("bpf, arm64: Add BPF exception tables").
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20211027111822.3801679-1-tongtiangen@huawei.com
Yucong Sun says:
====================
Several patches to improve parallel execution mode, updating vmtest.sh
and fixed two previously dropped patches according to feedback.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
This patch delete ns_src/ns_dst/ns_redir namespaces before recreating
them, making the test more robust.
Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com
This patch makes attach_probe uses its own method as attach point,
avoiding conflict with other tests like bpf_cookie.
Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com
Increase memory to 4G, 8 SMP core with host cpu passthrough. This
make it run faster in parallel mode and more likely to succeed.
Signed-off-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com
Eric Dumazet says:
====================
From: Eric Dumazet <edumazet@google.com>
Two first patches fix bugs added in 5.1 and 5.5
Third patch replaces the u64 fields in struct bpf_prog_stats
with u64_stats_t ones to avoid possible sampling errors,
in case of load/store stearing.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 316580b69d ("u64_stats: provide u64_stats_t type")
fixed possible load/store tearing on 64bit arches.
For instance the following C code
stats->nsecs += sched_clock() - start;
Could be rightfully implemented like this by a compiler,
confusing concurrent readers a lot:
stats->nsecs += sched_clock();
// arbitrary delay
stats->nsecs -= start;
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
It seems update_prog_stats() suffers from same issue fixed
in the prior patch:
As it can run while interrupts are enabled, it could
be re-entered and the u64_stats syncp could be mangled.
Fixes: fec56f5890 ("bpf: Introduce BPF trampoline")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com
Add a flag to `enum libbpf_strict_mode' to disable the global
`bpf_objects_list', preventing race conditions when concurrent threads
call bpf_object__open() or bpf_object__close().
bpf_object__next() will return NULL if this option is set.
Callers may achieve the same workflow by tracking bpf_objects in
application code.
[0] Closes: https://github.com/libbpf/libbpf/issues/293
Signed-off-by: Joe Burton <jevburton@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
Function in modules could appear in /proc/kallsyms in random order.
ffffffffa02608a0 t bpf_testmod_loop_test
ffffffffa02600c0 t __traceiter_bpf_testmod_test_writable_bare
ffffffffa0263b60 d __tracepoint_bpf_testmod_test_write_bare
ffffffffa02608c0 T bpf_testmod_test_read
ffffffffa0260d08 t __SCT__tp_func_bpf_testmod_test_writable_bare
ffffffffa0263300 d __SCK__tp_func_bpf_testmod_test_read
ffffffffa0260680 T bpf_testmod_test_write
ffffffffa0260860 t bpf_testmod_test_mod_kfunc
Therefore, we cannot reliably use kallsyms_find_next() to find the end of
a function. Replace it with a simple guess (start + 128). This is good
enough for this test.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211022234814.318457-1-songliubraving@fb.com
Skipping the second half of the test is not enough to silent the warning
in dmesg. Skip the whole test before we can either properly silent the
warning in kernel, or fix LBR snapshot for VM.
Fixes: 025bd7c753 ("selftests/bpf: Add test for bpf_get_branch_snapshot")
Fixes: aa67fdb464 ("selftests/bpf: Skip the second half of get_branch_snapshot in vm")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026000733.477714-1-songliubraving@fb.com
Ilya Leoshkevich says:
====================
v2: https://lore.kernel.org/bpf/20211025131214.731972-1-iii@linux.ibm.com/
v2 -> v3: Split the fix from the cleanup (Daniel).
v1: https://lore.kernel.org/bpf/20211021234653.643302-1-iii@linux.ibm.com/
v1 -> v2: Drop bpf_core_calc_field_relo() restructuring, split the
__BYTE_ORDER__ change (Andrii).
Hi,
this series fixes test failures in core_reloc on s390.
Patch 1 fixes an endianness bug with __BYTE_ORDER vs __BYTE_ORDER__.
Patches 2-5 make the rest of the code consistent in that respect.
Patch 6 fixes an endianness issue in test_core_reloc_mods.
Best regards,
Ilya
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-6-iii@linux.ibm.com
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-5-iii@linux.ibm.com
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-4-iii@linux.ibm.com
Use the compiler-defined __BYTE_ORDER__ instead of the libc-defined
__BYTE_ORDER for consistency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-3-iii@linux.ibm.com
__BYTE_ORDER is supposed to be defined by a libc, and __BYTE_ORDER__ -
by a compiler. bpf_core_read.h checks __BYTE_ORDER == __LITTLE_ENDIAN,
which is true if neither are defined, leading to incorrect behavior on
big-endian hosts if libc headers are not included, which is often the
case.
Fixes: ee26dade0e ("libbpf: Add support for relocatable bitfields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211026010831.748682-2-iii@linux.ibm.com
Andrii Nakryiko says:
====================
Add libbpf APIs to access BPF program instructions. Both before and after
libbpf processing (before and after bpf_object__load()). This allows to
inspect what's going on with BPF program assembly instructions as libbpf
performs its processing magic.
But in more practical terms, this allows to do a no-brainer BPF program
cloning, which is something you need when working with fentry/fexit BPF
programs to be able to attach the same BPF program code to multiple kernel
functions. Currently, kernel needs multiple copies of BPF programs, each
loaded with its own target BTF ID. retsnoop is one such example that
previously had to rely on bpf_program__set_prep() API to hijack program
instructions ([0] for before and after).
Speaking of bpf_program__set_prep() API and the whole concept of
multiple-instance BPF programs in libbpf, all that is scheduled for
deprecation in v0.7. It doesn't work well, it's cumbersome, and it will become
more broken as libbpf adds more functionality. So deprecate and remove it in
libbpf 1.0. It doesn't seem to be used by anyone anyways (except for that
retsnoop hack, which is now much cleaner with new APIs as can be seen in [0]).
[0] https://github.com/anakryiko/retsnoop/pull/1
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The name of the API doesn't convey clearly that this size is in number
of bytes (there needed to be a separate comment to make this clear in
libbpf.h). Further, measuring the size of BPF program in bytes is not
exactly the best fit, because BPF programs always consist of 8-byte
instructions. As such, bpf_program__insn_cnt() is a better alternative
in pretty much any imaginable case.
So schedule bpf_program__size() deprecation starting from v0.7 and it
will be removed in libbpf 1.0.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211025224531.1088894-5-andrii@kernel.org