libbpf 1.0 is not going to support passing ifindex to BPF
prog/map/helper feature probing APIs. Remove the support for BPF offload
feature probing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-3-andrii@kernel.org
Open-code bpf_map__is_offload_neutral() logic in one place in
to-be-deprecated bpf_prog_load_xattr2.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220202225916.3313522-2-andrii@kernel.org
Delyan Kratunov says:
====================
Fairly straight-forward mechanical transformation from bpf_prog_test_run
and bpf_prog_test_run_xattr to the bpf_prog_test_run_opts goodness.
I did a fair amount of drive-by CHECK/CHECK_ATTR cleanups as well, though
certainly not everything possible. Primarily, I did not want to just change
arguments to CHECK calls, though I had to do a bit more than that
in some cases (overall, -119 CHECK calls and all CHECK_ATTR calls).
v2 -> v3:
Don't introduce CHECK_OPTS, replace CHECK/CHECK_ATTR usages we need to touch
with ASSERT_* calls instead.
Don't be prescriptive about the opts var name and keep old names where that would
minimize unnecessary code churn.
Drop _xattr-specific checks in prog_run_xattr and rename accordingly.
v1 -> v2:
Split selftests/bpf changes into two commits to appease the mailing list.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
bpf_prog_test_run_xattr is being deprecated in favor of the OPTS-based
bpf_prog_test_run_opts.
We end up unable to use CHECK_ATTR so replace usages with ASSERT_* calls.
Also, prog_run_xattr is now prog_run_opts.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202235423.1097270-3-delyank@fb.com
bpf_prog_test_run is being deprecated in favor of the OPTS-based
bpf_prog_test_run_opts.
We end up unable to use CHECK in most cases, so replace usages with
ASSERT_* calls.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202235423.1097270-2-delyank@fb.com
Nathan Chancellor says:
====================
This series allows CONFIG_DEBUG_INFO_DWARF5 to be selected with
CONFIG_DEBUG_INFO_BTF=y by checking the pahole version.
The first four patches add CONFIG_PAHOLE_VERSION and
scripts/pahole-version.sh to clean up all the places that pahole's
version is transformed into a 3-digit form.
The fourth patch adds a PAHOLE_VERSION dependency to DEBUG_INFO_DWARF5
so that there are no build errors when it is selected with
DEBUG_INFO_BTF.
I build tested Fedora's aarch64 and x86_64 config with ToT clang 14.0.0
and GCC 11 with CONFIG_DEBUG_INFO_DWARF5 enabled with both pahole 1.21
and 1.23.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now that CONFIG_PAHOLE_VERSION exists, use it in the definition of
CONFIG_PAHOLE_HAS_SPLIT_BTF and CONFIG_PAHOLE_HAS_BTF_TAG to reduce the
amount of duplication across the tree.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-5-nathan@kernel.org
Use pahole-version.sh to get pahole's version code to reduce the amount
of duplication across the tree.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-4-nathan@kernel.org
There are a few different places where pahole's version is turned into a
three digit form with the exact same command. Move this command into
scripts/pahole-version.sh to reduce the amount of duplication across the
tree.
Create CONFIG_PAHOLE_VERSION so the version code can be used in Kconfig
to enable and disable configuration options based on the pahole version,
which is already done in a couple of places.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-3-nathan@kernel.org
Currently, scripts/pahole-flags.sh has no formal maintainer. Add it to
the BPF section so that patches to it can be properly reviewed and
picked up.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-2-nathan@kernel.org
Alexei Starovoitov says:
====================
CO-RE in the kernel support allows bpf preload to switch to light
skeleton and remove libbpf dependency.
This reduces the size of bpf_preload_umd from 300kbyte to 19kbyte
and eventually will make "kernel skeleton" possible.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Drop libbpf, libelf, libz dependency from bpf preload.
This reduces bpf_preload_umd binary size
from 1.7M to 30k unstripped with debug info
and from 300k to 19k stripped.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-8-alexei.starovoitov@gmail.com
Open code obj_get_info_by_fd in bpf preload.
It's the last part of libbpf that preload/iterators were using.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-7-alexei.starovoitov@gmail.com
BPF programs and maps are memcg accounted. setrlimit is obsolete.
Remove its use from bpf preload.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-5-alexei.starovoitov@gmail.com
Open code raw_tracepoint_open and link_create used by light skeleton
to be able to avoid full libbpf eventually.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-4-alexei.starovoitov@gmail.com
Open code low level bpf commands used by light skeleton to
be able to avoid full libbpf eventually.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-3-alexei.starovoitov@gmail.com
bpf iterator programs should use bpf_link_create to attach instead of
bpf_raw_tracepoint_open like other tracing programs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220131220528.98088-2-alexei.starovoitov@gmail.com
Deprecate xdp_cpumap, xdp_devmap and classifier sec definitions.
Introduce xdp/devmap and xdp/cpumap definitions according to the
standard for SEC("") in libbpf:
- prog_type.prog_flags/attach_place
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/5c7bd9426b3ce6a31d9a4b1f97eb299e1467fc52.1643727185.git.lorenzo@kernel.org
btf_ext__{func,line}_info_rec_size functions are used in conjunction
with already-deprecated btf_ext__reloc_{func,line}_info functions. Since
struct btf_ext is opaque to the user it was necessary to expose rec_size
getters in the past.
btf_ext__reloc_{func,line}_info were deprecated in commit 8505e8709b
("libbpf: Implement generalized .BTF.ext func/line info adjustment")
as they're not compatible with support for multiple programs per
section. It was decided[0] that users of these APIs should implement their
own .btf.ext parsing to access this data, in which case the rec_size
getters are unnecessary. So deprecate them from libbpf 0.7.0 onwards.
[0] Closes: https://github.com/libbpf/libbpf/issues/277
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201014610.3522985-1-davemarchevsky@fb.com
Jakub Sitnicki says:
====================
This is a follow-up to discussion around the idea of making dst_port in struct
bpf_sock a 16-bit field that happened in [1].
v2:
- use an anonymous field for zero padding (Alexei)
v1:
- keep dst_field offset unchanged to prevent existing BPF program breakage
(Martin)
- allow 8-bit loads from dst_port[0] and [1]
- add test coverage for the verifier and the context access converter
[1] https://lore.kernel.org/bpf/87sftbobys.fsf@cloudflare.com/
====================
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add coverage to the verifier tests and tests for reading bpf_sock fields to
ensure that 32-bit, 16-bit, and 8-bit loads from dst_port field are allowed
only at intended offsets and produce expected values.
While 16-bit and 8-bit access to dst_port field is straight-forward, 32-bit
wide loads need be allowed and produce a zero-padded 16-bit value for
backward compatibility.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220130115518.213259-3-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:
offsetof(struct bpf_sock, dst_port) + 0 <port MSB>
+ 8 <port LSB>
+16 0x00
+24 0x00
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
Reported-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hangbin Liu says:
====================
There are some bpf tests using hard code netns name like ns0, ns1, etc.
This kind of ns name is easily used by other tests or system. If there
is already a such netns, all the related tests will failed. So let's
use temp netns name for testing.
The first patch not only change to temp netns. But also fixed an interface
index issue. So I add fixes tag. For the later patches, I think that
should be an update instead of fixes, so the fixes tag is not added.
====================
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use temp netns instead of hard code name for testing in case the
netns already exists.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/r/20220125081717.1260849-8-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use temp netns instead of hard code name for testing in case the
netns already exists.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/r/20220125081717.1260849-6-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use temp netns instead of hard code name for testing in case the netns
already exists.
Remove the hard code interface index when creating the veth interfaces.
Because when the system loads some virtual interface modules, e.g. tunnels.
the ifindex of 2 will be used and the cmd will fail.
As the netns has not created if checking environment failed. Trap the
clean up function after checking env.
Fixes: 8955c1a329 ("selftests/bpf/xdp_redirect_multi: Limit the tests in netns")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Link: https://lore.kernel.org/r/20220125081717.1260849-2-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
According to the LLVM commit (https://reviews.llvm.org/D72184),
sync_fetch_and_sub() is implemented as a negation followed by
sync_fetch_and_add(), so there will be no BPF_SUB op, thus just
remove it. BPF_SUB is also rejected by the verifier anyway.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/bpf/20220127083240.1425481-1-houtao1@huawei.com
Yonghong Song says:
====================
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.
Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code. In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.
To support encoding __user information in BTF, a new attribute
__attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
#define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF and such information
can be used by bpf verifier, perf/probe or other use cases.
Currently btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).
In the rest of patch set, Patch 1 added support of __user btf_type_tag
during compilation. Patch 2 added bpf verifier support to utilize __user
tag information to reject bpf programs not using proper helper to access
user memories. Patches 3-5 are for bpf selftests which demonstrate verifier
can reject direct user memory accesses.
[1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
[2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
[3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
[4] https://reviews.llvm.org/D111199
[5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
Changelog:
v2 -> v3:
- remove FLAG_DONTCARE enumerator and just use 0 as dontcare flag.
- explain how btf type_tag is encoded in btf type chain.
v1 -> v2:
- use MEM_USER flag for PTR_TO_BTF_ID reg type instead of a separate
field to encode __user tag.
- add a test with kernel function __sys_getsockname which has __user tagged
argument.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Clarify where the BTF_KIND_TYPE_TAG gets encoded in the type chain,
so applications and kernel can properly parse them.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154627.665163-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The uapi btf.h contains the following declaration:
struct btf_decl_tag {
__s32 component_idx;
};
The skeleton will also generate a struct with name
"btf_decl_tag" for bpf program btf_decl_tag.c.
Rename btf_decl_tag.c to test_btf_decl_tag.c so
the corresponding skeleton struct name becomes
"test_btf_decl_tag". This way, we could include
uapi btf.h in prog_tests/btf_tag.c.
There is no functionality change for this patch.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154611.656699-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF verifier supports direct memory access for BPF_PROG_TYPE_TRACING type
of bpf programs, e.g., a->b. If "a" is a pointer
pointing to kernel memory, bpf verifier will allow user to write
code in C like a->b and the verifier will translate it to a kernel
load properly. If "a" is a pointer to user memory, it is expected
that bpf developer should be bpf_probe_read_user() helper to
get the value a->b. Without utilizing BTF __user tagging information,
current verifier will assume that a->b is a kernel memory access
and this may generate incorrect result.
Now BTF contains __user information, it can check whether the
pointer points to a user memory or not. If it is, the verifier
can reject the program and force users to use bpf_probe_read_user()
helper explicitly.
In the future, we can easily extend btf_add_space for other
address space tagging, for example, rcu/percpu etc.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154606.654961-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The __user attribute is currently mainly used by sparse for type checking.
The attribute indicates whether a memory access is in user memory address
space or not. Such information is important during tracing kernel
internal functions or data structures as accessing user memory often
has different mechanisms compared to accessing kernel memory. For example,
the perf-probe needs explicit command line specification to indicate a
particular argument or string in user-space memory ([1], [2], [3]).
Currently, vmlinux BTF is available in kernel with many distributions.
If __user attribute information is available in vmlinux BTF, the explicit
user memory access information from users will not be necessary as
the kernel can figure it out by itself with vmlinux BTF.
Besides the above possible use for perf/probe, another use case is
for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf
programs, users can write direct code like
p->m1->m2
and "p" could be a function parameter. Without __user information in BTF,
the verifier will assume p->m1 accessing kernel memory and will generate
normal loads. Let us say "p" actually tagged with __user in the source
code. In such cases, p->m1 is actually accessing user memory and direct
load is not right and may produce incorrect result. For such cases,
bpf_probe_read_user() will be the correct way to read p->m1.
To support encoding __user information in BTF, a new attribute
__attribute__((btf_type_tag("<arbitrary_string>")))
is implemented in clang ([4]). For example, if we have
#define __user __attribute__((btf_type_tag("user")))
during kernel compilation, the attribute "user" information will
be preserved in dwarf. After pahole converting dwarf to BTF, __user
information will be available in vmlinux BTF.
The following is an example with latest upstream clang (clang14) and
pahole 1.23:
[$ ~] cat test.c
#define __user __attribute__((btf_type_tag("user")))
int foo(int __user *arg) {
return *arg;
}
[$ ~] clang -O2 -g -c test.c
[$ ~] pahole -JV test.o
...
[1] INT int size=4 nr_bits=32 encoding=SIGNED
[2] TYPE_TAG user type_id=1
[3] PTR (anon) type_id=2
[4] FUNC_PROTO (anon) return=1 args=(3 arg)
[5] FUNC foo type_id=4
[$ ~]
You can see for the function argument "int __user *arg", its type is
described as
PTR -> TYPE_TAG(user) -> INT
The kernel can use this information for bpf verification or other
use cases.
Current btf_type_tag is only supported in clang (>= clang14) and
pahole (>= 1.23). gcc support is also proposed and under development ([5]).
[1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2
[2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2
[3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2
[4] https://reviews.llvm.org/D111199
[5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Even though there is a static key protecting from overhead from
cgroup-bpf skb filtering when there is nothing attached, in many cases
it's not enough as registering a filter for one type will ruin the fast
path for all others. It's observed in production servers I've looked
at but also in laptops, where registration is done during init by
systemd or something else.
Add a per-socket fast path check guarding from such overhead. This
affects both receive and transmit paths of TCP, UDP and other
protocols. It showed ~1% tx/s improvement in small payload UDP
send benchmarks using a real NIC and in a server environment and the
number jumps to 2-3% for preemtible kernels.
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/d8c58857113185a764927a46f4b5a058d36d3ec3.1643292455.git.asml.silence@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When building selftests/bpf with clang
make -j LLVM=1
make -C tools/testing/selftests/bpf -j LLVM=1
I hit the following compilation error:
trace_helpers.c:152:9: error: variable 'found' is used uninitialized whenever 'while' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
trace_helpers.c:161:7: note: uninitialized use occurs here
if (!found)
^~~~~
trace_helpers.c:152:9: note: remove the condition if it is always true
while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1
trace_helpers.c:145:12: note: initialize the variable 'found' to silence this warning
bool found;
^
= false
It is possible that for sane /proc/self/maps we may never hit the above issue
in practice. But let us initialize variable 'found' properly to silence the
compilation error.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220127163726.1442032-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After commit 710ad98c36 ("veth: Do not record rx queue hint in veth_xmit"),
veth no longer receives traffic on the same queue as it was sent on. This
breaks the bpf_res test for the AF_XDP selftests as the socket tied to
queue 1 will not receive traffic anymore.
Modify the test so that two sockets are tied to queue id 0 using a shared
umem instead. When killing the first socket enter the second socket into
the xskmap so that traffic will flow to it. This will still test that the
resources are not cleaned up until after the second socket dies, without
having to rely on veth supporting rx_queue hints.
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220125082945.26179-1-magnus.karlsson@gmail.com
Maciej Fijalkowski says:
====================
Unfortunately, similar scalability issues that were addressed for XDP
processing in ice, exist for XDP in the zero-copy driver used by AF_XDP.
Let's resolve them in mostly the same way as we did in [0] and utilize
the Tx batching API from XSK buffer pool.
Move the array of Tx descriptors that is used with batching approach to
the XSK buffer pool. This means that future users of this API will not
have to carry the array on their own side, they can simple refer to
pool's tx_desc array.
We also improve the Rx side where we extend ice_alloc_rx_buf_zc() to
handle the ring wrap and bump Rx tail more frequently. By doing so,
Rx side is adjusted to Tx and it was needed for l2fwd scenario.
Here are the improvements of performance numbers that this set brings
measured with xdpsock app in busy poll mode for 1 and 2 core modes.
Both Tx and Rx rings were sized to 1k length and busy poll budget was
256.
----------------------------------------------------------------
| txonly: | l2fwd | rxdrop
----------------------------------------------------------------
1C | 149% | 14% | 3%
----------------------------------------------------------------
2C | 134% | 20% | 5%
----------------------------------------------------------------
Next step will be to introduce batching onto Rx side.
v5:
* collect acks
* fix typos
* correct comments showing cache line boundaries in ice_tx_ring struct
v4 - address Alexandr's review:
* new patch (2) for making sure ring size is pow(2) when attaching
xsk socket
* don't open code ALIGN_DOWN (patch 3)
* resign from storing tx_thresh in ice_tx_ring (patch 4)
* scope variables in a better way for Tx batching (patch 7)
v3:
* drop likely() that was wrapping napi_complete_done (patch 1)
* introduce configurable Tx threshold (patch 2)
* handle ring wrap on Rx side when allocating buffers (patch 3)
* respect NAPI budget when cleaning Tx descriptors in ZC (patch 6)
v2:
* introduce new patch that resets @next_dd and @next_rs fields
* use batching API for AF_XDP Tx on ice side
[0]: https://lore.kernel.org/bpf/20211015162908.145341-8-anthony.l.nguyen@intel.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>