Stephen Rothwell reported htmldocs warning when merging bpf-next tree,
which was the same warning as reported by kernel test robot:
Documentation/bpf/graph_ds_impl.rst:62: ERROR: Error in "code-block" directive:
maximum 1 argument(s) allowed, 12 supplied.
The error is due to Sphinx confuses node_data struct declaration with
code-block directive option.
Fix the warning by separating the code-block marker with node_data struct
declaration.
Link: https://lore.kernel.org/linux-next/20230215144505.4751d823@canb.auug.org.au/
Link: https://lore.kernel.org/linux-doc/202302151123.wUE5FYFx-lkp@intel.com/
Fixes: c31315c3aa ("bpf, documentation: Add graph documentation for non-owning refs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230215123253.41552-3-bagasdotme@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It is difficult to intuit the semantics of owning and non-owning
references from verifier code. In order to keep the high-level details
from being lost in the mailing list, this patch adds documentation
explaining semantics and details.
The target audience of doc added in this patch is folks working on BPF
internals, as there's focus on "what should the verifier do here". Via
reorganization or copy-and-paste, much of the content can probably be
repurposed for BPF program writer audience as well.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-9-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add explanation about use of "u64", "u32", etc. as
the type convention used in BPF documentation.
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230127014706.1005-1-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cong pointed out that there are some inconsistencies between the BPF design
QA and the lifecycle expectations documentation we added for kfuncs. Let's
update the QA file to be consistent with the kfunc docs, and add references
where it makes sense. Also document that modules may export kfuncs now.
v3:
- Grammar nit + ack from David
v2:
- Fix repeated word (s/defined defined/defined/)
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230208164143.286392-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use consistent names for the same field, e.g., 'dst' vs 'dst_reg'.
Previously a mix of terms were used for the same thing in various cases.
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230127224555.916-1-dthaler1968@googlemail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF kernel <-> kernel API stability has been discussed at length over
the last several weeks and months. Now that we've largely aligned over
kfuncs being the way forward, and BPF helpers being considered
functionally frozen, it's time to document the expectations for kfunc
lifecycles and stability so that everyone (BPF users, kfunc developers,
and maintainers) are all aligned, and have a crystal-clear understanding
of the expectations surrounding kfuncs.
To do that, this patch adds that documentation to the main kfuncs
documentation page via a new 'kfunc lifecycle expectations' section. The
patch describes how decisions are made in the kernel regarding whether
to include, keep, deprecate, or change / remove a kfunc. As described
very overtly in the patch itself, but likely worth highlighting here:
"kfunc stability" does not mean, nor ever will mean, "BPF APIs may block
development elsewhere in the kernel".
Rather, the intention and expectation is for kfuncs to be treated like
EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a
safe and valuable option for maintainers and kfunc developers to extend
the kernel, without tying anyone's hands, or imposing any kind of
restrictions on maintainers in the same way that UAPI changes do.
In addition to the 'kfunc lifecycle expectations' section, this patch
also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc
authors or maintainers can choose to add to kfuncs if and when they
decide to deprecate them. Note that as described in the patch itself, a
kfunc need not be deprecated before being changed or removed -- this
flag is simply provided as an available deprecation mechanism for those
that want to provide a deprecation story / timeline to their users.
When necessary, kfuncs may be changed or removed to accommodate changes
elsewhere in the kernel without any deprecation at all.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230203155727.793518-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that the __bpf_kfunc macro has been added to linux/btf.h, include a
blurb about it in the kfuncs.rst file. In order for the macro to
successfully render with .. kernel-doc, we'll also need to add it to the
c_id_attributes array.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230201173016.342758-3-void@manifault.com
When comparing BTF IDs for pointers being passed to kfunc arguments, the
verifier will allow pointer types that are equivalent according to the C
standard. For example, for:
struct bpf_cpumask {
cpumask_t cpumask;
refcount_t usage;
};
The verifier will allow a struct bpf_cpumask * to be passed to a kfunc
that takes a const struct cpumask * (cpumask_t is a typedef of struct
cpumask). The exception to this rule is if a type is suffixed with
___init, such as:
struct nf_conn___init {
struct nf_conn ct;
};
The verifier will _not_ allow a struct nf_conn___init * to be passed to
a kfunc that expects a struct nf_conn *. This patch documents this
behavior in the kfuncs documentation page.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-8-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A prior change defined a new BTF_TYPE_SAFE_NESTED macro in the verifier
which allows developers to specify when a pointee field in a struct type
should inherit its parent pointer's trusted status. This patch updates
the kfuncs documentation to specify this macro and how it can be used.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-7-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that we've added a series of new cpumask kfuncs, we should document
them so users can easily use them. This patch adds a new cpumasks.rst
file to document them.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-6-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix modulo zero, division by zero, overflow, and underflow. Also clarify how
a negative immediate value is used in unsigned division.
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230124001218.827-1-dthaler1968@googlemail.com
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY7X/4wAKCRDbK58LschI
g7gzAQCjKsLtAWg1OplW+B7pvEPwkQ8g3O1+PYWlToCUACTlzQD+PEMrqGnxB573
oQAk6I2yOTwLgvlHkrm+TIdKSouI4gs=
=2hUY
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2023-01-04
We've added 45 non-merge commits during the last 21 day(s) which contain
a total of 50 files changed, 1454 insertions(+), 375 deletions(-).
The main changes are:
1) Fixes, improvements and refactoring of parts of BPF verifier's
state equivalence checks, from Andrii Nakryiko.
2) Fix a few corner cases in libbpf's BTF-to-C converter in particular
around padding handling and enums, also from Andrii Nakryiko.
3) Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better
support decap on GRE tunnel devices not operating in collect metadata,
from Christian Ehrig.
4) Improve x86 JIT's codegen for PROBE_MEM runtime error checks,
from Dave Marchevsky.
5) Remove the need for trace_printk_lock for bpf_trace_printk
and bpf_trace_vprintk helpers, from Jiri Olsa.
6) Add proper documentation for BPF_MAP_TYPE_SOCK{MAP,HASH} maps,
from Maryam Tahhan.
7) Improvements in libbpf's btf_parse_elf error handling, from Changbin Du.
8) Bigger batch of improvements to BPF tracing code samples,
from Daniel T. Lee.
9) Add LoongArch support to libbpf's bpf_tracing helper header,
from Hengqi Chen.
10) Fix a libbpf compiler warning in perf_event_open_probe on arm32,
from Khem Raj.
11) Optimize bpf_local_storage_elem by removing 56 bytes of padding,
from Martin KaFai Lau.
12) Use pkg-config to locate libelf for resolve_btfids build,
from Shen Jiamin.
13) Various libbpf improvements around API documentation and errno
handling, from Xin Liu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
libbpf: Return -ENODATA for missing btf section
libbpf: Add LoongArch support to bpf_tracing.h
libbpf: Restore errno after pr_warn.
libbpf: Added the description of some API functions
libbpf: Fix invalid return address register in s390
samples/bpf: Use BPF_KSYSCALL macro in syscall tracing programs
samples/bpf: Fix tracex2 by using BPF_KSYSCALL macro
samples/bpf: Change _kern suffix to .bpf with syscall tracing program
samples/bpf: Use vmlinux.h instead of implicit headers in syscall tracing program
samples/bpf: Use kyscall instead of kprobe in syscall tracing program
bpf: rename list_head -> graph_root in field info types
libbpf: fix errno is overwritten after being closed.
bpf: fix regs_exact() logic in regsafe() to remap IDs correctly
bpf: perform byte-by-byte comparison only when necessary in regsafe()
bpf: reject non-exact register type matches in regsafe()
bpf: generalize MAYBE_NULL vs non-MAYBE_NULL rule
bpf: reorganize struct bpf_reg_state fields
bpf: teach refsafe() to take into account ID remapping
bpf: Remove unused field initialization in bpf's ctl_table
selftests/bpf: Add jit probe_mem corner case tests to s390x denylist
...
====================
Link: https://lore.kernel.org/r/20230105000926.31350-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Improve the grammar of the function descriptions and highlight
that the key is a socket fd.
Fixes: f3212ad5b7 ("docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE")
Reported-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20221212101600.56026-1-donald.hunter@gmail.com
Add documentation for the BPF_MAP_TYPE_SK_STORAGE including
kernel version introduced, usage and examples.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221209112401.69319-1-donald.hunter@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and
bpf_cgroup_ancestor(), are kfuncs that were recently added to
kernel/bpf/helpers.c. These are "core" kfuncs in that they're available
for use in any tracepoint or struct_ops BPF program. Though they have no
ABI stability guarantees, we should still document them. This patch adds
a struct cgroup * subsection to the Core kfuncs section which describes
each of these kfuncs.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221207204911.873646-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are
kfuncs that were recently added to kernel/bpf/helpers.c. These are
"core" kfuncs in that they're available for use for any tracepoint or
struct_ops BPF program. Though they have no ABI stability guarantees, we
should still document them. This patch adds a new Core kfuncs section to
the BPF kfuncs doc, and adds entries for all of these task kfuncs.
Note that bpf_task_kptr_get() is not documented, as it still returns
NULL while we're working to resolve how it can use RCU to ensure struct
task_struct * lifetime.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221207204911.873646-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Document that describes how BPF iterators work, how to use iterators,
and how to pass parameters in BPF iterators.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Sreevani Sreejith <psreep@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221202221710.320810-2-ssreevani@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Refer to description of BPF_XOR, dst_reg should be used but not src_reg
in the examples.
Fixes: be3193cded ("bpf, docs: Add subsections for ALU and JMP instructions")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221129134558.2757043-1-zhengyejian1@huawei.com
Add documentation for BPF_MAP_TYPE_XSKMAP including kernel version introduced,
usage and examples.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221123090043.83945-1-mtahhan@redhat.com
Commit c64779e24e88("selftests/bpf: Merge most of test_btf into test_progs")
renamed the BTF selftest from 'test_btf.c' to 'prog_tests/btf.c'.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/tencent_1FA6904156E8E599CAE4ABDBE80F22830106@qq.com
Sphinx version >=3.1 warns about duplicate function declarations in the
DEVMAP documentation. This is because the function name is the same for
kernel and user space BPF progs but the parameters and return types
they take is what differs. This patch moves from using the ``c:function::``
directive to using the ``code-block:: c`` directive. The patches also fix
the indentation for the text associated with the "new" code block delcarations.
The missing support of c:namespace-push:: and c:namespace-pop:: directives by
helper scripts for kernel documentation prevents using the ``c:function::``
directive with proper namespacing.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221123092321.88558-3-mtahhan@redhat.com
Sphinx version >=3.1 warns about duplicate function declarations in the
CPUMAP documentation. This is because the function name is the same for
kernel and user space BPF progs but the parameters and return types
they take is what differs. This patch moves from using the ``c:function::``
directive to using the ``code-block:: c`` directive. The patches also fix
the indentation for the text associated with the "new" code block delcarations.
The missing support of c:namespace-push:: and c:namespace-pop:: directives by
helper scripts for kernel documentation prevents using the ``c:function::``
directive with proper namespacing.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221123092321.88558-2-mtahhan@redhat.com
Extend the libbpf documentation with a table of program types,
attach points and ELF section names.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221121121734.98329-1-donald.hunter@gmail.com
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.
The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.
There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:
SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
struct task_struct *task,
u64 clone_flags)
{
struct task_struct *acquired, *nested;
nested = task->last_wakee;
/* Would not be rejected by the verifier. */
acquired = bpf_task_acquire(nested);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.
A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add documentation for BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
including kernel version introduced, usage and examples.
Add documentation that describes XDP_REDIRECT.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221115144921.165483-1-mtahhan@redhat.com
Allow passing known constant scalars as arguments to kfuncs that do not
represent a size parameter. We use mark_chain_precision for the constant
scalar argument to mark it precise. This makes the search pruning
optimization of verifier more conservative for such kfunc calls, and
each non-distinct argument is considered unequivalent.
We will use this support to then expose a bpf_obj_new function where it
takes the local type ID of a type in program BTF, and returns a
PTR_TO_BTF_ID | MEM_ALLOC to the local type, and allows programs to
allocate their own objects.
Each type ID resolves to a distinct type with a possibly distinct size,
hence the type ID constant matters in terms of program safety and its
precision needs to be checked between old and cur states inside regsafe.
The use of mark_chain_precision enables this.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 26a9b433cf ("bpf/docs: Document how to run CI without patch
submission") caused a warning to be generated when compiling the
documentation:
> bpf_devel_QA.rst:55: WARNING: Unexpected indentation.
> bpf_devel_QA.rst:56: WARNING: Block quote ends without a blank line
This change fixes the problem by inserting the required blank lines.
Fixes: 26a9b433cf ("bpf/docs: Document how to run CI without patch submission")
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/bpf/20221116174358.2744613-1-deso@posteo.net
This change documents the process for running the BPF CI before
submitting a patch to the upstream mailing list, similar to what happens
if a patch is send to bpf@vger.kernel.org: it builds kernel and
selftests and runs the latter on different architecture (but it notably
does not cover stylistic checks such as cover letter verification).
Running BPF CI this way can help achieve better test coverage ahead of
patch submission than merely running locally (say, using
tools/testing/selftests/bpf/vmtest.sh), as additional architectures may
be covered as well.
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221114211501.2068684-1-deso@posteo.net
We don't want to commit to a specific name for these. Simply call them
allocated objects coming from bpf_obj_new, which is completely clear in
itself.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixup bpf_map_update_elem() declaration to use a single line.
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Akira Yokosawa <akiyks@gmail.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221113103327.3287482-1-mtahhan@redhat.com
Add documentation for the BPF_MAP_TYPE_ARRAY including kernel version
introduced, usage and examples. Also document BPF_MAP_TYPE_PERCPU_ARRAY
which is similar.
Co-developed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Maryam Tahhan <mtahhan@redhat.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/bpf/20221109174604.31673-2-donald.hunter@gmail.com
Add documentation for BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK,
including usage and examples.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221108093314.44851-1-donald.hunter@gmail.com
Add documentation for the ARRAY_OF_MAPS and HASH_OF_MAPS map types,
including usage and examples.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221108102215.47297-1-donald.hunter@gmail.com
Add documentation for BPF_MAP_TYPE_CPUMAP including
kernel version introduced, usage and examples.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107165207.2682075-2-mtahhan@redhat.com
Sphinx reported unknown target warning:
Documentation/bpf/bpf_design_QA.rst:329: WARNING: Unknown target name: "bpf".
The warning is caused by BPF type name prefix ("bpf_") which is written
without escaping the trailing underscore.
Escape the underscore to fix the warning. While at it, wrap the
containing paragraph in less than 80 characters.
Fixes: 9805af8d8a ("bpf: Document UAPI details for special BPF types")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221104123913.50610-1-bagasdotme@gmail.com
The kernel recognizes some special BPF types in map values or local
kptrs. Document that only bpf_spin_lock and bpf_timer will preserve
backwards compatibility, and kptr will preserve backwards compatibility
for the operations on the pointer, not the types supported for such
kptrs.
For local kptrs, document that there are no stability guarantees at all.
Finally, document that 'bpf_' namespace is reserved for adding future
special fields, hence BPF programs must not declare types with such
names in their programs and still expect backwards compatibility.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221103191013.1236066-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add some descriptions and examples for BPF_MAP_TYPE_CGRP_STORAGE.
Also illustate the major difference between BPF_MAP_TYPE_CGRP_STORAGE
and BPF_MAP_TYPE_CGROUP_STORAGE and recommend to use
BPF_MAP_TYPE_CGRP_STORAGE instead of BPF_MAP_TYPE_CGROUP_STORAGE
in the end.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221026042922.676383-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a more complete introduction, with links to man pages.
Move toctree of map types above usage notes.
Format usage notes to improve readability.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20221012152715.25073-1-donald.hunter@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Core
----
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO.
This significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF
---
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions.
Expose crash_kexec() to allow BPF to trigger a kernel dump.
Use CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs.
Only structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols
---------
- WiFi: more Extremely High Throughput (EHT) and Multi-Link
Operation (MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state
and RST packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API
----------
- Allow selecting the conduit interface used by each port
in DSA switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting
per traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one
of the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much
as possible. It's one of those magic knobs which seemed like
a good idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers
----------------------
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers
-------
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay
window offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S
Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY
fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe
IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j
TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52
bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs
HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89
Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B
AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q
9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf
LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE
ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8=
=Gsio
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF:
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support"
* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
eth: pse: add missing static inlines
once: rename _SLOW to _SLEEPABLE
net: pse-pd: add regulator based PSE driver
dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
ethtool: add interface to interact with Ethernet Power Equipment
net: mdiobus: search for PSE nodes by parsing PHY nodes.
net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
net: add framework to support Ethernet PSE and PDs devices
dt-bindings: net: phy: add PoDL PSE property
net: marvell: prestera: Propagate nh state from hw to kernel
net: marvell: prestera: Add neighbour cache accounting
net: marvell: prestera: add stub handler neighbour events
net: marvell: prestera: Add heplers to interact with fib_notifier_info
net: marvell: prestera: Add length macros for prestera_ip_addr
net: marvell: prestera: add delayed wq and flush wq on deinit
net: marvell: prestera: Add strict cleanup of fib arbiter
net: marvell: prestera: Add cleanup of allocated fib_nodes
net: marvell: prestera: Add router nexthops ABI
eth: octeon: fix build after netif_napi_add() changes
net/mlx5: E-Switch, Return EBUSY if can't get mode lock
...
Sphinx reported warnings on missing implementation notes documentations in the
table of contents:
Documentation/bpf/clang-notes.rst: WARNING: document isn't included in any toctree
Documentation/bpf/linux-notes.rst: WARNING: document isn't included in any toctree
Add these documentations to the table of contents (index.rst) of BPF
documentation to fix the warnings.
Link: https://lore.kernel.org/linux-doc/202210020749.yfgDZbRL-lkp@intel.com/
Fixes: 6c7aaffb24 ("bpf, docs: Move Clang notes to a separate file")
Fixes: 6166da0a02 ("bpf, docs: Move legacy packet instructions to a separate file")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20221002032022.24693-1-bagasdotme@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>