mirror of
https://github.com/torvalds/linux.git
synced 2024-11-26 22:21:42 +00:00
16c294a6aa
BPF kernel <-> kernel API stability has been discussed at length over the last several weeks and months. Now that we've largely aligned over kfuncs being the way forward, and BPF helpers being considered functionally frozen, it's time to document the expectations for kfunc lifecycles and stability so that everyone (BPF users, kfunc developers, and maintainers) are all aligned, and have a crystal-clear understanding of the expectations surrounding kfuncs. To do that, this patch adds that documentation to the main kfuncs documentation page via a new 'kfunc lifecycle expectations' section. The patch describes how decisions are made in the kernel regarding whether to include, keep, deprecate, or change / remove a kfunc. As described very overtly in the patch itself, but likely worth highlighting here: "kfunc stability" does not mean, nor ever will mean, "BPF APIs may block development elsewhere in the kernel". Rather, the intention and expectation is for kfuncs to be treated like EXPORT_SYMBOL_GPL symbols in the kernel. The goal is for kfuncs to be a safe and valuable option for maintainers and kfunc developers to extend the kernel, without tying anyone's hands, or imposing any kind of restrictions on maintainers in the same way that UAPI changes do. In addition to the 'kfunc lifecycle expectations' section, this patch also adds documentation for a new KF_DEPRECATED kfunc flag which kfunc authors or maintainers can choose to add to kfuncs if and when they decide to deprecate them. Note that as described in the patch itself, a kfunc need not be deprecated before being changed or removed -- this flag is simply provided as an available deprecation mechanism for those that want to provide a deprecation story / timeline to their users. When necessary, kfuncs may be changed or removed to accommodate changes elsewhere in the kernel without any deprecation at all. Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230203155727.793518-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
626 lines
24 KiB
ReStructuredText
626 lines
24 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
.. _kfuncs-header-label:
|
|
|
|
=============================
|
|
BPF Kernel Functions (kfuncs)
|
|
=============================
|
|
|
|
1. Introduction
|
|
===============
|
|
|
|
BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
|
|
kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
|
|
kfuncs do not have a stable interface and can change from one kernel release to
|
|
another. Hence, BPF programs need to be updated in response to changes in the
|
|
kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
|
|
|
|
2. Defining a kfunc
|
|
===================
|
|
|
|
There are two ways to expose a kernel function to BPF programs, either make an
|
|
existing function in the kernel visible, or add a new wrapper for BPF. In both
|
|
cases, care must be taken that BPF program can only call such function in a
|
|
valid context. To enforce this, visibility of a kfunc can be per program type.
|
|
|
|
If you are not creating a BPF wrapper for existing kernel function, skip ahead
|
|
to :ref:`BPF_kfunc_nodef`.
|
|
|
|
2.1 Creating a wrapper kfunc
|
|
----------------------------
|
|
|
|
When defining a wrapper kfunc, the wrapper function should have extern linkage.
|
|
This prevents the compiler from optimizing away dead code, as this wrapper kfunc
|
|
is not invoked anywhere in the kernel itself. It is not necessary to provide a
|
|
prototype in a header for the wrapper kfunc.
|
|
|
|
An example is given below::
|
|
|
|
/* Disables missing prototype warnings */
|
|
__diag_push();
|
|
__diag_ignore_all("-Wmissing-prototypes",
|
|
"Global kfuncs as their definitions will be in BTF");
|
|
|
|
__bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
|
|
{
|
|
return find_get_task_by_vpid(nr);
|
|
}
|
|
|
|
__diag_pop();
|
|
|
|
A wrapper kfunc is often needed when we need to annotate parameters of the
|
|
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
|
|
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
|
|
|
|
2.2 Annotating kfunc parameters
|
|
-------------------------------
|
|
|
|
Similar to BPF helpers, there is sometime need for additional context required
|
|
by the verifier to make the usage of kernel functions safer and more useful.
|
|
Hence, we can annotate a parameter by suffixing the name of the argument of the
|
|
kfunc with a __tag, where tag may be one of the supported annotations.
|
|
|
|
2.2.1 __sz Annotation
|
|
---------------------
|
|
|
|
This annotation is used to indicate a memory and size pair in the argument list.
|
|
An example is given below::
|
|
|
|
__bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
|
|
{
|
|
...
|
|
}
|
|
|
|
Here, the verifier will treat first argument as a PTR_TO_MEM, and second
|
|
argument as its size. By default, without __sz annotation, the size of the type
|
|
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
|
pointer.
|
|
|
|
2.2.2 __k Annotation
|
|
--------------------
|
|
|
|
This annotation is only understood for scalar arguments, where it indicates that
|
|
the verifier must check the scalar argument to be a known constant, which does
|
|
not indicate a size parameter, and the value of the constant is relevant to the
|
|
safety of the program.
|
|
|
|
An example is given below::
|
|
|
|
__bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
|
|
{
|
|
...
|
|
}
|
|
|
|
Here, bpf_obj_new uses local_type_id argument to find out the size of that type
|
|
ID in program's BTF and return a sized pointer to it. Each type ID will have a
|
|
distinct size, hence it is crucial to treat each such call as distinct when
|
|
values don't match during verifier state pruning checks.
|
|
|
|
Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
|
|
size parameter, and the value of the constant matters for program safety, __k
|
|
suffix should be used.
|
|
|
|
.. _BPF_kfunc_nodef:
|
|
|
|
2.3 Using an existing kernel function
|
|
-------------------------------------
|
|
|
|
When an existing function in the kernel is fit for consumption by BPF programs,
|
|
it can be directly registered with the BPF subsystem. However, care must still
|
|
be taken to review the context in which it will be invoked by the BPF program
|
|
and whether it is safe to do so.
|
|
|
|
2.4 Annotating kfuncs
|
|
---------------------
|
|
|
|
In addition to kfuncs' arguments, verifier may need more information about the
|
|
type of kfunc(s) being registered with the BPF subsystem. To do so, we define
|
|
flags on a set of kfuncs as follows::
|
|
|
|
BTF_SET8_START(bpf_task_set)
|
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
|
BTF_SET8_END(bpf_task_set)
|
|
|
|
This set encodes the BTF ID of each kfunc listed above, and encodes the flags
|
|
along with it. Ofcourse, it is also allowed to specify no flags.
|
|
|
|
kfunc definitions should also always be annotated with the ``__bpf_kfunc``
|
|
macro. This prevents issues such as the compiler inlining the kfunc if it's a
|
|
static kernel function, or the function being elided in an LTO build as it's
|
|
not used in the rest of the kernel. Developers should not manually add
|
|
annotations to their kfunc to prevent these issues. If an annotation is
|
|
required to prevent such an issue with your kfunc, it is a bug and should be
|
|
added to the definition of the macro so that other kfuncs are similarly
|
|
protected. An example is given below::
|
|
|
|
__bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
|
|
{
|
|
...
|
|
}
|
|
|
|
2.4.1 KF_ACQUIRE flag
|
|
---------------------
|
|
|
|
The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
|
|
refcounted object. The verifier will then ensure that the pointer to the object
|
|
is eventually released using a release kfunc, or transferred to a map using a
|
|
referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
|
|
loading of the BPF program until no lingering references remain in all possible
|
|
explored states of the program.
|
|
|
|
2.4.2 KF_RET_NULL flag
|
|
----------------------
|
|
|
|
The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
|
|
may be NULL. Hence, it forces the user to do a NULL check on the pointer
|
|
returned from the kfunc before making use of it (dereferencing or passing to
|
|
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
|
|
both are orthogonal to each other.
|
|
|
|
2.4.3 KF_RELEASE flag
|
|
---------------------
|
|
|
|
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
|
|
passed in to it. There can be only one referenced pointer that can be passed in.
|
|
All copies of the pointer being released are invalidated as a result of invoking
|
|
kfunc with this flag.
|
|
|
|
2.4.4 KF_KPTR_GET flag
|
|
----------------------
|
|
|
|
The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
|
|
as a pointer to kptr, safely increments the refcount of the object it points to,
|
|
and returns a reference to the user. The rest of the arguments may be normal
|
|
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
|
|
KF_ACQUIRE and KF_RET_NULL flags.
|
|
|
|
2.4.5 KF_TRUSTED_ARGS flag
|
|
--------------------------
|
|
|
|
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
|
indicates that the all pointer arguments are valid, and that all pointers to
|
|
BTF objects have been passed in their unmodified form (that is, at a zero
|
|
offset, and without having been obtained from walking another pointer, with one
|
|
exception described below).
|
|
|
|
There are two types of pointers to kernel objects which are considered "valid":
|
|
|
|
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
|
|
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
|
|
|
|
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
|
|
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
|
|
|
|
The definition of "valid" pointers is subject to change at any time, and has
|
|
absolutely no ABI stability guarantees.
|
|
|
|
As mentioned above, a nested pointer obtained from walking a trusted pointer is
|
|
no longer trusted, with one exception. If a struct type has a field that is
|
|
guaranteed to be valid as long as its parent pointer is trusted, the
|
|
``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
|
|
follows:
|
|
|
|
.. code-block:: c
|
|
|
|
BTF_TYPE_SAFE_NESTED(struct task_struct) {
|
|
const cpumask_t *cpus_ptr;
|
|
};
|
|
|
|
In other words, you must:
|
|
|
|
1. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.
|
|
|
|
2. Specify the type and name of the trusted nested field. This field must match
|
|
the field in the original type definition exactly.
|
|
|
|
2.4.6 KF_SLEEPABLE flag
|
|
-----------------------
|
|
|
|
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
|
|
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
|
|
|
|
2.4.7 KF_DESTRUCTIVE flag
|
|
--------------------------
|
|
|
|
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
|
|
destructive to the system. For example such a call can result in system
|
|
rebooting or panicking. Due to this additional restrictions apply to these
|
|
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
|
added later.
|
|
|
|
2.4.8 KF_RCU flag
|
|
-----------------
|
|
|
|
The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
|
|
When used together with KF_ACQUIRE, it indicates the kfunc should have a
|
|
single argument which must be a trusted argument or a MEM_RCU pointer.
|
|
The argument may have reference count of 0 and the kfunc must take this
|
|
into consideration.
|
|
|
|
.. _KF_deprecated_flag:
|
|
|
|
2.4.9 KF_DEPRECATED flag
|
|
------------------------
|
|
|
|
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
|
|
changed or removed in a subsequent kernel release. A kfunc that is
|
|
marked with KF_DEPRECATED should also have any relevant information
|
|
captured in its kernel doc. Such information typically includes the
|
|
kfunc's expected remaining lifespan, a recommendation for new
|
|
functionality that can replace it if any is available, and possibly a
|
|
rationale for why it is being removed.
|
|
|
|
Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
|
|
supported and have its KF_DEPRECATED flag removed, it is likely to be far more
|
|
difficult to remove a KF_DEPRECATED flag after it's been added than it is to
|
|
prevent it from being added in the first place. As described in
|
|
:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
|
|
encouraged to make their use-cases known as early as possible, and participate
|
|
in upstream discussions regarding whether to keep, change, deprecate, or remove
|
|
those kfuncs if and when such discussions occur.
|
|
|
|
2.5 Registering the kfuncs
|
|
--------------------------
|
|
|
|
Once the kfunc is prepared for use, the final step to making it visible is
|
|
registering it with the BPF subsystem. Registration is done per BPF program
|
|
type. An example is shown below::
|
|
|
|
BTF_SET8_START(bpf_task_set)
|
|
BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
|
|
BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
|
|
BTF_SET8_END(bpf_task_set)
|
|
|
|
static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
|
|
.owner = THIS_MODULE,
|
|
.set = &bpf_task_set,
|
|
};
|
|
|
|
static int init_subsystem(void)
|
|
{
|
|
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
|
}
|
|
late_initcall(init_subsystem);
|
|
|
|
2.6 Specifying no-cast aliases with ___init
|
|
--------------------------------------------
|
|
|
|
The verifier will always enforce that the BTF type of a pointer passed to a
|
|
kfunc by a BPF program, matches the type of pointer specified in the kfunc
|
|
definition. The verifier, does, however, allow types that are equivalent
|
|
according to the C standard to be passed to the same kfunc arg, even if their
|
|
BTF_IDs differ.
|
|
|
|
For example, for the following type definition:
|
|
|
|
.. code-block:: c
|
|
|
|
struct bpf_cpumask {
|
|
cpumask_t cpumask;
|
|
refcount_t usage;
|
|
};
|
|
|
|
The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
|
|
taking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For
|
|
instance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed
|
|
to bpf_cpumask_test_cpu().
|
|
|
|
In some cases, this type-aliasing behavior is not desired. ``struct
|
|
nf_conn___init`` is one such example:
|
|
|
|
.. code-block:: c
|
|
|
|
struct nf_conn___init {
|
|
struct nf_conn ct;
|
|
};
|
|
|
|
The C standard would consider these types to be equivalent, but it would not
|
|
always be safe to pass either type to a trusted kfunc. ``struct
|
|
nf_conn___init`` represents an allocated ``struct nf_conn`` object that has
|
|
*not yet been initialized*, so it would therefore be unsafe to pass a ``struct
|
|
nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
|
|
nf_conn *`` (e.g. ``bpf_ct_change_timeout()``).
|
|
|
|
In order to accommodate such requirements, the verifier will enforce strict
|
|
PTR_TO_BTF_ID type matching if two types have the exact same name, with one
|
|
being suffixed with ``___init``.
|
|
|
|
.. _BPF_kfunc_lifecycle_expectations:
|
|
|
|
3. kfunc lifecycle expectations
|
|
===============================
|
|
|
|
kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
|
|
strict stability restrictions associated with kernel <-> user UAPIs. This means
|
|
they can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be
|
|
modified or removed by a maintainer of the subsystem they're defined in when
|
|
it's deemed necessary.
|
|
|
|
Like any other change to the kernel, maintainers will not change or remove a
|
|
kfunc without having a reasonable justification. Whether or not they'll choose
|
|
to change a kfunc will ultimately depend on a variety of factors, such as how
|
|
widely used the kfunc is, how long the kfunc has been in the kernel, whether an
|
|
alternative kfunc exists, what the norm is in terms of stability for the
|
|
subsystem in question, and of course what the technical cost is of continuing
|
|
to support the kfunc.
|
|
|
|
There are several implications of this:
|
|
|
|
a) kfuncs that are widely used or have been in the kernel for a long time will
|
|
be more difficult to justify being changed or removed by a maintainer. In
|
|
other words, kfuncs that are known to have a lot of users and provide
|
|
significant value provide stronger incentives for maintainers to invest the
|
|
time and complexity in supporting them. It is therefore important for
|
|
developers that are using kfuncs in their BPF programs to communicate and
|
|
explain how and why those kfuncs are being used, and to participate in
|
|
discussions regarding those kfuncs when they occur upstream.
|
|
|
|
b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs
|
|
that call kfuncs are generally not part of the kernel tree. This means that
|
|
refactoring cannot typically change callers in-place when a kfunc changes,
|
|
as is done for e.g. an upstreamed driver being updated in place when a
|
|
kernel symbol is changed.
|
|
|
|
Unlike with regular kernel symbols, this is expected behavior for BPF
|
|
symbols, and out-of-tree BPF programs that use kfuncs should be considered
|
|
relevant to discussions and decisions around modifying and removing those
|
|
kfuncs. The BPF community will take an active role in participating in
|
|
upstream discussions when necessary to ensure that the perspectives of such
|
|
users are taken into account.
|
|
|
|
c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
|
|
will not ever hard-block a change in the kernel purely for stability
|
|
reasons. That being said, kfuncs are features that are meant to solve
|
|
problems and provide value to users. The decision of whether to change or
|
|
remove a kfunc is a multivariate technical decision that is made on a
|
|
case-by-case basis, and which is informed by data points such as those
|
|
mentioned above. It is expected that a kfunc being removed or changed with
|
|
no warning will not be a common occurrence or take place without sound
|
|
justification, but it is a possibility that must be accepted if one is to
|
|
use kfuncs.
|
|
|
|
3.1 kfunc deprecation
|
|
---------------------
|
|
|
|
As described above, while sometimes a maintainer may find that a kfunc must be
|
|
changed or removed immediately to accommodate some changes in their subsystem,
|
|
usually kfuncs will be able to accommodate a longer and more measured
|
|
deprecation process. For example, if a new kfunc comes along which provides
|
|
superior functionality to an existing kfunc, the existing kfunc may be
|
|
deprecated for some period of time to allow users to migrate their BPF programs
|
|
to use the new one. Or, if a kfunc has no known users, a decision may be made
|
|
to remove the kfunc (without providing an alternative API) after some
|
|
deprecation period so as to provide users with a window to notify the kfunc
|
|
maintainer if it turns out that the kfunc is actually being used.
|
|
|
|
It's expected that the common case will be that kfuncs will go through a
|
|
deprecation period rather than being changed or removed without warning. As
|
|
described in :ref:`KF_deprecated_flag`, the kfunc framework provides the
|
|
KF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been
|
|
deprecated. Once a kfunc has been marked with KF_DEPRECATED, the following
|
|
procedure is followed for removal:
|
|
|
|
1. Any relevant information for deprecated kfuncs is documented in the kfunc's
|
|
kernel docs. This documentation will typically include the kfunc's expected
|
|
remaining lifespan, a recommendation for new functionality that can replace
|
|
the usage of the deprecated function (or an explanation as to why no such
|
|
replacement exists), etc.
|
|
|
|
2. The deprecated kfunc is kept in the kernel for some period of time after it
|
|
was first marked as deprecated. This time period will be chosen on a
|
|
case-by-case basis, and will typically depend on how widespread the use of
|
|
the kfunc is, how long it has been in the kernel, and how hard it is to move
|
|
to alternatives. This deprecation time period is "best effort", and as
|
|
described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
|
|
sometimes dictate that the kfunc be removed before the full intended
|
|
deprecation period has elapsed.
|
|
|
|
3. After the deprecation period the kfunc will be removed. At this point, BPF
|
|
programs calling the kfunc will be rejected by the verifier.
|
|
|
|
4. Core kfuncs
|
|
==============
|
|
|
|
The BPF subsystem provides a number of "core" kfuncs that are potentially
|
|
applicable to a wide variety of different possible use cases and programs.
|
|
Those kfuncs are documented here.
|
|
|
|
4.1 struct task_struct * kfuncs
|
|
-------------------------------
|
|
|
|
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
|
|
used as kptrs:
|
|
|
|
.. kernel-doc:: kernel/bpf/helpers.c
|
|
:identifiers: bpf_task_acquire bpf_task_release
|
|
|
|
These kfuncs are useful when you want to acquire or release a reference to a
|
|
``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
|
|
struct_ops callback arg. For example:
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* A trivial example tracepoint program that shows how to
|
|
* acquire and release a struct task_struct * pointer.
|
|
*/
|
|
SEC("tp_btf/task_newtask")
|
|
int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
|
|
{
|
|
struct task_struct *acquired;
|
|
|
|
acquired = bpf_task_acquire(task);
|
|
|
|
/*
|
|
* In a typical program you'd do something like store
|
|
* the task in a map, and the map will automatically
|
|
* release it later. Here, we release it manually.
|
|
*/
|
|
bpf_task_release(acquired);
|
|
return 0;
|
|
}
|
|
|
|
----
|
|
|
|
A BPF program can also look up a task from a pid. This can be useful if the
|
|
caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
|
|
it can acquire a reference on with bpf_task_acquire().
|
|
|
|
.. kernel-doc:: kernel/bpf/helpers.c
|
|
:identifiers: bpf_task_from_pid
|
|
|
|
Here is an example of it being used:
|
|
|
|
.. code-block:: c
|
|
|
|
SEC("tp_btf/task_newtask")
|
|
int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
|
|
{
|
|
struct task_struct *lookup;
|
|
|
|
lookup = bpf_task_from_pid(task->pid);
|
|
if (!lookup)
|
|
/* A task should always be found, as %task is a tracepoint arg. */
|
|
return -ENOENT;
|
|
|
|
if (lookup->pid != task->pid) {
|
|
/* bpf_task_from_pid() looks up the task via its
|
|
* globally-unique pid from the init_pid_ns. Thus,
|
|
* the pid of the lookup task should always be the
|
|
* same as the input task.
|
|
*/
|
|
bpf_task_release(lookup);
|
|
return -EINVAL;
|
|
}
|
|
|
|
/* bpf_task_from_pid() returns an acquired reference,
|
|
* so it must be dropped before returning from the
|
|
* tracepoint handler.
|
|
*/
|
|
bpf_task_release(lookup);
|
|
return 0;
|
|
}
|
|
|
|
4.2 struct cgroup * kfuncs
|
|
--------------------------
|
|
|
|
``struct cgroup *`` objects also have acquire and release functions:
|
|
|
|
.. kernel-doc:: kernel/bpf/helpers.c
|
|
:identifiers: bpf_cgroup_acquire bpf_cgroup_release
|
|
|
|
These kfuncs are used in exactly the same manner as bpf_task_acquire() and
|
|
bpf_task_release() respectively, so we won't provide examples for them.
|
|
|
|
----
|
|
|
|
You may also acquire a reference to a ``struct cgroup`` kptr that's already
|
|
stored in a map using bpf_cgroup_kptr_get():
|
|
|
|
.. kernel-doc:: kernel/bpf/helpers.c
|
|
:identifiers: bpf_cgroup_kptr_get
|
|
|
|
Here's an example of how it can be used:
|
|
|
|
.. code-block:: c
|
|
|
|
/* struct containing the struct task_struct kptr which is actually stored in the map. */
|
|
struct __cgroups_kfunc_map_value {
|
|
struct cgroup __kptr_ref * cgroup;
|
|
};
|
|
|
|
/* The map containing struct __cgroups_kfunc_map_value entries. */
|
|
struct {
|
|
__uint(type, BPF_MAP_TYPE_HASH);
|
|
__type(key, int);
|
|
__type(value, struct __cgroups_kfunc_map_value);
|
|
__uint(max_entries, 1);
|
|
} __cgroups_kfunc_map SEC(".maps");
|
|
|
|
/* ... */
|
|
|
|
/**
|
|
* A simple example tracepoint program showing how a
|
|
* struct cgroup kptr that is stored in a map can
|
|
* be acquired using the bpf_cgroup_kptr_get() kfunc.
|
|
*/
|
|
SEC("tp_btf/cgroup_mkdir")
|
|
int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
|
|
{
|
|
struct cgroup *kptr;
|
|
struct __cgroups_kfunc_map_value *v;
|
|
s32 id = cgrp->self.id;
|
|
|
|
/* Assume a cgroup kptr was previously stored in the map. */
|
|
v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
|
|
if (!v)
|
|
return -ENOENT;
|
|
|
|
/* Acquire a reference to the cgroup kptr that's already stored in the map. */
|
|
kptr = bpf_cgroup_kptr_get(&v->cgroup);
|
|
if (!kptr)
|
|
/* If no cgroup was present in the map, it's because
|
|
* we're racing with another CPU that removed it with
|
|
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
|
|
* above, and our call to bpf_cgroup_kptr_get().
|
|
* bpf_cgroup_kptr_get() internally safely handles this
|
|
* race, and will return NULL if the task is no longer
|
|
* present in the map by the time we invoke the kfunc.
|
|
*/
|
|
return -EBUSY;
|
|
|
|
/* Free the reference we just took above. Note that the
|
|
* original struct cgroup kptr is still in the map. It will
|
|
* be freed either at a later time if another context deletes
|
|
* it from the map, or automatically by the BPF subsystem if
|
|
* it's still present when the map is destroyed.
|
|
*/
|
|
bpf_cgroup_release(kptr);
|
|
|
|
return 0;
|
|
}
|
|
|
|
----
|
|
|
|
Another kfunc available for interacting with ``struct cgroup *`` objects is
|
|
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
|
|
and return it as a cgroup kptr.
|
|
|
|
.. kernel-doc:: kernel/bpf/helpers.c
|
|
:identifiers: bpf_cgroup_ancestor
|
|
|
|
Eventually, BPF should be updated to allow this to happen with a normal memory
|
|
load in the program itself. This is currently not possible without more work in
|
|
the verifier. bpf_cgroup_ancestor() can be used as follows:
|
|
|
|
.. code-block:: c
|
|
|
|
/**
|
|
* Simple tracepoint example that illustrates how a cgroup's
|
|
* ancestor can be accessed using bpf_cgroup_ancestor().
|
|
*/
|
|
SEC("tp_btf/cgroup_mkdir")
|
|
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
|
|
{
|
|
struct cgroup *parent;
|
|
|
|
/* The parent cgroup resides at the level before the current cgroup's level. */
|
|
parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
|
|
if (!parent)
|
|
return -ENOENT;
|
|
|
|
bpf_printk("Parent id is %d", parent->self.id);
|
|
|
|
/* Return the parent cgroup that was acquired above. */
|
|
bpf_cgroup_release(parent);
|
|
return 0;
|
|
}
|
|
|
|
4.3 struct cpumask * kfuncs
|
|
---------------------------
|
|
|
|
BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
|
|
destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
|
|
for more details.
|