Commit Graph

1589 Commits

Author SHA1 Message Date
Andrii Nakryiko
5ab8cb89db libbpf: fix u64-to-pointer cast on 32-bit arches
It's been reported that (void *)map->map_extra is causing compilation
warnings on 32-bit architectures. It's easy enough to fix this by
casting to long first.

Fixes: 79ff13e991 ("libbpf: Add support for bpf_arena.")
Reported-by: Ryan Eatmon <reatmon@ti.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319215143.1279312-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-19 22:54:04 -07:00
Alexei Starovoitov
10ebe835c9 libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff https://github.com/llvm/llvm-project/pull/85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
  placed in section named ".addr_space.N" instead of ".arena.N".

Adjust libbpf, bpftool, and selftests to match LLVM.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315021834.62988-3-alexei.starovoitov@gmail.com
2024-03-15 14:24:00 -07:00
Andrii Nakryiko
2e7ba4f8fd libbpf: Recognize __arena global variables.
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240308010812.89848-11-alexei.starovoitov@gmail.com
2024-03-11 15:43:35 -07:00
Alexei Starovoitov
79ff13e991 libbpf: Add support for bpf_arena.
mmap() bpf_arena right after creation, since the kernel needs to
remember the address returned from mmap. This is user_vm_start.
LLVM will generate bpf_arena_cast_user() instructions where
necessary and JIT will add upper 32-bit of user_vm_start
to such pointers.

Fix up bpf_map_mmap_sz() to compute mmap size as
map->value_size * map->max_entries for arrays and
PAGE_SIZE * map->max_entries for arena.

Don't set BTF at arena creation time, since it doesn't support it.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-9-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
4d2b56081c libbpf: Add __arg_arena to bpf_helpers.h
Add __arg_arena to bpf_helpers.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240308010812.89848-8-alexei.starovoitov@gmail.com
2024-03-11 15:37:24 -07:00
Alexei Starovoitov
d147357e2e libbpf: Allow specifying 64-bit integers in map BTF.
__uint() macro that is used to specify map attributes like:
  __uint(type, BPF_MAP_TYPE_ARRAY);
  __uint(map_flags, BPF_F_MMAPABLE);
It is limited to 32-bit, since BTF_KIND_ARRAY has u32 "number of elements"
field in "struct btf_array".

Introduce __ulong() macro that allows specifying values bigger than 32-bit.
In map definition "map_extra" is the only u64 field, so far.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20240307031228.42896-5-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-07 14:58:48 -08:00
Eduard Zingerman
6ebaa3fb88 libbpf: Rewrite btf datasec names starting from '?'
Optional struct_ops maps are defined using question mark at the start
of the section name, e.g.:

    SEC("?.struct_ops")
    struct test_ops optional_map = { ... };

This commit teaches libbpf to detect if kernel allows '?' prefix
in datasec names, and if it doesn't then to rewrite such names
by replacing '?' with '_', e.g.:

    DATASEC ?.struct_ops -> DATASEC _.struct_ops

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-13-eddyz87@gmail.com
2024-03-06 15:18:16 -08:00
Eduard Zingerman
5ad0ecbe05 libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link")
Allow using two new section names for struct_ops maps:
- SEC("?.struct_ops")
- SEC("?.struct_ops.link")

To specify maps that have bpf_map->autocreate == false after open.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-12-eddyz87@gmail.com
2024-03-06 15:18:16 -08:00
Eduard Zingerman
240bf8a516 libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type
The next patch would add two new section names for struct_ops maps.
To make working with multiple struct_ops sections more convenient:
- remove fields like elf_state->st_ops_{shndx,link_shndx};
- mark section descriptions hosting struct_ops as
  elf_sec_desc->sec_type == SEC_ST_OPS;

After these changes struct_ops sections could be processed uniformly
by iterating bpf_object->efile.secs entries.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-11-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
fe9d049c3d libbpf: Sync progs autoload with maps autocreate for struct_ops maps
Automatically select which struct_ops programs to load depending on
which struct_ops maps are selected for automatic creation.
E.g. for the BPF code below:

    SEC("struct_ops/test_1") int BPF_PROG(foo) { ... }
    SEC("struct_ops/test_2") int BPF_PROG(bar) { ... }

    SEC(".struct_ops.link")
    struct test_ops___v1 A = {
        .foo = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 B = {
        .foo = (void *)foo,
        .bar = (void *)bar,
    };

And the following libbpf API calls:

    bpf_map__set_autocreate(skel->maps.A, true);
    bpf_map__set_autocreate(skel->maps.B, false);

The autoload would be enabled for program 'foo' and disabled for
program 'bar'.

During load, for each struct_ops program P, referenced from some
struct_ops map M:
- set P.autoload = true if M.autocreate is true for some M;
- set P.autoload = false if M.autocreate is false for all M;
- don't change P.autoload, if P is not referenced from any map.

Do this after bpf_object__init_kern_struct_ops_maps()
to make sure that shadow vars assignment is done.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-9-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
8db052615a libbpf: Honor autocreate flag for struct_ops maps
Skip load steps for struct_ops maps not marked for automatic creation.
This should allow to load bpf object in situations like below:

    SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
    SEC("struct_ops/bar") int BPF_PROG(bar) { ... }

    struct test_ops___v1 {
    	int (*foo)(void);
    };

    struct test_ops___v2 {
    	int (*foo)(void);
    	int (*does_not_exist)(void);
    };

    SEC(".struct_ops.link")
    struct test_ops___v1 map_for_old = {
    	.test_1 = (void *)foo
    };

    SEC(".struct_ops.link")
    struct test_ops___v2 map_for_new = {
    	.test_1 = (void *)foo,
    	.does_not_exist = (void *)bar
    };

Suppose program is loaded on old kernel that does not have definition
for 'does_not_exist' struct_ops member. After this commit it would be
possible to load such object file after the following tweaks:

    bpf_program__set_autoload(skel->progs.bar, false);
    bpf_map__set_autocreate(skel->maps.map_for_new, false);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20240306104529.6453-4-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
d9ab2f76ef libbpf: Tie struct_ops programs to kernel BTF ids, not to local ids
Enforce the following existing limitation on struct_ops programs based
on kernel BTF id instead of program-local BTF id:

    struct_ops BPF prog can be re-used between multiple .struct_ops &
    .struct_ops.link as long as it's the same struct_ops struct
    definition and the same function pointer field

This allows reusing same BPF program for versioned struct_ops map
definitions, e.g.:

    SEC("struct_ops/test")
    int BPF_PROG(foo) { ... }

    struct some_ops___v1 { int (*test)(void); };
    struct some_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link") struct some_ops___v1 a = { .test = foo }
    SEC(".struct_ops.link") struct some_ops___v2 b = { .test = foo }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-3-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Eduard Zingerman
a2a5172cf1 libbpf: Allow version suffixes (___smth) for struct_ops types
E.g. allow the following struct_ops definitions:

    struct bpf_testmod_ops___v1 { int (*test)(void); };
    struct bpf_testmod_ops___v2 { int (*test)(void); };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v1 a = { .test = ... }
    SEC(".struct_ops.link")
    struct bpf_testmod_ops___v2 b = { .test = ... }

Where both bpf_testmod_ops__v1 and bpf_testmod_ops__v2 would be
resolved as 'struct bpf_testmod_ops' from kernel BTF.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240306104529.6453-2-eddyz87@gmail.com
2024-03-06 15:18:15 -08:00
Chen Shen
25703adf45 libbpf: Correct debug message in btf__load_vmlinux_btf
In the function btf__load_vmlinux_btf, the debug message incorrectly
refers to 'path' instead of 'sysfs_btf_path'.

Signed-off-by: Chen Shen <peterchenshen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-04 14:33:51 +01:00
Kui-Feng Lee
69e4a9d2b3 libbpf: Convert st_ops->data to shadow type.
Convert st_ops->data to the shadow type of the struct_ops map. The shadow
type of a struct_ops type is a variant of the original struct type
providing a way to access/change the values in the maps of the struct_ops
type.

bpf_map__initial_value() will return st_ops->data for struct_ops types. The
skeleton is going to use it as the pointer to the shadow type of the
original struct type.

One of the main differences between the original struct type and the shadow
type is that all function pointers of the shadow type are converted to
pointers of struct bpf_program. Users can replace these bpf_program
pointers with other BPF programs. The st_ops->progs[] will be updated
before updating the value of a map to reflect the changes made by users.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-02-29 14:23:52 -08:00
Kui-Feng Lee
3644d28546 libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
For a struct_ops map, btf_value_type_id is the type ID of it's struct
type. This value is required by bpftool to generate skeleton including
pointers of shadow types. The code generator gets the type ID from
bpf_map__btf_value_type_id() in order to get the type information of the
struct type of a map.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com
2024-02-29 14:23:52 -08:00
Martin Kelly
58fd62e0aa bpf: Clarify batch lookup/lookup_and_delete semantics
The batch lookup and lookup_and_delete APIs have two parameters,
in_batch and out_batch, to facilitate iterative
lookup/lookup_and_deletion operations for supported maps. Except NULL
for in_batch at the start of these two batch operations, both parameters
need to point to memory equal or larger than the respective map key
size, except for various hashmaps (hash, percpu_hash, lru_hash,
lru_percpu_hash) where the in_batch/out_batch memory size should be
at least 4 bytes.

Document these semantics to clarify the API.

Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-22 10:24:38 -08:00
Matt Bobrowski
1159d27852 libbpf: Make remark about zero-initializing bpf_*_info structs
In some situations, if you fail to zero-initialize the
bpf_{prog,map,btf,link}_info structs supplied to the set of LIBBPF
helpers bpf_{prog,map,btf,link}_get_info_by_fd(), you can expect the
helper to return an error. This can possibly leave people in a
situation where they're scratching their heads for an unnnecessary
amount of time. Make an explicit remark about the requirement of
zero-initializing the supplied bpf_{prog,map,btf,link}_info structs
for the respective LIBBPF helpers.

Internally, LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd()
call into bpf_obj_get_info_by_fd() where the bpf(2)
BPF_OBJ_GET_INFO_BY_FD command is used. This specific command is
effectively backed by restrictions enforced by the
bpf_check_uarg_tail_zero() helper. This function ensures that if the
size of the supplied bpf_{prog,map,btf,link}_info structs are larger
than what the kernel can handle, trailing bits are zeroed. This can be
a problem when compiling against UAPI headers that don't necessarily
match the sizes of the same underlying types known to the kernel.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/ZcyEb8x4VbhieWsL@google.com
2024-02-14 09:48:46 -08:00
Cupertino Miranda
12bbcf8e84 libbpf: Add support to GCC in CORE macro definitions
Due to internal differences between LLVM and GCC the current
implementation for the CO-RE macros does not fit GCC parser, as it will
optimize those expressions even before those would be accessible by the
BPF backend.

As examples, the following would be optimized out with the original
definitions:
  - As enums are converted to their integer representation during
  parsing, the IR would not know how to distinguish an integer
  constant from an actual enum value.
  - Types need to be kept as temporary variables, as the existing type
  casts of the 0 address (as expanded for LLVM), are optimized away by
  the GCC C parser, never really reaching GCCs IR.

Although, the macros appear to add extra complexity, the expanded code
is removed from the compilation flow very early in the compilation
process, not really affecting the quality of the generated assembly.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
2024-02-13 11:28:12 -08:00
Toke Høiland-Jørgensen
92a871ab9f libbpf: Use OPTS_SET() macro in bpf_xdp_query()
When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa25 ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06 09:51:26 -08:00
Andrii Nakryiko
d7bc416aa5 libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check
If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8fafe3 ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05 20:01:46 -08:00
Andrii Nakryiko
b9551da8cf libbpf: Add missed btf_ext__raw_data() API
Another API that was declared in libbpf.map but actual implementation
was missing. btf_ext__get_raw_data() was intended as a discouraged alias
to consistently-named btf_ext__raw_data(), so make this an actuality.

Fixes: 20eccf29e2 ("libbpf: hide and discourage inconsistently named getters")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-5-andrii@kernel.org
2024-02-01 22:16:12 +01:00
Andrii Nakryiko
c81a8ab196 libbpf: Add btf__new_split() API that was declared but not implemented
Seems like original commit adding split BTF support intended to add
btf__new_split() API, and even declared it in libbpf.map, but never
added (trivial) implementation. Fix this.

Fixes: ba451366bf ("libbpf: Implement basic split BTF support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org
2024-02-01 22:16:12 +01:00
Andrii Nakryiko
93ee1eb85e libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim API
LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so
add it to make this API callable from libbpf's shared library version.

Fixes: e542f2c4cd ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
Fixes: ab9a5a05dc ("libbpf: fix up few libbpf.map problems")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org
2024-02-01 22:16:11 +01:00
Andrii Nakryiko
9fa5e1a180 libbpf: Call memfd_create() syscall directly
Some versions of Android do not implement memfd_create() wrapper in
their libc implementation, leading to build failures ([0]). On the other
hand, memfd_create() is available as a syscall on quite old kernels
(3.17+, while bpf() syscall itself is available since 3.18+), so it is
ok to assume that syscall availability and call into it with syscall()
helper to avoid Android-specific workarounds.

Validated in libbpf-bootstrap's CI ([1]).

  [0] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7701003207/job/20986080319#step:5:83
  [1] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7715988887/job/21031767212?pr=253

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-2-andrii@kernel.org
2024-02-01 22:16:11 +01:00
Eduard Zingerman
8263b3382d libbpf: Remove unnecessary null check in kernel_supports()
After recent changes, Coverity complained about inconsistent null checks
in kernel_supports() function:

    kernel_supports(const struct bpf_object *obj, ...)
    [...]
    // var_compare_op: Comparing obj to null implies that obj might be null
    if (obj && obj->gen_loader)
        return true;

    // var_deref_op: Dereferencing null pointer obj
    if (obj->token_fd)
        return feat_supported(obj->feat_cache, feat_id);
    [...]

- The original null check was introduced by commit [0], which introduced
  a call `kernel_supports(NULL, ...)` in function bump_rlimit_memlock();
- This call was refactored to use `feat_supported(NULL, ...)` in commit [1].

Looking at all places where kernel_supports() is called:

- There is either `obj->...` access before the call;
- Or `obj` comes from `prog->obj` expression, where `prog` comes from
  enumeration of programs in `obj`;
- Or `obj` comes from `prog->obj`, where `prog` is a parameter to one
  of the API functions:
  - bpf_program__attach_kprobe_opts;
  - bpf_program__attach_kprobe;
  - bpf_program__attach_ksyscall.

Assuming correct API usage, it appears that `obj` can never be null when
passed to kernel_supports(). Silence the Coverity warning by removing
redundant null check.

  [0] e542f2c4cd ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
  [1] d6dd1d4936 ("libbpf: Further decouple feature checking logic from bpf_object")

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240131212615.20112-1-eddyz87@gmail.com
2024-02-01 11:13:45 +01:00
Andrii Nakryiko
20d59ee551 libbpf: add bpf_core_cast() macro
Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.

libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.

But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-30 15:55:50 -08:00
Andrii Nakryiko
d28bb1a86e libbpf: add __arg_trusted and __arg_nullable tag macros
Add __arg_trusted to annotate global func args that accept trusted
PTR_TO_BTF_ID arguments.

Also add __arg_nullable to combine with __arg_trusted (and maybe other
tags in the future) to force global subprog itself (i.e., callee) to do
NULL checks, as opposed to default non-NULL semantics (and thus caller's
responsibility to ensure non-NULL values).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-30 09:41:50 -08:00
Ian Rogers
f2e4040c82 libbpf: Add some details for BTF parsing failures
As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
valid kernel BTF" message makes diagnosing the kernel build issue somewhat
cryptic. Add a little more detail with the hope of helping users.

Before:
```
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not accessible:
```
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not readable:
```
libbpf: failed to read kernel BTF from (/sys/kernel/btf/vmlinux): -1
```

Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@mail.gmail.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240125231840.1647951-1-irogers@google.com
2024-01-29 16:42:42 -08:00
Andrii Nakryiko
9eea8fafe3 libbpf: fix __arg_ctx type enforcement for perf_event programs
Adjust PERF_EVENT type enforcement around __arg_ctx to match exactly
what kernel is doing.

Fixes: 76ec90a996 ("libbpf: warn on unexpected __arg_ctx type when rewriting BTF")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-29 12:34:13 -08:00
Andrii Nakryiko
0e6d0a9d23 libbpf: integrate __arg_ctx feature detector into kernel_supports()
Now that feature detection code is in bpf-next tree, integrate __arg_ctx
kernel-side support into kernel_supports() framework.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-29 12:34:13 -08:00
Andrii Nakryiko
ad57654053 libbpf: Fix faccessat() usage on Android
Android implementation of libc errors out with -EINVAL in faccessat() if
passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf
refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by
detecting Android and redefining AT_EACCESS to 0, it's equivalent on
Android.

  [0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
  [1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250

Fixes: 6a4ab8869d ("libbpf: Fix the case of running as non-root with capabilities")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org
2024-01-29 16:17:44 +01:00
Andrii Nakryiko
cac270ad79 libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
To allow external admin authority to override default BPF FS location
(/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
didn't explicitly specify bpf_token_path option, it will be treated
exactly like bpf_token_path option, overriding default /sys/fs/bpf
location and making BPF token mandatory.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-29-andrii@kernel.org
2024-01-24 16:21:03 -08:00
Andrii Nakryiko
6b434b61b4 libbpf: Wire up BPF token support at BPF object level
Add BPF token support to BPF object-level functionality.

BPF token is supported by BPF object logic either as an explicitly
provided BPF token from outside (through BPF FS path), or implicitly
(unless prevented through bpf_object_open_opts).

Implicit mode is assumed to be the most common one for user namespaced
unprivileged workloads. The assumption is that privileged container
manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
delegation options (delegate_{cmds,maps,progs,attachs} mount options).
BPF object during loading will attempt to create BPF token from
/sys/fs/bpf location, and pass it for all relevant operations
(currently, map creation, BTF load, and program load).

In this implicit mode, if BPF token creation fails due to whatever
reason (BPF FS is not mounted, or kernel doesn't support BPF token,
etc), this is not considered an error. BPF object loading sequence will
proceed with no BPF token.

In explicit BPF token mode, user provides explicitly custom BPF FS mount
point path. In such case, BPF object will attempt to create BPF token
from provided BPF FS location. If BPF token creation fails, that is
considered a critical error and BPF object load fails with an error.

Libbpf provides a way to disable implicit BPF token creation, if it
causes any troubles (BPF token is designed to be completely optional and
shouldn't cause any problems even if provided, but in the world of BPF
LSM, custom security logic can be installed that might change outcome
depending on the presence of BPF token). To disable libbpf's default BPF
token creation behavior user should provide either invalid BPF token FD
(negative), or empty bpf_token_path option.

BPF token presence can influence libbpf's feature probing, so if BPF
object has associated BPF token, feature probing is instructed to use
BPF object-specific feature detection cache and token FD.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-26-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
f3dcee938f libbpf: Wire up token_fd into feature probing logic
Adjust feature probing callbacks to take into account optional token_fd.
In unprivileged contexts, some feature detectors would fail to detect
kernel support just because BPF program, BPF map, or BTF object can't be
loaded due to privileged nature of those operations. So when BPF object
is loaded with BPF token, this token should be used for feature probing.

This patch is setting support for this scenario, but we don't yet pass
non-zero token FD. This will be added in the next patch.

We also switched BPF cookie detector from using kprobe program to
tracepoint one, as tracepoint is somewhat less dangerous BPF program
type and has higher likelihood of being allowed through BPF token in the
future. This change has no effect on detection behavior.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-25-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
05f9cdd55d libbpf: Move feature detection code into its own file
It's quite a lot of well isolated code, so it seems like a good
candidate to move it out of libbpf.c to reduce its size.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-24-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
d6dd1d4936 libbpf: Further decouple feature checking logic from bpf_object
Add feat_supported() helper that accepts feature cache instead of
bpf_object. This allows low-level code in bpf.c to not know or care
about higher-level concept of bpf_object, yet it will be able to utilize
custom feature checking in cases where BPF token might influence the
outcome.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-23-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
ea4d587354 libbpf: Split feature detectors definitions from cached results
Split a list of supported feature detectors with their corresponding
callbacks from actual cached supported/missing values. This will allow
to have more flexible per-token or per-object feature detectors in
subsequent refactorings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-22-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
404cbc149c libbpf: Add BPF token support to bpf_prog_load() API
Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-16-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
a3d63e8525 libbpf: Add BPF token support to bpf_btf_load() API
Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Wire through new btf_flags as well, so that user can provide
BPF_F_TOKEN_FD flag, if necessary.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-15-andrii@kernel.org
2024-01-24 16:21:02 -08:00
Andrii Nakryiko
364f848375 libbpf: Add BPF token support to bpf_map_create() API
Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-14-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Andrii Nakryiko
639ecd7d62 libbpf: Add bpf_token_create() API
Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20240124022127.2379740-13-andrii@kernel.org
2024-01-24 16:21:01 -08:00
Martin KaFai Lau
c9f1155645 libbpf: Ensure undefined bpf_attr field stays 0
The commit 9e926acda0 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when
the caller of the libbpf's bpf_map_create did not define this field by
passing a NULL "opts" or passing in a "opts" that does not cover this
new field. OPT_HAS(opts, field) is used to decide if the field is
defined or not:

	((opts) && opts->sz >= offsetofend(typeof(*(opts)), field))

Once OPTS_HAS decided the field is not defined, that field should
be set to 0. For this particular new field (value_type_btf_obj_fd),
its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set.
Thus, the kernel does not treat it as an fd field.

Fixes: 9e926acda0 ("libbpf: Find correct module BTFs for struct_ops maps and progs.")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:19:39 -08:00
Dima Tisnek
d47b9f68d2 libbpf: Correct bpf_core_read.h comment wrt bpf_core_relo struct
Past commit ([0]) removed the last vestiges of struct bpf_field_reloc,
it's called struct bpf_core_relo now.

  [0] 28b93c6449 ("libbpf: Clean up and improve CO-RE reloc logging")

Signed-off-by: Dima Tisnek <dimaqq@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240121060126.15650-1-dimaqq@gmail.com
2024-01-23 20:29:57 -08:00
Kui-Feng Lee
9e926acda0 libbpf: Find correct module BTFs for struct_ops maps and progs.
Locate the module BTFs for struct_ops maps and progs and pass them to the
kernel. This ensures that the kernel correctly resolves type IDs from the
appropriate module BTFs.

For the map of a struct_ops object, the FD of the module BTF is set to
bpf_map to keep a reference to the module BTF. The FD is passed to the
kernel as value_type_btf_obj_fd when the struct_ops object is loaded.

For a bpf_struct_ops prog, attach_btf_obj_fd of bpf_prog is the FD of a
module BTF in the kernel.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240119225005.668602-13-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-23 17:12:52 -08:00
Andrii Nakryiko
bc308d011a libbpf: call dup2() syscall directly
We've ran into issues with using dup2() API in production setting, where
libbpf is linked into large production environment and ends up calling
unintended custom implementations of dup2(). These custom implementations
don't provide atomic FD replacement guarantees of dup2() syscall,
leading to subtle and hard to debug issues.

To prevent this in the future and guarantee that no libc implementation
will do their own custom non-atomic dup2() implementation, call dup2()
syscall directly with syscall(SYS_dup2).

Note that some architectures don't seem to provide dup2 and have dup3
instead. Try to detect and pick best syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240119210201.1295511-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 15:13:47 -08:00
Andrey Grafin
f04deb90e5 libbpf: Apply map_set_def_max_entries() for inner_maps on creation
This patch allows to auto create BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS with values of BPF_MAP_TYPE_PERF_EVENT_ARRAY
by bpf_object__load().

Previous behaviour created a zero filled btf_map_def for inner maps and
tried to use it for a map creation but the linux kernel forbids to create
a BPF_MAP_TYPE_PERF_EVENT_ARRAY map with max_entries=0.

Fixes: 646f02ffdd ("libbpf: Add BTF-defined map-in-map support")
Signed-off-by: Andrey Grafin <conquistador@yandex-team.ru>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20240117130619.9403-1-conquistador@yandex-team.ru
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-23 14:43:12 -08:00
Andrii Nakryiko
76ec90a996 libbpf: warn on unexpected __arg_ctx type when rewriting BTF
On kernel that don't support arg:ctx tag, before adjusting global
subprog BTF information to match kernel's expected canonical type names,
make sure that types used by user are meaningful, and if not, warn and
don't do BTF adjustments.

This is similar to checks that kernel performs, but narrower in scope,
as only a small subset of BPF program types can be accommodated by
libbpf using canonical type names.

Libbpf unconditionally allows `struct pt_regs *` for perf_event program
types, unlike kernel, which supports that conditionally on architecture.
This is done to keep things simple and not cause unnecessary false
positives. This seems like a minor and harmless deviation, which in
real-world programs will be caught by kernels with arg:ctx tag support
anyways. So KISS principle.

This logic is hard to test (especially on latest kernels), so manual
testing was performed instead. Libbpf emitted the following warning for
perf_event program with wrong context argument type:

  libbpf: prog 'arg_tag_ctx_perf': subprog 'subprog_ctx_tag' arg#0 is expected to be of `struct bpf_perf_event_data *` type

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-17 20:20:06 -08:00
Andrii Nakryiko
01b55f4f0c libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.

test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).

Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=814209&state=*

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240118033143.3384355-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-17 20:20:05 -08:00
Andrii Nakryiko
2f38fe6894 libbpf: implement __arg_ctx fallback logic
Out of all special global func arg tag annotations, __arg_ctx is
practically is the most immediately useful and most critical to have
working across multitude kernel version, if possible. This would allow
end users to write much simpler code if __arg_ctx semantics worked for
older kernels that don't natively understand btf_decl_tag("arg:ctx") in
verifier logic.

Luckily, it is possible to ensure __arg_ctx works on old kernels through
a bit of extra work done by libbpf, at least in a lot of common cases.

To explain the overall idea, we need to go back at how context argument
was supported in global funcs before __arg_ctx support was added. This
was done based on special struct name checks in kernel. E.g., for
BPF_PROG_TYPE_PERF_EVENT the expectation is that argument type `struct
bpf_perf_event_data *` mark that argument as PTR_TO_CTX. This is all
good as long as global function is used from the same BPF program types
only, which is often not the case. If the same subprog has to be called
from, say, kprobe and perf_event program types, there is no single
definition that would satisfy BPF verifier. Subprog will have context
argument either for kprobe (if using bpf_user_pt_regs_t struct name) or
perf_event (with bpf_perf_event_data struct name), but not both.

This limitation was the reason to add btf_decl_tag("arg:ctx"), making
the actual argument type not important, so that user can just define
"generic" signature:

  __noinline int global_subprog(void *ctx __arg_ctx) { ... }

I won't belabor how libbpf is implementing subprograms, see a huge
comment next to bpf_object_relocate_calls() function. The idea is that
each main/entry BPF program gets its own copy of global_subprog's code
appended.

This per-program copy of global subprog code *and* associated func_info
.BTF.ext information, pointing to FUNC -> FUNC_PROTO BTF type chain
allows libbpf to simulate __arg_ctx behavior transparently, even if the
kernel doesn't yet support __arg_ctx annotation natively.

The idea is straightforward: each time we append global subprog's code
and func_info information, we adjust its FUNC -> FUNC_PROTO type
information, if necessary (that is, libbpf can detect the presence of
btf_decl_tag("arg:ctx") just like BPF verifier would do it).

The rest is just mechanical and somewhat painful BTF manipulation code.
It's painful because we need to clone FUNC -> FUNC_PROTO, instead of
reusing it, as same FUNC -> FUNC_PROTO chain might be used by another
main BPF program within the same BPF object, so we can't just modify it
in-place (and cloning BTF types within the same struct btf object is
painful due to constant memory invalidation, see comments in code).
Uploaded BPF object's BTF information has to work for all BPF
programs at the same time.

Once we have FUNC -> FUNC_PROTO clones, we make sure that instead of
using some `void *ctx` parameter definition, we have an expected `struct
bpf_perf_event_data *ctx` definition (as far as BPF verifier and kernel
is concerned), which will mark it as context for BPF verifier. Same
global subprog relocated and copied into another main BPF program will
get different type information according to main program's type. It all
works out in the end in a completely transparent way for end user.

Libbpf maintains internal program type -> expected context struct name
mapping internally. Note, not all BPF program types have named context
struct, so this approach won't work for such programs (just like it
didn't before __arg_ctx). So native __arg_ctx is still important to have
in kernel to have generic context support across all BPF program types.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240104013847.3875810-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-03 21:22:49 -08:00