Add BPF_PROG_RUN command as an alias to BPF_RPOG_TEST_RUN to better
indicate the full range of use cases done by the command.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210519014032.20908-1-alexei.starovoitov@gmail.com
Alexei Starovoitov says:
====================
v5->v6:
- fixed issue found by bpf CI. The light skeleton generation was
doing a dry-run of loading the program where all actual sys_bpf syscalls
were replaced by calls into gen_loader. Turned out that search for valid
vmlinux_btf was not stubbed out which was causing light skeleton gen
to fail on older kernels.
- significantly reduced verbosity of gen_loader.c.
- an example trace_printk.lskel.h generated out of progs/trace_printk.c
https://gist.github.com/4ast/774ea58f8286abac6aa8e3bf3bf3b903
v4->v5:
- addressed a bunch of minor comments from Andrii.
- the main difference is that lskel is now more robust in case of errors
and a bit cleaner looking.
v3->v4:
- cleaned up closing of temporary FDs in case intermediate sys_bpf fails during
execution of loader program.
- added support for rodata in the skeleton.
- enforce bpf_prog_type_syscall to be sleepable, since it needs bpf_copy_from_user
to populate rodata map.
- converted test trace_printk to use lskel to test rodata access.
- various small bug fixes.
v2->v3: Addressed comments from Andrii and John.
- added support for setting max_entries after signature verification
and used it in ringbuf test, since ringbuf's max_entries has to be updated
after skeleton open() and before load(). See patch 20.
- bpf_btf_find_by_name_kind doesn't take btf_fd anymore.
Because of that removed attach_prog_fd from bpf_prog_desc in lskel.
Both features to be added later.
- cleaned up closing of fd==0 during loader gen by resetting fds back to -1.
- converted loader gen to use memset(&attr, cmd_specific_attr_size).
would love to see this optimization in the rest of libbpf.
- fixed memory leak during loader_gen in case of enomem.
- support for fd_array kernel feature is added in patch 9 to have
exhaustive testing across all selftests and then partially reverted
in patch 15 to keep old style map_fd patching tested as well.
- since fentry_test/fexit_tests were extended with re-attach had to add
support for per-program attach method in lskel and use it in the tests.
- cleanup closing of fds in lskel in case of partial failures.
- fixed numerous small nits.
v1->v2: Addressed comments from Al, Yonghong and Andrii.
- documented sys_close fdget/fdput requirement and non-recursion check.
- reduced internal api leaks between libbpf and bpftool.
Now bpf_object__gen_loader() is the only new libbf api with minimal fields.
- fixed light skeleton __destroy() method to munmap and close maps and progs.
- refactored bpf_btf_find_by_name_kind to return btf_id | (btf_obj_fd << 32).
- refactored use of bpf_btf_find_by_name_kind from loader prog.
- moved auto-gen like code into skel_internal.h that is used by *.lskel.h
It has minimal static inline bpf_load_and_run() method used by lskel.
- added lksel.h example in patch 15.
- replaced union bpf_map_prog_desc with struct bpf_map_desc and struct bpf_prog_desc.
- removed mark_feat_supported and added a patch to pass 'obj' into kernel_supports.
- added proper tracking of temporary FDs in loader prog and their cleanup via bpf_sys_close.
- rename gen_trace.c into gen_loader.c to better align the naming throughout.
- expanded number of available helpers in new prog type.
- added support for raw_tp attaching in lskel.
lskel supports tracing and raw_tp progs now.
It correctly loads all networking prog types too, but __attach() method is tbd.
- converted progs/test_ksyms_module.c to lskel.
- minor feedback fixes all over.
The description of V1 set is still valid:
This is a first step towards signed bpf programs and the third approach of that kind.
The first approach was to bring libbpf into the kernel as a user-mode-driver.
The second approach was to invent a new file format and let kernel execute
that format as a sequence of syscalls that create maps and load programs.
This third approach is using new type of bpf program instead of inventing file format.
1st and 2nd approaches had too many downsides comparing to this 3rd and were discarded
after months of work.
To make it work the following new concepts are introduced:
1. syscall bpf program type
A kind of bpf program that can do sys_bpf and sys_close syscalls.
It can only execute in user context.
2. FD array or FD index.
Traditionally BPF instructions are patched with FDs.
What it means that maps has to be created first and then instructions modified
which breaks signature verification if the program is signed.
Instead of patching each instruction with FD patch it with an index into array of FDs.
That makes the program signature stable if it uses maps.
3. loader program that is generated as "strace of libbpf".
When libbpf is loading bpf_file.o it does a bunch of sys_bpf() syscalls to
load BTF, create maps, populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map and single bpf program that
does those syscalls.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.
4. light skeleton
Instead of embedding the whole elf file into skeleton and using libbpf
to parse it later generate a loader program and embed it into "light skeleton".
Such skeleton can load the same set of elf files, but it doesn't need
libbpf and libelf to do that. It only needs few sys_bpf wrappers.
Future steps:
- support CO-RE in the kernel. This patch set is already too big,
so that critical feature is left for the next step.
- generate light skeleton in golang to allow such users use BTF and
all other features provided by libbpf
- generate light skeleton for kernel, so that bpf programs can be embeded
in the kernel module. The UMD usage in bpf_preload will be replaced with
such skeleton, so bpf_preload would become a standard kernel module
without user space dependency.
- finally do the signing of the loader program.
The patches are work in progress with few rough edges.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add -L flag to bpftool to use libbpf gen_trace facility and syscall/loader program
for skeleton generation and program loading.
"bpftool gen skeleton -L" command will generate a "light skeleton" or "loader skeleton"
that is similar to existing skeleton, but has one major difference:
$ bpftool gen skeleton lsm.o > lsm.skel.h
$ bpftool gen skeleton -L lsm.o > lsm.lskel.h
$ diff lsm.skel.h lsm.lskel.h
@@ -5,34 +4,34 @@
#define __LSM_SKEL_H__
#include <stdlib.h>
-#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
The light skeleton does not use majority of libbpf infrastructure.
It doesn't need libelf. It doesn't parse .o file.
It only needs few sys_bpf wrappers. All of them are in bpf/bpf.h file.
In future libbpf/bpf.c can be inlined into bpf.h, so not even libbpf.a would be
needed to work with light skeleton.
"bpftool prog load -L file.o" command is introduced for debugging of syscall/loader
program generation. Just like the same command without -L it will try to load
the programs from file.o into the kernel. It won't even try to pin them.
"bpftool prog load -L -d file.o" command will provide additional debug messages
on how syscall/loader program was generated.
Also the execution of syscall/loader program will use bpf_trace_printk() for
each step of loading BTF, creating maps, and loading programs.
The user can do "cat /.../trace_pipe" for further debug.
An example of fexit_sleep.lskel.h generated from progs/fexit_sleep.c:
struct fexit_sleep {
struct bpf_loader_ctx ctx;
struct {
struct bpf_map_desc bss;
} maps;
struct {
struct bpf_prog_desc nanosleep_fentry;
struct bpf_prog_desc nanosleep_fexit;
} progs;
struct {
int nanosleep_fentry_fd;
int nanosleep_fexit_fd;
} links;
struct fexit_sleep__bss {
int pid;
int fentry_cnt;
int fexit_cnt;
} *bss;
};
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-18-alexei.starovoitov@gmail.com
The BPF program loading process performed by libbpf is quite complex
and consists of the following steps:
"open" phase:
- parse elf file and remember relocations, sections
- collect externs and ksyms including their btf_ids in prog's BTF
- patch BTF datasec (since llvm couldn't do it)
- init maps (old style map_def, BTF based, global data map, kconfig map)
- collect relocations against progs and maps
"load" phase:
- probe kernel features
- load vmlinux BTF
- resolve externs (kconfig and ksym)
- load program BTF
- init struct_ops
- create maps
- apply CO-RE relocations
- patch ld_imm64 insns with src_reg=PSEUDO_MAP, PSEUDO_MAP_VALUE, PSEUDO_BTF_ID
- reposition subprograms and adjust call insns
- sanitize and load progs
During this process libbpf does sys_bpf() calls to load BTF, create maps,
populate maps and finally load programs.
Instead of actually doing the syscalls generate a trace of what libbpf
would have done and represent it as the "loader program".
The "loader program" consists of single map with:
- union bpf_attr(s)
- BTF bytes
- map value bytes
- insns bytes
and single bpf program that passes bpf_attr(s) and data into bpf_sys_bpf() helper.
Executing such "loader program" via bpf_prog_test_run() command will
replay the sequence of syscalls that libbpf would have done which will result
the same maps created and programs loaded as specified in the elf file.
The "loader program" removes libelf and majority of libbpf dependency from
program loading process.
kconfig, typeless ksym, struct_ops and CO-RE are not supported yet.
The order of relocate_data and relocate_calls had to change, so that
bpf_gen__prog_load() can see all relocations for a given program with
correct insn_idx-es.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-15-alexei.starovoitov@gmail.com
Add a pointer to 'struct bpf_object' to kernel_supports() helper.
It will be used in the next patch.
No functional changes.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-13-alexei.starovoitov@gmail.com
In order to be able to generate loader program in the later
patches change the order of data and text relocations.
Also improve the test to include data relos.
If the kernel supports "FD array" the map_fd relocations can be processed
before text relos since generated loader program won't need to manually
patch ld_imm64 insns with map_fd.
But ksym and kfunc relocations can only be processed after all calls
are relocated, since loader program will consist of a sequence
of calls to bpf_btf_find_by_name_kind() followed by patching of btf_id
and btf_obj_fd into corresponding ld_imm64 insns. The locations of those
ld_imm64 insns are specified in relocations.
Hence process all data relocations (maps, ksym, kfunc) together after call relos.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-12-alexei.starovoitov@gmail.com
Add bpf_sys_close() helper to be used by the syscall/loader program to close
intermediate FDs and other cleanup.
Note this helper must never be allowed inside fdget/fdput bracketing.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-11-alexei.starovoitov@gmail.com
Add new helper:
long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags)
Description
Find BTF type with given name and kind in vmlinux BTF or in module's BTFs.
Return
Returns btf_id and btf_obj_fd in lower and upper 32 bits.
It will be used by loader program to find btf_id to attach the program to
and to find btf_ids of ksyms.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
Typical program loading sequence involves creating bpf maps and applying
map FDs into bpf instructions in various places in the bpf program.
This job is done by libbpf that is using compiler generated ELF relocations
to patch certain instruction after maps are created and BTFs are loaded.
The goal of fd_idx is to allow bpf instructions to stay immutable
after compilation. At load time the libbpf would still create maps as usual,
but it wouldn't need to patch instructions. It would store map_fds into
__u32 fd_array[] and would pass that pointer to sys_bpf(BPF_PROG_LOAD).
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-9-alexei.starovoitov@gmail.com
Similar to prog_load make btf_load command to be availble to
bpf_prog_type_syscall program.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
bpf_prog_type_syscall is a program that creates a bpf map,
updates it, and loads another bpf program using bpf_sys_bpf() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-6-alexei.starovoitov@gmail.com
With the help from bpfptr_t prepare relevant bpf syscall commands
to be used from kernel and user space.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
Similar to sockptr_t introduce bpfptr_t with few additions:
make_bpfptr() creates new user/kernel pointer in the same address space as
existing user/kernel pointer.
bpfptr_add() advances the user/kernel pointer.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-3-alexei.starovoitov@gmail.com
Add placeholders for bpf_sys_bpf() helper and new program type.
Make sure to check that expected_attach_type is zero for future extensibility.
Allow tracing helper functions to be used in this program type, since they will
only execute from user context via bpf_prog_test_run.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.
bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.
tcp_gso_segment():
[...]
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
[...]
Allow to upgrade/downgrade MSS only when BPF_F_ADJ_ROOM_FIXED_GSO is
not set.
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/1620804453-57566-1-git-send-email-dseok.yi@samsung.com
'err' and 'flags' are not used, we can just get rid of them.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210517022348.50555-1-xiyou.wangcong@gmail.com
After commit 96a71005bd ("bpf, arm64: remove obsolete exception handling
from div/mod"), there is no need to check twice about BPF_DIV and BPF_MOD,
remove the redundant switch case.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1621328170-17583-1-git-send-email-yangtiezhu@loongson.cn
This adds functions that wrap the netlink API used for adding, manipulating,
and removing traffic control filters.
The API summary:
A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
This means that creating a hook leads to creation of the backing qdisc,
while destruction either removes all filters attached to a hook, or destroys
qdisc if requested explicitly (as discussed below).
The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters. All functions return 0 on success, and a
negative error code on failure.
bpf_tc_hook_create - Create a hook
Parameters:
@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
proper enum constant. Note that parent must be unset when
attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
valid value for attach_point.
Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
bpf_tc_hook_destroy - Destroy a hook
Parameters:
@hook - Cannot be NULL. The behaviour depends on value of
attach_point. If BPF_TC_INGRESS, all filters attached to
the ingress hook will be detached. If BPF_TC_EGRESS, all
filters attached to the egress hook will be detached. If
BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
deleted, also detaching all filters. As before, parent must
be unset for these attach_points, and set for BPF_TC_CUSTOM.
It is advised that if the qdisc is operated on by many programs,
then the program at least check that there are no other existing
filters before deleting the clsact qdisc. An example is shown
below:
DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
.attach_point = BPF_TC_INGRESS);
/* set opts as NULL, as we're not really interested in
* getting any info for a particular filter, but just
* detecting its presence.
*/
r = bpf_tc_query(&hook, NULL);
if (r == -ENOENT) {
/* no filters */
hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
return bpf_tc_hook_destroy(&hook);
} else {
/* failed or r == 0, the latter means filters do exist */
return r;
}
Note that there is a small race between checking for no
filters and deleting the qdisc. This is currently unavoidable.
Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
bpf_tc_attach - Attach a filter to a hook
Parameters:
@hook - Cannot be NULL. Represents the hook the filter will be
attached to. Requirements for ifindex and attach_point are
same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
is also supported. In that case, parent must be set to the
handle where the filter will be attached (using BPF_TC_PARENT).
E.g. to set parent to 1:16 like in tc command line, the
equivalent would be BPF_TC_PARENT(1, 16).
@opts - Cannot be NULL. The following opts are optional:
* handle - The handle of the filter
* priority - The priority of the filter
Must be >= 0 and <= UINT16_MAX
Note that when left unset, they will be auto-allocated by
the kernel. The following opts must be set:
* prog_fd - The fd of the loaded SCHED_CLS prog
The following opts must be unset:
* prog_id - The ID of the BPF prog
The following opts are optional:
* flags - Currently only BPF_TC_F_REPLACE is allowed. It
allows replacing an existing filter instead of
failing with -EEXIST.
The following opts will be filled by bpf_tc_attach on a
successful attach operation if they are unset:
* handle - The handle of the attached filter
* priority - The priority of the attached filter
* prog_id - The ID of the attached SCHED_CLS prog
This way, the user can know what the auto allocated values
for optional opts like handle and priority are for the newly
attached filter, if they were unset.
Note that some other attributes are set to fixed default
values listed below (this holds for all bpf_tc_* APIs):
protocol as ETH_P_ALL, direct action mode, chain index of 0,
and class ID of 0 (this can be set by writing to the
skb->tc_classid field from the BPF program).
bpf_tc_detach
Parameters:
@hook - Cannot be NULL. Represents the hook the filter will be
detached from. Requirements are same as described above
in bpf_tc_attach.
@opts - Cannot be NULL. The following opts must be set:
* handle, priority
The following opts must be unset:
* prog_fd, prog_id, flags
bpf_tc_query
Parameters:
@hook - Cannot be NULL. Represents the hook where the filter lookup will
be performed. Requirements are same as described above in
bpf_tc_attach().
@opts - Cannot be NULL. The following opts must be set:
* handle, priority
The following opts must be unset:
* prog_fd, prog_id, flags
The following fields will be filled by bpf_tc_query upon a
successful lookup:
* prog_id
Some usage examples (using BPF skeleton infrastructure):
BPF program (test_tc_bpf.c):
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("classifier")
int cls(struct __sk_buff *skb)
{
return 0;
}
Userspace loader:
struct test_tc_bpf *skel = NULL;
int fd, r;
skel = test_tc_bpf__open_and_load();
if (!skel)
return -ENOMEM;
fd = bpf_program__fd(skel->progs.cls);
DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
if_nametoindex("lo"), .attach_point =
BPF_TC_INGRESS);
/* Create clsact qdisc */
r = bpf_tc_hook_create(&hook);
if (r < 0)
goto end;
DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
r = bpf_tc_attach(&hook, &opts);
if (r < 0)
goto end;
/* Print the auto allocated handle and priority */
printf("Handle=%u", opts.handle);
printf("Priority=%u", opts.priority);
opts.prog_fd = opts.prog_id = 0;
bpf_tc_detach(&hook, &opts);
end:
test_tc_bpf__destroy(skel);
This is equivalent to doing the following using tc command line:
# tc qdisc add dev lo clsact
# tc filter add dev lo ingress bpf obj foo.o sec classifier da
# tc filter del dev lo ingress handle <h> prio <p> bpf
... where the handle and priority can be found using:
# tc filter show dev lo ingress
Another example replacing a filter (extending prior example):
/* We can also choose both (or one), let's try replacing an
* existing filter.
*/
DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
opts.handle, .priority = opts.priority,
.prog_fd = fd);
r = bpf_tc_attach(&hook, &replace_opts);
if (r == -EEXIST) {
/* Expected, now use BPF_TC_F_REPLACE to replace it */
replace_opts.flags = BPF_TC_F_REPLACE;
return bpf_tc_attach(&hook, &replace_opts);
} else if (r < 0) {
return r;
}
/* There must be no existing filter with these
* attributes, so cleanup and return an error.
*/
replace_opts.prog_fd = replace_opts.prog_id = 0;
bpf_tc_detach(&hook, &replace_opts);
return -1;
To obtain info of a particular filter:
/* Find info for filter with handle 1 and priority 50 */
DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
.priority = 50);
r = bpf_tc_query(&hook, &info_opts);
if (r == -ENOENT)
printf("Filter not found");
else if (r < 0)
return r;
printf("Prog ID: %u", info_opts.prog_id);
return 0;
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
[ Daniel: also did major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
is useful to wrap send + recv handling in a generic way. Subsequent patch
will also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link() helper has been removed instead, moving socket
creation into the newly named libbpf_netlink_send_recv().
Every nested attribute's closure must happen using the helper
nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested() helper. Other simple attributes
can be added directly.
The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:
struct {
struct nlmsghdr nh;
struct tcmsg t;
char buf[4096];
} req;
Then, maxsz should be sizeof(req).
This change also converts the open coded attribute preparation with these
helpers. Note that the only failure the internal call to nlattr_add()
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.
The libbpf_netlink_send_recv() call takes care of opening the socket,
sending the netlink message, receiving the response, potentially invoking
callbacks, and return errors if any, and then finally close the socket.
This allows users to avoid identical socket setup code in different places.
The only user of libbpf_nl_get_link() has been converted to make use of it.
__bpf_set_link_xdp_fd_replace() has also been refactored to use it.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
[ Daniel: major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
Detect use of static entry-point BPF programs (those with SEC() markings) and
emit error message. This is similar to
c1cccec9c6 ("libbpf: Reject static maps") but for BPF programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210514195534.1440970-1-andrii@kernel.org
Static maps never really worked with libbpf, because all such maps were always
silently resolved to the very first map. Detect static maps (both legacy and
BTF-defined) and report user-friendly error.
Tested locally by switching few maps (legacy and BTF-defined) in selftests to
static ones and verifying that now libbpf rejects them loudly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-2-andrii@kernel.org
Adjust static_linked selftests to test a mix of global and static variables
and their handling of bpftool's skeleton generation code.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-1-andrii@kernel.org
Use the common function round_up() directly to show the align size
explicitly, the function STACK_ALIGN() is needless, remove it. Other
JITs also just rely on round_up().
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1620651119-5663-1-git-send-email-yangtiezhu@loongson.cn
While cross compiling on ARM32 , the casting from pointer to __u64 will
cause warnings:
samples/bpf/task_fd_query_user.c: In function 'main':
samples/bpf/task_fd_query_user.c:399:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
399 | uprobe_file_offset = (__u64)main - (__u64)&__executable_start;
| ^
samples/bpf/task_fd_query_user.c:399:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
399 | uprobe_file_offset = (__u64)main - (__u64)&__executable_start;
Workaround this by using "unsigned long" to adapt to different ARCHs.
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210511140429.89426-1-liuhailongg6@163.com
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
As discussed in [0], stop emitting static variables in BPF skeletons to avoid
issues with name-conflicting static variables across multiple
statically-linked BPF object files.
Users using static variables to pass data between BPF programs and user-space
should do a trivial one-time switch according to the following simple rules:
- read-only `static volatile const` variables should be converted to
`volatile const`;
- read/write `static volatile` variables should just drop `static volatile`
modifiers to become global variables/symbols. To better handle older Clang
versions, such newly converted global variables should be explicitly
initialized with a specific value or `= 0`/`= {}`, whichever is
appropriate.
[0] https://lore.kernel.org/bpf/CAEf4BzZo7_r-hsNvJt3w3kyrmmBJj7ghGY8+k4nvKF0KLjma=w@mail.gmail.com/T/#m664d4b0d6b31ac8b2669360e0fc2d6962e9f5ec1
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-5-andrii@kernel.org
In preparation of skipping emitting static variables in BPF skeletons, switch
all current selftests uses of static variables to pass data between BPF and
user-space to use global variables.
All non-read-only `static volatile` variables become just plain global
variables by dropping `static volatile` part.
Read-only `static volatile const` variables, though, still require `volatile`
modifier, otherwise compiler will ignore whatever values are set from
user-space.
Few static linker tests are using name-conflicting static variables to
validate that static linker still properly handles static variables and
doesn't trip up on name conflicts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-4-andrii@kernel.org
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
Similarly to .rodata, strip any const/volatile/restrict modifiers when
generating BPF skeleton. They are not helpful and actually just get in the way.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210507054119.270888-2-andrii@kernel.org
Lorenz Bauer says:
====================
github.com/cilium/ebpf runs integration tests with libbpf in a vm on CI.
I recently did some work to increase the code coverage from that, and
started experiencing OOM-kills in the VM. That led me down a rabbit
hole looking at verifier memory allocation patterns. I didn't figure out
what triggered the OOM-kills but refactored some often called memory
allocation code.
The key insight is that often times we don't need to do a full kfree /
kmalloc, but can instead just reallocate. The first patch adds two helpers
which do just that for the use cases in the verifier, which are sufficiently
different that they can't use stock krealloc_array and friends.
The series makes bpf_verif_scale about 10% faster in my VM set up, which
is especially noticeable when running with KASAN enabled.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
func_states_equal makes a very short lived allocation for idmap,
probably because it's too large to fit on the stack. However the
function is called quite often, leading to a lot of alloc / free
churn. Replace the temporary allocation with dedicated scratch
space in struct bpf_verifier_env.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com
Resizing and copying stack and reference tracking state currently
does a lot of kfree / kmalloc when the size of the tracked set changes.
The logic in copy_*_state and realloc_*_state is also hard to follow.
Refactor this into two core functions. copy_array copies from a source
into a destination. It avoids reallocation by taking the allocated
size of the destination into account via ksize(). The function is
essentially krealloc_array, with the difference that the contents of
dst are not preserved. realloc_array changes the size of an array and
zeroes newly allocated items. Contrary to krealloc both functions don't
free the destination if the size is zero. Instead we rely on free_func_state
to clean up.
realloc_stack_state is renamed to grow_stack_state to better convey
that it never shrinks the stack state.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210429134656.122225-2-lmb@cloudflare.com
* A fix to avoid over-allocating the kernel's mapping on !MMU systems,
which could lead to up to 2MiB of lost memory.
* The SiFive address extension errata only manifest on rv64, they are
now disabled on rv32 where they are unnecessary.
There are also a pair of late-landing cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmCWw5QTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiRKiEACIVsjL+QmsgpAxAK9z4YokneXl+bBw
PN401kDpyijVhHCmwXHr1eNilwc2FMkVsfjtGGp4mIgjYnzEoMd8wfaI9b7/1+Rk
73uYF6V/oJCrWvLQ/gIULp8D385F5vjGQDjuFm9mlQuhR1kpGTbvvliVoYwevQ8B
PqVn+zWPqqGyFUEnNNLXhVtTRntGrfZwIz/bxpZOtn/IN3BE1oEQW51+5rbfXpFr
eK59b808lfd5qlbSJ4Dgpwv1TAa01gM9TPYpxqSO/4mo9C1eZsBqS2/LxaJHDqBQ
CENkujqd0qMDq4CP/SN4pBxHWTFyR9wfE8u2d4hqrNwhJivFHM9+yP84ayElfBXh
IhM6PiMnW2N618DeMq92bqpGwrNAUdAVby+B38jpo9z+GCKVqgdMeAnTaBO4ApLx
Hgo6pOknJZxnta6Wmsog8/hNnONSkGDBrzcP7+bdNvsCG6j07P5wLtvsf05yPnIe
6BB/xGePYxaSWtFwebBenSrOYU5mEKV44yOnHvWuy9TwFx6khxb5xCGKtB8I16vy
f3kUG9TXJx6CqDmhsl12YGYHdT13l3YttRAN1MzMazFmzSKcbsmsDDsEmmN/zmXh
qkAXk0yRVxwBqxvehNw2PE0ViNYTpS31qmOHns7pcA85VEzAJSC0yD/k+qZcRuus
a/+f66xSe2KqbQ==
=mEPT
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid over-allocating the kernel's mapping on !MMU systems,
which could lead to up to 2MiB of lost memory
- The SiFive address extension errata only manifest on rv64, they are
now disabled on rv32 where they are unnecessary
- A pair of late-landing cleanups
* tag 'riscv-for-linus-5.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: remove unused handle_exception symbol
riscv: Consistify protect_kernel_linear_mapping_text_rodata() use
riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y
riscv: Only extend kernel reservation if mapped read-only
intel_dp_check_mst_status() uses a 14-byte array to read the DPRX Event
Status Indicator data, but then passes that buffer at offset 10 off as
an argument to drm_dp_channel_eq_ok().
End result: there are only 4 bytes remaining of the buffer, yet
drm_dp_channel_eq_ok() wants a 6-byte buffer. gcc-11 correctly warns
about this case:
drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_check_mst_status’:
drivers/gpu/drm/i915/display/intel_dp.c:3491:22: warning: ‘drm_dp_channel_eq_ok’ reading 6 bytes from a region of size 4 [-Wstringop-overread]
3491 | !drm_dp_channel_eq_ok(&esi[10], intel_dp->lane_count)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_dp.c:3491:22: note: referencing argument 1 of type ‘const u8 *’ {aka ‘const unsigned char *’}
In file included from drivers/gpu/drm/i915/display/intel_dp.c:38:
include/drm/drm_dp_helper.h:1466:6: note: in a call to function ‘drm_dp_channel_eq_ok’
1466 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
| ^~~~~~~~~~~~~~~~~~~~
6:14 elapsed
This commit just extends the original array by 2 zero-initialized bytes,
avoiding the warning.
There may be some underlying bug in here that caused this confusion, but
this is at least no worse than the existing situation that could use
random data off the stack.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a set of minor fixes in various drivers (qla2xxx, ufs,
scsi_debug, lpfc) one doc fix and a fairly large update to the fnic
driver to remove the open coded iteration functions in favour of the
scsi provided ones.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYJa23iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVhnAP0WVxjv
V+L6SAyssgzN3Nbq/nY18qMU1/yeA5jVljRW1gD+JaQKpkOmU+lsldauwW2a3v0G
9XPGTricrtYOu1j3t7c=
=F0aC
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a set of minor fixes in various drivers (qla2xxx, ufs,
scsi_debug, lpfc) one doc fix and a fairly large update to the fnic
driver to remove the open coded iteration functions in favour of the
scsi provided ones"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Use scsi_host_busy_iter() to traverse commands
scsi: fnic: Kill 'exclude_id' argument to fnic_cleanup_io()
scsi: scsi_debug: Fix cmd_per_lun, set to max_queue
scsi: ufs: core: Narrow down fast path in system suspend path
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: qla2xxx: Prevent PRLI in target mode
scsi: qla2xxx: Add marginal path handling support
scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found
scsi: ufs: core: Fix a typo in ufs-sysfs.c
scsi: lpfc: Fix bad memory access during VPD DUMP mailbox command
scsi: lpfc: Fix DMA virtual address ptr assignment in bsg
scsi: lpfc: Fix illegal memory access on Abort IOCBs
scsi: blk-mq: Fix build warning when making htmldocs
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config is
really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmCWrucVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRLkQAJ8t7PfMJLSh/VcgDXp3Z7fZ/V2M
RUGbOeRYErR1gylejuip/R19mS5MiBNecU60VrugZyDOMf98+mx61mI/ykpPeX92
sE3VU5MPXEwmv758QUr4gH014TZshMtHHo+tXA+NVUbqFp7RTnkZMDjOXGthYDHG
NhDou4LZ2P0CUKm8vb58SJPqB7ZdYOT9eEQEdHevm18Gx0KProCxRziup7loldy7
ET770okQ23if90ufCSVmnM6Ee6opoKYvXS5lv8V/a4xV/VbicbUclpzIZsHF7L2i
mIfr6dy480ncOaQlfWnX9ACgIeeqiFPOeZbAu7HAtwXzP5vCahgQ9FKVC7KPt+BP
Lf3LgdBrfSP5A7f7FrtkkPmP7pl1j6/Bq3+PhCur9XimtRIsvTOx7m7nuvsY4yHC
/wmBXFZgqE5DGyzpHXz1az8JHWw2AesP9L2f536BhfvRtdXaoOxPtZ/rmO1lfcMV
fWMa9f1em8lXwCiD1dR8UkBrIxItty+qqPffu2S/DlEepbiZrCg1gD827Fy7Mm3n
5rvrzYMOY2YK0yW1jtm+w3NlPlmG91BDUTP8tEcDxrTOIXezwqJf7fw8qIgGIy7W
3WzuBfgSvpT977ByMsB0YYugo2Xie+R1jpOWt7tv6KHM4varNBu0WpVhQhrKQr5o
agJiuvzsf3b+64oP
=935P
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config
is really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang
as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
* tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
kbuild: Don't remove link-vmlinux temporary files on exit/signal
kbuild: remove the unneeded comments for external module builds
kbuild: make distclean remove tag files in sub-directories
kbuild: make distclean work against $(objtree) instead of $(srctree)
kbuild: refactor modname-multi by using suffix-search
kbuild: refactor fdtoverlay rule
kbuild: parameterize the .o part of suffix-search
arch: use cross_compiling to check whether it is a cross build or not
kbuild: remove ARCH=sh64 support from top Makefile
.gitignore: prefix local generated files with a slash
kbuild: replace LANG=C with LC_ALL=C
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
kbuild: add a script to remove stale generated files
kbuild: update config_data.gz only when the content of .config is changed
.gitignore: ignore only top-level modules.builtin
.gitignore: move tags and TAGS close to other tag files
kernel/.gitgnore: remove stale timeconst.h and hz.bc
usr/include: refactor .gitignore
genksyms: fix stale comment
...
- Remove the nvlink support now that it's only user has been removed.
- Enable huge vmalloc mappings for Radix MMU (P9).
- Fix KVM conversion to gfn-based MMU notifier callbacks.
- Fix a kexec/kdump crash with hot plugged CPUs.
- Fix boot failure on 32-bit with CONFIG_STACKPROTECTOR.
- Restore alphabetic order of the selects under CONFIG_PPC.
Thanks to: Christophe Leroy, Christoph Hellwig, Nicholas Piggin, Sandipan Das, Sourabh Jain.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmCWjQsTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAeMD/0V0eVXtpYiPHZpEaRJS8H/ANInBdMz
+PxgCkeX97e+eiufq6IMJO5H5tbWdVIMGRTrhUipZG25M88pU4toRyZwdtdtW2x8
/BjRnmaOAyoTx2LbDTn5qJQmee3jHIVBZINKpVk+vVXwbs6Wmc+qJTVTwvOOPVZm
vgJzQL08JaA6F/y1CLCFnw+GXMdN/e0Nkrz89wu3YHCCPkHi0mZkQrKaEb7QRvU4
s9wbhBz0caFRNsGSXn6kUduBjOLZ/PZ70Dr1HOX4JF5F4gPgH/sjAmUscrKo9s2b
x4/LH0bxQ6zsjNOknBWtPuDOEEm1D6VGPSf/JUdhuOoErJHJBiMqWCljQhSwsHW7
vEB4TGY5mMpoGB4RhSSdl6VgnlwCAsntUx+3PidSSLTGmkUrj+M+oH97NGY1/UD4
JwuiAtdM66KNx7Xmoa8R9mNxS09hjWcCCUWVe0ezZ/6L+4HMgiozWg9xdhmfD2ec
NB4ibqzYxSznvjzljmzrIg8nZSf7KSFxPVs6ivB90X8fQoBsq8H021n0wbVut9+W
+75XKQnuCnfLFtHDZQjw+JxiDZzml5ccEBcwA59t09se34L6oG2k9n7xq01v/Qii
sg83LcB+gmlcRR3DjzLpwDuoUovAc9QGABXFRWuKVc4mc7guXI412HbNKEoHgzqn
je3PDrLwILElOQ==
=2P8l
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates and fixes from Michael Ellerman:
"A bit of a mixture of things, tying up some loose ends.
There's the removal of the nvlink code, which dependend on a commit in
the vfio tree. Then the enablement of huge vmalloc which was in next
for a few weeks but got dropped due to conflicts. And there's also a
few fixes.
Summary:
- Remove the nvlink support now that it's only user has been removed.
- Enable huge vmalloc mappings for Radix MMU (P9).
- Fix KVM conversion to gfn-based MMU notifier callbacks.
- Fix a kexec/kdump crash with hot plugged CPUs.
- Fix boot failure on 32-bit with CONFIG_STACKPROTECTOR.
- Restore alphabetic order of the selects under CONFIG_PPC.
Thanks to: Christophe Leroy, Christoph Hellwig, Nicholas Piggin,
Sandipan Das, and Sourabh Jain"
* tag 'powerpc-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
powerpc/kconfig: Restore alphabetic order of the selects under CONFIG_PPC
powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
powerpc/powernv/memtrace: Fix dcache flushing
powerpc/kexec_file: Use current CPU info while setting up FDT
powerpc/64s/radix: Enable huge vmalloc mappings
powerpc/powernv: remove the nvlink support
and netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to
avoid false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmCV3YoACgkQMUZtbf5S
IrsQ2w//Q8/qbl6wGTKUfu6DZHYUU5j5sTwiHR823PKKSgXI+okWMN0KUlZszOsz
qnPkH6GuojRooOE1s8PFLSlt9axKhQ0y7uzMTrWYafQ+JZTtgg9/MiPxQ8fdiE5i
uOG1ngttZ+1jlE5tMPL4GAOSegg3rWVDclzqnJTdsPPOco3MWj6SL9xN0LDPxCEL
BDysRqL/UiOIoh4v6IXQRx2UWjsNGu4biM1po+Jfumnd9T0zKoEpzu6UN6yPShbx
284LihZSQtughCbhGqkErBOxfjZcvpFOQrqmjEvI+Z/eYg4InfWZemt8Sa92/alE
yAFjK76MUTaUxaAO/gk8XauhvkYOzJJwKpqhbOmlaM7oj55QdzT5/8JxMxVoA6hV
pscHOixk15GVse49PdPV8v47cyTLc/Xi69i+/uUdNVVfuORL1wft1w1xbd0S6Pbe
7Gqax21S7zxcDsrUli7cFheYiqtbQAL0anlIUz8tUOZFz0VQ/zPuFd4rUYZ/o38V
Mrevdk3t6CXNxS4CRXyUW4UejYB1O6Qw12sUue31e3h73d6LiN3NAiN5Qp7SEk1/
fvk+jfOf8vvmtimYvcUK2i0D+vqj4Ec/qRIE/XXuUDBcp22tPL9uWMfWavwTdAj1
Se4SzksTWF+NM0lO0ItonMyPh3ZXcSLhIv/gHrZwEKuWkXCGO4M=
=JmWS
-----END PGP SIGNATURE-----
Merge tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc1, including fixes from bpf, can and
netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors"
* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
atm: firestream: Use fallthrough pseudo-keyword
net: stmmac: Do not enable RX FIFO overflow interrupts
mptcp: fix splat when closing unaccepted socket
i40e: Remove LLDP frame filters
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix broken XDP support
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nftables: avoid overflows in nft_hash_buckets()
tcp: Specify cmsgbuf is user pointer for receive zerocopy.
mlxsw: spectrum_mr: Update egress RIF list before route's action
net: ipa: fix inter-EE IRQ register definitions
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
can: mcp251x: fix resume from sleep before interface was brought up
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
netfilter: nftables: Fix a memleak from userdata error path in new objects
netfilter: remove BUG_ON() after skb_header_pointer()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
...