linux

Author	SHA1	Message	Date
Mauricio Vásquez	704c91e59f	selftests/bpf: Test "bpftool gen min_core_btf" This commit reuses the core_reloc test to check if the BTF files generated with "bpftool gen min_core_btf" are correct. This introduces test_core_btfgen() that runs all the core_reloc tests, but this time the source BTF files are generated by using "bpftool gen min_core_btf". The goal of this test is to check that the generated files are usable, and not to check if the algorithm is creating an optimized BTF file. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-8-mauricio@kinvolk.io	2022-02-16 10:14:34 -08:00
Rafael David Tinoco	1d1ffbf7f0	bpftool: Gen min_core_btf explanation and examples Add "min_core_btf" feature explanation and one example of how to use it to bpftool-gen man page. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-7-mauricio@kinvolk.io	2022-02-16 10:13:21 -08:00
Mauricio Vásquez	dc695516b6	bpftool: Implement btfgen_get_btf() The last part of the BTFGen algorithm is to create a new BTF object with all the types that were recorded in the previous steps. This function performs two different steps: 1. Add the types to the new BTF object by using btf__add_type(). Some special logic around struct and unions is implemented to only add the members that are really used in the field-based relocations. The type ID on the new and old BTF objects is stored on a map. 2. Fix all the type IDs on the new BTF object by using the IDs saved in the previous step. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-6-mauricio@kinvolk.io	2022-02-16 10:10:42 -08:00
Mauricio Vásquez	a9caaba399	bpftool: Implement "gen min_core_btf" logic This commit implements the logic for the gen min_core_btf command. Specifically, it implements the following functions: - minimize_btf(): receives the path of a source and destination BTF files and a list of BPF objects. This function records the relocations for all objects and then generates the BTF file by calling btfgen_get_btf() (implemented in the following commit). - btfgen_record_obj(): loads the BTF and BTF.ext sections of the BPF objects and loops through all CO-RE relocations. It uses bpf_core_calc_relo_insn() from libbpf and passes the target spec to btfgen_record_reloc(), that calls one of the following functions depending on the relocation kind. - btfgen_record_field_relo(): uses the target specification to mark all the types that are involved in a field-based CO-RE relocation. In this case types resolved and marked recursively using btfgen_mark_type(). Only the struct and union members (and their types) involved in the relocation are marked to optimize the size of the generated BTF file. - btfgen_record_type_relo(): marks the types involved in a type-based CO-RE relocation. In this case no members for the struct and union types are marked as libbpf doesn't use them while performing this kind of relocation. Pointed types are marked as they are used by libbpf in this case. - btfgen_record_enumval_relo(): marks the whole enum type for enum-based relocations. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-5-mauricio@kinvolk.io	2022-02-16 10:05:45 -08:00
Mauricio Vásquez	0a9f4a20c6	bpftool: Add gen min_core_btf command This command is implemented under the "gen" command in bpftool and the syntax is the following: $ bpftool gen min_core_btf INPUT OUTPUT OBJECT [OBJECT...] INPUT is the file that contains all the BTF types for a kernel and OUTPUT is the path of the minimize BTF file that will be created with only the types needed by the objects. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-4-mauricio@kinvolk.io	2022-02-16 10:05:45 -08:00
Mauricio Vásquez	8de6cae40b	libbpf: Expose bpf_core_{add,free}_cands() to bpftool Expose bpf_core_add_cands() and bpf_core_free_cands() to handle candidates list. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-3-mauricio@kinvolk.io	2022-02-16 10:05:45 -08:00
Mauricio Vásquez	adb8fa195e	libbpf: Split bpf_core_apply_relo() BTFGen needs to run the core relocation logic in order to understand what are the types involved in a given relocation. Currently bpf_core_apply_relo() calculates and applies a relocation to an instruction. Having both operations in the same function makes it difficult to only calculate the relocation without patching the instruction. This commit splits that logic in two different phases: (1) calculate the relocation and (2) patch the instruction. For the first phase bpf_core_apply_relo() is renamed to bpf_core_calc_relo_insn() who is now only on charge of calculating the relocation, the second phase uses the already existing bpf_core_patch_insn(). bpf_object__relocate_core() uses both of them and the BTFGen will use only bpf_core_calc_relo_insn(). Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-2-mauricio@kinvolk.io	2022-02-16 10:05:42 -08:00
Hou Tao	8cbf062a25	bpf: Reject kfunc calls that overflow insn->imm Now kfunc call uses s32 to represent the offset between the address of kfunc and __bpf_call_base, but it doesn't check whether or not s32 will be overflowed. The overflow is possible when kfunc is in module and the offset between module and kernel is greater than 2GB. Take arm64 as an example, before commit `b2eed9b588` ("arm64/kernel: kaslr: reduce module randomization range to 2 GB"), the offset between module symbol and __bpf_call_base will in 4GB range due to KASLR and may overflow s32. So add an extra checking to reject these invalid kfunc calls. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220215065732.3179408-1-houtao1@huawei.com	2022-02-15 10:05:11 -08:00
Alexei Starovoitov	d2b94f33e4	Merge branch 'Make BPF skeleton easier to use from C++ code' Andrii Nakryiko says: ==================== Add minimal C++-specific additions to BPF skeleton codegen to facilitate easier use of C skeletons in C++ applications. These additions don't add any extra ongoing maintenance and allows C++ users to fit pure C skeleton better into their C++ code base. All that without the need to design, implement and support a separate C++ BPF skeleton implementation. v1->v2: - use default argument values in T::open() (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-02-15 09:59:02 -08:00
Andrii Nakryiko	189e0ecabc	selftests/bpf: Add Skeleton templated wrapper as an example Add an example of how to build C++ template-based BPF skeleton wrapper. It's an actually runnable valid use of skeleton through more C++-like interface. Note that skeleton destuction happens implicitly through Skeleton<T>'s destructor. Also make test_cpp runnable as it would have crashed on invalid btf passed into btf_dump__new(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220212055733.539056-3-andrii@kernel.org	2022-02-15 09:59:01 -08:00
Andrii Nakryiko	bb8ffe61ea	bpftool: Add C++-specific open/load/etc skeleton wrappers Add C++-specific static methods for code-generated BPF skeleton for each skeleton operation: open, open_opts, open_and_load, load, attach, detach, destroy, and elf_bytes. This is to facilitate easier C++ templating on top of pure C BPF skeleton. In C, open/load/destroy/etc "methods" are of the form <skeleton_name>__<method>() to avoid name collision with similar "methods" of other skeletons withint the same application. This works well, but is very inconvenient for C++ applications that would like to write generic (templated) wrappers around BPF skeleton to fit in with C++ code base and take advantage of destructors and other convenient C++ constructs. This patch makes it easier to build such generic templated wrappers by additionally defining C++ static methods for skeleton's struct with fixed names. This allows to refer to, say, open method as `T::open()` instead of having to somehow generate `T__open()` function call. Next patch adds an example template to test_cpp selftest to demonstrate how it's possible to have all the operations wrapped in a generic Skeleton<my_skeleton> type without explicitly passing function references. An example of generated declaration section without %1$s placeholders: #ifdef __cplusplus static struct test_attach_probe open(const struct bpf_object_open_opts opts = nullptr); static struct test_attach_probe open_and_load(); static int load(struct test_attach_probe skel); static int attach(struct test_attach_probe skel); static void detach(struct test_attach_probe skel); static void destroy(struct test_attach_probe skel); static const void elf_bytes(size_t sz); #endif / __cplusplus */ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220212055733.539056-2-andrii@kernel.org	2022-02-15 09:59:01 -08:00
Andrii Nakryiko	d3b0b80064	selftests/bpf: Fix GCC11 compiler warnings in -O2 mode When compiling selftests in -O2 mode with GCC1, we get three new compilations warnings about potentially uninitialized variables. Compiler is wrong 2 out of 3 times, but this patch makes GCC11 happy anyways, as it doesn't cost us anything and makes optimized selftests build less annoying. The amazing one is tc_redirect case of token that is malloc()'ed before ASSERT_OK_PTR() check is done on it. Seems like GCC pessimistically assumes that libbpf_get_error() will dereference the contents of the pointer (no it won't), so the only way I found to shut GCC up was to do zero-initializaing calloc(). This one was new to me. For linfo case, GCC didn't realize that linfo_size will be initialized by the function that is returning linfo_size as out parameter. core_reloc.c case was a real bug, we can goto cleanup before initializing obj. But we don't need to do any clean up, so just continue iteration intstead. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211190927.1434329-1-andrii@kernel.org	2022-02-15 09:58:34 -08:00
Yinjun Zhang	edc21dc909	bpftool: Fix the error when lookup in no-btf maps When reworking btf__get_from_id() in commit `a19f93cfaf` the error handling when calling bpf_btf_get_fd_by_id() changed. Before the rework if bpf_btf_get_fd_by_id() failed the error would not be propagated to callers of btf__get_from_id(), after the rework it is. This lead to a change in behavior in print_key_value() that now prints an error when trying to lookup keys in maps with no btf available. Fix this by following the way used in dumping maps to allow to look up keys in no-btf maps, by which it decides whether and where to get the btf info according to the btf value type. Fixes: `a19f93cfaf` ("libbpf: Add internal helper to load BTF data by FD") Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/1644249625-22479-1-git-send-email-yinjun.zhang@corigine.com	2022-02-15 09:03:40 -08:00
Toke Høiland-Jørgensen	9c3de619e1	libbpf: Use dynamically allocated buffer when receiving netlink messages When receiving netlink messages, libbpf was using a statically allocated stack buffer of 4k bytes. This happened to work fine on systems with a 4k page size, but on systems with larger page sizes it can lead to truncated messages. The user-visible impact of this was that libbpf would insist no XDP program was attached to some interfaces because that bit of the netlink message got chopped off. Fix this by switching to a dynamically allocated buffer; we borrow the approach from iproute2 of using recvmsg() with MSG_PEEK\|MSG_TRUNC to get the actual size of the pending message before receiving it, adjusting the buffer as necessary. While we're at it, also add retries on interrupted system calls around the recvmsg() call. v2: - Move peek logic to libbpf_netlink_recv(), don't double free on ENOMEM. Fixes: `8bbb77b7c7` ("libbpf: Add various netlink helpers") Reported-by: Zhiqian Guan <zhguan@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20220211234819.612288-1-toke@redhat.com	2022-02-12 07:57:44 -08:00
Andrii Nakryiko	d130e954a0	libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0 Ensure that LIBBPF_0.7.0 inherits everything from LIBBPF_0.6.0. Fixes: `dbdd2c7f8c` ("libbpf: Add API to get/set log_level at per-program level") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211205235.2089104-1-andrii@kernel.org	2022-02-11 12:59:28 -08:00
Andrii Nakryiko	4407fa06ae	Merge branch 'bpftool: Switch to new versioning scheme (align on libbpf's)' Quentin Monnet says: ==================== Hi, this set aims at updating the way bpftool versions are numbered. Instead of copying the version from the kernel (given that the sources for the kernel and bpftool are shipped together), align it on libbpf's version number, with a fixed offset (6) to avoid going backwards. Please refer to the description of the second commit for details on the motivations. The patchset also adds the number of the version of libbpf that was used to compile to the output of "bpftool version". Bpftool makes such a heavy usage of libbpf that it makes sense to indicate what version was used to build it. v3: - Compute bpftool's version at compile time, but from the macros exposed by libbpf instead of calling a shell to compute $(BPFTOOL_VERSION) in the Makefile. - Drop the commit which would add a "libbpfversion" target to libbpf's Makefile. This is no longer necessary. - Use libbpf's major, minor versions with jsonw_printf() to avoid offsetting the version string to skip the "v" prefix. - Reword documentation change. v2: - Align on libbpf's version number instead of creating an independent versioning scheme. - Use libbpf_version_string() to retrieve and display libbpf's version. - Re-order patches (1 <-> 2). ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2022-02-10 21:09:47 -08:00
Quentin Monnet	9910a74d6e	bpftool: Update versioning scheme, align on libbpf's version number Since the notion of versions was introduced for bpftool, it has been following the version number of the kernel (using the version number corresponding to the tree in which bpftool's sources are located). The rationale was that bpftool's features are loosely tied to BPF features in the kernel, and that we could defer versioning to the kernel repository itself. But this versioning scheme is confusing today, because a bpftool binary should be able to work with both older and newer kernels, even if some of its recent features won't be available on older systems. Furthermore, if bpftool is ported to other systems in the future, keeping a Linux-based version number is not a good option. Looking at other options, we could either have a totally independent scheme for bpftool, or we could align it on libbpf's version number (with an offset on the major version number, to avoid going backwards). The latter comes with a few drawbacks: - We may want bpftool releases in-between two libbpf versions. We can always append pre-release numbers to distinguish versions, although those won't look as "official" as something with a proper release number. But at the same time, having bpftool with version numbers that look "official" hasn't really been an issue so far. - If no new feature lands in bpftool for some time, we may move from e.g. 6.7.0 to 6.8.0 when libbpf levels up and have two different versions which are in fact the same. - Following libbpf's versioning scheme sounds better than kernel's, but ultimately it doesn't make too much sense either, because even though bpftool uses the lib a lot, its behaviour is not that much conditioned by the internal evolution of the library (or by new APIs that it may not use). Having an independent versioning scheme solves the above, but at the cost of heavier maintenance. Developers will likely forget to increase the numbers when adding features or bug fixes, and we would take the risk of having to send occasional "catch-up" patches just to update the version number. Based on these considerations, this patch aligns bpftool's version number on libbpf's. This is not a perfect solution, but 1) it's certainly an improvement over the current scheme, 2) the issues raised above are all minor at the moment, and 3) we can still move to an independent scheme in the future if we realise we need it. Given that libbpf is currently at version 0.7.0, and bpftool, before this patch, was at 5.16, we use an offset of 6 for the major version, bumping bpftool to 6.7.0. Libbpf does not export its patch number; leave bpftool's patch number at 0 for now. It remains possible to manually override the version number by setting BPFTOOL_VERSION when calling make. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220210104237.11649-3-quentin@isovalent.com	2022-02-10 21:09:47 -08:00
Quentin Monnet	61fce9693f	bpftool: Add libbpf's version number to "bpftool version" output To help users check what version of libbpf is being used with bpftool, print the number along with bpftool's own version number. Output: $ ./bpftool version ./bpftool v5.16.0 using libbpf v0.7 features: libbfd, libbpf_strict, skeletons $ ./bpftool version --json --pretty { "version": "5.16.0", "libbpf_version": "0.7", "features": { "libbfd": true, "libbpf_strict": true, "skeletons": true } } Note that libbpf does not expose its patch number. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220210104237.11649-2-quentin@isovalent.com	2022-02-10 21:09:47 -08:00
Song Liu	4cc0991abd	bpf: Fix bpf_prog_pack build for ppc64_defconfig bpf_prog_pack causes build error with powerpc ppc64_defconfig: kernel/bpf/core.c:830:23: error: variably modified 'bitmap' at file scope 830 \| unsigned long bitmap[BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)]; \| ^~~~~~ This is because the marco expands as: unsigned long bitmap[((((((1UL) << (16 + __pte_index_size)) / (1 << 6))) \ + ((sizeof(long) * 8)) - 1) / ((sizeof(long) * 8)))]; where __pte_index_size is a global variable. Fix it by turning bitmap into a 0-length array. Fixes: `57631054fa` ("bpf: Introduce bpf_prog_pack allocator") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211024939.2962537-1-song@kernel.org	2022-02-10 18:59:32 -08:00
Lorenzo Bianconi	a5a358abbc	selftest/bpf: Check invalid length in test_xdp_update_frags Update test_xdp_update_frags adding a test for a buffer size set to (MAX_SKB_FRAGS + 2) * PAGE_SIZE. The kernel is supposed to return -ENOMEM. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/3e4afa0ee4976854b2f0296998fe6754a80b62e5.1644366736.git.lorenzo@kernel.org	2022-02-10 17:48:30 -08:00
Daniel Borkmann	85fbd23303	Merge branch 'bpf-light-skel' Alexei Starovoitov says: ==================== The libbpf performs a set of complex operations to load BPF programs. With "loader program" and "CO-RE in the kernel" the loading job of libbpf was diminished. The light skeleton became lean enough to perform program loading and map creation tasks without libbpf. It's now possible to tweak it further to make light skeleton usable out of user space and out of kernel module. This allows bpf_preload.ko to drop user-mode-driver usage, drop host compiler dependency, allow cross compilation and simplify the code. It's a building block toward safe and portable kernel modules. v3->v4: - inlined skel_prep_init_value() as direct assignment in lskel v2->v3: - dropped vm_mmap() and switched to bpf_loader_ctx->flags & KERNEL approach. It allows bpf_preload.ko to be built-in. The kernel is able to load bpf progs before init process starts. - added comments (Yonghong's review) - added error checks in lskel (Andrii's review) - added Acks in all but 2nd patch. v1->v2: - removed redundant anon struct and added comments (Andrii's reivew) - added Yonghong's ack - fixed build warning when JIT is off ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2022-02-10 23:32:14 +01:00
Alexei Starovoitov	cb80ddc671	bpf: Convert bpf_preload.ko to use light skeleton. The main change is a move of the single line #include "iterators.lskel.h" from iterators/iterators.c to bpf_preload_kern.c. Which means that generated light skeleton can be used from user space or user mode driver like iterators.c or from the kernel module or the kernel itself. The direct use of light skeleton from the kernel module simplifies the code, since UMD is no longer necessary. The libbpf.a required user space and UMD. The CO-RE in the kernel and generated "loader bpf program" used by the light skeleton are capable to perform complex loading operations traditionally provided by libbpf. In addition UMD approach was launching UMD process every time bpffs has to be mounted. With light skeleton in the kernel the bpf_preload kernel module loads bpf iterators once and pins them multiple times into different bpffs mounts. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-6-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
Alexei Starovoitov	d7beb3d6ab	bpf: Update iterators.lskel.h. Light skeleton and skel_internal.h have changed. Update iterators.lskel.h. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-5-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
Alexei Starovoitov	28d743f671	bpftool: Generalize light skeleton generation. Generealize light skeleton by hiding mmap details in skel_internal.h In this form generated lskel.h is usable both by user space and by the kernel. Note that previously #include <bpf/bpf.h> was in *.lskel.h file. To avoid #ifdef-s in a generated lskel.h the include of bpf.h is moved to skel_internal.h, but skel_internal.h is also used by gen_loader.c which is part of libbpf. Therefore skel_internal.h does #include "bpf.h" in case of user space, so gen_loader.c and lskel.h have necessary definitions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-4-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
Alexei Starovoitov	6fe65f1b4d	libbpf: Prepare light skeleton for the kernel. Prepare light skeleton to be used in the kernel module and in the user space. The look and feel of lskel.h is mostly the same with the difference that for user space the skel->rodata is the same pointer before and after skel_load operation, while in the kernel the skel->rodata after skel_open and the skel->rodata after skel_load are different pointers. Typical usage of skeleton remains the same for kernel and user space: skel = my_bpf__open(); skel->rodata->my_global_var = init_val; err = my_bpf__load(skel); err = my_bpf__attach(skel); // access skel->rodata->my_global_var; // access skel->bss->another_var; Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-3-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
Alexei Starovoitov	b1d18a7574	bpf: Extend sys_bpf commands for bpf_syscall programs. bpf_sycall programs can be used directly by the kernel modules to load programs and create maps via kernel skeleton. . Export bpf_sys_bpf syscall wrapper to be used in kernel skeleton. . Export bpf_map_get to be used in kernel skeleton. . Allow prog_run cmd for bpf_syscall programs with recursion check. . Enable link_create and raw_tp_open cmds. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-2-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
kernel test robot	4f5e483b8c	net: dsa: qca8k: fix noderef.cocci warnings drivers/net/dsa/qca8k.c:422:37-43: ERROR: application of sizeof to pointer sizeof when applied to a pointer typed expression gives the size of the pointer Generated by: scripts/coccinelle/misc/noderef.cocci Fixes: `90386223f4` ("net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet") CC: Ansuel Smith <ansuelsmth@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220209221304.GA17529@d2214a582157 Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-10 10:56:00 -08:00
Minghao Chi (CGEL ZTE)	d8c2858181	net/switchdev: use struct_size over open coded arithmetic Replace zero-length array with flexible-array member and make use of the struct_size() helper in kmalloc(). For example: struct switchdev_deferred_item { ... unsigned long data[]; }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:37:47 +00:00
Guillaume Nault	dc513a405c	ipv4: Reject again rules with high DSCP values Commit `563f8e97e0` ("ipv4: Stop taking ECN bits into account in fib4-rules") replaced the validation test on frh->tos. While the new test is stricter for ECN bits, it doesn't detect the use of high order DSCP bits. This would be fine if IPv4 could properly handle them. But currently, most IPv4 lookups are done with the three high DSCP bits masked. Therefore, using these bits doesn't lead to the expected result. Let's reject such configurations again, so that nobody starts to use and make any assumption about how the stack handles the three high order DSCP bits in fib4 rules. Fixes: `563f8e97e0` ("ipv4: Stop taking ECN bits into account in fib4-rules") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:33:33 +00:00
Subbaraya Sundeep	4b0385bc8e	octeontx2-pf: Add TC feature for VFs This patch adds TC feature for VFs also. When MCAM rules are allocated for a VF then either TC or ntuple filters can be used. Below are the commands to use TC feature for a VF(say lbk0): devlink dev param set pci/0002:01:00.1 name mcam_count value 16 \ cmode runtime ethtool -K lbk0 hw-tc-offload on ifconfig lbk0 up tc qdisc add dev lbk0 ingress tc filter add dev lbk0 parent ffff: protocol ip flower skip_sw \ dst_mac 98:03:9b:83:aa:12 action police rate 100Mbit burst 5000 Also to modify any fields of the hardware context with NIX_AQ_INSTOP_WRITE command then corresponding masks of those fields must be set as per hardware. This was missing in ingress ratelimiting context. This patch sets those masks also. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:32:54 +00:00
Eric Dumazet	ede6c39c4f	net: make net->dev_unreg_count atomic Having to acquire rtnl from netdev_run_todo() for every dismantled device is not desirable when/if rtnl is under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:30:26 +00:00
Venkata Sudheer Kumar Bhavaraju	ca2d5f1ff0	qed: prevent a fw assert during device shutdown Device firmware can assert if the device shutdown path in driver encounters an async. events from mfw (processed in qed_mcp_handle_events()) after qed_mcp_unload_req() returns. A call to qed_mcp_unload_req() currently marks the device as inactive and thus stops any new events, but there is a windows where in-flight events might still be received by the driver. To prevent this race condition, atomically set QED_MCP_BYPASS_PROC_BIT in qed_mcp_unload_req() to make sure qed_mcp_handle_events() ignores all events. Wait for any event that might already be in-process to complete by monitoring QED_MCP_IN_PROCESSING_BIT. Signed-off-by: Pravin Kumar Ganesh Dhende <pdhende@marvell.com> Signed-off-by: Venkata Sudheer Kumar Bhavaraju <vbhavaraju@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:27:44 +00:00
David S. Miller	57ea56b05b	Merge branch 'ping6-cmsg' Jakub Kicinski says: ==================== net: ping6: support basic socket cmsgs Add support for common SOL_SOCKET cmsgs in ICMPv6 sockets. Extend the cmsg tests to cover more cmsgs and socket types. SOL_IPV6 cmsgs to follow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:52 +00:00
Jakub Kicinski	af6ca20591	selftests: net: test standard socket cmsgs across UDP and ICMP sockets Test TIMESTAMPING and TXTIME across UDP / ICMP and IP versions. Before ICMPv6 support: # ./tools/testing/selftests/net/cmsg_time.sh Case ICMPv6 - ts cnt returned '0', expected '2' Case ICMPv6 - ts0 SCHED returned '', expected 'OK' Case ICMPv6 - ts0 SND returned '', expected 'OK' Case ICMPv6 - TXTIME abs returned '', expected 'OK' Case ICMPv6 - TXTIME rel returned '', expected 'OK' FAIL - 5/36 cases failed After: # ./tools/testing/selftests/net/cmsg_time.sh OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:52 +00:00
Jakub Kicinski	eb8f3116fb	selftests: net: cmsg_sender: support Tx timestamping Support requesting Tx timestamps: $ ./cmsg_sender -p i -t -4 $tgt 123 -d 1000 SCHED ts0 61us SND ts0 1071us Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:52 +00:00
Jakub Kicinski	4d397424a5	selftests: net: cmsg_sender: support setting SO_TXTIME Add ability to send delayed packets. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:52 +00:00
Jakub Kicinski	9bbfbc92c6	selftests: net: cmsg_so_mark: test with SO_MARK set by setsockopt Test if setting SO_MARK with setsockopt works and if cmsg takes precedence over it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	0344488e11	selftests: net: cmsg_so_mark: test ICMP and RAW sockets Use new capabilities of cmsg_sender to test ICMP and RAW sockets, previously only UDP was tested. Before SO_MARK support was added to ICMPv6: # ./cmsg_so_mark.sh Case ICMP rejection returned 0, expected 1 FAIL - 1/12 cases failed After: # ./cmsg_so_mark.sh OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	de17e305a8	selftests: net: cmsg_sender: support icmp and raw sockets Support sending fake ICMP(v6) messages and UDP via RAW sockets. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	49b7861302	selftests: net: make cmsg_so_mark ready for more options Parametrize the code so that it can support UDP and ICMP sockets in the future, and more cmsg types. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	a086ee24cc	selftests: net: rename cmsg_so_mark Rename the file in prep for generalization. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	3ebb0b1032	net: ping6: support setting socket options via cmsg Minor reordering of the code and a call to sock_cmsg_send() gives us support for setting the common socket options via cmsg (the usual ones - SO_MARK, SO_TIMESTAMPING_OLD, SCM_TXTIME). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	e7b060460f	net: ping6: support packet timestamping Nothing prevents the user from requesting timestamping on ping6 sockets, yet timestamps are not going to be reported. Plumb the flags through. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
Jakub Kicinski	4265223946	net: ping6: remove a pr_debug() statement We have ftrace and BPF today, there's no need for printing arguments at the start of a function. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:04:51 +00:00
David S. Miller	9557167bc6	Merge tag 'ieee802154-for-davem-2022-02-10' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2022-02-10 An update from ieee802154 for your net-next tree. There is more ongoing in ieee802154 than usual. This will be the first pull request for this cycle, but I expect one more. Depending on review and rework times. Pavel Skripkin ported the atusb driver over to the new USB api to avoid unint problems as well as making use of the modern api without kmalloc() needs in he driver. Miquel Raynal landed some changes to ensure proper frame checksum checking with hwsim, documenting our use of wake and stop_queue and eliding a magic value by using the proper define. David Girault documented the address struct used in ieee802154. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 14:28:04 +00:00
David S. Miller	adc27288f2	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-02-09 This series contains updates to ice driver only. Brett adds support for QinQ. This begins with code refactoring and re-organization of VLAN configuration functions to allow for introduction of VSI VLAN ops to enable setting and calling of respective operations based on device support of single or double VLANs. Implementations are added for outer VLAN support. To support QinQ, the device must be set to double VLAN mode (DVM). In order for this to occur, the DDP package and NVM must also support DVM. Functions to determine compatibility and properly configure the device are added as well as setting the proper bits to advertise and utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 is also included to allow for VF to negotiate and utilize this functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 11:00:13 +00:00
Jakub Kicinski	4523082982	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Conntrack sets on CHECKSUM_UNNECESSARY for UDP packet with no checksum, from Kevin Mitchell. 2) skb->priority support for nfqueue, from Nicolas Dichtel. 3) Remove conntrack extension register API, from Florian Westphal. 4) Move nat destroy hook to nf_nat_hook instead, to remove nf_ct_ext_destroy(), also from Florian. 5) Wrap pptp conntrack NAT hooks into single structure, from Florian Westphal. 6) Support for tcp option set to noop for nf_tables, also from Florian. 7) Do not run x_tables comment match from packet path in nf_tables, from Florian Westphal. 8) Replace spinlock by cmpxchg() loop to update missed ct event, from Florian Westphal. 9) Wrap cttimeout hooks into single structure, from Florian. 10) Add fast nft_cmp expression for up to 16-bytes. 11) Use cb->ctx to store context in ctnetlink dump, instead of using cb->args[], from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: ctnetlink: use dump structure instead of raw args nfqueue: enable to set skb->priority netfilter: nft_cmp: optimize comparison for 16-bytes netfilter: cttimeout: use option structure netfilter: ecache: don't use nf_conn spinlock netfilter: nft_compat: suppress comment match netfilter: exthdr: add support for tcp option removal netfilter: conntrack: pptp: use single option structure netfilter: conntrack: remove extension register api netfilter: conntrack: handle ->destroy hook via nat_ops instead netfilter: conntrack: move extension sizes into core netfilter: conntrack: make all extensions 8-byte alignned netfilter: nfqueue: enable to get skb->priority netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY ==================== Link: https://lore.kernel.org/r/20220209133616.165104-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 21:35:08 -08:00
Sebastian Andrzej Siewior	4f9bf2a2f5	tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. Commit `9652dc2eb9` ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket:🔒 local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock); Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock); tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock) Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 21:28:36 -08:00
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Menglong Dong	5cad527d5f	net: drop_monitor: support drop reason In the commit `c504e5c2f9` ("net: skb: introduce kfree_skb_reason()") drop reason is introduced to the tracepoint of kfree_skb. Therefore, drop_monitor is able to report the drop reason to users by netlink. The drop reasons are reported as string to users, which is exactly the same as what we do when reporting it to ftrace. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220209060838.55513-1-imagedong@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 17:25:57 -08:00

1 2 3 4 5 ...

1073530 Commits