linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-07 19:41:31 +00:00

Author	SHA1	Message	Date
Yonghong Song	3144bfa507	bpf: Fix a compilation failure with clang lto build When building the kernel with clang lto (CONFIG_LTO_CLANG_FULL=y), the following compilation error will appear: $ make LLVM=1 LLVM_IAS=1 -j ... ld.lld: error: ld-temp.o <inline asm>:26889:1: symbol 'cgroup_storage_map_btf_ids' is already defined cgroup_storage_map_btf_ids:; ^ make[1]: *** [/.../bpf-next/scripts/Makefile.vmlinux_o:61: vmlinux.o] Error 1 In local_storage.c, we have BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct, bpf_local_storage_map) Commit `c4bcfb38a9` ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") added the above identical BTF_ID_LIST_SINGLE definition in bpf_cgrp_storage.c. With duplicated definitions, llvm linker complains with lto build. Also, extracting btf_id of 'struct bpf_local_storage_map' is defined four times for sk, inode, task and cgrp local storages. Let us define a single global one with a different name than cgroup_storage_map_btf_ids, which also fixed the lto compilation error. Fixes: `c4bcfb38a9` ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221130052147.1591625-1-yhs@fb.com	2022-11-30 17:13:25 -08:00
Pengcheng Yang	89903dcb3c	selftests/bpf: Add ingress tests for txmsg with apply_bytes Currently, the ingress redirect is not covered in "txmsg test apply". Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-5-git-send-email-yangpc@wangsu.com	2022-12-01 01:07:41 +01:00
Pengcheng Yang	9072931f02	bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect Use apply_bytes on ingress redirect, when apply_bytes is less than the length of msg data, some data may be skipped and lost in bpf_tcp_ingress(). If there is still data in the scatterlist that has not been consumed, we cannot move the msg iter. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-4-git-send-email-yangpc@wangsu.com	2022-12-01 01:07:36 +01:00
Pengcheng Yang	a351d6087b	bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytes When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS flag from the msg->flags. If apply_bytes is used and it is larger than the current data being processed, sk_psock_msg_verdict() will not be called when sendmsg() is called again. At this time, the msg->flags is 0, and we lost the BPF_F_INGRESS flag. So we need to save the BPF_F_INGRESS flag in sk_psock and use it when redirection. Fixes: `8934ce2fd0` ("bpf: sockmap redirect ingress support") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-3-git-send-email-yangpc@wangsu.com	2022-12-01 01:07:32 +01:00
Pengcheng Yang	7a9841ca02	bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data In tcp_bpf_send_verdict() redirection, the eval variable is assigned to __SK_REDIRECT after the apply_bytes data is sent, if msg has more_data, sock_put() will be called multiple times. We should reset the eval variable to __SK_NONE every time more_data starts. This causes: IPv4: Attempt to release TCP socket in state 1 00000000b4c925d7 ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 5 PID: 4482 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0x110 Modules linked in: CPU: 5 PID: 4482 Comm: sockhash_bypass Kdump: loaded Not tainted 6.0.0 #1 Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Call Trace: <TASK> __tcp_transmit_skb+0xa1b/0xb90 ? __alloc_skb+0x8c/0x1a0 ? __kmalloc_node_track_caller+0x184/0x320 tcp_write_xmit+0x22a/0x1110 __tcp_push_pending_frames+0x32/0xf0 do_tcp_sendpages+0x62d/0x640 tcp_bpf_push+0xae/0x2c0 tcp_bpf_sendmsg_redir+0x260/0x410 ? preempt_count_add+0x70/0xa0 tcp_bpf_send_verdict+0x386/0x4b0 tcp_bpf_sendmsg+0x21b/0x3b0 sock_sendmsg+0x58/0x70 __sys_sendto+0xfa/0x170 ? xfd_validate_state+0x1d/0x80 ? switch_fpu_return+0x59/0xe0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: `cd9733f5d7` ("tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-2-git-send-email-yangpc@wangsu.com	2022-12-01 01:07:21 +01:00
Alexei Starovoitov	c67cae551f	bpf: Tighten ptr_to_btf_id checks. The networking programs typically don't require CAP_PERFMON, but through kfuncs like bpf_cast_to_kern_ctx() they can access memory through PTR_TO_BTF_ID. In such case enforce CAP_PERFMON. Also make sure that only GPL programs can access kernel data structures. All kfuncs require GPL already. Also remove allow_ptr_to_map_access. It's the same as allow_ptr_leaks and different name for the same check only causes confusion. Fixes: `fd264ca020` ("bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx") Fixes: `50c6b8a9ae` ("selftests/bpf: Add a test for btf_type_tag "percpu"") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221125220617.26846-1-alexei.starovoitov@gmail.com	2022-11-30 15:33:48 -08:00
Daniel Borkmann	996c060e2b	selftests/bpf: Add bench test to arm64 and s390x denylist BPF CI fails for arm64 and s390x each with the following result: [...] All error logs: serial_test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec serial_test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec libbpf: prog 'test_kprobe_empty': failed to attach: Operation not supported serial_test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -95 #92 kprobe_multi_bench_attach:FAIL [...] Add the test to the deny list. Fixes: `5b6c7e5c44` ("selftests/bpf: Add attach bench test") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2022-12-01 00:07:30 +01:00
Andrii Nakryiko	f8186bf65a	selftests/bpf: Make sure enum-less bpf_enable_stats() API works in C++ mode Just a simple test to make sure we don't introduce unwanted compiler warnings and API still supports passing enums as input argument. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221130200013.2997831-2-andrii@kernel.org	2022-11-30 22:56:47 +01:00
Andrii Nakryiko	b42693415b	libbpf: Avoid enum forward-declarations in public API in C++ mode C++ enum forward declarations are fundamentally not compatible with pure C enum definitions, and so libbpf's use of `enum bpf_stats_type;` forward declaration in libbpf/bpf.h public API header is causing C++ compilation issues. More details can be found in [0], but it comes down to C++ supporting enum forward declaration only with explicitly specified backing type: enum bpf_stats_type: int; In C (and I believe it's a GCC extension also), such forward declaration is simply: enum bpf_stats_type; Further, in Linux UAPI this enum is defined in pure C way: enum bpf_stats_type { BPF_STATS_RUN_TIME = 0; } And even though in both cases backing type is int, which can be confirmed by looking at DWARF information, for C++ compiler actual enum definition and forward declaration are incompatible. To eliminate this problem, for C++ mode define input argument as int, which makes enum unnecessary in libbpf public header. This solves the issue and as demonstrated by next patch doesn't cause any unwanted compiler warnings, at least with default warnings setting. [0] https://stackoverflow.com/questions/42766839/c11-enum-forward-causes-underlying-type-mismatch [1] Closes: https://github.com/libbpf/libbpf/issues/249 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221130200013.2997831-1-andrii@kernel.org	2022-11-30 22:56:47 +01:00
Martin KaFai Lau	443f216448	selftests/bpf: Avoid pinning prog when attaching to tc ingress in btf_skc_cls_ingress This patch removes the need to pin prog when attaching to tc ingress in the btf_skc_cls_ingress test. Instead, directly use the bpf_tc_hook_create() and bpf_tc_attach(). The qdisc clsact will go away together with the netns, so no need to bpf_tc_hook_destroy(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-8-martin.lau@linux.dev	2022-11-30 22:47:43 +01:00
Martin KaFai Lau	9b6a777397	selftests/bpf: Remove serial from tests using {open,close}_netns After removing the mount/umount dance from {open,close}_netns() in the pervious patch, "serial_" can be removed from the tests using {open,close}_netns(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-7-martin.lau@linux.dev	2022-11-30 22:47:43 +01:00
Martin KaFai Lau	3084097c36	selftests/bpf: Remove the "/sys" mount and umount dance in {open,close}_netns The previous patches have removed the need to do the mount and umount dance when switching netns. In particular: * Avoid remounting /sys/fs/bpf to have a clean start * Avoid remounting /sys to get a ifindex of a particular netns This patch can finally remove the mount and umount dance in {open,close}_netns which is unnecessarily complicated and error-prone. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-6-martin.lau@linux.dev	2022-11-30 22:47:43 +01:00
Martin KaFai Lau	5dc42a7fc2	selftests/bpf: Avoid pinning bpf prog in the netns_load_bpf() callers This patch removes the need to pin prog in the remaining tests in tc_redirect.c by directly using the bpf_tc_hook_create() and bpf_tc_attach(). The clsact qdisc will go away together with the test netns, so no need to do bpf_tc_hook_destroy(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-5-martin.lau@linux.dev	2022-11-30 22:47:43 +01:00
Martin KaFai Lau	f1b73577bb	selftests/bpf: Avoid pinning bpf prog in the tc_redirect_peer_l3 test This patch removes the need to pin prog in the tc_redirect_peer_l3 test by directly using the bpf_tc_hook_create() and bpf_tc_attach(). The clsact qdisc will go away together with the test netns, so no need to do bpf_tc_hook_destroy(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-4-martin.lau@linux.dev	2022-11-30 22:47:42 +01:00
Martin KaFai Lau	57d0863f1d	selftests/bpf: Avoid pinning bpf prog in the tc_redirect_dtime test This patch removes the need to pin prog in the tc_redirect_dtime test by directly using the bpf_tc_hook_create() and bpf_tc_attach(). The clsact qdisc will go away together with the test netns, so no need to do bpf_tc_hook_destroy(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-3-martin.lau@linux.dev	2022-11-30 22:47:42 +01:00
Martin KaFai Lau	052c82dcdc	selftests/bpf: Use if_nametoindex instead of reading the /sys/net/class//ifindex When switching netns, the setns_by_fd() is doing dances in mount/umounting the /sys directories. One reason is the tc_redirect.c test is depending on the /sys/net/class//ifindex instead of using the if_nametoindex(). if_nametoindex() uses ioctl() to get the ifindex. This patch is to move all /sys/net/class/*/ifindex usages to if_nametoindex(). The current code checks ifindex >= 0 which is incorrect. ifindex > 0 should be checked instead. This patch also stores ifindex_veth_src and ifindex_veth_dst since the latter patch will need them. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221129070900.3142427-2-martin.lau@linux.dev	2022-11-30 22:47:42 +01:00
Willem de Bruijn	91a7de8560	selftests/net: add csum offload test Test NIC hardware checksum offload: - Rx + Tx - IPv4 + IPv6 - TCP + UDP Optional features: - zero checksum 0xFFFF - checksum disable 0x0000 - transport encap headers - randomization See file header for detailed comments. Expected results differ depending on NIC features: - CHECKSUM_UNNECESSARY vs CHECKSUM_COMPLETE - NETIF_F_HW_CSUM (csum_start/csum_off) vs NETIF_F_IP(V6)_CSUM Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20221128140210.553391-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 21:24:32 -08:00
Jakub Kicinski	5cb0c51fe3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-11-26 1) Remove redundant variable in esp6. From Colin Ian King. 2) Update x->lastused for every packet. It was used only for outgoing mobile IPv6 packets, but showed to be usefull to check if the a SA is still in use in general. From Antony Antony. 3) Remove unused variable in xfrm_byidx_resize. From Leon Romanovsky. 4) Finalize extack support for xfrm. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: add extack to xfrm_set_spdinfo xfrm: add extack to xfrm_alloc_userspi xfrm: add extack to xfrm_do_migrate xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len xfrm: add extack to xfrm_del_sa xfrm: add extack to xfrm_add_sa_expire xfrm: a few coding style clean ups xfrm: Remove not-used total variable xfrm: update x->lastused for every packet esp6: remove redundant variable err ==================== Link: https://lore.kernel.org/r/20221126110303.1859238-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:50:51 -08:00
Jakub Kicinski	b2d7b6e9e4	Merge branch 'net-pcs-altera-tse-simplify-and-clean-up-the-driver' Maxime Chevallier says: ==================== net: pcs: altera-tse: simplify and clean-up the driver This small series does a bit of code cleanup in the altera TSE pcs driver, removing unused register definitions, handling 1000BaseX speed configuration correctly according to the datasheet, and making use of proper poll_timeout helpers. No functional change is introduced. ==================== Link: https://lore.kernel.org/r/20221125131801.64234-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:29:59 -08:00
Maxime Chevallier	befd851de2	net: pcs: altera-tse: remove unnecessary register definitions remove unused register definitions, left from the split with the altera-tse mac driver. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:29:55 -08:00
Maxime Chevallier	b4a7bf9f5b	net: pcs: altera-tse: don't set the speed for 1000BaseX When disabling the SGMII mode bit, the PCS defaults to 1000BaseX mode. In that mode, we don't need to set the speed since it's always 1000Mbps. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:29:55 -08:00
Maxime Chevallier	d1a0ff5ff9	net: pcs: altera-tse: use read_poll_timeout to wait for reset Software resets on the TSE PCS don't clear registers, but rather reset all internal state machines regarding AN, comma detection and encoding/decoding. Use read_poll_timeout to wait for the reset to clear instead of manually polling the register. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:29:55 -08:00
Jakub Kicinski	7f0c940be5	Merge branch 'mptcp-msg_fastopen-and-tfo-listener-side-support' Matthieu Baerts says: ==================== mptcp: MSG_FASTOPEN and TFO listener side support Before this series, only the initiator of a connection was able to combine both TCP FastOpen and MPTCP when using TCP_FASTOPEN_CONNECT socket option. These new patches here add (in theory) the full support of TFO with MPTCP, which means: - MSG_FASTOPEN sendmsg flag support (patch 1/8) - TFO support for the listener side (patches 2-5/8) - TCP_FASTOPEN socket option (patch 6/8) - TCP_FASTOPEN_KEY socket option (patch 7/8) To support TFO for the server side, a few preparation patches are needed (patches 2 to 5/8). Some of them were inspired by a previous work from Benjamin Hesmans. Note that TFO support with MPTCP has been validated with selftests (patch 8/8) but also with Packetdrill tests running with a modified but still very WIP version supporting MPTCP. Both the modified tool and the tests are available online: https://github.com/multipath-tcp/packetdrill/ ==================== Link: https://lore.kernel.org/r/20221125222958.958636-1-matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:33 -08:00
Dmytro Shytyi	ca7ae89160	selftests: mptcp: mptfo Initiator/Listener This patch first adds TFO support in mptcp_connect.c. This can be enabled via a new option: -o MPTFO. Once enabled, the TCP_FASTOPEN socket option is enabled for the server side and a sendto() with MSG_FASTOPEN is used instead of a connect() for the client side. Note that the first SYN has a limit of bytes it can carry. In other words, it is allowed to send less data than the provided one. We then need to track more status info to properly allow the next sendmsg() starting from the next part of the data to send the rest. Also in TFO scenarios, we need to completely spool the partially xmitted buffer -- and account for that -- before starting sendfile/mmap xmit, otherwise the relevant tests will fail. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:26 -08:00
Matthieu Baerts	cb99816cb5	mptcp: add support for TCP_FASTOPEN_KEY sockopt The goal of this socket option is to set different keys per listener, see commit `1fba70e5b6` ("tcp: socket option to set TCP fast open key") for more details about this socket option. The only thing to do here with MPTCP is to relay the request to the first subflow like it is already done for the other TCP_FASTOPEN* socket options. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:26 -08:00
Dmytro Shytyi	4ffb0a0234	mptcp: add TCP_FASTOPEN sock option The TCP_FASTOPEN socket option is one way for the application to tell the kernel TFO support has to be enabled for the listener socket. The only thing to do here with MPTCP is to relay the request to the first subflow like it is already done for the other TCP_FASTOPEN* socket options. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:25 -08:00
Dmytro Shytyi	36b122baf6	mptcp: add subflow_v(4,6)_send_synack() The send_synack() needs to be overridden for MPTCP to support TFO for two reasons: - There is not be enough space in the TCP options if the TFO cookie has to be added in the SYN+ACK with other options: MSS (4), SACK OK (2), Timestamps (10), Window Scale (3+1), TFO (10+2), MP_CAPABLE (12). MPTCPv1 specs -- RFC 8684, section B.1 [1] -- suggest to drop the TCP timestamps option in this case. - The data received in the SYN has to be handled: the SKB can be dequeued from the subflow sk and transferred to the MPTCP sk. Counters need to be updated accordingly and the application can be notified at the end because some bytes have been received. [1] https://www.rfc-editor.org/rfc/rfc8684.html#section-b.1 Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:25 -08:00
Dmytro Shytyi	dfc8d06030	mptcp: implement delayed seq generation for passive fastopen With fastopen in place, the first subflow socket is created before the MPC handshake completes, and we need to properly initialize the sequence numbers at MPC ACK reception. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:25 -08:00
Paolo Abeni	b3ea6b272d	mptcp: consolidate initial ack seq generation Currently the initial ack sequence is generated on demand whenever it's requested and the remote key is handy. The relevant code is scattered in different places and can lead to multiple, unneeded, crypto operations. This change consolidates the ack sequence generation code in a single helper, storing the sequence number at the subflow level. The above additionally saves a few conditional in fast-path and will simplify the upcoming fast-open implementation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:25 -08:00
Paolo Abeni	fe33d38626	mptcp: track accurately the incoming MPC suboption type Currently in the receive path we don't need to discriminate between MPC SYN, MPC SYN-ACK and MPC ACK, but soon the fastopen code will need that info to properly track the fully established status. Track the exact MPC suboption type into the receive opt bitmap. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:24 -08:00
Dmytro Shytyi	1e777f39b4	mptcp: add MSG_FASTOPEN sendmsg flag support Since commit `54f1944ed6` ("mptcp: factor out mptcp_connect()"), all the infrastructure is now in place to support the MSG_FASTOPEN flag, we just need to call into the fastopen path in mptcp_sendmsg(). Co-developed-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 20:24:24 -08:00
Jakub Kicinski	f2bb566f5c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/lib/bpf/ringbuf.c `927cbb478a` ("libbpf: Handle size overflow for ringbuf mmap") `b486d19a0a` ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 13:04:52 -08:00
Linus Torvalds	01f856ae6d	Including fixes from bpf, can and wifi. Current release - new code bugs: - eth: mlx5e: - use kvfree() in mlx5e_accel_fs_tcp_create() - MACsec, fix RX data path 16 RX security channel limit - MACsec, fix memory leak when MACsec device is deleted - MACsec, fix update Rx secure channel active field - MACsec, fix add Rx security association (SA) rule memory leak Previous releases - regressions: - wifi: cfg80211: don't allow multi-BSSID in S1G - stmmac: set MAC's flow control register to reflect current settings - eth: mlx5: - E-switch, fix duplicate lag creation - fix use-after-free when reverting termination table Previous releases - always broken: - ipv4: fix route deletion when nexthop info is not specified - bpf: fix a local storage BPF map bug where the value's spin lock field can get initialized incorrectly - tipc: re-fetch skb cb after tipc_msg_validate - wifi: wilc1000: fix Information Element parsing - packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE - sctp: fix memory leak in sctp_stream_outq_migrate() - can: can327: fix potential skb leak when netdev is down - can: add number of missing netdev freeing on error paths - aquantia: do not purge addresses when setting the number of rings - wwan: iosm: - fix incorrect skb length leading to truncated packet - fix crash in peek throughput test due to skb UAF Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOGOdYACgkQMUZtbf5S IrsknQ//SAoOyDOEu15YzOt8hAupLKoF6MM+D0dwwTEQZLf7IVXCjPpkKtVh7Si7 YCBoyrqrDs7vwaUrVoKY19Amwov+EYrHCpdC+c7wdZ7uxTaYfUbJJUGmxYOR179o lV1+1Aiqg9F9C6CUsmZ5lDN2Yb7/uPDBICIV8LM+VzJAtXjurBVauyMwAxLxPOAr cgvM+h5xzE7DXMF2z8R/mUq5MSIWoJo9hy2UwbV+f2liMTQuw9rwTbyw3d7+H/6p xmJcBcVaABjoUEsEhld3NTlYbSEnlFgCQBfDWzf2e4y6jBxO0JepuIc7SZwJFRJY XBqdsKcGw5RkgKbksKUgxe126XFX0SUUQEp0UkOIqe15k7eC2yO9uj1gRm6OuV4s J94HKzHX9WNV5OQ790Ed2JyIJScztMZlNFVJ/cz2/+iKR42xJg6kaO6Rt2fobtmL VC2cH+RfHzLl+2+7xnfzXEDgFePSBlA02Aq1wihU3zB3r7WCFHchEf9T7sGt1QF0 03R+8E3+N2tYqphPAXyDoy6kXQJTPxJHAe1FNHJlwgfieUDEWZi/Pm+uQrKIkDeo oq9MAV2QBNSD1w4wl7cXfvicO5kBr/OP6YBqwkpsGao2jCSIgkWEX2DRrUaLczXl 5/Z+m/gCO5tAEcVRYfMivxUIon//9EIhbErVpHTlNWpRHk24eS4= =0Lnw -----END PGP SIGNATURE----- Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, can and wifi. Current release - new code bugs: - eth: mlx5e: - use kvfree() in mlx5e_accel_fs_tcp_create() - MACsec, fix RX data path 16 RX security channel limit - MACsec, fix memory leak when MACsec device is deleted - MACsec, fix update Rx secure channel active field - MACsec, fix add Rx security association (SA) rule memory leak Previous releases - regressions: - wifi: cfg80211: don't allow multi-BSSID in S1G - stmmac: set MAC's flow control register to reflect current settings - eth: mlx5: - E-switch, fix duplicate lag creation - fix use-after-free when reverting termination table Previous releases - always broken: - ipv4: fix route deletion when nexthop info is not specified - bpf: fix a local storage BPF map bug where the value's spin lock field can get initialized incorrectly - tipc: re-fetch skb cb after tipc_msg_validate - wifi: wilc1000: fix Information Element parsing - packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE - sctp: fix memory leak in sctp_stream_outq_migrate() - can: can327: fix potential skb leak when netdev is down - can: add number of missing netdev freeing on error paths - aquantia: do not purge addresses when setting the number of rings - wwan: iosm: - fix incorrect skb length leading to truncated packet - fix crash in peek throughput test due to skb UAF" * tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed MAINTAINERS: Update maintainer list for chelsio drivers ionic: update MAINTAINERS entry sctp: fix memory leak in sctp_stream_outq_migrate() packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE net/mlx5: Lag, Fix for loop when checking lag Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path" net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions net: tun: Fix use-after-free in tun_detach() net: mdiobus: fix unbalanced node reference count net: hsr: Fix potential use-after-free tipc: re-fetch skb cb after tipc_msg_validate mptcp: fix sleep in atomic at close time mptcp: don't orphan ssk in mptcp_close() dsa: lan9303: Correct stat name ipv4: Fix route deletion when nexthop info is not specified net: wwan: iosm: fix incorrect skb length net: wwan: iosm: fix crash in peek throughput test net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type net: wwan: iosm: fix kernel test robot reported error ...	2022-11-29 09:52:10 -08:00
Yuan Can	7a945ce0c1	udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write() As the nla_nest_start() may fail with NULL returned, the return value should be checked. Note that this is not a real bug, nothing will break here. The next nla_put() will fail as well and we'll bail (and nla_nest_cancel() can handle NULL). But we keep getting those "fixes" so whatever. Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20221129013934.55184-1-yuancan@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:44:24 -08:00
Yoshihiro Shimoda	d66233a312	net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed After system resumed on some environment board, the promiscuous mode is disabled because the SoC turned off. So, call ravb_set_rx_mode() in the ravb_resume() to fix the issue. Reported-by: Tho Vu <tho.vu.wh@renesas.com> Fixes: `0184165b2f` ("ravb: add sleep PM suspend/resume support") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20221128065604.1864391-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:41:01 -08:00
Ayush Sawal	178833f99f	MAINTAINERS: Update maintainer list for chelsio drivers This updates the maintainers for chelsio inline crypto drivers and chelsio crypto drivers. Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Link: https://lore.kernel.org/r/20221128231348.8225-1-ayush.sawal@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:41:01 -08:00
Shannon Nelson	91a2bbfff3	ionic: update MAINTAINERS entry Now that Pensando is a part of AMD we need to update a couple of addresses. We're keeping the mailing list address for the moment, but that will likely change in the near future. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20221129011734.20849-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:31:39 -08:00
Zhengchao Shao	9ed7bfc795	sctp: fix memory leak in sctp_stream_outq_migrate() When sctp_stream_outq_migrate() is called to release stream out resources, the memory pointed to by prio_head in stream out is not released. The memory leak information is as follows: unreferenced object 0xffff88801fe79f80 (size 64): comm "sctp_repo", pid 7957, jiffies 4294951704 (age 36.480s) hex dump (first 32 bytes): 80 9f e7 1f 80 88 ff ff 80 9f e7 1f 80 88 ff ff ................ 90 9f e7 1f 80 88 ff ff 90 9f e7 1f 80 88 ff ff ................ backtrace: [<ffffffff81b215c6>] kmalloc_trace+0x26/0x60 [<ffffffff88ae517c>] sctp_sched_prio_set+0x4cc/0x770 [<ffffffff88ad64f2>] sctp_stream_init_ext+0xd2/0x1b0 [<ffffffff88aa2604>] sctp_sendmsg_to_asoc+0x1614/0x1a30 [<ffffffff88ab7ff1>] sctp_sendmsg+0xda1/0x1ef0 [<ffffffff87f765ed>] inet_sendmsg+0x9d/0xe0 [<ffffffff8754b5b3>] sock_sendmsg+0xd3/0x120 [<ffffffff8755446a>] __sys_sendto+0x23a/0x340 [<ffffffff87554651>] __x64_sys_sendto+0xe1/0x1b0 [<ffffffff89978b49>] do_syscall_64+0x39/0xb0 [<ffffffff89a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://syzkaller.appspot.com/bug?exrid=29c402e56c4760763cc0 Fixes: `637784ade2` ("sctp: introduce priority based stream scheduler") Reported-by: syzbot+29c402e56c4760763cc0@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20221126031720.378562-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:30:50 -08:00
Willem de Bruijn	b85f628aa1	packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE CHECKSUM_COMPLETE signals that skb->csum stores the sum over the entire packet. It does not imply that an embedded l4 checksum field has been validated. Fixes: `682f048bd4` ("af_packet: pass checksum validation status to the user") Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20221128161812.640098-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:30:18 -08:00
Chris Mi	0e682f04b4	net/mlx5: Lag, Fix for loop when checking lag The cited commit adds a for loop to check if each port supports lag or not. But dev is not initialized correctly. Fix it by initializing dev for each iteration. Fixes: `e87c6a832f` ("net/mlx5: E-switch, Fix duplicate lag creation") Signed-off-by: Chris Mi <cmi@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221129093006.378840-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:19:59 -08:00
Saeed Mahameed	dda3bbbb26	Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path" This reverts commit `c0071be0e1`. The cited commit removed the validity checks which initialized the window_sz and never removed the use of the now uninitialized variable, so now we are left with wrong value in the window size and the following clang warning: [-Wuninitialized] drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c:232:45: warning: variable 'window_sz' is uninitialized when used here MLX5_SET(macsec_aso, aso_ctx, window_size, window_sz); Revet at this time to address the clang issue due to lack of time to test the proper solution. Fixes: `c0071be0e1` ("net/mlx5e: MACsec, remove replay window size limitation in offload path") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221129093006.378840-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 08:19:59 -08:00
Yu Xiao	a61474c41e	nfp: ethtool: support reporting link modes Add support for reporting link modes, including `Supported link modes` and `Advertised link modes`, via ethtool $DEV. A new command `SPCODE_READ_MEDIA` is added to read info from management firmware. Also, the mapping table `nfp_eth_media_table` associates the link modes between NFP and kernel. Both of them help to support this ability. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20221125113030.141642-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 15:28:52 +01:00
Jiri Pirko	7666dbec72	net: devlink: add WARN_ON_ONCE to check return value of unregister_netdevice_notifier_net() call As the return value is not 0 only in case there is no such notifier block registered, add a WARN_ON_ONCE() to yell about it. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20221125100255.1786741-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:37:53 +01:00
Paolo Abeni	cb55ff7ac4	Merge branch 'add-support-for-lan966x-is2-vcap' Horatiu Vultur says: ==================== Add support for lan966x IS2 VCAP This provides initial support for lan966x for 'tc' traffic control userspace tool and its flower filter. For this is required to use the VCAP library. Currently supported flower filter keys and actions are: - source and destination MAC address keys - trap action ==================== Link: https://lore.kernel.org/r/20221125095010.124458-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:26 +01:00
Horatiu Vultur	4f141e3671	net: microchip: vcap: Implement w32be On lan966x the layout of the vcap memory is different than on sparx5. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00
Horatiu Vultur	4426b78c62	net: lan966x: Add port keyset config and callback interface Implement vcap_operations and enable default port keyset configuration for each port. Now it is possible actually write/read/move entries in the VCAP. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00
Horatiu Vultur	61caac2d1a	net: lan966x: add tc matchall goto action Extend matchall with action goto. This is needed to enable the lookup in the VCAP. It is needed to connect chain 0 to a chain that is recognized by the HW. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00
Horatiu Vultur	3643abd6e6	net: lan966x: add tc flower support for VCAP API Currently the only supported action is ACTION_TRAP and the only dissector is ETH_ADDRS. Others will be added in future patches. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00
Horatiu Vultur	f919ccc93d	net: lan966x: add vcap registers Add registers used to access vcap controller. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00
Horatiu Vultur	39bedc169c	net: lan966x: Add is2 vcap model to vcap API. This provides the lan966x is2 model and adds it to the vcap control instance that will be provided to the vcap API. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-11-29 13:08:23 +01:00

1 2 3 4 5 ...

1139948 Commits