linux

Author	SHA1	Message	Date
Martin KaFai Lau	29003875bd	bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt() After the prep work in the previous patches, this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET) and reuses them from sk_setsockopt(). The sock ptr test is added to the SO_RCVLOWAT because the sk->sk_socket could be NULL in some of the bpf hooks. The existing optname white-list is refactored into a new function sol_socket_setsockopt(). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061804.4178920-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-18 17:06:13 -07:00
Martin KaFai Lau	e42c7beee7	bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt() When bpf program calling bpf_setsockopt(SOL_SOCKET), it could be run in softirq and doesn't make sense to do the capable check. There was a similar situation in bpf_setsockopt(TCP_CONGESTION). In commit `8d650cdeda` ("tcp: fix tcp_set_congestion_control() use from bpf hook"), tcp_set_congestion_control(..., cap_net_admin) was added to skip the cap check for bpf prog. This patch adds sockopt_ns_capable() and sockopt_capable() for the sk_setsockopt() to use. They will consider the has_current_bpf_ctx() before doing the ns_capable() and capable() test. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Suggested-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061723.4175820-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-18 17:06:12 -07:00
Martin KaFai Lau	24426654ed	bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a has_current_bpf_ctx() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the has_current_bpf_ctx() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test has_current_bpf_ctx() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-18 17:06:12 -07:00
Eyal Birger	7ec9fce4b3	ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels Commit `451ef36bd2` ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key(). VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not. This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths. Fixes: `b8fff74852` ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Paul Chaignon <paul@isovalent.com> Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com	2022-08-18 21:18:28 +02:00
Jakub Kicinski	3f5f728a72	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-08-17 We've added 45 non-merge commits during the last 14 day(s) which contain a total of 61 files changed, 986 insertions(+), 372 deletions(-). The main changes are: 1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt Kanzenbach and Jesper Dangaard Brouer. 2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko. 3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov. 4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires. 5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle unsupported names on old kernels, from Hangbin Liu. 6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton, from Hao Luo. 7) Relax libbpf's requirement for shared libs to be marked executable, from Henqgi Chen. 8) Improve bpf_iter internals handling of error returns, from Hao Luo. 9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard. 10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong. 11) bpftool improvements to handle full BPF program names better, from Manu Bretelle. 12) bpftool fixes around libcap use, from Quentin Monnet. 13) BPF map internals clean ups and improvements around memory allocations, from Yafang Shao. 14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup iterator to work on cgroupv1, from Yosry Ahmed. 15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong. 16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) selftests/bpf: Few fixes for selftests/bpf built in release mode libbpf: Clean up deprecated and legacy aliases libbpf: Streamline bpf_attr and perf_event_attr initialization libbpf: Fix potential NULL dereference when parsing ELF selftests/bpf: Tests libbpf autoattach APIs libbpf: Allows disabling auto attach selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm libbpf: Making bpf_prog_load() ignore name if kernel doesn't support selftests/bpf: Update CI kconfig selftests/bpf: Add connmark read test selftests/bpf: Add existing connection bpf_*_ct_lookup() test bpftool: Clear errno after libcap's checks bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation bpftool: Fix a typo in a comment libbpf: Add names for auxiliary maps bpf: Use bpf_map_area_alloc consistently on bpf map creation bpf: Make __GFP_NOWARN consistent in bpf map creation bpf: Use bpf_map_area_free instread of kvfree bpf: Remove unneeded memset in queue_stack_map creation libbpf: preserve errno across pr_warn/pr_info/pr_debug ... ==================== Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 20:29:36 -07:00
Jakub Kicinski	bec13ba9ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/20220817140015.25843-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 20:17:45 -07:00
David Howells	fc4aaf9fb3	net: Fix suspicious RCU usage in bpf_sk_reuseport_detach() bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags() to obtain the value of sk->sk_user_data, but that function is only usable if the RCU read lock is held, and neither that function nor any of its callers hold it. Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() that checks to see if sk->sk_callback_lock() is held and use that here instead. Alternatively, making __rcu_dereference_sk_user_data_with_flags() use rcu_dereference_checked() might suffice. Without this, the following warning can be occasionally observed: ============================= WARNING: suspicious RCU usage 6.0.0-rc1-build2+ #563 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/29873: #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4 stack backtrace: CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f bpf_sk_reuseport_detach+0x6d/0xa4 reuseport_detach_sock+0x75/0xdd inet_unhash+0xa5/0x1c0 tcp_set_state+0x169/0x20f ? lockdep_sock_is_held+0x3a/0x3a ? __lock_release.isra.0+0x13e/0x220 ? reacquire_held_locks+0x1bb/0x1bb ? hlock_class+0x31/0x96 ? mark_lock+0x9e/0x1af __tcp_close+0x50/0x4b6 tcp_close+0x28/0x70 inet_release+0x8e/0xa7 __sock_release+0x95/0x121 sock_close+0x14/0x17 __fput+0x20f/0x36a task_work_run+0xa3/0xcc exit_to_user_mode_prepare+0x9c/0x14d syscall_exit_to_user_mode+0x18/0x44 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: `cf8c1e9672` ("net: refactor bpf_sk_reuseport_detach()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Hawkins Jiawei <yin31149@gmail.com> Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 16:42:59 -07:00
Zhengchao Shao	52327d2e39	net: sched: remove the unused return value of unregister_qdisc Return value of unregister_qdisc is unused, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220815030417.271894-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-16 19:37:06 -07:00
Alexander Mikhalitsyn	0ff4eb3d5e	neighbour: make proxy_queue.qlen limit per-device Right now we have a neigh_param PROXY_QLEN which specifies maximum length of neigh_table->proxy_queue. But in fact, this limitation doesn't work well because check condition looks like: tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN) The problem is that p (struct neigh_parms) is a per-device thing, but tbl (struct neigh_table) is a system-wide global thing. It seems reasonable to make proxy_queue limit per-device based. v2: - nothing changed in this patch v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:25:09 +01:00
Linus Torvalds	7ebfc85e2c	Including fixes from bluetooth, bpf, can and netfilter. A little longer PR than usual but it's all fixes, no late features. It's long partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmL1TtkACgkQMUZtbf5S Iruz8Q/+O5xFFsjxuyZD0Mw9d3Jeo3ZI9PeeDvcYl5dZXVegpxqorujTFntxv1Ad JC8o5qqms3kO51d+W/yai6iDacEHX2YcJrupZve+vGvpOEVmBRY5O0E1AckJ18+u ItmjSVESkybUP5P08/An7Y0dMmj9Xb2z84dGkLe+n8lg6/fimo6Ki6yZjcOBOALu AYquMXUcnwztRMbTFjscbJjBd4xFMKZEtthljYtPdIReIN976wmMNYYx+jcPK7ha g39Kv6maklp4euerkGIJ/AMnOWHaOGCFjIaz7rr4444NDfrKdt/jeirUXJaz77Jo TJM2UOwgOeg6WZkSa3cmdq6UdjdkJ6LTe2CJFf1wJ1qfhAi+s8yWoszsM2Enp+66 c/mo9jTCMAjmgEJF11idZuz2S697/5j0hvbfM3ZPgNyNBgn8qxz/Z56fNOisx95u TkoKKFnGH+mcm/et+omBcyLBtBVK2+/6B6mpl6btf4DOkPn5KFYWHV67uV3ksHzQ ye+pnzidoIG0yKbRM2EQKXk7ELKROpl52xUHko93ZinMJt0Q7jBm7tZhJozNFEzi hWgUvpmNXgawzLYQcJ9jJmKw3PmYZnRhvYZB/1r91YamM28Hd58k9WfpWtUtjYJN N0X58L6JSnKPqzR70pcFppz6iBlh0tHdcEQGWhhKU5ScS3FDxGc= =C5Ck -----END PGP SIGNATURE----- Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...	2022-08-11 13:45:37 -07:00
Jakub Kicinski	5c221f0af6	net: add missing kdoc for struct genl_multicast_group::flags Multicast group flags were added in commit `4d54cc3211` ("mptcp: avoid lock_fast usage in accept path"), but it missed adding the kdoc. Mention which flags go into that field, and do the same for op structs. Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-11 09:26:04 -07:00
Florian Westphal	0b2f3212b5	netfilter: nfnetlink: re-enable conntrack expectation events To avoid allocation of the conntrack extension area when possible, the default behaviour was changed to only allocate the event extension if a userspace program is subscribed to a notification group. Problem is that while 'conntrack -E' does enable the event allocation behind the scenes, 'conntrack -E expect' does not: no expectation events are delivered unless user sets "net.netfilter.nf_conntrack_events" back to 1 (always on). Fix the autodetection to also consider EXP type group. We need to track the 6 event groups (3+3, new/update/destroy for events and for expectations each) independently, else we'd disable events again if an expectation group becomes empty while there is still an active event group. Fixes: `2794cdb0b9` ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-11 18:09:54 +02:00
Maxim Mikityanskiy	94ce3b64c6	net/tls: Use RCU API to access tls_ctx->netdev Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: `89df6a8104` ("net/bonding: Implement TLS TX device offload") Fixes: `c55dcdd435` ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 22:58:43 -07:00
Jakub Kicinski	fbe8870f72	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-08-10 We've added 23 non-merge commits during the last 7 day(s) which contain a total of 19 files changed, 424 insertions(+), 35 deletions(-). The main changes are: 1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao. 2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia. 3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu. 4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev. 5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney. 6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi. 7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa. 8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai. 9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address offset buffer, from Aijun Sun. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator bpf: Check the validity of max_rdwr_access for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for hash map iterator bpf: Acquire map uref in .init_seq_private for array map iterator bpf: Disallow bpf programs call prog_run command. bpf, arm64: Fix bpf trampoline instruction endianness selftests/bpf: Add test for prealloc_lru_pop bug bpf: Don't reinit map value in prealloc_lru_pop bpf: Allow calling bpf_prog_test kfuncs in tracing programs bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf bpf: Use proper target btf when exporting attach_btf_obj_id mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled bpf: Cleanup ftrace hash in bpf_trampoline_put BPF: Fix potential bad pointer dereference in bpf_sys_bpf() ... ==================== Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 21:48:15 -07:00
Hawkins Jiawei	2a0133723f	net: fix refcount bug in sk_psock_get (2) Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit `f1ff5ce2cd` ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 21:47:58 -07:00
Christophe JAILLET	84b709d310	ax88796: Fix some typo in a comment s/by caused/be caused/ s/ax88786/ax88796/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7db4b622d2c3e5af58c1d1f32b81836f4af71f18.1659801746.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-09 22:14:02 -07:00
Pablo Neira Ayuso	f323ef3a0d	netfilter: nf_tables: disallow jump to implicit chain from set element Extend struct nft_data_desc to add a flag field that specifies nft_data_init() is being called for set element data. Use it to disallow jump to implicit chain from set element, only jump to chain via immediate expression is allowed. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso	341b694160	netfilter: nf_tables: upfront validation of data via nft_data_init() Instead of parsing the data and then validate that type and length are correct, pass a description of the expected data so it can be validated upfront before parsing it to bail out earlier. This patch adds a new .size field to specify the maximum size of the data area. The .len field is optional and it is used as an input/output field, it provides the specific length of the expected data in the input path. If then .len field is not specified, then obtained length from the netlink attribute is stored. This is required by cmp, bitwise, range and immediate, which provide no netlink attribute that describes the data length. The immediate expression uses the destination register type to infer the expected data type. Relying on opencoded validation of the expected data might lead to subtle bugs as described in `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso	34aae2c2fb	netfilter: nf_tables: validate variable length element extension Update template to validate variable length extensions. This patch adds a new .ext_len[id] field to the template to store the expected extension length. This is used to sanity check the initialization of the variable length extension. Use PTR_ERR() in nft_set_elem_init() to report errors since, after this update, there are two reason why this might fail, either because of ENOMEM or insufficient room in the extension field (EINVAL). Kernels up until `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data") allowed to copy more data to the extension than was allocated. This ext_len field allows to validate if the destination has the correct size as additional check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:16 +02:00
Kumar Kartikeya Dwivedi	6e116280b4	net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink The current ifdefry for code shared by the BPF and ctnetlink side looks ugly. As per Pablo's request, simplify this by unconditionally compiling in the code. This can be revisited when the shared code between the two grows further. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-09 09:41:54 -07:00
Jiri Olsa	f1d41f7720	mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled The btf_sock_ids array needs struct mptcp_sock BTF ID for the bpf_skc_to_mptcp_sock helper. When CONFIG_MPTCP is disabled, the 'struct mptcp_sock' is not defined and resolve_btfids will complain with: [...] BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol mptcp_sock [...] Add an empty definition for struct mptcp_sock when CONFIG_MPTCP is disabled. Fixes: `3bc253c2e6` ("bpf: Add bpf_skc_to_mptcp_sock_proto") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220802163324.1873044-1-jolsa@kernel.org	2022-08-08 15:30:45 +02:00
Linus Torvalds	ea0c39260d	9p-for-5.20 - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmLuKqkACgkQq06b7GqY 5nA+RBAAvuA6AjKQSvsNxHsqSFMahwoE3cCPlF/QlmAnnZa1DzmRI3kKTAuuDKkE Q4c7DjxzDJCWRTlkpNUCGZycHqpu1QQbfoTo43sIqob7W8xlajeemc5Fxqtw5sPM m+0SzN7vvNJpCr+6pxMXwwGHSZmZOvpFjwj7cUjhpF/V1WO8bNxfCGzQdF0hX1Vn 2HoBbFUOmacL9Z/pF3O/ZG9LwCFuRQsH0EmbFBlJy1WdDtlVHTXlzDaGQ7EGaY4D 17UR6iITsj9ozacLhvk094PLIc3/RHDGLrm3C4Ka3zmUI7BsiYWPDmai3Pu/DNqn JJ5sZkdrVowxyBbGxw8GpZ4YDJtGsU5XglFPdkw+ZazxhZLNEIstPXg0HTZybrOe GE+WskWB0qS+RpX0tnYEcX6qOHWm3/63Yq5NG6A9tLQUSFku02jS/bCQSLBrmGWW Js24IWvFSTvl6XytHeldYhJP618pNUxXRSqgYv96vT/LI3mUrIMN+IVBNPujO6p2 jIYXNoaqLoY3efXKW/WQmp7C/52ZP4ly4fOiz7qtHTQsCcIk8Xo6zwHtm/FkNEqc sMZdqgLrxKPNBAlT8iEtt//wU2fB7mFt988p0pc+5lAK5t0h67KZJV6vDwTAMObX wV6Ht+QhOHtJwO779fk8FhZPNRPYZYMutIyFNlRx+4gCpBuqJPc= =941k -----END PGP SIGNATURE----- Merge tag '9p-for-5.20' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting * tag '9p-for-5.20' of https://github.com/martinetd/linux: net/9p: Initialize the iounit field during fid creation net: 9p: fix refcount leak in p9_read_work() error handling 9p: roll p9_tag_remove into p9_req_put 9p: Add client parameter to p9_req_put() 9p: Drop kref usage 9p: Fix some kernel-doc comments 9p fid refcount: cleanup p9_fid_put calls 9p fid refcount: add a 9p_fid_ref tracepoint 9p fid refcount: add p9_fid_get/put wrappers 9p: Fix minor typo in code comment 9p: Remove unnecessary variable for old fids while walking from d_parent 9p: Make the path walk logic more clear about when cloning is required 9p: Track the root fid with its own variable during lookups	2022-08-06 14:48:54 -07:00
Vladimir Oltean	06799a9085	net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS The bonding driver piggybacks on time stamps kept by the network stack for the purpose of the netdev TX watchdog, and this is problematic because it does not work with NETIF_F_LLTX devices. It is hard to say why the driver looks at dev_trans_start() of the slave->dev, considering that this is updated even by non-ARP/NS probes sent by us, and even by traffic not sent by us at all (for example PTP on physical slave devices). ARP monitoring in active-backup mode appears to still work even if we track only the last TX time of actual ARP probes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-03 19:20:12 -07:00
Linus Torvalds	f86d1fbbe7	Networking changes for 6.0. Core ---- - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF --- - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols --------- - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API ---------- - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers ---------------------- - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers ------- - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years. Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2 UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+ jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0 WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9 QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm 1nYpXKHA8qML =hxEG -----END PGP SIGNATURE----- Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Paolo Abeni: "Core: - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF: - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols: - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API: - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers: - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers: - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years" * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits) doc: sfp-phylink: Fix a broken reference wireguard: selftests: support UML wireguard: allowedips: don't corrupt stack when detecting overflow wireguard: selftests: update config fragments wireguard: ratelimiter: use hrtimer in selftest net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ net: usb: ax88179_178a: Bind only to vendor-specific interface selftests: net: fix IOAM test skip return code net: usb: make USB_RTL8153_ECM non user configurable net: marvell: prestera: remove reduntant code octeontx2-pf: Reduce minimum mtu size to 60 net: devlink: Fix missing mutex_unlock() call net/tls: Remove redundant workqueue flush before destroy net: txgbe: Fix an error handling path in txgbe_probe() net: dsa: Fix spelling mistakes and cleanup code Documentation: devlink: add add devlink-selftests to the table of contents dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: ionic: fix error check for vlan flags in ionic_set_nic_features() net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() nfp: flower: add support for tunnel offload without key ID ...	2022-08-03 16:29:08 -07:00
Paolo Abeni	7c6327c77d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/ax25/af_ax25.c `d7c4c9e075` ("ax25: fix incorrect dev_tracker usage") `d62607c3fe` ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c `180a6a3ee6` ("netdevsim: fib: Fix reference count leak on route deletion failure") `012ec02ae4` ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-03 09:04:55 +02:00
Linus Torvalds	b349b1181d	for-5.20/io_uring-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg 6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7 0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp 2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3 Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2 43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw 5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+ NiU6VihPOw== =gxeV -----END PGP SIGNATURE----- Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: - As per (valid) complaint in the last merge window, fs/io_uring.c has grown quite large these days. io_uring isn't really tied to fs either, as it supports a wide variety of functionality outside of that. Move the code to io_uring/ and split it into files that either implement a specific request type, and split some code into helpers as well. The code is organized a lot better like this, and io_uring.c is now < 4K LOC (me). - Deprecate the epoll_ctl opcode. It'll still work, just trigger a warning once if used. If we don't get any complaints on this, and I don't expect any, then we can fully remove it in a future release (me). - Improve the cancel hash locking (Hao) - kbuf cleanups (Hao) - Efficiency improvements to the task_work handling (Dylan, Pavel) - Provided buffer improvements (Dylan) - Add support for recv/recvmsg multishot support. This is similar to the accept (or poll) support for have for multishot, where a single SQE can trigger everytime data is received. For applications that expect to do more than a few receives on an instantiated socket, this greatly improves efficiency (Dylan). - Efficiency improvements for poll handling (Pavel) - Poll cancelation improvements (Pavel) - Allow specifiying a range for direct descriptor allocations (Pavel) - Cleanup the cqe32 handling (Pavel) - Move io_uring types to greatly cleanup the tracing (Pavel) - Tons of great code cleanups and improvements (Pavel) - Add a way to do sync cancelations rather than through the sqe -> cqe interface, as that's a lot easier to use for some use cases (me). - Add support to IORING_OP_MSG_RING for sending direct descriptors to a different ring. This avoids the usually problematic SCM case, as we disallow those. (me) - Make the per-command alloc cache we use for apoll generic, place limits on it, and use it for netmsg as well (me). - Various cleanups (me, Michal, Gustavo, Uros) * tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...	2022-08-02 13:20:44 -07:00
Maxim Mikityanskiy	8eaa1d1108	net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ Striding RQ uses MTT page mapping, where each page corresponds to an XSK frame. MTT pages have alignment requirements, and XSK frames don't have any alignment guarantees in the unaligned mode. Frames with improper alignment must be discarded, otherwise the packet data will be written at a wrong address. Fixes: `282c0c798f` ("net/mlx5e: Allow XSK frames smaller than a page") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20220729121356.3990867-1-maximmi@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-02 15:07:09 +02:00
Eric Dumazet	2df91e397d	net: rose: add netdev ref tracker to 'struct rose_sock' This will help debugging netdevice refcount problems with CONFIG_NET_DEV_REFCNT_TRACKER=y Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 11:59:23 -07:00
Jakub Kicinski	5fc7c5887c	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-07-29 We've added 22 non-merge commits during the last 4 day(s) which contain a total of 27 files changed, 763 insertions(+), 120 deletions(-). The main changes are: 1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper, from Paul Chaignon. 2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong. 3) Fix for XDP frame length calculation, from Lorenzo Bianconi. 4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate s390x quirks with socketcall(), from Ilya Leoshkevich. 5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI, from Daniel Müller. 6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai. 7) Fix allocation warnings in netdevsim, from Jakub Kicinski. 8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton. 9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev. 10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei, Jörn-Thorben Hinz, Yang Li. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits) bpf: Remove unneeded semicolon libbpf: Add bpf_obj_get_opts() netdevsim: Avoid allocation warnings triggered from user space bpf: Fix NULL pointer dereference when registering bpf trampoline bpf: Fix test_progs -j error with fentry/fexit tests selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout bpftool: Don't try to return value from void function in skeleton bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro bpf: btf: Fix vsnprintf return value check libbpf: Support PPC in arch_specific_syscall_pfx selftests/bpf: Adjust vmtest.sh to use local kernel configuration selftests/bpf: Copy over libbpf configs selftests/bpf: Sort configuration selftests/bpf: Attach to socketcall() in test_probe_user libbpf: Extend BPF_KSYSCALL documentation bpf, devmap: Compute proper xdp_frame len redirecting frames bpf: Fix bpf_xdp_pointer return pointer selftests/bpf: Don't assign outer source IP to host bpf: Set flow flag to allow any source IP in bpf_tunnel_key geneve: Use ip_tunnel_key flow flags in route lookups ... ==================== Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-29 19:04:29 -07:00
Mike Manning	944fd1aeac	net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set The commit `3c82a21f43` ("net: allow binding socket in a VRF when there's an unbound socket") changed the inet socket lookup to avoid packets in a VRF from matching an unbound socket. This is to ensure the necessary isolation between the default and other VRFs for routing and forwarding. VRF-unaware processes running in the default VRF cannot access another VRF and have to be run with 'ip vrf exec <vrf>'. This is to be expected with tcp_l3mdev_accept disabled, but could be reallowed when this sysctl option is enabled. So instead of directly checking dif and sdif in inet[6]_match, here call inet_sk_bound_dev_eq(). This allows a match on unbound socket for non-zero sdif i.e. for packets in a VRF, if tcp_l3mdev_accept is enabled. Fixes: `3c82a21f43` ("net: allow binding socket in a VRF when there's an unbound socket") Signed-off-by: Mike Manning <mvrmanning@gmail.com> Link: https://lore.kernel.org/netdev/a54c149aed38fded2d3b5fdb1a6c89e36a083b74.camel@lasnet.de/ Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-29 11:58:54 +01:00
Andy Shevchenko	29192a170e	firewire: net: Make use of get_unaligned_be48(), put_unaligned_be48() Since we have a proper endianness converters for BE 48-bit data use them. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220726144906.5217-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 22:21:54 -07:00
Eric Dumazet	d7c4c9e075	ax25: fix incorrect dev_tracker usage While investigating a separate rose issue [1], and enabling CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2] An ax25_dev can be used by one (or many) struct ax25_cb. We thus need different dev_tracker, one per struct ax25_cb. After this patch is applied, we are able to focus on rose. [1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/ [2] [ 205.798723] reference already released. [ 205.798732] allocated in: [ 205.798734] ax25_bind+0x1a2/0x230 [ax25] [ 205.798747] __sys_bind+0xea/0x110 [ 205.798753] __x64_sys_bind+0x18/0x20 [ 205.798758] do_syscall_64+0x5c/0x80 [ 205.798763] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798768] freed in: [ 205.798770] ax25_release+0x115/0x370 [ax25] [ 205.798778] __sock_release+0x42/0xb0 [ 205.798782] sock_close+0x15/0x20 [ 205.798785] __fput+0x9f/0x260 [ 205.798789] ____fput+0xe/0x10 [ 205.798792] task_work_run+0x64/0xa0 [ 205.798798] exit_to_user_mode_prepare+0x18b/0x190 [ 205.798804] syscall_exit_to_user_mode+0x26/0x40 [ 205.798808] do_syscall_64+0x69/0x80 [ 205.798812] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.798827] ------------[ cut here ]------------ [ 205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81 [ 205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video [ 205.798948] mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25] [ 205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3 [ 205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020 [ 205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81 [ 205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e [ 205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286 [ 205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000 [ 205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff [ 205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618 [ 205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0 [ 205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001 [ 205.799024] FS: 0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000 [ 205.799028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0 [ 205.799033] Call Trace: [ 205.799035] <TASK> [ 205.799038] ? ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799047] ? ax25_device_event+0x9f/0x270 [ax25] [ 205.799055] ? raw_notifier_call_chain+0x49/0x60 [ 205.799060] ? call_netdevice_notifiers_info+0x52/0xa0 [ 205.799065] ? dev_close_many+0xc8/0x120 [ 205.799070] ? unregister_netdevice_many+0x13d/0x890 [ 205.799073] ? unregister_netdevice_queue+0x90/0xe0 [ 205.799076] ? unregister_netdev+0x1d/0x30 [ 205.799080] ? mkiss_close+0x7c/0xc0 [mkiss] [ 205.799084] ? tty_ldisc_close+0x2e/0x40 [ 205.799089] ? tty_ldisc_hangup+0x137/0x210 [ 205.799092] ? __tty_hangup.part.0+0x208/0x350 [ 205.799098] ? tty_vhangup+0x15/0x20 [ 205.799103] ? pty_close+0x127/0x160 [ 205.799108] ? tty_release+0x139/0x5e0 [ 205.799112] ? __fput+0x9f/0x260 [ 205.799118] ax25_dev_device_down+0xd9/0x1b0 [ax25] [ 205.799126] ax25_device_event+0x9f/0x270 [ax25] [ 205.799135] raw_notifier_call_chain+0x49/0x60 [ 205.799140] call_netdevice_notifiers_info+0x52/0xa0 [ 205.799146] dev_close_many+0xc8/0x120 [ 205.799152] unregister_netdevice_many+0x13d/0x890 [ 205.799157] unregister_netdevice_queue+0x90/0xe0 [ 205.799161] unregister_netdev+0x1d/0x30 [ 205.799165] mkiss_close+0x7c/0xc0 [mkiss] [ 205.799170] tty_ldisc_close+0x2e/0x40 [ 205.799173] tty_ldisc_hangup+0x137/0x210 [ 205.799178] __tty_hangup.part.0+0x208/0x350 [ 205.799184] tty_vhangup+0x15/0x20 [ 205.799188] pty_close+0x127/0x160 [ 205.799193] tty_release+0x139/0x5e0 [ 205.799199] __fput+0x9f/0x260 [ 205.799203] ____fput+0xe/0x10 [ 205.799208] task_work_run+0x64/0xa0 [ 205.799213] do_exit+0x33b/0xab0 [ 205.799217] ? __handle_mm_fault+0xc4f/0x15f0 [ 205.799224] do_group_exit+0x35/0xa0 [ 205.799228] __x64_sys_exit_group+0x18/0x20 [ 205.799232] do_syscall_64+0x5c/0x80 [ 205.799238] ? handle_mm_fault+0xba/0x290 [ 205.799242] ? debug_smp_processor_id+0x17/0x20 [ 205.799246] ? fpregs_assert_state_consistent+0x26/0x50 [ 205.799251] ? exit_to_user_mode_prepare+0x49/0x190 [ 205.799256] ? irqentry_exit_to_user_mode+0x9/0x20 [ 205.799260] ? irqentry_exit+0x33/0x40 [ 205.799263] ? exc_page_fault+0x87/0x170 [ 205.799268] ? asm_exc_page_fault+0x8/0x30 [ 205.799273] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 205.799277] RIP: 0033:0x7ff6b80eaca1 [ 205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77. [ 205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1 [ 205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 [ 205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028 [ 205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00 [ 205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00 [ 205.799304] </TASK> Fixes: `feef318c85` ("ax25: fix UAF bugs of net_device caused by rebinding operation") Reported-by: Bernard F6BVP <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 22:06:15 -07:00
Vikas Gupta	08f588fa30	devlink: introduce framework for selftests Add a framework for running selftests. Framework exposes devlink commands and test suite(s) to the user to execute and query the supported tests by the driver. Below are new entries in devlink_nl_ops devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported selftests by the drivers. devlink_nl_cmd_selftests_run: To execute selftests. Users can provide a test mask for executing group tests or standalone tests. Documentation/networking/devlink/ path is already part of MAINTAINERS & the new files come under this path. Hence no update needed to the MAINTAINERS Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:56:53 -07:00
Tariq Toukan	7adc91e0c9	net/tls: Multi-threaded calls to TX tls_dev_del Multiple TLS device-offloaded contexts can be added in parallel via concurrent calls to .tls_dev_add, while calls to .tls_dev_del are sequential in tls_device_gc_task. This is not a sustainable behavior. This creates a rate gap between add and del operations (addition rate outperforms the deletion rate). When running for enough time, the TLS device resources could get exhausted, failing to offload new connections. Replace the single-threaded garbage collector work with a per-context alternative, so they can be handled on several cores in parallel. Use a new dedicated destruct workqueue for this. Tested with mlx5 device: Before: 22141 add/sec, 103 del/sec After: 11684 add/sec, 11684 del/sec Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 21:50:54 -07:00
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00
Ziyang Xuan	85f0173df3	ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr Change net device's MTU to smaller than IPV6_MIN_MTU or unregister device while matching route. That may trigger null-ptr-deref bug for ip6_ptr probability as following. ========================================================= BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134 Read of size 4 at addr 0000000000000308 by task ping6/263 CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14 Call trace: dump_backtrace+0x1a8/0x230 show_stack+0x20/0x70 dump_stack_lvl+0x68/0x84 print_report+0xc4/0x120 kasan_report+0x84/0x120 __asan_load4+0x94/0xd0 find_match.part.0+0x70/0x134 __find_rr_leaf+0x408/0x470 fib6_table_lookup+0x264/0x540 ip6_pol_route+0xf4/0x260 ip6_pol_route_output+0x58/0x70 fib6_rule_lookup+0x1a8/0x330 ip6_route_output_flags_noref+0xd8/0x1a0 ip6_route_output_flags+0x58/0x160 ip6_dst_lookup_tail+0x5b4/0x85c ip6_dst_lookup_flow+0x98/0x120 rawv6_sendmsg+0x49c/0xc70 inet_sendmsg+0x68/0x94 Reproducer as following: Firstly, prepare conditions: $ip netns add ns1 $ip netns add ns2 $ip link add veth1 type veth peer name veth2 $ip link set veth1 netns ns1 $ip link set veth2 netns ns2 $ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1 $ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2 $ip netns exec ns1 ifconfig veth1 up $ip netns exec ns2 ifconfig veth2 up $ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1 $ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1 Secondly, execute the following two commands in two ssh windows respectively: $ip netns exec ns1 sh $while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done $ip netns exec ns1 sh $while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly, then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check. cpu0 cpu1 fib6_table_lookup __find_rr_leaf addrconf_notify [ NETDEV_CHANGEMTU ] addrconf_ifdown RCU_INIT_POINTER(dev->ip6_ptr, NULL) find_match ip6_ignore_linkdown So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to fix the null-ptr-deref bug. Fixes: `dcd1f57295` ("net/ipv6: Remove fib6_idev") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 10:42:44 -07:00
Paolo Abeni	7d85e9cb40	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: PPPoE offload support Marcin Szycik says: Add support for dissecting PPPoE and PPP-specific fields in flow dissector: PPPoE session id and PPP protocol type. Add support for those fields in tc-flower and support offloading PPPoE. Finally, add support for hardware offload of PPPoE packets in switchdev mode in ice driver. Example filter: tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \ 1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to use the new fields (will be submitted soon). ICE COMMS DDP package is required to create a filter in ice. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add support for PPPoE hardware offload flow_offload: Introduce flow_match_pppoe net/sched: flower: Add PPPoE filter flow_dissector: Add PPPoE dissectors ==================== Link: https://lore.kernel.org/r/20220726203133.2171332-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-28 11:54:56 +02:00
Jakub Kicinski	5f10376b6b	add missing includes and forward declarations to networking includes under linux/ Similarly to a recent include/net/ cleanup, this patch adds missing includes to networking headers under include/linux. All these problems are currently masked by the existing users including the missing dependency before the broken header. Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/ v1 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-28 11:29:36 +02:00
Stefan Raspl	8b2fed8e27	net/smc: Pass on DMBE bit mask in IRQ handler Make the DMBE bits, which are passed on individually in ism_move() as parameter idx, available to the receiver. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-27 13:24:42 +01:00
Stefan Raspl	0a2f4f9893	s390/ism: Cleanups Reworked signature of the function to retrieve the system EID: No plausible reason to use a double pointer. And neither to pass in the device as an argument, as this identifier is by definition per system, not per device. Plus some minor consistency edits. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-27 13:24:42 +01:00
Jakub Kicinski	84c61fe1a7	tls: rx: do not use the standard strparser TLS is a relatively poor fit for strparser. We pause the input every time a message is received, wait for a read which will decrypt the message, start the parser, repeat. strparser is built to delineate the messages, wrap them in individual skbs and let them float off into the stack or a different socket. TLS wants the data pages and nothing else. There's no need for TLS to keep cloning (and occasionally skb_unclone()'ing) the TCP rx queue. This patch uses a pre-allocated skb and attaches the skbs from the TCP rx queue to it as frags. TLS is careful never to modify the input skb without CoW'ing / detaching it first. Since we call TCP rx queue cleanup directly we also get back the benefit of skb deferred free. Overall this results in a 6% gain in my benchmarks. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-26 14:38:51 -07:00
Jakub Kicinski	3f92a64e44	tcp: allow tls to decrypt directly from the tcp rcv queue Expose TCP rx queue accessor and cleanup, so that TLS can decrypt directly from the TCP queue. The expectation is that the caller can access the skb returned from tcp_recv_skb() and up to inq bytes worth of data (some of which may be in ->next skbs) and then call tcp_read_done() when data has been consumed. The socket lock must be held continuously across those two operations. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-26 14:38:51 -07:00
Jiri Pirko	7b2d9a1a50	net: devlink: introduce nested devlink entity for line card For the purpose of exposing device info and allow flash update which is going to be implemented in follow-up patches, introduce a possibility for a line card to expose relation to nested devlink entity. The nested devlink entity represents the line card. Example: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-26 13:50:51 -07:00
Luiz Augusto von Dentz	d0be8347c6	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put This fixes the following trace which is caused by hci_rx_work starting up after the final channel reference has been put() during sock_close() but before the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-26 13:35:24 -07:00
Wojciech Drewek	6a21b0856d	flow_offload: Introduce flow_match_pppoe Allow to offload PPPoE filters by adding flow_rule_match_pppoe. Drivers can extract PPPoE specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 10:49:27 -07:00
Wojciech Drewek	46126db9c8	flow_dissector: Add PPPoE dissectors Allow to dissect PPPoE specific fields which are: - session ID (16 bits) - ppp protocol (16 bits) - type (16 bits) - this is PPPoE ethertype, for now only ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC in the future The goal is to make the following TC command possible: # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \ flower \ pppoe_sid 12 \ ppp_proto ip \ action drop Note that only PPPoE Session is supported. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-07-26 09:49:12 -07:00
Paul Chaignon	451ef36bd2	ip_tunnels: Add new flow flags field to ip_tunnel_key This commit extends the ip_tunnel_key struct with a new field for the flow flags, to pass them to the route lookups. This new field will be populated and used in subsequent commits. Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/f8bfd4983bd06685a59b1e3ba76ca27496f51ef3.1658759380.git.paul@isovalent.com	2022-07-26 12:43:16 +02:00
Jakub Kicinski	2baf8ba532	wireless-next patches for v5.20 Third set of patches for v5.20. MLO work continues and we have a lot of stack changes due to that, including driver API changes. Not much driver patches except on mt76. Major changes: cfg80211/mac80211 * more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support, works with one link now * align with IEEE Draft P802.11be_D2.0 * hardware timestamps for receive and transmit mt76 * preparation for new chipset support * ACPI SAR support -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLe1k4RHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZvxlQf8DrZIllhF0q/7Wry3JuG0gbNA+V2eI/lc OYrephsDBm/dvvyjcFWcdUzxoNk0k1+aOrx/09JijHFgCGKVwuK1+hxYVfjW2q43 9mHxJBo4NcMk1RDDM3paVuZ8QMHuYugbv2mQOZeAEq2XloAaqEM7wVE+bb4Mgtgx VAKS5du2igrSt83wl8BRMFb9MPAM1EQ3Cw7Ro5T4y+1Qm/hrBm6qWizSpqh9CXYx pDLR3pvQxiD4Axa0Uq3rUbyF4hLwciqSFOJvr2sI3q7b9YElA7wIi6NQzMkYJH6Z 7HW5K6UIQbblAaQkv2BLqpU1N6puTHUOAf5Md31vOAaOcGbSI5hbUA== =Cnxg -----END PGP SIGNATURE----- Merge tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.20 Third set of patches for v5.20. MLO work continues and we have a lot of stack changes due to that, including driver API changes. Not much driver patches except on mt76. Major changes: cfg80211/mac80211 - more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support, works with one link now - align with IEEE Draft P802.11be_D2.0 - hardware timestamps for receive and transmit mt76 - preparation for new chipset support - ACPI SAR support * tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits) wifi: mac80211: fix link data leak wifi: mac80211: mlme: fix disassoc with MLO wifi: mac80211: add macros to loop over active links wifi: mac80211: remove erroneous sband/link validation wifi: mac80211: mlme: transmit assoc frame with address translation wifi: mac80211: verify link addresses are different wifi: mac80211: rx: track link in RX data wifi: mac80211: optionally implement MLO multicast TX wifi: mac80211: expand ieee80211_mgmt_tx() for MLO wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API wifi: mac80211: report link ID to cfg80211 on mgmt RX wifi: cfg80211: report link ID in NL80211_CMD_FRAME wifi: mac80211: add hardware timestamps for RX and TX wifi: cfg80211: add hardware timestamps to frame RX info wifi: cfg80211/nl80211: move rx management data into a struct wifi: cfg80211: add a function for reporting TX status with hardware timestamps wifi: nl80211: add RX and TX timestamp attributes wifi: ieee80211: add helper functions for detecting TM/FTM frames wifi: mac80211_hwsim: handle links for wmediumd/virtio wifi: mac80211: sta_info: fix link_sta insertion ... ==================== Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-25 18:20:52 -07:00
David S. Miller	e222dc8d84	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-07-20 1) Don't set DST_NOPOLICY in IPv4, a recent patch made this superfluous. From Eyal Birger. 2) Convert alg_key to flexible array member to avoid an iproute2 compile warning when built with gcc-12. From Stephen Hemminger. 3) xfrm_register_km and xfrm_unregister_km do always return 0 so change the type to void. From Zhengchao Shao. 4) Fix spelling mistake in esp6.c From Zhang Jiaming. 5) Improve the wording of comment above XFRM_OFFLOAD flags. From Petr Vaněk. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-25 13:25:39 +01:00
Kuniyuki Iwashima	0273954595	net: Fix data-races around sysctl_[rw]mem(_offset)?. While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-25 12:42:09 +01:00
Dylan Yudaken	72c531f8ef	net: copy from user before calling __get_compat_msghdr this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Matthias May	7074732c8f	ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN The current code allows for VXLAN and GENEVE to inherit the TOS respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6. However when the payload is VLAN encapsulated, then this inheriting does not work, because the visible skb-protocol is of type ETH_P_8021Q or ETH_P_8021AD. Instead of skb->protocol use skb_protocol(). Signed-off-by: Matthias May <matthias.may@westermo.com> Link: https://lore.kernel.org/r/20220721202718.10092-1-matthias.may@westermo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-22 21:47:28 -07:00
Jakub Kicinski	4a934eca7b	bluetooth-next pull request for net-next: - Add support for IM Networks PID 0x3568 - Add support for BCM4349B1 - Add support for CYW55572 - Add support for MT7922 VID/PID 0489/e0e2 - Add support for Realtek RTL8852C - Initial support for Isochronous Channels/ISO sockets - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLbPmsZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKSo7EACc8njgHO2pN8ncWvgu/gH8 0v1lRBoi+Tyzk5gZtdM0rIbE3t7tFqml3Kr0WsCkzV6CGgnqCw5i/MKZXCV8G4tG 0ZsY8y6NMiCFR6wQq3rdNS8NqPmlCHWm6yY2EISEM6qbtF8HwxQvXdkzznZPHgVG DR7i5fAVuGA6rIL+9NSG/TxHjJvq6Bmu3v9Uu/V062I7NrBMw9Jr0Ic1EaUqgYck QL8+653ZZMxYPxt978UekbQIYEp3YwZ5MTACtX86j2s5tlZKuivKTIZch1vSaOi1 1zC6up208+p2/4+Yq7FJ2kA6d2be3FD26oT1xymRhqiMakCRrHfdmTFpC7/J4ZgX /4mcIREkUoO2duim+91Hgt1Ww1vaD4joPwXD6AILbK1bdp0pi0gw47bQF8XO8uIh yPQqnoGWSJGD1VknPh5x7lGcAYQ3bgSg0L3TlQ4gN9Qc/emuC6UOQ9QPvwmyOilG ZrDn2p1Rpsoj8vVRTv6+CgKqLokXNUTPixCAaS4AIygRzhIzwReYYNMYUZgYMrTk Qf6bKqczEut1vq8NZiN3TTxLLOWHB5+3cdl/QRv4ZeoNXMXPmbtJ0y9Vud66GhZu vS6F+BNQapIkEbM6xrC/OepCgzrVoVn2BQuDO6SQA3pg4JxFeAl+V434Va3tqVeK 9h6GHrl8KePjh528NXbH3Q== =Ne87 -----END PGP SIGNATURE----- Merge tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for IM Networks PID 0x3568 - Add support for BCM4349B1 - Add support for CYW55572 - Add support for MT7922 VID/PID 0489/e0e2 - Add support for Realtek RTL8852C - Initial support for Isochronous Channels/ISO sockets - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk * tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits) Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet Bluetooth: btusb: Add support for ISO packets Bluetooth: ISO: Add broadcast support Bluetooth: Add initial implementation of BIS connections Bluetooth: Add BTPROTO_ISO socket type Bluetooth: Add initial implementation of CIS connections Bluetooth: hci_core: Introduce hci_recv_event_data Bluetooth: Convert delayed discov_off to hci_sync Bluetooth: Remove update_scan hci_request dependancy Bluetooth: Remove dead code from hci_request.c Bluetooth: btrtl: Fix typo in comment Bluetooth: MGMT: Fix holding hci_conn reference while command is queued Bluetooth: mgmt: Fix using hci_conn_abort Bluetooth: Use bt_status to convert from errno Bluetooth: Add bt_status Bluetooth: hci_sync: Split hci_dev_open_sync Bluetooth: hci_sync: Refactor remove Adv Monitor Bluetooth: hci_sync: Refactor add Adv Monitor Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR ... ==================== Link: https://lore.kernel.org/r/20220723002232.964796-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-22 19:00:17 -07:00
Luiz Augusto von Dentz	f764a6c2c1	Bluetooth: ISO: Add broadcast support This adds broadcast support for BTPROTO_ISO by extending the sockaddr_iso with a new struct sockaddr_iso_bc where the socket user can set the broadcast address when receiving, the SID and the BIS indexes it wants to synchronize. When using BTPROTO_ISO for broadcast the roles are: Broadcaster -> uses connect with address set to BDADDR_ANY: > tools/isotest -s 00:00:00:00:00:00 Broadcast Receiver -> uses listen with address set to broadcaster: > tools/isotest -d 00:AA:01:00:00:00 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 17:14:13 -07:00
Luiz Augusto von Dentz	eca0ae4aea	Bluetooth: Add initial implementation of BIS connections This adds initial support for BIS/BIG which includes: == Broadcaster role: Setup a periodic advertising and create a BIG == > tools/isotest -s 00:00:00:00:00:00 isotest[63]: Connected [00:00:00:00:00:00] isotest[63]: QoS BIG 0x00 BIS 0x00 Packing 0x00 Framing 0x00] isotest[63]: Output QoS [Interval 10000 us Latency 10 ms SDU 40 PHY 0x02 RTN 2] isotest[63]: Sending ... isotest[63]: Number of packets: 1 isotest[63]: Socket jitter buffer: 80 buffer < HCI Command: LE Set Perio.. (0x08\|0x003e) plen 7 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Parameters (0x08\|0x003e) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Perio.. (0x08\|0x003f) plen 7 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Data (0x08\|0x003f) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Perio.. (0x08\|0x0040) plen 2 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Enable (0x08\|0x0040) ncmd 1 Status: Success (0x00) < HCI Command: LE Create B.. (0x08\|0x0068) plen 31 ... > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08\|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 21 LE Broadcast Isochronous Group Complete (0x1b) ... == Broadcast Receiver role: Create a PA Sync and BIG Sync == > tools/isotest -i hci1 -d 00:AA:01:00:00:00 isotest[66]: Waiting for connection 00:AA:01:00:00:00... < HCI Command: LE Periodic Advert.. (0x08\|0x0044) plen 14 ... > HCI Event: Command Status (0x0f) plen 4 LE Periodic Advertising Create Sync (0x08\|0x0044) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Extended Sca.. (0x08\|0x0041) plen 8 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Scan Parameters (0x08\|0x0041) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Extended Sca.. (0x08\|0x0042) plen 6 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Scan Enable (0x08\|0x0042) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 LE Extended Advertising Report (0x0d) ... > HCI Event: LE Meta Event (0x3e) plen 16 LE Periodic Advertising Sync Established (0x0e) ... < HCI Command: LE Broadcast Isoch.. (0x08\|0x006b) plen 25 ... > HCI Event: Command Status (0x0f) plen 4 LE Broadcast Isochronous Group Create Sync (0x08\|0x006b) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 17 LE Broadcast Isochronous Group Sync Estabilished (0x1d) ... Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 17:13:56 -07:00
Luiz Augusto von Dentz	ccf74f2390	Bluetooth: Add BTPROTO_ISO socket type This introduces a new socket type BTPROTO_ISO which can be enabled with use of ISO Socket experiemental UUID, it can used to initiate/accept connections and transfer packets between userspace and kernel similarly to how BTPROTO_SCO works: Central -> uses connect with address set to destination bdaddr: > tools/isotest -s 00:AA:01:00:00:00 Peripheral -> uses listen: > tools/isotest -d Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 17:13:39 -07:00
Luiz Augusto von Dentz	26afbd826e	Bluetooth: Add initial implementation of CIS connections This adds the initial implementation of CIS connections and introduces the ISO packets/links. == Central: Set CIG Parameters, create a CIS and Setup Data Path == > tools/isotest -s <address> < HCI Command: LE Extended Create... (0x08\|0x0043) plen 26 ... > HCI Event: Command Status (0x0f) plen 4 LE Extended Create Connection (0x08\|0x0043) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 31 LE Enhanced Connection Complete (0x0a) ... < HCI Command: LE Create Connected... (0x08\|0x0064) plen 5 ... > HCI Event: Command Status (0x0f) plen 4 LE Create Connected Isochronous Stream (0x08\|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 LE Connected Isochronous Stream Established (0x19) ... < HCI Command: LE Setup Isochronou.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 < HCI Command: LE Setup Isochronou.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 == Peripheral: Accept CIS and Setup Data Path == > tools/isotest -d HCI Event: LE Meta Event (0x3e) plen 7 LE Connected Isochronous Stream Request (0x1a) ... < HCI Command: LE Accept Co.. (0x08\|0x0066) plen 2 ... > HCI Event: LE Meta Event (0x3e) plen 29 LE Connected Isochronous Stream Established (0x19) ... < HCI Command: LE Setup Is.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 < HCI Command: LE Setup Is.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 17:13:22 -07:00
Jakub Kicinski	b3fce974d4	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-07-22 We've added 73 non-merge commits during the last 12 day(s) which contain a total of 88 files changed, 3458 insertions(+), 860 deletions(-). The main changes are: 1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai. 2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel syscalls through kprobe mechanism, from Andrii Nakryiko. 3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function, from Song Liu & Jiri Olsa. 4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi. 5) Add a ksym BPF iterator to allow for more flexible and efficient interactions with kernel symbols, from Alan Maguire. 6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter. 7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov. 8) libbpf support for writing custom perf event readers, from Jon Doron. 9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar. 10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski. 11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin. 12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev. 13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui. 14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong. 15) Make non-prealloced BPF map allocations low priority to play better with memcg limits, from Yafang Shao. 16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao. 17) Various smaller cleanups and improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits) bpf: Simplify bpf_prog_pack_[size\|mask] bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch) bpf, x64: Allow to use caller address from stack ftrace: Allow IPMODIFY and DIRECT ops on the same function ftrace: Add modify_ftrace_direct_multi_nolock bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF selftests/bpf: Fix test_verifier failed test in unprivileged mode selftests/bpf: Add negative tests for new nf_conntrack kfuncs selftests/bpf: Add tests for new nf_conntrack kfuncs selftests/bpf: Add verifier tests for trusted kfunc args net: netfilter: Add kfuncs to set and change CT status net: netfilter: Add kfuncs to set and change CT timeout net: netfilter: Add kfuncs to allocate and insert CT net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup bpf: Add documentation for kfuncs bpf: Add support for forcing kfunc args to be trusted bpf: Switch to new kfunc flags infrastructure tools/resolve_btfids: Add support for 8-byte BTF sets bpf: Introduce 8-byte BTF set ... ==================== Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-22 16:55:44 -07:00
Wei Wang	4d8f24eeed	Revert "tcp: change pingpong threshold to 3" This reverts commit `4a41f453be`. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: `4a41f453be` ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-22 15:09:10 -07:00
Luiz Augusto von Dentz	dfe6d5c3ec	Bluetooth: hci_core: Introduce hci_recv_event_data This introduces hci_recv_event_data to make it simpler to access the contents of last received event rather than having to pass its contents to the likes of _ind/_cfm callbacks. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 13:20:52 -07:00
Brian Gix	bb87672562	Bluetooth: Remove update_scan hci_request dependancy This removes the remaining calls to HCI_OP_WRITE_SCAN_ENABLE from hci_request call chains, and converts them to hci_sync calls. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 12:55:40 -07:00
Brian Gix	ec2904c259	Bluetooth: Remove dead code from hci_request.c The discov_update work queue is no longer used as a result of the hci_sync rework. The __hci_req_hci_power_on() function is no longer referenced in the code as a result of the hci_sync rework. Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-22 12:55:21 -07:00
Gregory Greenman	9f781533bb	wifi: mac80211: add macros to loop over active links Add a preliminary version which will be updated later to loop over vif's and sta's active links. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:47 +02:00
Johannes Berg	963d0e8d08	wifi: mac80211: optionally implement MLO multicast TX For drivers using software encryption for multicast TX, such as mac80211_hwsim, mac80211 needs to duplicate the multicast frames on each link, if MLO is enabled. Do this, but don't just make it dependent on the key but provide a separate flag for drivers to opt out of this. This is not very efficient, I expect that drivers will do it in firmware/hardware or at least with DMA engine assistence, so this is mostly for hwsim. To make this work, also implement the SNS11 sequence number space that an AP MLD shall have, and modify the API to the __ieee80211_subif_start_xmit() function to always require the link ID bits to be set. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:36 +02:00
Johannes Berg	e1e68b14c5	wifi: mac80211: expand ieee80211_mgmt_tx() for MLO There are a couple of new things that should be possible with MLO: * selecting the link to transmit to a station by link ID, which a previous patch added to the nl80211 API * selecting the link by frequency, similarly * allowing transmittion to an MLD without specifying any channel or link ID, with MLD addresses Enable these use cases. Also fix the address comparison in client mode to use the AP (MLD) address. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:35 +02:00
Johannes Berg	95f498bb49	wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API Allow optionally specifying the link ID to transmit on, which can be done instead of the link frequency, on an MLD addressed frame. Both can also be omitted in which case the frame must be MLD addressed and link selection (and address translation) will be done on lower layers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:33 +02:00
Johannes Berg	6074c9e574	wifi: cfg80211: report link ID in NL80211_CMD_FRAME If given by the underlying driver, report the link ID for MLO in NL80211_CMD_FRAME. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:30 +02:00
Avraham Stern	f9202638df	wifi: mac80211: add hardware timestamps for RX and TX When the low level driver reports hardware timestamps for frame TX status or frame RX, pass the timestamps to cfg80211. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:29 +02:00
Avraham Stern	1ff715ffa0	wifi: cfg80211: add hardware timestamps to frame RX info Add hardware timestamps to management frame RX info. This shall be used by drivers that support hardware timestamping for Timing measurement and Fine timing measurement action frames RX. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:27 +02:00
Avraham Stern	00b3d84010	wifi: cfg80211/nl80211: move rx management data into a struct The functions for reporting rx management take many arguments. Collect all the arguments into a struct, which also make it easier to add more arguments if needed. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:26 +02:00
Avraham Stern	ea7d50c925	wifi: cfg80211: add a function for reporting TX status with hardware timestamps Add a function for reporting TX status with hardware timestamps. This function shall be used for reporting the TX status of Timing measurement and Fine timing measurement action frames by devices that support reporting hardware timestamps. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:24 +02:00
Jakub Kicinski	949d6b405e	net: add missing includes and forward declarations under net/ This patch adds missing includes to headers under include/net. All these problems are currently masked by the existing users including the missing dependency before the broken header. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-22 12:53:22 +01:00
Kuniyuki Iwashima	36eeee75ef	tcp: Fix a data-race around sysctl_tcp_adv_win_scale. While reading sysctl_tcp_adv_win_scale, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-22 12:06:17 +01:00
Lorenzo Bianconi	ef69aa3a98	net: netfilter: Add kfuncs to set and change CT status Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in order to set nf_conn field of allocated entry or update nf_conn status field of existing inserted entry. Use nf_ct_change_status_common to share the permitted status field changes between netlink and BPF side by refactoring ctnetlink_change_status. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi	0b38923644	net: netfilter: Add kfuncs to set and change CT timeout Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in order to change nf_conn timeout. This is same as ctnetlink_change_timeout, hence code is shared between both by extracting it out to __nf_ct_change_timeout. It is also updated to return an error when it sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Apart from this, bpf_ct_set_timeout is only called for newly allocated CT so it doesn't need to inspect the status field just yet. Sharing the helpers even if it was possible would make timeout setting helper sensitive to order of setting status and timeout after allocation. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Lorenzo Bianconi	d7e79c97c0	net: netfilter: Add kfuncs to allocate and insert CT Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry kfuncs in order to insert a new entry from XDP and TC programs. Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common code. We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK. Later this helper will be reused as a helper to set timeout of allocated but not yet inserted CT entry. The allocation functions return struct nf_conn___init instead of nf_conn, to distinguish allocated CT from an already inserted or looked up CT. This is later used to enforce restrictions on what kfuncs allocated CT can be used with. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Luiz Augusto von Dentz	1f7435c8f6	Bluetooth: mgmt: Fix using hci_conn_abort This fixes using hci_conn_abort instead of using hci_conn_abort_sync. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:16:10 -07:00
Luiz Augusto von Dentz	ca2045e059	Bluetooth: Add bt_status This adds bt_status which can be used to convert Unix errno to Bluetooth status. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:15:31 -07:00
Manish Mandlik	7cf5c2978f	Bluetooth: hci_sync: Refactor remove Adv Monitor Make use of hci_cmd_sync_queue for removing an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:55 -07:00
Manish Mandlik	b747a83690	Bluetooth: hci_sync: Refactor add Adv Monitor Make use of hci_cmd_sync_queue for adding an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:32 -07:00
Zijun Hu	63b1a7dd38	Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Core driver addtionally checks LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING to decide if HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting are broken, so remove this unnecessary quirk. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:10 -07:00
Zijun Hu	766ae2422b	Bluetooth: hci_sync: Check LMP feature bit instead of quirk BT core driver should addtionally check LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting are broken. BLUETOOTH CORE SPECIFICATION Version 5.3 \| Vol 2, Part C \| page 587 This feature indicates whether the device is able to support the Packet_Status_Flag and the HCI commands HCI_Write_Default_- Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_- Data_Reporting. the quirk was introduced by 'commit `cde1a8a992` ("Bluetooth: btusb: Fix and detect most of the Chinese Bluetooth controllers")' to mark HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting broken by BT device driver, but the reason why these two HCI commands are broken is that feature "Erroneous Data Reporting" is not enabled by firmware, this scenario is illustrated by below log of QCA controllers with USB I/F: @ RAW Open: hcitool (privileged) version 2.22 < HCI Command: Read Local Supported Commands (0x04\|0x0002) plen 0 > HCI Event: Command Complete (0x0e) plen 68 Read Local Supported Commands (0x04\|0x0002) ncmd 1 Status: Success (0x00) Commands: 288 entries ...... Read Default Erroneous Data Reporting (Octet 18 - Bit 2) Write Default Erroneous Data Reporting (Octet 18 - Bit 3) ...... < HCI Command: Read Default Erroneous Data Reporting (0x03\|0x005a) plen 0 > HCI Event: Command Complete (0x0e) plen 4 Read Default Erroneous Data Reporting (0x03\|0x005a) ncmd 1 Status: Unknown HCI Command (0x01) < HCI Command: Read Local Supported Features (0x04\|0x0003) plen 0 > HCI Event: Command Complete (0x0e) plen 12 Read Local Supported Features (0x04\|0x0003) ncmd 1 Status: Success (0x00) Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87 3 slot packets ...... Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:13:18 -07:00
Dan Carpenter	6f43f6169a	Bluetooth: clean up error pointer checking The bt_skb_sendmsg() function can't return NULL so there is no need to check for that. Several of these checks were removed previously but this one was missed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:11:10 -07:00
Luiz Augusto von Dentz	34a718bc86	Bluetooth: HCI: Fix not always setting Scan Response/Advertising Data The scan response and advertising data needs to be tracked on a per instance (adv_info) since when these instaces are removed so are their data, to fix that new flags are introduced which is used to mark when the data changes and then checked to confirm when the data needs to be synced with the controller. Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:07:30 -07:00
Abhishek Pandit-Subedi	359ee4f834	Bluetooth: Unregister suspend with userchannel When HCI_USERCHANNEL is used, unregister the suspend notifier when binding and register when releasing. The userchannel socket should be left alone after open is completed. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:05:58 -07:00
Schspa Shi	877afadad2	Bluetooth: When HCI work queue is drained, only queue chained work The HCI command, event, and data packet processing workqueue is drained to avoid deadlock in commit `76727c02c1` ("Bluetooth: Call drain_workqueue() before resetting state"). There is another delayed work, which will queue command to this drained workqueue. Which results in the following error report: Bluetooth: hci2: command 0x040f tx timeout WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140 Workqueue: events hci_cmd_timeout RIP: 0010:__queue_work+0xdad/0x1140 RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000 RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08 RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700 R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60 R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800 FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? queue_work_on+0xcb/0x110 ? lockdep_hardirqs_off+0x90/0xd0 queue_work_on+0xee/0x110 process_one_work+0x996/0x1610 ? pwq_dec_nr_in_flight+0x2a0/0x2a0 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x41/0x50 worker_thread+0x665/0x1080 ? process_one_work+0x1610/0x1610 kthread+0x2e9/0x3a0 ? kthread_complete_and_exit+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the timeout workqueue while command workqueue is draining. Fixes: `76727c02c1` ("Bluetooth: Call drain_workqueue() before resetting state") Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:05:22 -07:00
Jakub Kicinski	6e0e846ee2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-21 13:03:39 -07:00
Jakub Kicinski	602ae008ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-20 18:05:51 -07:00
Kuniyuki Iwashima	4845b5713a	tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `35089bb203` ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	3d72bb4188	udp: Fix a data-race around sysctl_udp_l3mdev_accept. While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `63a6fff353` ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	9b55c20f83	ip: Fix data-races around sysctl_ip_prot_sock. sysctl_ip_prot_sock is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. Fixes: `4548b683b7` ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Davide Caratti	ca0cab1192	net/sched: remove qdisc_root_lock() helper the last caller has been removed with commit `96f5e66e8a` ("mac80211: fix aggregation for hardware with ampdu queues"), so it's safe to remove this function. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 17:14:55 -07:00
Taehee Yoo	30e22a6ebc	amt: use workqueue for gateway side message handling There are some synchronization issues(amt->status, amt->req_cnt, etc) if the interface is in gateway mode because gateway message handlers are processed concurrently. This applies a work queue for processing these messages instead of expanding the locking context. So, the purposes of this patch are to fix exist race conditions and to make gateway to be able to validate a gateway status more correctly. When the AMT gateway interface is created, it tries to establish to relay. The establishment step looks stateless, but it should be managed well. In order to handle messages in the gateway, it saves the current status(i.e. AMT_STATUS_XXX). This patch makes gateway code to be worked with a single thread. Now, all messages except the multicast are triggered(received or delay expired), and these messages will be stored in the event queue(amt->events). Then, the single worker processes stored messages asynchronously one by one. The multicast data message type will be still processed immediately. Now, amt->lock is only needed to access the event queue(amt->events) if an interface is the gateway mode. Fixes: `cbc21dc1cf` ("amt: add data plane of amt interface") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-19 12:37:02 +02:00
Jiri Pirko	f655dacb59	net: devlink: remove unused locked functions Remove locked versions of functions that are no longer used by anyone. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	012ec02ae4	netdevsim: convert driver to use unlocked devlink API during init/fini Prepare for devlink reload being called with devlink->lock held and convert the netdevsim driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	eb0e9fa2c6	net: devlink: add unlocked variants of devlink_region_create/destroy() functions Add unlocked variants of devlink_region_create/destroy() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	70a2ff8936	net: devlink: add unlocked variants of devlink_dpipe() functions Add unlocked variants of devlink_dpipe() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:47 -07:00
Jiri Pirko	755cfa69c4	net: devlink: add unlocked variants of devlink_sb() functions Add unlocked variants of devlink_sb() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:47 -07:00
Jiri Pirko	c223d6a4bf	net: devlink: add unlocked variants of devlink_resource() functions Add unlocked variants of devlink_resource() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:46 -07:00
Jiri Pirko	852e85a704	net: devlink: add unlocked variants of devling_trap() functions Add unlocked variants of devl_trap() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:46 -07:00
Kuniyuki Iwashima	55be873695	tcp: Fix a data-race around sysctl_tcp_notsent_lowat. While reading sysctl_tcp_notsent_lowat, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `c9bee3b7fd` ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	39e24435a7	tcp: Fix data-races around some timeout sysctl knobs. While reading these sysctl knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_retries1 - tcp_retries2 - tcp_orphan_retries - tcp_fin_timeout Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	f2f316e287	tcp: Fix data-races around keepalive sysctl knobs. While reading sysctl_tcp_keepalive_(time\|probes\|intvl), they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Jakub Kicinski	c618db2afe	tls: rx: async: hold onto the input skb Async crypto currently benefits from the fact that we decrypt in place. When we allow input and output to be different skbs we will have to hang onto the input while we move to the next record. Clone the inputs and keep them on a list. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:11 +01:00
Jakub Kicinski	53d57999fe	tls: rx: remove the message decrypted tracking We no longer allow a decrypted skb to remain linked to ctx->recv_pkt. Anything on the list is decrypted, anything on ctx->recv_pkt needs to be decrypted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:10 +01:00
Jakub Kicinski	4cbc325ed6	tls: rx: allow only one reader at a time recvmsg() in TLS gets data from the skb list (rx_list) or fresh skbs we read from TCP via strparser. The former holds skbs which were already decrypted for peek or decrypted and partially consumed. tls_wait_data() only notices appearance of fresh skbs coming out of TCP (or psock). It is possible, if there is a concurrent call to peek() and recv() that the peek() will move the data from input to rx_list without recv() noticing. recv() will then read data out of order or never wake up. This is not a practical use case/concern, but it makes the self tests less reliable. This patch solves the problem by allowing only one reader in. Because having multiple processes calling read()/peek() is not normal avoid adding a lock and try to fast-path the single reader case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:24:10 +01:00
Wen Gu	4bc5008e43	net/smc: Introduce a sysctl for setting SMC-R buffer type This patch introduces the sysctl smcr_buf_type for setting the type of SMC-R sndbufs and RMBs. Valid values includes: - SMCR_PHYS_CONT_BUFS, which means use physically contiguous buffers for better performance and is the default value. - SMCR_VIRT_CONT_BUFS, which means use virtually contiguous buffers in case of physically contiguous memory is scarce. - SMCR_MIXED_BUFS, which means first try to use physically contiguous buffers. If not available, then use virtually contiguous buffers. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:19:17 +01:00
Kuniyuki Iwashima	11052589cf	tcp/udp: Make early_demux back namespacified. Commit `e21145a987` ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns. We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp\|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp\|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob. This patch namespacifies (tcp\|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time. Fixes: `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-15 18:50:35 -07:00
Kuniyuki Iwashima	08a75f1067	tcp: Fix data-races around sysctl_tcp_l3mdev_accept. While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `6dd9a14e92` ("net: Allow accepted sockets to be bound to l3mdev domain") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	1a0008f9df	tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept. While reading sysctl_tcp_fwmark_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `84f39b08d7` ("net: support marking accepting TCP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	85d0b4dbd7	ip: Fix a data-race around sysctl_fwmark_reflect. While reading sysctl_fwmark_reflect, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `e110861f86` ("net: add a sysctl to reflect the fwmark on replies") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	289d3b21fb	ip: Fix data-races around sysctl_ip_nonlocal_bind. While reading sysctl_ip_nonlocal_bind, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	60c158dc7b	ip: Fix data-races around sysctl_ip_fwd_use_pmtu. While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `f87c10a8aa` ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima	8281b7ec5c	ip: Fix data-races around sysctl_ip_default_ttl. While reading sysctl_ip_default_ttl, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-15 11:49:55 +01:00
Andrei Otcheretianski	3e0278b717	wifi: mac80211: select link when transmitting to non-MLO stations When an MLO AP is transmitting to a non-MLO station, addr2 should be set to a link address. This should be done before the frame is encrypted as otherwise aad verification would fail. In case of software encryption this can't be left for the device to handle, and should be done by mac80211 when building the frame hdr. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Johannes Berg	7464f66515	wifi: cfg80211: add cfg80211_get_iftype_ext_capa() Add a helper function cfg80211_get_iftype_ext_capa() to look up interface type-specific (extended) capabilities. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:23 +02:00
Gregory Greenman	7840bd468a	wifi: mac80211: remove link_id parameter from link_info_changed() Since struct ieee80211_bss_conf already contains link_id, passing link_id is not necessary. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Gregory Greenman	727eff4dd1	wifi: mac80211: replace link_id with link_conf in switch/(un)assign_vif_chanctx() Since mac80211 already has a protected pointer to link_conf, pass it to the driver to avoid additional RCU locking. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:20 +02:00
Andrei Otcheretianski	67207bab93	wifi: cfg80211/mac80211: Support control port TX from specific link In case of authentication with a legacy station, link addressed EAPOL frames should be sent. Support it. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	4e9c3af398	wifi: nl80211: add EML/MLD capabilities to per-iftype capabilities We have the per-interface type capabilities, currently for extended capabilities, add the EML/MLD capabilities there to have this advertised by the driver. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:19 +02:00
Johannes Berg	19654a61bf	wifi: cfg80211: add ieee80211_chanwidth_rate_flags() To simplify things when we don't have a full chandef, add ieee80211_chanwidth_rate_flags(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:18 +02:00
Gregory Greenman	b327c84c32	wifi: mac80211: replace link_id with link_conf in start/stop_ap() When calling start/stop_ap(), mac80211 already has a protected link_conf pointer. Pass it to the driver, so it shouldn't handle RCU protection. Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	5cd212cb64	wifi: cfg80211: extend cfg80211_rx_assoc_resp() for MLO Extend the cfg80211_rx_assoc_resp() to cover multiple BSSes, the AP MLD address and local link addresses for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	cd47c0f57a	wifi: cfg80211: put cfg80211_rx_assoc_resp() arguments into a struct For MLO we'll need a lot more arguments, including all the BSS pointers and link addresses, so move the data to a struct to be able to extend it more easily later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	e69dac88a1	wifi: cfg80211: adjust assoc comeback for MLO We only report the BSSID to userspace, so change the argument from BSS struct pointer to AP address, which we'll use to carry either the BSSID or AP MLD address. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:17 +02:00
Johannes Berg	f662d2f4e2	wifi: cfg80211: prepare association failure APIs for MLO For MLO, we need the ability to report back multiple BSS structures to release, as well as the AP MLD address (if attempting to make an MLO connection). Unify cfg80211_assoc_timeout() and cfg80211_abandon_assoc() into a new cfg80211_assoc_failure() that gets a structure parameter with the necessary data. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	8f6e0dfc22	wifi: cfg80211: remove BSS pointer from cfg80211_disassoc_request The race described by the comment in mac80211 hasn't existed since the locking rework to use the same lock and for MLO we need to pass the AP MLD address, so just pass the BSSID or AP MLD address instead of the BSS struct pointer, and adjust all the code accordingly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	b65567b03c	wifi: mac80211: mlme: track AP (MLD) address separately To prepare a bit more for MLO in the client code, track the AP's address (for now only the BSSID, but will track the AP MLD's address later) separately from the per-link BSSID. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:16 +02:00
Johannes Berg	b3e2130bf5	wifi: mac80211: change QoS settings API to take link into account Take the link into account in the QoS settings (EDCA parameters) APIs. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	a3b8008dc1	wifi: mac80211: move ps setting to vif config This really shouldn't be in a per-link config, we don't want to let anyone control it that way (if anything, link powersave could be forced through APIs to activate/deactivate a link), and we don't support powersave in software with devices that can do MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	3fbddae46e	wifi: mac80211: provide link ID in link_conf It might be useful to drivers to be able to pass only the link_conf pointer, rather than both the pointer and the link_id; add the link_id to the link_conf to facility that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:15 +02:00
Johannes Berg	23cc6d8c37	wifi: cfg80211: make cfg80211_auth_request::key_idx signed We might assign -1 to it in some cases when key is NULL, which means the key_idx isn't used but can lead to a warning from static checkers such as smatch. Make the struct member signed simply to avoid that, we only need a range of -1..3 anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Johannes Berg	d8675a6351	wifi: mac80211: RCU-ify link/link_conf pointers Since links can be added and removed dynamically, we need to somehow protect the sdata->link[] and vif->link_conf[] array pointers from disappearing when accessing them without locks. RCU-ify the pointers to achieve this, which requires quite a bit of rework. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:14 +02:00
Shaul Triebitz	b95eb7f0ee	wifi: cfg80211/mac80211: separate link params from station params Put the link_station_parameters structure in the station_parameters structure (and remove the station_parameters fields already existing in link_station_parameters). Now, for an MLD station, the default link is added together with the station. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Shaul Triebitz	577e5b8c39	wifi: cfg80211: add API to add/modify/remove a link station Add an API for adding/modifying/removing a link of a station. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-15 11:43:13 +02:00
Jiri Pirko	9a7923668b	net: devlink: make devlink_dpipe_headers_register() return void The return value is not used, so change the return value type to void. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 21:58:46 -07:00
Jakub Kicinski	816cd16883	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h `310731e2f1` ("net: Fix data-races around sysctl_mem.") `e70f3c7012` ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c `747c143072` ("ip: fix dflt addr selection for connected nexthop") `d62607c3fe` ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h `3d8c51b25a` ("net/tls: Check for errors in tls_device_init") `5879031423` ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 15:27:35 -07:00
Maciej Fijalkowski	ca2e1a6270	xsk: Mark napi_id on sendmsg() When application runs in busy poll mode and does not receive a single packet but only sends them, it is currently impossible to get into napi_busy_loop() as napi_id is only marked on Rx side in xsk_rcv_check(). In there, napi_id is being taken from xdp_rxq_info carried by xdp_buff. From Tx perspective, we do not have access to it. What we have handy is the xsk pool. Xsk pool works on a pool of internal xdp_buff wrappers called xdp_buff_xsk. AF_XDP ZC enabled drivers call xp_set_rxq_info() so each of xdp_buff_xsk has a valid pointer to xdp_rxq_info of underlying queue. Therefore, on Tx side, napi_id can be pulled from xs->pool->heads[0].xdp.rxq->napi_id. Hide this pointer chase under helper function, xsk_pool_get_napi_id(). Do this only for sockets working in ZC mode as otherwise rxq pointers would not be initialized. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220707130842.49408-1-maciej.fijalkowski@intel.com	2022-07-14 22:45:34 +02:00
Tariq Toukan	3d8c51b25a	net/tls: Check for errors in tls_device_init Add missing error checks in tls_device_init. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-14 10:12:39 -07:00
James Yonan	9d2f00fb0a	netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * nf_nat_initialized() doesn't modify passed struct nf_conn, so declare as const. This is helpful for code readability and makes it possible to call nf_nat_initialized() with a const struct nf_conn *. Signed-off-by: James Yonan <james@openvpn.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-14 00:24:06 +02:00
Zhengchao Shao	bc5c8260f4	net/sched: remove return value of unregister_tcf_proto_ops Return value of unregister_tcf_proto_ops is unused, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:46:59 +01:00
David S. Miller	736002fb6a	A fairly large set of updates for next, highlights: ath10k * ethernet frame format support rtw89 * TDLS support cfg80211/mac80211 * airtime fairness fixes * EHT support continued, especially in AP mode * initial (and still major) rework for multi-link operation (MLO) from 802.11be/wifi 7 As usual, also many small updates/cleanups/fixes/etc. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOca8ACgkQB8qZga/f l8S2sQ//VyUyfPxKTnos4xLm9cZFYbP4/JAl+e1QwbYpa8TtQFMjyiDq+/mTiowA gS5qdiAllS75MyxH5LuVJ1fSWe7DmSQ1A733gO4cQUxPUtaUrtXWZpsinYT+Vk4J a20kOic/9KCD6j1JFLEFToaDBHxO6Rbqo1knnTuOpMXIV6H/ou0PNlj6Ys66oFLV V5SvsoeIfCXsN3j/8JyGgjIC52LiNLam3VfdalParurY8yAxda0ub9IKvYqL/s3M PZyuHUc0kJsL/2094sjmn6SKZobjTzrOQcLgq4nPXgspp+8YQ+CUf97QS8nH5rBV AOlv7+WOiC9Ext/rBzxwZvjCmJUZSVn44mDMjafzIfTYDn0sB9m4CpqfQpgK5zvC mf+jhvI99VuK3S4Zx/xRhNFZMAZZG65zkJKEACclBL2Bcs9A+z12CPIWvalEb3/k Hk38VlUIMWPQlbcJW7oVTNH8HNpKIuOCecxKWZC+8MDDb/ZhIYhFqFNMb5TnbOBI GMXIDBlfYZgvBKHgwcj9G24QGgm1P+yKGyDcnVH0KPismZwt0gm9R+VX2B4HyBnD neT/7wx8yxsm7ujJIF28CM+BnF9vxZKVPGUS6XhS2aarOKanAalybsm9DKLwlArZ Qlr2rwaTM+ZkHS82Yapv6At97IYvfiq+ju3b940aL3YrOmgHoqs= =smwk -----END PGP SIGNATURE----- Merge tag 'wireless-next-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== A fairly large set of updates for next, highlights: ath10k * ethernet frame format support rtw89 * TDLS support cfg80211/mac80211 * airtime fairness fixes * EHT support continued, especially in AP mode * initial (and still major) rework for multi-link operation (MLO) from 802.11be/wifi 7 As usual, also many small updates/cleanups/fixes/etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:28:52 +01:00
David S. Miller	67de8acdd3	A small set of fixes for * queue selection in mesh/ocb * queue handling on interface stop * hwsim virtio device vs. some other virtio changes * dt-bindings email addresses * color collision memory allocation * a const variable in rtw88 * shared SKB transmit in the ethernet format path * P2P client port authorization -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOcFIACgkQB8qZga/f l8TK6g//dM2kjGZhyDJUnUicUplN6m4sHLeVqqWCJiUaZepg0Zb3zwEhfEjXnYgn nWfFCqyRYN2JgESKFG2LNliAUW954ccu5mAHNoR41SXjwPxPLZblYqdirdtMsbv3 VM6Ar7WKVWqIer103lUOmiH+tSMObuUhfESbFVByutJfRAcWOolEIJdoAQEmqoKt BgU0frkZLGpX9PTzJaT5KmgOnXstrWqdTY1JzLPR93k+fN0kwsOcBtwipqYTombI gcnIMb5eY16EHQES9Rf02PIGDe9Oka2+xr9gfOAwFE5JWgh6j6TwHnXBi6UM5mby /i6owhSS9km1rwTzsqJnpC89zZ1E26e5W7i6tDdQ+70OorSgPjMOGiyPNP+1KX0x P9CfFGV6c2CICCfylva7lQXoBkAUn9uQsimGBOzYY3eWt5gYZKrwNistLKlrZQca qRMRCXApfPvcyPvkX4DEuiJDgi+74nUqm0okIHLVHN4QfAuoq22DzTlTlFiF6OCJ Fj5URCCfwyuwNtaF0W6IH8PnhkD8VQjYHH0RqclQAUaS5yJxj4x///GTGPwYDCxe JcbASQfDOK1QmN4C3vOweym9J5jUdJR4fbvuj2iJhL0qQLrQZrKHoPfu8J5G4EyC rtHAVmz8eI+IQtYsppRpQbRpNtmcj773FXhQ2wNqkZ6Y7i/GtFE= =GrDi -----END PGP SIGNATURE----- Merge tag 'wireless-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A small set of fixes for * queue selection in mesh/ocb * queue handling on interface stop * hwsim virtio device vs. some other virtio changes * dt-bindings email addresses * color collision memory allocation * a const variable in rtw88 * shared SKB transmit in the ethernet format path * P2P client port authorization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 14:27:38 +01:00
Jiri Pirko	277cbb6bc4	net: devlink: move unlocked function prototypes alongside the locked ones Maintain the same order as it is in devlink.c for function prototypes. The most of the locked variants would very likely soon be removed and the unlocked version would be the only one. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 13:49:44 +01:00
Kuniyuki Iwashima	1dace01492	raw: Fix a data-race around sysctl_raw_l3mdev_accept. While reading sysctl_raw_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `6897445fb1` ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:56:49 +01:00
Maksym Glubokiy	83d85bb069	net: extract port range fields from fl_flow_key So it can be used for port range filter offloading. Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-13 12:16:56 +01:00
Zhengchao Shao	5022e221c9	net: change the type of ip_route_input_rcu to static The type of ip_route_input_rcu should be static. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220711073549.8947-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 15:08:45 +02:00
Moshe Shemesh	df539fc62b	devlink: Remove unused functions devlink_rate_leaf_create/destroy The previous patch removed the last usage of the functions devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus, remove these function from devlink API. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 10:26:22 +02:00
Moshe Shemesh	868232f5cd	devlink: Remove unused function devlink_rate_nodes_destroy The previous patch removed the last usage of the function devlink_rate_nodes_destroy(). Thus, remove this function from devlink API. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-07-12 10:26:22 +02:00
Christophe JAILLET	2b8bf3d6c9	net/fq_impl: Use the bitmap API to allocate bitmaps Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/c7bf099af07eb497b02d195906ee8c11fea3b3bd.1657377335.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-11 19:49:38 -07:00
Florian Westphal	6b77205374	netfilter: nf_tables: move nft_cmp_fast_mask to where its used ... and cast result to u32 so sparse won't complain anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	7278b3c1e4	netfilter: nf_tables: add and use BE register load-store helpers Same as the existing ones, no conversions. This is just for sparse sake only so that we no longer mix be16/u16 and be32/u32 types. Alternative is to add __force __beX in various places, but this seems nicer. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:40:46 +02:00
Florian Westphal	6976890e89	netfilter: nf_conntrack: add missing __rcu annotations Access to the hook pointers use correct helpers but the pointers lack the needed __rcu annotation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:15 +02:00
Vlad Buslov	b038177636	netfilter: nf_flow_table: count pending offload workqueue tasks To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate. Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-11 16:25:14 +02:00
sewookseo	e22aa14866	net: Find dst with sk's xfrm policy not ctl_sk If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-11 13:39:56 +01:00
David S. Miller	e45955766b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) refcount_inc_not_zero() is not semantically equivalent to atomic_int_not_zero(), from Florian Westphal. My understanding was that refcount_*() API provides a wrapper to easier debugging of reference count leaks, however, there are semantic differences between these two APIs, where refcount_inc_not_zero() needs a barrier. Reason for this subtle difference to me is unknown. 2) packet logging is not correct for ARP and IP packets, from the ARP family and netdev/egress respectively. Use skb_network_offset() to reach the headers accordingly. 3) set element extension length have been growing over time, replace a BUG_ON by EINVAL which might be triggerable from userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-11 11:58:38 +01:00
Jakub Kicinski	0076cad301	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-09 12:24:16 -07:00
Pablo Neira Ayuso	c39ba4de6b	netfilter: nf_tables: replace BUG_ON by element length check BUG_ON can be triggered from userspace with an element with a large userdata area. Replace it by length check and return EINVAL instead. Over time extensions have been growing in size. Pick a sufficiently old Fixes: tag to propagate this fix. Fixes: `7d7402642e` ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-09 16:25:09 +02:00
Geliang Tang	f7657ff4a7	mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to include/net/mptcp.h. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-09 12:19:23 +01:00
Kent Overstreet	8b11ff098a	9p: Add client parameter to p9_req_put() This is to aid in adding mempools, in the next patch. Link: https://lkml.kernel.org/r/20220704014243.153050-2-kent.overstreet@gmail.com Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-09 14:38:35 +09:00
Kent Overstreet	6cda12864c	9p: Drop kref usage An upcoming patch is going to require passing the client through p9_req_put() -> p9_req_free(), but that's awkward with the kref indirection - so this patch switches to using refcount_t directly. Link: https://lkml.kernel.org/r/20220704014243.153050-1-kent.overstreet@gmail.com Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-09 14:38:12 +09:00
Jakub Kicinski	5879031423	tls: create an internal header include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	50a07aa531	tls: rx: always allocate max possible aad size for decrypt AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:45 -07:00
Jakub Kicinski	2d91ecace6	strparser: pad sk_skb_cb to avoid straddling cachelines sk_skb_cb lives within skb->cb[]. skb->cb[] straddles 2 cache lines, each containing 24B of data. The first cache line does not contain much interesting information for users of strparser, so pad things a little. Previously strp_msg->full_len would live in the first cache line and strp_msg->offset in the second. We need to reorder the 8 byte temp_reg with struct tls_msg to prevent a 4B hole which would push the struct over 48B. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 18:38:44 -07:00
Kuniyuki Iwashima	310731e2f1	net: Fix data-races around sysctl_mem. While reading .sysctl_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-08 12:10:33 +01:00
Jakub Kicinski	83ec88d81a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-07 12:07:37 -07:00
Jakub Kicinski	88527790c0	tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3 Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:56:35 +01:00
Vlad Buslov	052f744f44	net/sched: act_police: allow 'continue' action offload Offloading police with action TC_ACT_UNSPEC was erroneously disabled even though it was supported by mlx5 matchall offload implementation, which didn't verify the action type but instead assumed that any single police action attached to matchall classifier is a 'continue' action. Lack of action type check made it non-obvious what mlx5 matchall implementation actually supports and caused implementers and reviewers of referenced commits to disallow it as a part of improved validation code. Fixes: `b8cd5831c6` ("net: flow_offload: add tc police action parameters") Fixes: `b50e462bc2` ("net/sched: act_police: Add extack messages for offload failure") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-06 12:44:39 +01:00
Vladimir Oltean	d7be266adb	net: sched: provide shim definitions for taprio_offload_{get,free} All callers of taprio_offload_get() and taprio_offload_free() prior to the blamed commit are conditionally compiled based on CONFIG_NET_SCH_TAPRIO. felix_vsc9959.c is different; it provides vsc9959_qos_port_tas_set() even when taprio is compiled out. Provide shim definitions for the functions exported by taprio so that felix_vsc9959.c is able to compile. vsc9959_qos_port_tas_set() in that case is dead code anyway, and ocelot_port->taprio remains NULL, which is fine for the rest of the logic. Fixes: `1c9017e44a` ("net: dsa: felix: keep reference on entire tc-taprio config") Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20220704190241.1288847-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-05 17:50:38 -07:00
Prasanna Vengateshan	092f875131	net: dsa: tag_ksz: add tag handling for Microchip LAN937x The Microchip LAN937X switches have a tagging protocol which is very similar to KSZ tagging. So that the implementation is added to tag_ksz.c and reused common APIs Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-02 16:34:05 +01:00
Dominique Martinet	286c171b86	9p fid refcount: add a 9p_fid_ref tracepoint This adds a tracepoint event for 9p fid lifecycle tracing: when a fid is created, its reference count increased/decreased, and freed. The new 9p_fid_ref tracepoint should help anyone wishing to debug any fid problem such as missing clunk (destroy) or use-after-free. Link: https://lkml.kernel.org/r/20220612085330.1451496-6-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Dominique Martinet	b48dbb998d	9p fid refcount: add p9_fid_get/put wrappers I was recently reminded that it is not clear that p9_client_clunk() was actually just decrementing refcount and clunking only when that reaches zero: make it clear through a set of helpers. This will also allow instrumenting refcounting better for debugging next patch Link: https://lkml.kernel.org/r/20220612085330.1451496-5-asmadeus@codewreck.org Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2022-07-02 18:52:21 +09:00
Paolo Abeni	e918c137db	net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNK There are no more users for the mentioned macros, just drop them. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-01 13:25:00 +01:00
Aloka Dixit	8bc65d38ee	wifi: nl80211: retrieve EHT related elements in AP mode Add support to retrieve EHT capabilities and EHT operation elements passed by the userspace in the beacon template and store the pointers in struct cfg80211_ap_settings to be used by the drivers. Co-developed-by: Vikram Kandukuri <quic_vikram@quicinc.com> Signed-off-by: Vikram Kandukuri <quic_vikram@quicinc.com> Co-developed-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Link: https://lore.kernel.org/r/20220523064904.28523-1-quic_alokad@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 12:37:54 +02:00
Veerendranath Jakkam	ecad3b0b99	wifi: cfg80211: Increase akm_suites array size in cfg80211_crypto_settings Increase akm_suites array size in struct cfg80211_crypto_settings to 10 and advertise the capability to userspace. This allows userspace to send more than two AKMs to driver in netlink commands such as NL80211_CMD_CONNECT. This capability is needed for implementing WPA3-Personal transition mode correctly with any driver that handles roaming internally. Currently, the possible AKMs for multi-AKM connect can include PSK, PSK-SHA-256, SAE, FT-PSK and FT-SAE. Since the count is already 5, increasing the akm_suites array size to 10 should be reasonable for future usecases. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/1653312358-12321-1-git-send-email-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 12:07:08 +02:00
Felix Fietkau	942741dabc	wifi: mac80211: switch airtime fairness back to deficit round-robin scheduling This reverts commits `6a789ba679` and `2433647bc8`. The virtual time scheduler code has a number of issues: - queues slowed down by hardware/firmware powersave handling were not properly handled. - on ath10k in push-pull mode, tx queues that the driver tries to pull from were starved, causing excessive latency - delay between tx enqueue and reported airtime use were causing excessively bursty tx behavior The bursty behavior may also be present on the round-robin scheduler, but there it is much easier to fix without introducing additional regressions Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20220625212411.36675-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:51:41 +02:00
Johannes Berg	7f884baae6	wifi: mac80211: fix a kernel-doc complaint Somehow kernel-doc complains here about strong markup, but we really don't need the [] so just remove that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:42:40 +02:00
Johannes Berg	c8a9415e6d	wifi: cfg80211: remove redundant documentation These struct members no longer exist, remove them from documentation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:36:36 +02:00
Mauro Carvalho Chehab	82757b792b	wifi: mac80211: add a missing comma at kernel-doc markup The lack of the colon makes it not parse the function parameter: include/net/mac80211.h:6250: warning: Function parameter or member 'vif' not described in 'ieee80211_channel_switch_disconnect' Fix it. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/11c1bdb861d89c93058fcfe312749b482851cbdb.1656409369.git.mchehab@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:30:05 +02:00
Mauro Carvalho Chehab	2d8b08fef0	wifi: cfg80211: fix kernel-doc warnings all over the file There are currently 17 kernel-doc warnings on this file: include/net/cfg80211.h:391: warning: Function parameter or member 'bw' not described in 'ieee80211_eht_mcs_nss_supp' include/net/cfg80211.h:437: warning: Function parameter or member 'eht_cap' not described in 'ieee80211_sband_iftype_data' include/net/cfg80211.h:507: warning: Function parameter or member 's1g' not described in 'ieee80211_sta_s1g_cap' include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_beacon' not described in 'cfg80211_color_change_settings' include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_presp' not described in 'cfg80211_color_change_settings' include/net/cfg80211.h:1430: warning: Enum value 'STATION_PARAM_APPLY_STA_TXPOWER' not described in enum 'station_parameters_apply_mask' include/net/cfg80211.h:2195: warning: Function parameter or member 'dot11MeshConnectedToAuthServer' not described in 'mesh_config' include/net/cfg80211.h:2341: warning: Function parameter or member 'short_ssid' not described in 'cfg80211_scan_6ghz_params' include/net/cfg80211.h:3328: warning: Function parameter or member 'kck_len' not described in 'cfg80211_gtk_rekey_data' include/net/cfg80211.h:3698: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_result' include/net/cfg80211.h:3828: warning: Function parameter or member 'global_mcast_stypes' not described in 'mgmt_frame_regs' include/net/cfg80211.h:4977: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_capabilities' include/net/cfg80211.h:5742: warning: Function parameter or member 'u' not described in 'wireless_dev' include/net/cfg80211.h:5742: warning: Function parameter or member 'links' not described in 'wireless_dev' include/net/cfg80211.h:5742: warning: Function parameter or member 'valid_links' not described in 'wireless_dev' include/net/cfg80211.h:6076: warning: Function parameter or member 'is_amsdu' not described in 'ieee80211_data_to_8023_exthdr' include/net/cfg80211.h:6949: warning: Function parameter or member 'sig_dbm' not described in 'cfg80211_notify_new_peer_candidate' Address them, in order to build a better documentation from this header. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lore.kernel.org/r/f6f522cdc716a01744bb0eae2186f4592976222b.1656409369.git.mchehab@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-01 10:28:55 +02:00
Jakub Kicinski	0d8730f07c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c `9c5de246c1` ("net: sparx5: mdb add/del handle non-sparx5 devices") `fbb89d02e3` ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-30 16:31:00 -07:00
Yuwei Wang	211da42eaa	net, neigh: introduce interval_probe_time_ms for periodic probe commit `ed6cd6a178` ("net, neigh: Set lower cap for neigh_managed_work rearming") fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the system work queue hog CPU to 100%, and further more we should introduce a new option used by periodic probe Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-30 13:14:35 +02:00
Vladimir Oltean	3eb4a4c344	net: switchdev: add reminder near struct switchdev_notifier_fdb_info br_switchdev_fdb_notify() creates an on-stack FDB info variable, and initializes it member by member. As such, newly added fields which are not initialized by br_switchdev_fdb_notify() will contain junk bytes from the stack. Other uses of struct switchdev_notifier_fdb_info have a struct initializer which should put zeroes in the uninitialized fields. Add a reminder above the structure for future developers. Recently discussed during review. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-2-schultz.hans+netdev@gmail.com/#24877698 Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-3-schultz.hans+netdev@gmail.com/#24912269 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220628100831.2899434-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:37:36 -07:00
Oleksij Rempel	3d410403a5	net: dsa: add get_pause_stats support Add support for pause stats Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:17:11 -07:00
Lorenzo Bianconi	03895c8414	wifi: mac80211: add gfp_t parameter to ieeee80211_obss_color_collision_notify Introduce the capability to specify gfp_t parameter to ieeee80211_obss_color_collision_notify routine since it runs in interrupt context in ieee80211_rx_check_bss_color_collision(). Fixes: `6d945a33f2` ("mac80211: introduce BSS color collision detection") Co-developed-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-06-29 11:43:15 +02:00
Florian Westphal	e34b9ed96c	netfilter: nf_tables: avoid skb access on nf_stolen When verdict is NF_STOLEN, the skb might have been freed. When tracing is enabled, this can result in a use-after-free: 1. access to skb->nf_trace 2. access to skb->mark 3. computation of trace id 4. dump of packet payload To avoid 1, keep a cached copy of skb->nf_trace in the trace state struct. Refresh this copy whenever verdict is != STOLEN. Avoid 2 by skipping skb->mark access if verdict is STOLEN. 3 is avoided by precomputing the trace id. Only dump the packet when verdict is not "STOLEN". Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-06-27 19:22:54 +02:00
Clément Léger	a08d6a6dc8	net: dsa: add Renesas RZ/N1 switch tag driver The switch that is present on the Renesas RZ/N1 SoC uses a specific VLAN value followed by 6 bytes which contains forwarding configuration. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-27 11:37:55 +01:00
Clément Léger	67f38b1c73	net: dsa: add support for ethtool get_rmon_stats() Add support to allow dsa drivers to specify the .get_rmon_stats() operation. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-27 11:37:55 +01:00
Richard Gobert	ede57d58e6	net: helper function skb_len_add Move the len fields manipulation in the skbs to a helper function. There is a comment specifically requesting this and there are several other areas in the code displaying the same pattern which can be refactored. This improves code readability. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Link: https://lore.kernel.org/r/20220622160853.GA6478@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-24 16:24:38 -07:00
Hangbin Liu	0a2ff7cc8a	Bonding: add per-port priority for failover re-selection Add per port priority support for bonding active slave re-selection during failover. A higher number means higher priority in selection. The primary slave still has the highest priority. This option also follows the primary_reselect rules. This option could only be configured via netlink. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-24 11:27:59 +01:00
Hangbin Liu	f2b3b28ce5	bonding: add slave_dev field for bond_opt_value Currently, bond_opt_value are mostly used for bonding option settings. If we want to set a value for slave, we need to re-alloc a string to store both slave name and vlaue, like bond_option_queue_id_set() does, which is complex and dumb. As Jon suggested, let's add a union field slave_dev for bond_opt_value, which will be benefit for future slave option setting. In function __bond_opt_init(), we will always check the extra field and set it if it's not NULL. Suggested-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-24 11:27:59 +01:00
Zhengchao Shao	f41b284a2c	xfrm: change the type of xfrm_register_km and xfrm_unregister_km Functions xfrm_register_km and xfrm_unregister_km do always return 0, change the type of functions to void. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-06-24 10:19:11 +02:00
Jakub Kicinski	93817be8b6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-23 12:33:24 -07:00
Jakub Kicinski	e34a07c0ae	sock: redo the psock vs ULP protection check Commit `8a59f9d1e3` ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to the new tcp_bpf_update_proto() function. I'm guessing that this was done to allow creating psocks for non-inet sockets. Unfortunately the destruction path for psock includes the ULP unwind, so we need to fail the sk_psock_init() itself. Otherwise if ULP is already present we'll notice that later, and call tcp_update_ulp() with the sk_proto of the ULP itself, which will most likely result in the ULP looping its callbacks. Fixes: `8a59f9d1e3` ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-23 10:08:30 +02:00
Kuniyuki Iwashima	2f7ca90a01	af_unix: Remove unix_table_locks. unix_table_locks are to protect the global hash table, unix_socket_table. The previous commit removed it, so let's clean up the unnecessary locks. Here is a test result on EC2 c5.9xlarge where 10 processes run concurrently in different netns and bind 100,000 sockets for each. without this series : 1m 38s with this series : 11s It is ~10x faster because the global hash table is split into 10 netns in this case. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima	cf2f225e26	af_unix: Put a socket into a per-netns hash table. This commit replaces the global hash table with a per-netns one and removes the global one. We now link a socket in each netns's hash table so we can save some netns comparisons when iterating through a hash bucket. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima	b6e8113830	af_unix: Define a per-netns hash table. This commit adds a per netns hash table for AF_UNIX, which size is fixed as UNIX_HASH_SIZE for now. The first implementation defines a per-netns hash table as a single array of lock and list: struct unix_hashbucket { spinlock_t lock; struct hlist_head head; }; struct netns_unix { struct unix_hashbucket hash; ... }; But, Eric pointed out memory cost that the structure has holes because of sizeof(spinlock_t), which is 4 (or more if LOCKDEP is enabled). [0] It could be expensive on a host with thousands of netns and few AF_UNIX sockets. For this reason, a per-netns hash table uses two dense arrays. struct unix_table { spinlock_t locks; struct hlist_head *buckets; }; struct netns_unix { struct unix_table table; ... }; Note the length of the list has a significant impact rather than lock contention, so having shared locks can be an option. But, per-netns locks and lists still perform better than the global locks and per-netns lists. [1] Also, this patch adds a change so that struct netns_unix disappears from struct net if CONFIG_UNIX is disabled. [0]: https://lore.kernel.org/netdev/CANn89iLVxO5aqx16azNU7p7Z-nz5NrnM5QTqOzueVxEnkVTxyg@mail.gmail.com/ [1]: https://lore.kernel.org/netdev/20220617175215.1769-1-kuniyu@amazon.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima	f302d180c6	af_unix: Include the whole hash table size in UNIX_HASH_SIZE. Currently, the size of AF_UNIX hash table is UNIX_HASH_SIZE * 2, the first half for bind()ed sockets and the second half for unbound ones. UNIX_HASH_SIZE * 2 is used to define the table and iterate over it. In some places, we use ARRAY_SIZE(unix_socket_table) instead of UNIX_HASH_SIZE * 2. However, we cannot use it anymore because we will allocate the hash table dynamically. Then, we would have to add UNIX_HASH_SIZE * 2 in many places, which would be troublesome. This patch adapts the UNIX_HASH_SIZE definition to include bound and unbound sockets and defines a new UNIX_HASH_MOD macro to ease calculations. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-22 12:59:43 +01:00
Eric Dumazet	af185d8c76	raw: complete rcu conversion raw_diag_dump() can use rcu_read_lock() instead of read_lock() Now the hashinfo lock is only used from process context, in write mode only, we can convert it to a spinlock, and we do not need to block BH anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220620100509.3493504-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-06-21 11:38:29 +02:00
Cong Wang	965b57b469	net: Introduce a new proto_ops ->read_skb() Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com	2022-06-20 14:05:52 +02:00

... 2 3 4 5 6 ...

16112 Commits