linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-13 06:32:50 +00:00

Author	SHA1	Message	Date
Kuniyuki Iwashima	1227c1771d	net: Fix data-races around sysctl_[rw]mem_(max\|default). While reading sysctl_[rw]mem_(max\|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
lily	c624c58e08	net/core/skbuff: Check the return value of skb_copy_bits() skb_copy_bits() could fail, which requires a check on the return value. Signed-off-by: Li Zhong <floridsleeves@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:16:48 +01:00
David S. Miller	76de008340	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 12:51:50 +01:00
Yang Yingliang	d5485d9dd2	net: neigh: don't call kfree_skb() under spin_lock_irqsave() It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: `66ba215cb5` ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 09:49:20 +01:00
Eric Dumazet	00cd7bf9f9	netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered. I found this issue while investigating a probable kernel issue causing flakes in tools/testing/selftests/net/ip_defrag.sh In particular, these sysctl changes were ignored: ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1 ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1 This change is inline with commit `8361962392` ("net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net") Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 08:06:44 +02:00
Pablo Neira Ayuso	9afb4b2734	netfilter: flowtable: fix stuck flows on cleanup due to pending work To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	759eebbcfa	netfilter: flowtable: add function to invoke garbage collection immediately Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	e02f0d3970	netfilter: nf_tables: disallow binding to already bound chain Update nft_data_init() to report EINVAL if chain is already bound. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	01e4092d53	netfilter: nft_tunnel: restrict it to netdev family Only allow to use this expression from NFPROTO_NETDEV family. Fixes: `af308b94a2` ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	5f3b7aae14	netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families As it was originally intended, restrict extension to supported families. Fixes: `b96af92d6e` ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	43eb8949cf	netfilter: nf_tables: do not leave chain stats enabled on error Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: `9f08ea8481` ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	7044ab281f	netfilter: nft_payload: do not truncate csum_offset and csum_type Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: `7ec3f7b47b` ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	94254f990c	netfilter: nft_payload: report ERANGE for too long offset and length Instead of offset and length are truncation to u8, report ERANGE. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	ab482c6b66	netfilter: nf_tables: make table handle allocation per-netns friendly mutex is per-netns, move table_netns to the pernet area. read-write to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: `f102d66b33` ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	5dc52d83ba	netfilter: nf_tables: disallow updates of implicit chain Updates on existing implicit chain make no sense, disallow this. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Florian Westphal	18bbc32133	netfilter: nft_tproxy: restrict to prerouting hook TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this. This fixes a crash (null dereference) when using tproxy from e.g. output. Fixes: `4ed8eb6570` ("netfilter: nf_tables: Add native tproxy support") Reported-by: Shell Chen <xierch@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 21:24:34 +02:00
Florian Westphal	cf97769c76	netfilter: conntrack: work around exceeded receive window When a TCP sends more bytes than allowed by the receive window, all future packets can be marked as invalid. This can clog up the conntrack table because of 5-day default timeout. Sequence of packets: 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1] 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8] 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68 Window is 5840 starting from 33212 -> 39052. 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 This is fine, conntrack will flag the connection as having outstanding data (UNACKED), which lowers the conntrack timeout to 300s. 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 Packet 10 is already sending more than permitted, but conntrack doesn't validate this (only seq is tested vs. maxend, not 'seq+len'). 38484 is acceptable, but only up to 39052, so this packet should not have been sent (or only 568 bytes, not 1318). At this point, connection is still in '300s' mode. Next packet however will get flagged: 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326 nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH .. Now, a couple of replies/acks comes in: 12 initiator > responder: [.], ack 34530, win 4368, [.. irrelevant acks removed ] 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0 This ack is significant -- this acks the last packet send by the responder that conntrack considered valid. This means that ack == td_end. This will withdraw the 'unacked data' flag, the connection moves back to the 5-day timeout of established conntracks. 17 initiator > responder: ack 40128, win 10030, ... This packet is also flagged as invalid. Because conntrack only updates state based on packets that are considered valid, packet 11 'did not exist' and that gets us: nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG Because this received and processed by the endpoints, the conntrack entry remains in a bad state, no packets will ever be considered valid again: 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, .. 31 initiator > responder: [.], ack 40433, win 11348, .. 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 .. ... all trigger 'ACK is over bound' test and we end up with non-early-evictable 5-day default timeout. NB: This patch triggers a bunch of checkpatch warnings because of silly indent. I will resend the cleanup series linked below to reduce the indent level once this change has propagated to net-next. I could route the cleanup via nf but that causes extra backport work for stable maintainers. Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8 Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 18:26:48 +02:00
Florian Westphal	7997eff828	netfilter: ebtables: reject blobs that don't provide all entry points Harshit Mogalapalli says: In ebt_do_table() function dereferencing 'private->hook_entry[hook]' can lead to NULL pointer dereference. [..] Kernel panic: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [..] RIP: 0010:ebt_do_table+0x1dc/0x1ce0 Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88 [..] Call Trace: nf_hook_slow+0xb1/0x170 __br_forward+0x289/0x730 maybe_deliver+0x24b/0x380 br_flood+0xc6/0x390 br_dev_xmit+0xa2e/0x12c0 For some reason ebtables rejects blobs that provide entry points that are not supported by the table, but what it should instead reject is the opposite: blobs that DO NOT provide an entry point supported by the table. t->valid_hooks is the bitmask of hooks (input, forward ...) that will see packets. Providing an entry point that is not support is harmless (never called/used), but the inverse isn't: it results in a crash because the ebtables traverser doesn't expect a NULL blob for a location its receiving packets for. Instead of fixing all the individual checks, do what iptables is doing and reject all blobs that differ from the expected hooks. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 18:23:15 +02:00
Vladimir Oltean	855a28f9c9	net: dsa: don't dereference NULL extack in dsa_slave_changeupper() When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed to provide a reason for it, DSA attempts to set the extack to say that software fallback will kick in. The problem is, when we use brctl and the legacy bridge ioctls, the extack will be NULL, and DSA dereferences it in the process of setting it. Sergei Antonov proves this using the following stack trace: Unable to handle kernel NULL pointer dereference at virtual address 00000000 PC is at dsa_slave_changeupper+0x5c/0x158 dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4 __netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78 netdev_master_upper_dev_link from br_add_if+0x430/0x7f4 br_add_if from br_ioctl_stub+0x170/0x530 br_ioctl_stub from br_ioctl_call+0x54/0x7c br_ioctl_call from dev_ifsioc+0x4e0/0x6bc dev_ifsioc from dev_ioctl+0x2f8/0x758 dev_ioctl from sock_ioctl+0x5f0/0x674 sock_ioctl from sys_ioctl+0x518/0xe40 sys_ioctl from ret_fast_syscall+0x0/0x1c Fix the problem by only overriding the extack if non-NULL. Fixes: `1c6e8088d9` ("net: dsa: allow port_bridge_join() to override extack message") Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/ Reported-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220819173925.3581871-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-23 07:54:16 -07:00
Linus Torvalds	072e51356c	NFS client bugfixes for Linux 6.0 Highlights include: Stable fixes - NFS: Fix another fsync() issue after a server reboot Bugfixes - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT - NFS: Fix missing unlock in nfs_unlink() - Add sanity checking of the file type used by __nfs42_ssc_open - Fix a case where we're failing to set task->tk_rpc_status Cleanups - Remove the flag NFS_CONTEXT_RESEND_WRITES that got obsoleted by the fsync() fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmMDk3wACgkQZwvnipYK APJLpw/+ONqG16L5W31/BzGJ80DlG9CERMad7Yt8+lk+ih574k/OrCotHThMyBm9 2TfY3S8zD9QoLnsPesDKeoc6AYyL3el0Wo2vKmWlGvrirvrzNt9nMc61CDMs2IHT kN7gjO2P1LCZln8GTE87C4tI3Pg0Cwr4UUlyHHjMSKdYuJckJugj1gDvblSjn5h4 bGKGEJ9G71G1REn013sVqmQ6huvQ3iif07X5NaN7T5e+TpNFet/0AlTmrA9zsUDI WPm+efP+ieTmihvhqOSYdV31uHN/ECx4p60ITzAlWwPYPyXr1M0r9acUGX10ENna eX1B9nyxbUAzO6rxxPgXi3LXgvmgRDVEmbSs5IL985XR2zsVR+AF3dgAwoJXqV9y 7mAtoiwyqe3idvaK+mHU4OWCSqdhZbauJJ+Jc0ZHZHy2vzHPS2CWcpvXHjVTw63R txOkUFL89SwnqJv03N6CZt4OyY1av97dDOEvPqHuRx4NyfT3v/QvF5W3V/UvLnt2 hTPNGIRUPZU1lpfqEgd7NXWO6LLtkWK2MciRGVnSFf2S5uKYqvlbPsVqWc6CviXc Mu4o2RoctkIwxexSfHY0p7UQrbu3OvYgTuIzgy6cIZ2GK70L29UpJYBe5YEh9Qru J/Pgn1ZSdGgDgwqzR8S92PTbbKq1caOqnGReFdyJDnCetb6LrrA= =tpKO -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Stable fixes: - NFS: Fix another fsync() issue after a server reboot Bugfixes: - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT - NFS: Fix missing unlock in nfs_unlink() - Add sanity checking of the file type used by __nfs42_ssc_open - Fix a case where we're failing to set task->tk_rpc_status Cleanups: - Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the fsync() fix" * tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: RPC level errors should set task->tk_rpc_status NFSv4.2 fix problems with __nfs42_ssc_open NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds NFS: Fix another fsync() issue after a server reboot NFS: Fix missing unlock in nfs_unlink()	2022-08-22 11:40:01 -07:00
Bernard Pidoux	3c53cd65de	rose: check NULL rose_loopback_neigh->loopback Commit `3b3fd068c5` added NULL check for `rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to check rose_loopback_neigh->loopback. It thus prevents all rose connect. The reason is that a special rose_neigh loopback has a NULL device. /proc/net/rose_neigh illustrates it via rose_neigh_show() function : [...] seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu", rose_neigh->number, (rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign), rose_neigh->dev ? rose_neigh->dev->name : "???", rose_neigh->count, /proc/net/rose_neigh displays special rose_loopback_neigh->loopback as callsign RSLOOP-0: addr callsign dev count use mode restart t0 tf digipeaters 00001 RSLOOP-0 ??? 1 2 DCE yes 0 0 By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called even in case rose_loopback_neigh->dev is NULL. This repairs rose connections. Verification with rose client application FPAC: FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h) F6BVP-4 (Commands = ?) : u Users - AX.25 Level 2 sessions : Port Callsign Callsign AX.25 state ROSE state NetRom status axudp F6BVP-5 -> F6BVP-9 Connected Connected --------- Fixes: `3b3fd068c5` ("rose: Fix Null pointer dereference in rose_send_frame()") Signed-off-by: Bernard Pidoux <f6bvp@free.fr> Suggested-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Thomas DL9SAU Osterried <thomas@osterried.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-22 14:24:54 +01:00
Shigeru Yoshida	b1cb8a71f1	batman-adv: Fix hang up with small MTU hard-interface The system hangs up when batman-adv soft-interface is created on hard-interface with small MTU. For example, the following commands create batman-adv soft-interface on dummy interface with zero MTU: # ip link add name dummy0 type dummy # ip link set mtu 0 dev dummy0 # ip link set up dev dummy0 # ip link add name bat0 type batadv # ip link set dev dummy0 master bat0 These commands cause the system hang up with the following messages: [ 90.578925][ T6689] batman_adv: bat0: Adding interface: dummy0 [ 90.580884][ T6689] batman_adv: bat0: The MTU of interface dummy0 is too small (0) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1560 would solve the problem. [ 90.586264][ T6689] batman_adv: bat0: Interface activated: dummy0 [ 90.590061][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320) [ 90.595517][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320) [ 90.598499][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320) This patch fixes this issue by returning error when enabling hard-interface with small MTU size. Fixes: `c6c8fea297` ("net: Add batman-adv meshing protocol") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2022-08-20 14:17:45 +02:00
Trond Myklebust	ed06fce0b0	SUNRPC: RPC level errors should set task->tk_rpc_status Fix up a case in call_encode() where we're failing to set task->tk_rpc_status when an RPC level error occurred. Fixes: `9c5948c248` ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2022-08-19 20:32:05 -04:00
Eyal Birger	7ec9fce4b3	ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels Commit `451ef36bd2` ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key(). VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not. This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths. Fixes: `b8fff74852` ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Paul Chaignon <paul@isovalent.com> Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com	2022-08-18 21:18:28 +02:00
Cong Wang	2e23acd99e	tcp: handle pure FIN case correctly When skb->len==0, the recv_actor() returns 0 too, but we also use 0 for error conditions. This patch amends this by propagating the errors to tcp_read_skb() so that we can distinguish skb->len==0 case from error cases. Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Reported-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 11:04:56 -07:00
Cong Wang	a8688821f3	tcp: refactor tcp_read_skb() a bit As tcp_read_skb() only reads one skb at a time, the while loop is unnecessary, we can turn it into an if. This also simplifies the code logic. Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 11:04:56 -07:00
Cong Wang	c457985aaa	tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it assumes the skb is not yet dequeued. This is no longer true for tcp_read_skb() case where we dequeue the skb first. Fix this by introducing a helper __tcp_cleanup_rbuf() which does not require any skb and calling it in tcp_read_skb(). Fixes: `04919bed94` ("tcp: Introduce tcp_read_skb()") Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 11:04:55 -07:00
Cong Wang	e9c6e79760	tcp: fix sock skb accounting in tcp_read_skb() Before commit `965b57b469` ("net: Introduce a new proto_ops ->read_skb()"), skb was not dequeued from receive queue hence when we close TCP socket skb can be just flushed synchronously. After this commit, we have to uncharge skb immediately after being dequeued, otherwise it is still charged in the original sock. And we still need to retain skb->sk, as eBPF programs may extract sock information from skb->sk. Therefore, we have to call skb_set_owner_sk_safe() here. Fixes: `965b57b469` ("net: Introduce a new proto_ops ->read_skb()") Reported-and-tested-by: syzbot+a0e6f8738b58f7654417@syzkaller.appspotmail.com Tested-by: Stanislav Fomichev <sdf@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 11:04:55 -07:00
Jakub Kicinski	249801360d	net: genl: fix error path memory leak in policy dumping If construction of the array of policies fails when recording non-first policy we need to unwind. netlink_policy_dump_add_policy() itself also needs fixing as it currently gives up on error without recording the allocated pointer in the pstate pointer. Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com Fixes: `50a896cf2d` ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/20220816161939.577583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 10:20:48 -07:00
Vladimir Oltean	211987f3ac	net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: `fd36454131` ("net: dsa: change scope of STP state setter") Reported-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220816201445.1809483-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 21:58:22 -07:00
Jakub Kicinski	bec13ba9ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/20220817140015.25843-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 20:17:45 -07:00
Liu Jian	583585e48d	skmsg: Fix wrong last sg check in sk_msg_recvmsg() Fix one kernel NULL pointer dereference as below: [ 224.462334] Call Trace: [ 224.462394] __tcp_bpf_recvmsg+0xd3/0x380 [ 224.462441] ? sock_has_perm+0x78/0xa0 [ 224.462463] tcp_bpf_recvmsg+0x12e/0x220 [ 224.462494] inet_recvmsg+0x5b/0xd0 [ 224.462534] __sys_recvfrom+0xc8/0x130 [ 224.462574] ? syscall_trace_enter+0x1df/0x2e0 [ 224.462606] ? __do_page_fault+0x2de/0x500 [ 224.462635] __x64_sys_recvfrom+0x24/0x30 [ 224.462660] do_syscall_64+0x5d/0x1d0 [ 224.462709] entry_SYSCALL_64_after_hwframe+0x65/0xca In commit `9974d37ea7` ("skmsg: Fix invalid last sg check in sk_msg_recvmsg()"), we change last sg check to sg_is_last(), but in sockmap redirection case (without stream_parser/stream_verdict/ skb_verdict), we did not mark the end of the scatterlist. Check the sk_msg_alloc, sk_msg_page_add, and bpf_msg_push_data functions, they all do not mark the end of sg. They are expected to use sg.end for end judgment. So the judgment of '(i != msg_rx->sg.end)' is added back here. Fixes: `9974d37ea7` ("skmsg: Fix invalid last sg check in sk_msg_recvmsg()") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20220809094915.150391-1-liujian56@huawei.com	2022-08-17 22:33:20 +02:00
Jakub Kicinski	849f16bbfb	tls: rx: react to strparser initialization errors Even though the normal strparser's init function has a return value we got away with ignoring errors until now, as it only validates the parameters and we were passing correct parameters. tls_strp can fail to init on memory allocation errors, which syzbot duly induced and reported. Reported-by: syzbot+abd45eb849b05194b1b6@syzkaller.appspotmail.com Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-17 10:24:00 +01:00
Nikolay Aleksandrov	17ecd4a4db	xfrm: policy: fix metadata dst->dev xmit null pointer dereference When we try to transmit an skb with metadata_dst attached (i.e. dst->dev == NULL) through xfrm interface we can hit a null pointer dereference[1] in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a loopback skb device when there's no policy which dereferences dst->dev unconditionally. Not having dst->dev can be interepreted as it not being a loopback device, so just add a check for a null dst_orig->dev. With this fix xfrm interface's Tx error counters go up as usual. [1] net-next calltrace captured via netconsole: BUG: kernel NULL pointer dereference, address: 00000000000000c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60 Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02 RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80 RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000 FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0 Call Trace: <TASK> xfrmi_xmit+0xde/0x460 ? tcf_bpf_act+0x13d/0x2a0 dev_hard_start_xmit+0x72/0x1e0 __dev_queue_xmit+0x251/0xd30 ip_finish_output2+0x140/0x550 ip_push_pending_frames+0x56/0x80 raw_sendmsg+0x663/0x10a0 ? try_charge_memcg+0x3fd/0x7a0 ? __mod_memcg_lruvec_state+0x93/0x110 ? sock_sendmsg+0x30/0x40 sock_sendmsg+0x30/0x40 __sys_sendto+0xeb/0x130 ? handle_mm_fault+0xae/0x280 ? do_user_addr_fault+0x1e7/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff35cac1366 Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366 RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003 RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001 </TASK> Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole] CR2: 00000000000000c0 CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Daniel Borkmann <daniel@iogearbox.net> Fixes: `2d151d3907` ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-08-17 11:06:37 +02:00
Geert Uytterhoeven	aa5762c342	netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y NF_CONNTRACK_PROCFS was marked obsolete in commit `54b07dca68` ("netfilter: provide config option to disable ancient procfs parts") in v3.3. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-17 08:46:30 +02:00
Zhengchao Shao	de64b6b6fb	net: sched: fix misuse of qcpu->backlog in gnet_stats_add_queue_cpu In the gnet_stats_add_queue_cpu function, the qstats->qlen statistics are incorrectly set to qcpu->backlog. Fixes: `448e163f8b` ("gen_stats: Add gnet_stats_add_queue()") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220815030848.276746-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-16 19:38:20 -07:00
Zhengchao Shao	5b22f62724	net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg When bulk delete command is received in the rtnetlink_rcv_msg function, if bulk delete is not supported, module_put is not called to release the reference counting. As a result, module reference count is leaked. Fixes: `a6cec0bcd3` ("net: rtnetlink: add bulk delete support flag") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220815024629.240367-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-15 19:58:30 -07:00
Pablo Neira Ayuso	1b6345d416	netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified Since `f3a2181e16` ("netfilter: nf_tables: Support for sets with multiple ranged fields"), it possible to combine intervals and concatenations. Later on, `ef516e8625` ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag for userspace to report that the set stores a concatenation. Make sure NFT_SET_CONCAT is set on if field_count is specified for consistency. Otherwise, if NFT_SET_CONCAT is specified with no field_count, bail out with EINVAL. Fixes: `ef516e8625` ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 18:41:21 +02:00
Pablo Neira Ayuso	fc0ae524b5	netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END These flags are mutually exclusive, report EINVAL in this case. Fixes: `aaa31047a6` ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 17:54:59 +02:00
Pablo Neira Ayuso	88cccd908d	netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags If the NFT_SET_CONCAT\|NFT_SET_INTERVAL flags are set on, then the netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise, NFTA_SET_ELEM_KEY_END should not be present. For catch-all element, NFTA_SET_ELEM_KEY_END should not be present. The NFT_SET_ELEM_INTERVAL_END is never used with this set flags combination. Fixes: `7b225d0b5c` ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 17:54:51 +02:00
Magnus Karlsson	58ca14ed98	xsk: Fix corrupted packets for XDP_SHARED_UMEM Fix an issue in XDP_SHARED_UMEM mode together with aligned mode where packets are corrupted for the second and any further sockets bound to the same umem. In other words, this does not affect the first socket bound to the umem. The culprit for this bug is that the initialization of the DMA addresses for the pre-populated xsk buffer pool entries was not performed for any socket but the first one bound to the umem. Only the linear array of DMA addresses was populated. Fix this by populating the DMA addresses in the xsk buffer pool for every socket bound to the same umem. Fixes: `94033cd8e7` ("xsk: Optimize for aligned case") Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com> Reported-by: Intrusion Shield Team <dnevil@intrusion.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@outlook.com/ Link: https://lore.kernel.org/bpf/20220812113259.531-1-magnus.karlsson@gmail.com	2022-08-15 17:26:07 +02:00
Jamal Hadi Salim	0279957171	net_sched: cls_route: disallow handle of 0 Follows up on: https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/ handle of 0 implies from/to of universe realm which is not very sensible. Lets see what this patch will do: $sudo tc qdisc add dev $DEV root handle 1:0 prio //lets manufacture a way to insert handle of 0 $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \ route to 0 from 0 classid 1:10 action ok //gets rejected... Error: handle of 0 is not valid. We have an error talking to the kernel, -1 //lets create a legit entry.. sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \ classid 1:10 action ok //what did the kernel insert? $sudo tc filter ls dev $DEV parent 1:0 filter protocol ip pref 100 route chain 0 filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 //Lets try to replace that legit entry with a handle of 0 $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \ handle 0x000a8000 route to 0 from 0 classid 1:10 action drop Error: Replacing with handle of 0 is invalid. We have an error talking to the kernel, -1 And last, lets run Cascardo's POC: $ ./poc 0 0 -22 -22 -22 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:46:30 +01:00
Xin Xiong	7396ba87f1	net: fix potential refcount leak in ndisc_router_discovery() The issue happens on specific paths in the function. After both the object `rt` and `neigh` are grabbed successfully, when `lifetime` is nonzero but the metric needs change, the function just deletes the route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh` again if above conditions hold. The function simply overwrite `neigh` if succeeds or returns if fails, without decreasing the reference count of previous `neigh`. This may result in memory leaks. Fix it by decrementing the reference count of `neigh` in place. Fixes: `6b2e04bc24` ("net: allow user to set metric on default route learned via Router Advertisement") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:40:28 +01:00
Alexander Mikhalitsyn	0ff4eb3d5e	neighbour: make proxy_queue.qlen limit per-device Right now we have a neigh_param PROXY_QLEN which specifies maximum length of neigh_table->proxy_queue. But in fact, this limitation doesn't work well because check condition looks like: tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN) The problem is that p (struct neigh_parms) is a per-device thing, but tbl (struct neigh_table) is a system-wide global thing. It seems reasonable to make proxy_queue limit per-device based. v2: - nothing changed in this patch v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:25:09 +01:00
Denis V. Lunev	66ba215cb5	neigh: fix possible DoS due to net iface start/stop loop Normal processing of ARP request (usually this is Ethernet broadcast packet) coming to the host is looking like the following: * the packet comes to arp_process() call and is passed through routing procedure * the request is put into the queue using pneigh_enqueue() if corresponding ARP record is not local (common case for container records on the host) * the request is processed by timer (within 80 jiffies by default) and ARP reply is sent from the same arp_process() using NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside pneigh_enqueue()) And here the problem comes. Linux kernel calls pneigh_queue_purge() which destroys the whole queue of ARP requests on ANY network interface start/stop event through __neigh_ifdown(). This is actually not a problem within the original world as network interface start/stop was accessible to the host 'root' only, which could do more destructive things. But the world is changed and there are Linux containers available. Here container 'root' has an access to this API and could be considered as untrusted user in the hosting (container's) world. Thus there is an attack vector to other containers on node when container's root will endlessly start/stop interfaces. We have observed similar situation on a real production node when docker container was doing such activity and thus other containers on the node become not accessible. The patch proposed doing very simple thing. It drops only packets from the same namespace in the pneigh_queue_purge() where network interface state change is detected. This is enough to prevent the problem for the whole node preserving original semantics of the code. v2: - do del_timer_sync() if queue is empty after pneigh_queue_purge() v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:25:09 +01:00
Maxim Kochetkov	68a838b84e	net: qrtr: start MHI channel after endpoit creation MHI channel may generates event/interrupt right after enabling. It may leads to 2 race conditions issues. 1) Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check: if (!qdev \|\| mhi_res->transaction_status) return; Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at this moment. In this situation qrtr-ns will be unable to enumerate services in device. --------------------------------------------------------------- 2) Such event may come at the moment after dev_set_drvdata() and before qrtr_endpoint_register(). In this case kernel will panic with accessing wrong pointer at qcom_mhi_qrtr_dl_callback(): rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr, mhi_res->bytes_xferd); Because endpoint is not created yet. -------------------------------------------------------------- So move mhi_prepare_for_transfer_autoqueue after endpoint creation to fix it. Fixes: `a2e2cc0dbb` ("net: qrtr: Start MHI channels during init") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-15 11:21:42 +01:00
Hongbin Wang	7778856731	ip6_tunnel: Fix the type of functions Functions ip6_tnl_change, ip6_tnl_update and ip6_tnl0_update do always return 0, change the type of functions to void. Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-13 10:27:36 +01:00
Pablo Neira Ayuso	5a2f3dc318	netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag If the NFTA_SET_ELEM_OBJREF netlink attribute is present and NFT_SET_OBJECT flag is set on, report EINVAL. Move existing sanity check earlier to validate that NFT_SET_OBJECT requires NFTA_SET_ELEM_OBJREF. Fixes: `8aeff920dc` ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-12 16:32:16 +02:00
Xin Xiong	bfc48f1b05	net/sunrpc: fix potential memory leaks in rpc_sysfs_xprt_state_change() The issue happens on some error handling paths. When the function fails to grab the object `xprt`, it simply returns 0, forgetting to decrease the reference count of another object `xps`, which is increased by rpc_sysfs_xprt_kobj_get_xprt_switch(), causing refcount leaks. Also, the function forgets to check whether `xps` is valid before using it, which may result in NULL-dereferencing issues. Fix it by adding proper error handling code when either `xprt` or `xps` is NULL. Fixes: `5b7eb78486` ("SUNRPC: take a xprt offline using sysfs") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-12 11:21:28 +01:00
Mikulas Patocka	9f414eb409	rds: add missing barrier to release_refill The functions clear_bit and set_bit do not imply a memory barrier, thus it may be possible that the waitqueue_active function (which does not take any locks) is moved before clear_bit and it could miss a wakeup event. Fix this bug by adding a memory barrier after clear_bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-12 10:46:01 +01:00
Linus Torvalds	7ebfc85e2c	Including fixes from bluetooth, bpf, can and netfilter. A little longer PR than usual but it's all fixes, no late features. It's long partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmL1TtkACgkQMUZtbf5S Iruz8Q/+O5xFFsjxuyZD0Mw9d3Jeo3ZI9PeeDvcYl5dZXVegpxqorujTFntxv1Ad JC8o5qqms3kO51d+W/yai6iDacEHX2YcJrupZve+vGvpOEVmBRY5O0E1AckJ18+u ItmjSVESkybUP5P08/An7Y0dMmj9Xb2z84dGkLe+n8lg6/fimo6Ki6yZjcOBOALu AYquMXUcnwztRMbTFjscbJjBd4xFMKZEtthljYtPdIReIN976wmMNYYx+jcPK7ha g39Kv6maklp4euerkGIJ/AMnOWHaOGCFjIaz7rr4444NDfrKdt/jeirUXJaz77Jo TJM2UOwgOeg6WZkSa3cmdq6UdjdkJ6LTe2CJFf1wJ1qfhAi+s8yWoszsM2Enp+66 c/mo9jTCMAjmgEJF11idZuz2S697/5j0hvbfM3ZPgNyNBgn8qxz/Z56fNOisx95u TkoKKFnGH+mcm/et+omBcyLBtBVK2+/6B6mpl6btf4DOkPn5KFYWHV67uV3ksHzQ ye+pnzidoIG0yKbRM2EQKXk7ELKROpl52xUHko93ZinMJt0Q7jBm7tZhJozNFEzi hWgUvpmNXgawzLYQcJ9jJmKw3PmYZnRhvYZB/1r91YamM28Hd58k9WfpWtUtjYJN N0X58L6JSnKPqzR70pcFppz6iBlh0tHdcEQGWhhKU5ScS3FDxGc= =C5Ck -----END PGP SIGNATURE----- Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...	2022-08-11 13:45:37 -07:00
Linus Torvalds	786da5da56	We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmL1HF8THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHziwOuB/97JKHFuOlP1HrD6fYe5a0ul9zC9VG4 57XPDNqG2PSmfXCjvZhyVU4n53sUlJTqzKDSTXydoPCMQjtyHvysA6gEvcgUJFPd PHaZDCd9TmqX8my67NiTK70RVpNR9BujJMVMbOfM+aaisl0K6WQbitO+BfhEiJcK QStdKm5lPyf02ESH9jF+Ga0DpokARaLbtDFH7975owxske6gWuoPBCJNrkMooKiX LjgEmNgH1F/sJSZXftmKdlw9DtGBFaLQBdfbfSB5oVPRb7chI7xBeraNr6Od3rls o4davbFkcsOr+s6LJPDH2BJobmOg+HoMoma7ezspF7ZqBF4Uipv5j3VC =1427 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while" * tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits) libceph: clean up ceph_osdc_start_request prototype libceph: fix ceph_pagelist_reserve() comment typo ceph: remove useless check for the folio ceph: don't truncate file in atomic_open ceph: make f_bsize always equal to f_frsize ceph: flush the dirty caps immediatelly when quota is approaching libceph: print fsid and epoch with osd id libceph: check pointer before assigned to "c->rules[]" ceph: don't get the inline data for new creating files ceph: update the auth cap when the async create req is forwarded ceph: make change_auth_cap_ses a global symbol ceph: fix incorrect old_size length in ceph_mds_request_args ceph: switch back to testing for NULL folio->private in ceph_dirty_folio ceph: call netfs_subreq_terminated with was_async == false ceph: convert to generic_file_llseek ceph: fix the incorrect comment for the ceph_mds_caps struct ceph: don't leak snap_rwsem in handle_cap_grant ceph: prevent a client from exceeding the MDS maximum xattr size ceph: choose auth MDS for getxattr with the Xs caps ceph: add session already open notify support ...	2022-08-11 12:41:07 -07:00
Pablo Neira Ayuso	271c5ca826	netfilter: nf_tables: really skip inactive sets when allocating name While looping to build the bitmap of used anonymous set names, check the current set in the iteration, instead of the one that is being created. Fixes: `37a9cc5255` ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 18:53:48 +02:00
Florian Westphal	0b2f3212b5	netfilter: nfnetlink: re-enable conntrack expectation events To avoid allocation of the conntrack extension area when possible, the default behaviour was changed to only allocate the event extension if a userspace program is subscribed to a notification group. Problem is that while 'conntrack -E' does enable the event allocation behind the scenes, 'conntrack -E expect' does not: no expectation events are delivered unless user sets "net.netfilter.nf_conntrack_events" back to 1 (always on). Fix the autodetection to also consider EXP type group. We need to track the 6 event groups (3+3, new/update/destroy for events and for expectations each) independently, else we'd disable events again if an expectation group becomes empty while there is still an active event group. Fixes: `2794cdb0b9` ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-11 18:09:54 +02:00
Florian Westphal	2024439bd5	netfilter: nf_tables: fix scheduling-while-atomic splat nf_tables_check_loops() can be called from rhashtable list walk so cond_resched() cannot be used here. Fixes: `81ea010667` ("netfilter: nf_tables: add rescheduling points during loop detection walks") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 17:57:28 +02:00
Florian Westphal	976bf59c69	netfilter: nf_ct_irc: cap packet search space to 4k This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. In practice, IRC commands are not expected to exceed 512 bytes, plus this is interactive protocol, so we should not see large packets in practice. Given most IRC connections nowadays use TLS so this helper could also be removed in the near future. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:51:13 +02:00
Florian Westphal	c783a29c7e	netfilter: nf_ct_ftp: prefer skb_linearize This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. Use skb_linearize. It should be possible to rewrite this to properly deal with segmented skbs (i.e., only do small chunk-wise accesses), but this is going to be a lot more intrusive than this because every helper function needs to get the sk_buff instead of a pointer to a raw data buffer. In practice, provided we're really looking at FTP control channel packets, there should never be a case where we deal with huge packets. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:51:01 +02:00
Florian Westphal	f3e124c36f	netfilter: nf_ct_h323: cap packet size at 64k With BIG TCP, packets generated by tcp stack may exceed 64kb. Cap datalen at 64kb. The internal message format uses 16bit fields, so no embedded message can exceed 64k size. Multiple h323 messages in a single superpacket may now result in a message to get treated as incomplete/truncated, but thats better than scribbling past h323_buffer. Another alternative suitable for net tree would be a switch to skb_linearize(). Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:50:49 +02:00
Florian Westphal	a664375da7	netfilter: nf_ct_sane: remove pseudo skb linearization For historical reason this code performs pseudo linearization of skbs via skb_header_pointer and a global 64k buffer. With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb. Rewrite this to only extract the needed header data. This also allows to get rid of the locking. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:50:25 +02:00
Maxim Mikityanskiy	94ce3b64c6	net/tls: Use RCU API to access tls_ctx->netdev Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: `89df6a8104` ("net/bonding: Implement TLS TX device offload") Fixes: `c55dcdd435` ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 22:58:43 -07:00
Jakub Kicinski	d800a7b357	tls: rx: device: don't try to copy too much on detach Another device offload bug, we use the length of the output skb as an indication of how much data to copy. But that skb is sized to offset + record length, and we start from offset. So we end up double-counting the offset which leads to skb_copy_bits() returning -EFAULT. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 22:53:25 -07:00
Jakub Kicinski	86b259f6f8	tls: rx: device: bound the frag walk We can't do skb_walk_frags() on the input skbs, because the input skbs is really just a pointer to the tcp read queue. We need to bound the "is decrypted" check by the amount of data in the message. Note that the walk in tls_device_reencrypt() is after a CoW so the skb there is safe to walk. Actually in the current implementation it can't have frags at all, but whatever, maybe one day it will. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 22:53:25 -07:00
Thadeu Lima de Souza Cascardo	9ad36309e2	net_sched: cls_route: remove from list when handle is 0 When a route filter is replaced and the old filter has a 0 handle, the old one won't be removed from the hashtable, while it will still be freed. The test was there since before commit `1109c00547` ("net: sched: RCU cls_route"), when a new filter was not allocated when there was an old one. The old filter was reused and the reinserting would only be necessary if an old filter was replaced. That was still wrong for the same case where the old handle was 0. Remove the old filter from the list independently from its handle value. This fixes CVE-2022-2588, also reported as ZDI-CAN-17440. Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Kamal Mostafa <kamal@canonical.com> Cc: <stable@vger.kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 22:53:11 -07:00
Jakub Kicinski	fbe8870f72	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-08-10 We've added 23 non-merge commits during the last 7 day(s) which contain a total of 19 files changed, 424 insertions(+), 35 deletions(-). The main changes are: 1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao. 2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia. 3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu. 4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev. 5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney. 6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi. 7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa. 8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai. 9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address offset buffer, from Aijun Sun. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator bpf: Check the validity of max_rdwr_access for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for hash map iterator bpf: Acquire map uref in .init_seq_private for array map iterator bpf: Disallow bpf programs call prog_run command. bpf, arm64: Fix bpf trampoline instruction endianness selftests/bpf: Add test for prealloc_lru_pop bug bpf: Don't reinit map value in prealloc_lru_pop bpf: Allow calling bpf_prog_test kfuncs in tracing programs bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf bpf: Use proper target btf when exporting attach_btf_obj_id mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled bpf: Cleanup ftrace hash in bpf_trampoline_put BPF: Fix potential bad pointer dereference in bpf_sys_bpf() ... ==================== Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 21:48:15 -07:00
Hawkins Jiawei	2a0133723f	net: fix refcount bug in sk_psock_get (2) Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit `f1ff5ce2cd` ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-10 21:47:58 -07:00
Linus Torvalds	aeb6e6ac18	NFS client updates for Linux 5.20 Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmLzy24ACgkQZwvnipYK APJJvhAAnVmv3jeOLjwDm+3X9xBevTq8PXAikWVeJgbSKCdjfqXO1J+XF49MpxXl N8PQZMqnwWkF3WvhvYobblwGSl6gJ3RnhVuTgdv4jiPl0ZWyS3XngJY0dCTnNgdL 5O9AjoOMtnGVZwN5j8agymA8f2TUcel5mED6sAk10t2zDZY7VxuQqjp0m696jYjF 0PKBmxoC4+6tXtYJS7d2PGRCTjfEUx2BLnnGuLOKsB7X8f63XmxJiu8/AIiY7spr M/M+BjAF3ok86dT1LjGlkvSNp23H70Wmsv98udfvxIWBm1l4972oCR/CS+kh3/mU dYM+oQ3JxxgKEN7Fdak+zU/+qma9q5z2rPFpSIT1fMEuaKN/7H2cbiHPi5RnEBLa AHWilX/lWBIMnZJZd9g3yYcGe6E/pkT6TqW5JY+2510koyfNER4IismAWMx2iYKU D0WSZOkmEBS/OYZxpTnqGwvS4L1szo9DN3c+yG2KXLifnmVPpjXZO25wahqSuUo3 V6eYUCXRJmVg+IuXGsMNdrjxGYxD12xChoYzx5RlXls2lwHGeZr+iG3aL3+XayHa I1Kji3500UmfEOUUUr4UiQ428dOdL3QqNzVzdymN8Vh4d7v64LUL0GSseY+10Xrs xcbR6l/hwjBIo+I1Bi2mmv3W10tKErFy9eBIKzql3D6VHg7ESOo= =U00h -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements" * tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFS: Improve readpage/writepage tracing NFS: Improve O_DIRECT tracing NFS: Improve write error tracing NFS: don't unhash dentry during unlink/rename NFSv4/pnfs: Fix a use-after-free bug in open NFS: nfs_async_write_reschedule_io must not recurse into the writeback code SUNRPC: Don't reuse bvec on retransmission of the request SUNRPC: Reinitialise the backchannel request buffers before reuse NFSv4.1: RECLAIM_COMPLETE must handle EACCES NFSv4.1 probe offline transports for trunking on session creation SUNRPC create a function that probes only offline transports SUNRPC export xprt_iter_rewind function SUNRPC restructure rpc_clnt_setup_test_and_add_xprt NFSv4.1 remove xprt from xprt_switch if session trunking test fails SUNRPC create an rpc function that allows xprt removal from rpc_clnt SUNRPC enable back offline transports in trunking discovery SUNRPC create an iterator to list only OFFLINE xprts NFSv4.1 offline trunkable transports on DESTROY_SESSION SUNRPC add function to offline remove trunkable transports SUNRPC expose functions for offline remote xprt functionality ...	2022-08-10 14:04:32 -07:00
Hou Tao	52bd05eb7c	bpf: Check the validity of max_rdwr_access for sock local storage map iterator The value of sock local storage map is writable in map iterator, so check max_rdwr_access instead of max_rdonly_access. Fixes: `5ce6e77c7e` ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-10 10:12:48 -07:00
Hou Tao	f0d2b2716d	bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator sock_map_iter_attach_target() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in sock_map_iter_detach_target() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Fixing it by acquiring an extra map uref in .init_seq_private and releasing it in .fini_seq_private. Fixes: `0365351524` ("net: Allow iterating sockmap and sockhash") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-10 10:12:48 -07:00
Hou Tao	3c5f6e698b	bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and releasing it in bpf_iter_fini_sk_storage_map(). Fixes: `5ce6e77c7e` ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-10 10:12:48 -07:00
Pablo Neira Ayuso	c485c35ff6	netfilter: nf_tables: possible module reference underflow in error path dst->ops is set on when nft_expr_clone() fails, but module refcount has not been bumped yet, therefore nft_expr_destroy() leads to module reference underflow. Fixes: `8cfd9b0f85` ("netfilter: nftables: generalize set expressions support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso	4963674c2e	netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces the flag notation. Fixes: `7b225d0b5c` ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso	3400278328	netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access The generation ID is bumped from the commit path while holding the mutex, however, netlink dump operations rely on RCU. This patch also adds missing cb->base_eq initialization in nf_tables_dump_set(). Fixes: `38e029f14a` ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Ido Schimmel	6b4db2e528	devlink: Fix use-after-free after a failed reload After a failed devlink reload, devlink parameters are still registered, which means user space can set and get their values. In the case of the mlxsw "acl_region_rehash_interval" parameter, these operations will trigger a use-after-free [1]. Fix this by rejecting set and get operations while in the failed state. Return the "-EOPNOTSUPP" error code which does not abort the parameters dump, but instead causes it to skip over the problematic parameter. Another possible fix is to perform these checks in the mlxsw parameter callbacks, but other drivers might be affected by the same problem and I am not aware of scenarios where these stricter checks will cause a regression. [1] mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev mlxsw_spectrum3 0000:00:10.0: Failed to create ports ================================================================== BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777 CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1 Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106 print_address_description mm/kasan/report.c:313 [inline] print_report.cold+0x5e/0x5cf mm/kasan/report.c:429 kasan_report+0xb9/0xf0 mm/kasan/report.c:491 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854 devlink_param_get net/core/devlink.c:4981 [inline] devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089 devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168 devlink_ns_change_notify net/core/devlink.c:4417 [inline] devlink_ns_change_notify net/core/devlink.c:4396 [inline] devlink_reload+0x15f/0x700 net/core/devlink.c:4507 devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272 ops_pre_exit_list net/core/net_namespace.c:152 [inline] cleanup_net+0x494/0xc00 net/core/net_namespace.c:582 process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289 worker_thread+0x675/0x10b0 kernel/workqueue.c:2436 kthread+0x30c/0x3d0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> The buggy address belongs to the physical page: page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc flags: 0x100000000000000(node=0\|zone=1) raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Fixes: `98bbf70c1c` ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-10 13:48:04 +01:00
Peilin Ye	a3e7b29e30	vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() Imagine two non-blocking vsock_connect() requests on the same socket. The first request schedules @connect_work, and after it times out, vsock_connect_timeout() sets sock state back to TCP_CLOSE, but keeps socket state as SS_CONNECTING. Later, the second request returns -EALREADY, meaning the socket "already has a pending connection in progress", even though the first request has already timed out. As suggested by Stefano, fix it by setting socket state back to SS_UNCONNECTED, so that the second request will return -ETIMEDOUT. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-10 09:50:18 +01:00
Peilin Ye	7e97cfed99	vsock: Fix memory leak in vsock_connect() An O_NONBLOCK vsock_connect() request may try to reschedule @connect_work. Imagine the following sequence of vsock_connect() requests: 1. The 1st, non-blocking request schedules @connect_work, which will expire after 200 jiffies. Socket state is now SS_CONNECTING; 2. Later, the 2nd, blocking request gets interrupted by a signal after a few jiffies while waiting for the connection to be established. Socket state is back to SS_UNCONNECTED, but @connect_work is still pending, and will expire after 100 jiffies. 3. Now, the 3rd, non-blocking request tries to schedule @connect_work again. Since @connect_work is already scheduled, schedule_delayed_work() silently returns. sock_hold() is called twice, but sock_put() will only be called once in vsock_connect_timeout(), causing a memory leak reported by syzbot: BUG: memory leak unreferenced object 0xffff88810ea56a40 (size 1232): comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace: [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930 [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989 [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734 [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203 [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468 [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline] [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561 [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline] [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline] [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568 [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae <...> Use mod_delayed_work() instead: if @connect_work is already scheduled, reschedule it, and undo sock_hold() to keep the reference count balanced. Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-10 09:50:18 +01:00
Topi Miettinen	2cd0e8dba7	netlabel: fix typo in comment 'IPv4 and IPv4' should be 'IPv4 and IPv6'. Signed-off-by: Topi Miettinen <toiwoton@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-10 09:24:41 +01:00
David S. Miller	e7f164955f	linux-can-fixes-for-6.0-20220810 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmLzWagTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCtfkuQ2KDTXU/wB/4lE4h9FGmCkAT14+y/83zPZ+eK3Sym EjtOLJA6RX7WwsIICU5xTGr8imQQIju0Q2fMzmaweT3+bley8uvdN6zUVhf5UyjX 7wmud4x5TfZxj22EUQM+MmWCuAiUet3zf9ad+2zdUsCNWI6VH6kwGYHZAC5JhZz/ zZqG8Z+oB/nt0ykMsmNHea6w60P3DDD5icAqY6J8nkIOozpxhm1anRGshYj88YwH CRwXZv1DgjwMgJyPMMyM8xWb1zEOsPDOu6HgzChnphLZTe+XL/prU5TZazG52aNU yEq5ooV7kT0ld5enClyY5v4voTlR+TAgJshpdiVV19peZXKtslQA+dfk =sGmZ -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-6.0-20220810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== this is a pull request of 4 patches for net/master, with the whitespace issue fixed. Fedor Pchelkin contributes 2 fixes for the j1939 CAN protocol. A patch by me for the ems_usb driver fixes an unaligned access warning. Sebastian Würl's patch for the mcp251x driver fixes a race condition in the receive interrupt. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-10 09:19:50 +01:00
Matthias May	ab7e2e0dfa	ipv6: do not use RT_TOS for IPv6 flowlabel According to Guillaume Nault RT_TOS should never be used for IPv6. Quote: RT_TOS() is an old macro used to interprete IPv4 TOS as described in the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4 code, although, given the current state of the code, most of the existing calls have no consequence. But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS" field to be interpreted the RFC 1349 way. There's no historical compatibility to worry about. Fixes: `571912c69f` ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-09 22:19:21 -07:00
Jakub Kicinski	690bf64395	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Harden set element field checks to avoid out-of-bound memory access, this patch also fixes the type of issue described in `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data") in a broader way. 2) Patches to restrict the chain, set, and rule id lookup in the transaction to the corresponding top-level table, patches from Thadeu Lima de Souza Cascardo. 3) Fix incorrect comment in ip6t_LOG.h 4) nft_data_init() performs upfront validation of the expected data. struct nft_data_desc is used to describe the expected data to be received from userspace. The .size field represents the maximum size that can be stored, for bound checks. Then, .len is an input/output field which stores the expected length as input (this is optional, to restrict the checks), as output it stores the real length received from userspace (if it was not specified as input). This patch comes in response to `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data") to address this type of issue in a more generic way by avoid opencoded data validation. Next patch requires this as a dependency. 5) Disallow jump to implicit chain from set element, this configuration is invalid. Only allow jump to chain via immediate expression is supported at this stage. 6) Fix possible null-pointer derefence in the error path of table updates, if memory allocation of the transaction fails. From Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: fix null deref due to zeroed list head netfilter: nf_tables: disallow jump to implicit chain from set element netfilter: nf_tables: upfront validation of data via nft_data_init() netfilter: ip6t_LOG: Fix a typo in a comment netfilter: nf_tables: do not allow RULE_ID to refer to another chain netfilter: nf_tables: do not allow CHAIN_ID to refer to another table netfilter: nf_tables: do not allow SET_ID to refer to another table netfilter: nf_tables: validate variable length element extension ==================== Link: https://lore.kernel.org/r/20220809220532.130240-1-pablo@netfilter.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-09 21:28:21 -07:00
Kumar Kartikeya Dwivedi	1f0752628e	bpf: Allow calling bpf_prog_test kfuncs in tracing programs In addition to TC hook, enable these in tracing programs so that they can be used in selftests. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-09 18:46:11 -07:00
Linus Torvalds	e394ff83bb	NFSD 6.0 Release Notes Work on "courteous server", which was introduced in 5.19, continues apace. This release introduces a more flexible limit on the number of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in courtesy state long after the lease expiration timeout. The client limit is adjusted based on the physical memory size of the server. The NFSD filecache is a cache of files held open by NFSv4 clients or recently touched by NFSv2 or NFSv3 clients. This cache had some significant scalability constraints that have been relieved in this release. Thanks to all who contributed to this work. A data corruption bug found during the most recent NFS bake-a-thon that involves NFSv3 and NFSv4 clients writing the same file has been addressed in this release. This release includes several improvements in CPU scalability for NFSv4 operations. In addition, Neil Brown provided patches that simplify locking during file lookup, creation, rename, and removal that enables subsequent work on making these operations more scalable. We expect to see that work materialize in the next release. There are also numerous single-patch fixes, clean-ups, and the usual improvements in observability. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLujF0ACgkQM2qzM29m f5c14w/+Lsoryo5vdTXMAZBXBNvVdXQmHqLIEotJEVA3sECr+Kad2bF8rFaCWVzS Gf9KhTetmcDO9O73I5I/UtJ2qFT9B4I6baSGpOzIkjsM/aKeEEQpbpdzPYhKrCEv bQu3P54js7snH4YV8s0I39nBFOWdnahYXaw7peqE/2GOHxaR2mz88bkrQ+OybCxz KETqUxA6bKzkOT61S0nHcnQKd8HQzhocMDtrxtANHGsMM167ngI1dw4tUQAtfAUI s9R+GS6qwiKgwGz1oqhTR6LA/h4DROxPnc7AieuD9FvuAnR3kXw61bGMN5Biwv2T JZUTBbQvWhNasSV+7qOY9nBu+sHVC6Q7OZ5C9F/KjMyqCioDX0DnbxX9uKP20CDd EAAMS8n4Tdgd4KRBWdkLXPzizWYAjZQmFIJtcZne1JzGZ4IWRnikgM5qD6n1VviZ kcPRm5EN3DRHA+Hte4jG0EHIrE/7g5gnf+zr9dWl3uNhZtfTmumCfU16YYmKG8pP QN4kXBR2w7dAvp8nRaOsY6bBFLDAk/jHbpY8Q4xoUO4tsojfWayCTGVFOrecOjxv uSn0LhiidC5pLlkcPgwemhysVywDzr+gGXBRJXeUOHfdd05Q2gbFK8OpqDSvJ3dZ aC/RxFvHc8jaktUcuIjkE6Rsz6AVaAH3EZj84oMZ4hZhyGbEreg= =PEJ3 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Work on 'courteous server', which was introduced in 5.19, continues apace. This release introduces a more flexible limit on the number of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in courtesy state long after the lease expiration timeout. The client limit is adjusted based on the physical memory size of the server. The NFSD filecache is a cache of files held open by NFSv4 clients or recently touched by NFSv2 or NFSv3 clients. This cache had some significant scalability constraints that have been relieved in this release. Thanks to all who contributed to this work. A data corruption bug found during the most recent NFS bake-a-thon that involves NFSv3 and NFSv4 clients writing the same file has been addressed in this release. This release includes several improvements in CPU scalability for NFSv4 operations. In addition, Neil Brown provided patches that simplify locking during file lookup, creation, rename, and removal that enables subsequent work on making these operations more scalable. We expect to see that work materialize in the next release. There are also numerous single-patch fixes, clean-ups, and the usual improvements in observability" * tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits) lockd: detect and reject lock arguments that overflow NFSD: discard fh_locked flag and fh_lock/fh_unlock NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NFSD: use explicit lock/unlock for directory ops NFSD: reduce locking in nfsd_lookup() NFSD: only call fh_unlock() once in nfsd_link() NFSD: always drop directory lock in nfsd_unlink() NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. NFSD: add posix ACLs to struct nfsd_attrs NFSD: add security label to struct nfsd_attrs NFSD: set attributes when creating symlinks NFSD: introduce struct nfsd_attrs NFSD: verify the opened dentry after setting a delegation NFSD: drop fh argument from alloc_init_deleg NFSD: Move copy offload callback arguments into a separate structure NFSD: Add nfsd4_send_cb_offload() NFSD: Remove kmalloc from nfsd4_do_async_copy() NFSD: Refactor nfsd4_do_copy() NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) ...	2022-08-09 14:56:49 -07:00
Jakub Kicinski	7ba0fa7f32	wireless fixes for v6.0 First set of fixes for v6.0. Small one this time, fix a cfg80211 warning seen with brcmfmac and remove an unncessary inline keyword from wilc1000. -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLyj2ERHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZtxaggAptzh9NVi2qCWpCdwIjp+d6CusPoEA4NN eI7PSLecWPA5MVCR5YXSOboVDEtV/wGDOk/N1fKpKVXW02+7nvuLohx5tOclFpms CZtS2thpyEvUW6Zu+bE1Opwyx1v4e3nyznrNXMHW8tcnaVI3BNwYpdp7LRCylv07 JQPNKZvxR5fs8NuIhf0O1TSjPaUSvRrMWfRn3ZioHWVa7+j8qMfnxWk+o6n38zP5 fqbYlhLEBS3Nu9jp3e26KRMRrkAs/OTb/oRc/bPbU68V0VFPquP97Fz0vOobyjzO +B5+qAcaNpP6lSlAmrVyPxFEO1Y0utXblXblrWQsAqox7rt/PXQecg== =2QwM -----END PGP SIGNATURE----- Merge tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.0 First set of fixes for v6.0. Small one this time, fix a cfg80211 warning seen with brcmfmac and remove an unncessary inline keyword from wilc1000. * tag 'wireless-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: wilc1000: fix spurious inline in wilc_handle_disconnect() wifi: cfg80211: Fix validating BSS pointers in __cfg80211_connect_result ==================== Link: https://lore.kernel.org/r/20220809164756.B1DAEC433D6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-09 11:51:00 -07:00
Florian Westphal	580077855a	netfilter: nf_tables: fix null deref due to zeroed list head In nf_tables_updtable, if nf_tables_table_enable returns an error, nft_trans_destroy is called to free the transaction object. nft_trans_destroy() calls list_del(), but the transaction was never placed on a list -- the list head is all zeroes, this results in a null dereference: BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59 Call Trace: nft_trans_destroy+0x26/0x59 nf_tables_newtable+0x4bc/0x9bc [..] Its sane to assume that nft_trans_destroy() can be called on the transaction object returned by nft_trans_alloc(), so make sure the list head is initialised. Fixes: `55dd6f9307` ("netfilter: nf_tables: use new transaction infrastructure to handle table") Reported-by: mingi cho <mgcho.minic@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:30 +02:00
Pablo Neira Ayuso	f323ef3a0d	netfilter: nf_tables: disallow jump to implicit chain from set element Extend struct nft_data_desc to add a flag field that specifies nft_data_init() is being called for set element data. Use it to disallow jump to implicit chain from set element, only jump to chain via immediate expression is allowed. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso	341b694160	netfilter: nf_tables: upfront validation of data via nft_data_init() Instead of parsing the data and then validate that type and length are correct, pass a description of the expected data so it can be validated upfront before parsing it to bail out earlier. This patch adds a new .size field to specify the maximum size of the data area. The .len field is optional and it is used as an input/output field, it provides the specific length of the expected data in the input path. If then .len field is not specified, then obtained length from the netlink attribute is stored. This is required by cmp, bitwise, range and immediate, which provide no netlink attribute that describes the data length. The immediate expression uses the destination register type to infer the expected data type. Relying on opencoded validation of the expected data might lead to subtle bugs as described in `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Thadeu Lima de Souza Cascardo	36d5b29132	netfilter: nf_tables: do not allow RULE_ID to refer to another chain When doing lookups for rules on the same batch by using its ID, a rule from a different chain can be used. If a rule is added to a chain but tries to be positioned next to a rule from a different chain, it will be linked to chain2, but the use counter on chain1 would be the one to be incremented. When looking for rules by ID, use the chain that was used for the lookup by name. The chain used in the context copied to the transaction needs to match that same chain. That way, struct nft_rule does not need to get enlarged with another member. Fixes: `1a94e38d25` ("netfilter: nf_tables: add NFTA_RULE_ID attribute") Fixes: `75dd48e2e4` ("netfilter: nf_tables: Support RULE_ID reference in new rule") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:18 +02:00
Thadeu Lima de Souza Cascardo	95f466d223	netfilter: nf_tables: do not allow CHAIN_ID to refer to another table When doing lookups for chains on the same batch by using its ID, a chain from a different table can be used. If a rule is added to a table but refers to a chain in a different table, it will be linked to the chain in table2, but would have expressions referring to objects in table1. Then, when table1 is removed, the rule will not be removed as its linked to a chain in table2. When expressions in the rule are processed or removed, that will lead to a use-after-free. When looking for chains by ID, use the table that was used for the lookup by name, and only return chains belonging to that same table. Fixes: `837830a4b4` ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:17 +02:00
Thadeu Lima de Souza Cascardo	470ee20e06	netfilter: nf_tables: do not allow SET_ID to refer to another table When doing lookups for sets on the same batch by using its ID, a set from a different table can be used. Then, when the table is removed, a reference to the set may be kept after the set is freed, leading to a potential use-after-free. When looking for sets by ID, use the table that was used for the lookup by name, and only return sets belonging to that same table. This fixes CVE-2022-2586, also reported as ZDI-CAN-17470. Reported-by: Team Orca of Sea Security (@seasecresponse) Fixes: `958bee14d0` ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:17 +02:00
Pablo Neira Ayuso	34aae2c2fb	netfilter: nf_tables: validate variable length element extension Update template to validate variable length extensions. This patch adds a new .ext_len[id] field to the template to store the expected extension length. This is used to sanity check the initialization of the variable length extension. Use PTR_ERR() in nft_set_elem_init() to report errors since, after this update, there are two reason why this might fail, either because of ENOMEM or insufficient room in the extension field (EINVAL). Kernels up until `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data") allowed to copy more data to the extension than was allocated. This ext_len field allows to validate if the destination has the correct size as additional check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:16 +02:00
Fedor Pchelkin	8c21c54a53	can: j1939: j1939_session_destroy(): fix memory leak of skbs We need to drop skb references taken in j1939_session_skb_queue() when destroying a session in j1939_session_destroy(). Otherwise those skbs would be lost. Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Suggested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-08-09 09:05:06 +02:00
Fedor Pchelkin	8ef49f7f82	can: j1939: j1939_sk_queue_activate_next_locked(): replace WARN_ON_ONCE with netdev_warn_once() We should warn user-space that it is doing something wrong when trying to activate sessions with identical parameters but WARN_ON_ONCE macro can not be used here as it serves a different purpose. So it would be good to replace it with netdev_warn_once() message. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: `9d71dd0c70` ("can: add support of SAE J1939 protocol") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20220729143655.1108297-1-pchelkin@ispras.ru [mkl: fix indention] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-08-09 09:05:06 +02:00
Jakub Kicinski	b8c3bf0ed2	bluetooth pull request for net: - Fixes various issues related to ISO channel/socket support - Fixes issues when building with C=1 - Fix cancel uninitilized work which blocks syzbot to run -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLxpdoZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKbjqEACZiIUKIACTYWa8Os0fTuzu LM/h4aOnh3L+W2KyA9Kh4Hmm7Caf9JtUZrTIMhMigGiTTN91eLPoScu6ATm7q0vY 7JgfRKbsLCjhUV8uQfypDBM0uQq7exbEiwd1KTHo8XfOgiheZL6ergN4r2g+V/gt up0a58j4ukc6PhWAxujc3UzvMj2c1Sb5jY6TIuyiQM7RONtWLH9VDzc0InRNGqpa eEpPDqCuXsgDTKDAvcJoWARwnj6nsODN3QaSWVlwgN1JgE0/OjXI9hoUNQ83ueCH pl6qigJIuCnGq4ZDbdDE+QcK5I2ouoGoJ9rQMLuUFdupmaBtTEdMK7pw7opzYt3c HqW/TvIR8t2LG0oFmrvFSKH+OMHkIH7D7zaCHGYx5T7B778x5fnUK4OfhvnJ4NPu HkKYD5BJv92X7cHacgclJwQdwwbParrr7wPbqGiSRgiw2ec2puC1VQSYj/+4nwV5 De3AJ2OORv+2kcIw+zi3T0wGzddQF07gXXpz7ckOnFQ1A5jiYX5yGrfGlJezvblX LnXikwvPkkl640ZRrSZvGQBPNySKv8N2yuE/FtbkKfNjoumAkC67PA+4NYOLc9g9 gkgPnR6y4Cm+h3yLILV3njcYbif+5Ue0KHx8L3rr523bZ9C7vUKdcE1i+Tkr9B0y I6rMyxtfkUFwVRRFKwvguw== =nFPi -----END PGP SIGNATURE----- Merge tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fixes various issues related to ISO channel/socket support - Fixes issues when building with C=1 - Fix cancel uninitilized work which blocks syzbot to run * tag 'for-net-2022-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: ISO: Fix not using the correct QoS Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed() Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP Bluetooth: MGMT: Fixes build warnings with C=1 Bluetooth: hci_event: Fix build warning with C=1 Bluetooth: ISO: Fix memory corruption Bluetooth: Fix null pointer deref on unexpected status event Bluetooth: ISO: Fix info leak in iso_sock_getsockopt() Bluetooth: hci_conn: Fix updating ISO QoS PHY Bluetooth: ISO: unlock on error path in iso_sock_setsockopt() Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression ==================== Link: https://lore.kernel.org/r/20220809001224.412807-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-08 20:59:07 -07:00
Martin Schiller	944e594cfa	net/x25: fix call timeouts in blocking connects When a userspace application starts a blocking connect(), a CALL REQUEST is sent, the t21 timer is started and the connect is waiting in x25_wait_for_connection_establishment(). If then for some reason the t21 timer expires before any reaction on the assigned logical channel (e.g. CALL ACCEPT, CLEAR REQUEST), there is sent a CLEAR REQUEST and timer t23 is started waiting for a CLEAR confirmation. If we now receive a CLEAR CONFIRMATION from the peer, x25_disconnect() is called in x25_state2_machine() with reason "0", which means "normal" call clearing. This is ok, but the parameter "reason" is used as sk->sk_err in x25_disconnect() and sock_error(sk) is evaluated in x25_wait_for_connection_establishment() to check if the call is still pending. As "0" is not rated as an error, the connect will stuck here forever. To fix this situation, also check if the sk->sk_state changed form TCP_SYN_SENT to TCP_CLOSE in the meantime, which is also done by x25_disconnect(). Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20220805061810.10824-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-08 20:48:51 -07:00
Linus Torvalds	f30adc0d33	iov_iter stuff, part 2, rebased * more new_sync_{read,write}() speedups - ITER_UBUF introduction * ITER_PIPE cleanups * unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics * making ITER_PIPE take high-order pages without splitting them * handling copy_page_from_iter() for high-order pages properly Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYvHI8QAKCRBZ7Krx/gZQ 62CQAPsGlbebqBeAT2pMulaGDxfLAsgz5Yf4BEaMLhPtRqFOQgD+KrZQId7Sd8O0 3IWucpTb2c4jvLlXhGMS+XWnusQH+AQ= =pBux -----END PGP SIGNATURE----- Merge tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more iov_iter updates from Al Viro: - more new_sync_{read,write}() speedups - ITER_UBUF introduction - ITER_PIPE cleanups - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics - making ITER_PIPE take high-order pages without splitting them - handling copy_page_from_iter() for high-order pages properly * tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) fix copy_page_from_iter() for compound destinations hugetlbfs: copy_page_to_iter() can deal with compound pages copy_page_to_iter(): don't split high-order page in case of ITER_PIPE expand those iov_iter_advance()... pipe_get_pages(): switch to append_pipe() get rid of non-advancing variants ceph: switch the last caller of iov_iter_get_pages_alloc() 9p: convert to advancing variant of iov_iter_get_pages_alloc() af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages() iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() block: convert to advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: saner helper for page array allocation fold __pipe_get_pages() into pipe_get_pages() ITER_XARRAY: don't open-code DIV_ROUND_UP() unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts unify xarray_get_pages() and xarray_get_pages_alloc() unify pipe_get_pages() and pipe_get_pages_alloc() iov_iter_get_pages(): sanity-check arguments iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper ...	2022-08-08 20:04:35 -07:00
Al Viro	7f02464739	9p: convert to advancing variant of iov_iter_get_pages_alloc() that one is somewhat clumsier than usual and needs serious testing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:23 -04:00
Al Viro	1ef255e257	iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-08-08 22:37:22 -04:00
Luiz Augusto von Dentz	1d1ab5d39b	Bluetooth: ISO: Fix not using the correct QoS This fixes using wrong QoS settings when attempting to send frames while acting as peripheral since the QoS settings in use are stored in hconn->iso_qos not in sk->qos, this is actually properly handled on getsockopt(BT_ISO_QOS) but not on iso_send_frame. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:06:36 -07:00
Tetsuo Handa	3f2893d3c1	Bluetooth: don't try to cancel uninitialized works at mgmt_index_removed() syzbot is reporting attempt to cancel uninitialized work at mgmt_index_removed() [1], for calling cancel_delayed_work_sync() without INIT_DELAYED_WORK() is not permitted. INIT_DELAYED_WORK() is called from mgmt_init_hdev() via chan->hdev_init() from hci_mgmt_cmd(), but cancel_delayed_work_sync() is unconditionally called from mgmt_index_removed(). Call cancel_delayed_work_sync() only if HCI_MGMT flag was set, for mgmt_init_hdev() sets HCI_MGMT flag when calling INIT_DELAYED_WORK(). Link: https://syzkaller.appspot.com/bug?extid=b8ddd338a8838e581b1c [1] Reported-by: syzbot <syzbot+b8ddd338a8838e581b1c@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: `0ef08313ce` ("Bluetooth: Convert delayed discov_off to hci_sync") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:06:23 -07:00
Luiz Augusto von Dentz	9dfe1727b2	Bluetooth: ISO: Fix iso_sock_getsockopt for BT_DEFER_SETUP BT_DEFER_SETUP shall be considered valid for all states except for BT_CONNECTED as it is also used when initiated a connection rather then only for BT_BOUND and BT_LISTEN. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:06:10 -07:00
Luiz Augusto von Dentz	0c7937587d	Bluetooth: MGMT: Fixes build warnings with C=1 This fixes the following warning when building with make C=1: net/bluetooth/mgmt.c:3821:29: warning: restricted __le16 degrades to integer net/bluetooth/mgmt.c:4625:9: warning: cast to restricted __le32 Fixes: `600a87490f` ("Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.") Fixes: `4c54bf2b09` ("Bluetooth: Add get/set device flags mgmt op") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:05:45 -07:00
Luiz Augusto von Dentz	889f0346d4	Bluetooth: hci_event: Fix build warning with C=1 This fixes the following warning when build with make C=1: net/bluetooth/hci_event.c:337:15: warning: restricted __le16 degrades to integer Fixes: `a936612036` ("Bluetooth: Process result of HCI Delete Stored Link Key command") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:05:12 -07:00
Luiz Augusto von Dentz	b444342327	Bluetooth: ISO: Fix memory corruption The following memory corruption can happen since iso_pinfo.base size did not account for its headers (4 bytes): net/bluetooth/eir.c 76 memcpy(&eir[eir_len], data, data_len); ^^^^^^^ ^^^^^^^^ 77 eir_len += data_len; 78 79 return eir_len; 80 } The "eir" buffer has 252 bytes and data_len is 252 but we do a memcpy() to &eir[4] so this can corrupt 4 bytes beyond the end of the buffer. Fixes: `f764a6c2c1` ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>	2022-08-08 17:04:51 -07:00
Soenke Huster	ce78e557ff	Bluetooth: Fix null pointer deref on unexpected status event __hci_cmd_sync returns NULL if the controller responds with a status event. This is unexpected for the commands sent here, but on occurrence leads to null pointer dereferences and thus must be handled. Signed-off-by: Soenke Huster <soenke.huster@eknoes.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:04:37 -07:00
Luiz Augusto von Dentz	0eee4995f4	Bluetooth: ISO: Fix info leak in iso_sock_getsockopt() The C standard rules for when struct holes are zeroed out are slightly weird. The existing assignments might initialize everything, but GCC is allowed to (and does sometimes) leave the struct holes uninitialized, so instead of using yet another variable and copy the QoS settings just use a pointer to the stored QoS settings. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:04:24 -07:00
Luiz Augusto von Dentz	10b9adb556	Bluetooth: hci_conn: Fix updating ISO QoS PHY BT_ISO_QOS has different semantics when it comes to QoS PHY as it uses 0x00 to disable a direction but that value is invalid over HCI and sockets using DEFER_SETUP to connect may attempt to use hci_bind_cis multiple times in order to detect if the parameters have changed, so to fix the code will now just mirror the PHY for the parameters of HCI_OP_LE_SET_CIG_PARAMS and will not update the PHY of the socket leaving it disabled. Fixes: `26afbd826e` ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:04:11 -07:00
Dan Carpenter	164dac9755	Bluetooth: ISO: unlock on error path in iso_sock_setsockopt() Call release_sock(sk); before returning on this error path. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:03:57 -07:00
Luiz Augusto von Dentz	332f1795ca	Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression The patch `d0be8347c6`: "Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put" from Jul 21, 2022, leads to the following Smatch static checker warning: net/bluetooth/l2cap_core.c:1977 l2cap_global_chan_by_psm() error: we previously assumed 'c' could be null (see line 1996) Fixes: `d0be8347c6` ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-08-08 17:03:40 -07:00
Gao Feng	f574f7f839	net: bpf: Use the protocol's set_rcvlowat behavior if there is one The commit `d1361840f8` ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning") add one new (struct proto_ops)->set_rcvlowat method so that a protocol can override the default setsockopt(SO_RCVLOWAT) behavior. The prior bpf codes don't check and invoke the protos's set_rcvlowat, now correct it. Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-08 09:45:14 +01:00
Veerendranath Jakkam	baa56dfe2c	wifi: cfg80211: Fix validating BSS pointers in __cfg80211_connect_result Driver's SME is allowed to fill either BSSID or BSS pointers in struct cfg80211_connect_resp_params when indicating connect response but a check in __cfg80211_connect_result() is giving unnecessary warning when driver's SME fills only BSSID pointer and not BSS pointer in struct cfg80211_connect_resp_params. In case of mac80211 with auth/assoc path, it is always expected to fill BSS pointers in struct cfg80211_connect_resp_params when calling __cfg80211_connect_result() since cfg80211 must have hold BSS pointers in cfg80211_mlme_assoc(). So, skip the check for the drivers which support cfg80211 connect callback, for example with brcmfmac is one such driver which had the warning: WARNING: CPU: 5 PID: 514 at net/wireless/sme.c:786 __cfg80211_connect_result+0x2fc/0x5c0 [cfg80211] Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: `efbabc1165` ("cfg80211: Indicate MLO connection info in connect and roam callbacks") Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> [kvalo@kernel.org: add more info to the commit log] Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220805135259.4126630-1-quic_vjakkam@quicinc.com	2022-08-08 11:09:52 +03:00
Linus Torvalds	ea0c39260d	9p-for-5.20 - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmLuKqkACgkQq06b7GqY 5nA+RBAAvuA6AjKQSvsNxHsqSFMahwoE3cCPlF/QlmAnnZa1DzmRI3kKTAuuDKkE Q4c7DjxzDJCWRTlkpNUCGZycHqpu1QQbfoTo43sIqob7W8xlajeemc5Fxqtw5sPM m+0SzN7vvNJpCr+6pxMXwwGHSZmZOvpFjwj7cUjhpF/V1WO8bNxfCGzQdF0hX1Vn 2HoBbFUOmacL9Z/pF3O/ZG9LwCFuRQsH0EmbFBlJy1WdDtlVHTXlzDaGQ7EGaY4D 17UR6iITsj9ozacLhvk094PLIc3/RHDGLrm3C4Ka3zmUI7BsiYWPDmai3Pu/DNqn JJ5sZkdrVowxyBbGxw8GpZ4YDJtGsU5XglFPdkw+ZazxhZLNEIstPXg0HTZybrOe GE+WskWB0qS+RpX0tnYEcX6qOHWm3/63Yq5NG6A9tLQUSFku02jS/bCQSLBrmGWW Js24IWvFSTvl6XytHeldYhJP618pNUxXRSqgYv96vT/LI3mUrIMN+IVBNPujO6p2 jIYXNoaqLoY3efXKW/WQmp7C/52ZP4ly4fOiz7qtHTQsCcIk8Xo6zwHtm/FkNEqc sMZdqgLrxKPNBAlT8iEtt//wU2fB7mFt988p0pc+5lAK5t0h67KZJV6vDwTAMObX wV6Ht+QhOHtJwO779fk8FhZPNRPYZYMutIyFNlRx+4gCpBuqJPc= =941k -----END PGP SIGNATURE----- Merge tag '9p-for-5.20' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - a couple of fixes - add a tracepoint for fid refcounting - some cleanup/followup on fid lookup - some cleanup around req refcounting * tag '9p-for-5.20' of https://github.com/martinetd/linux: net/9p: Initialize the iounit field during fid creation net: 9p: fix refcount leak in p9_read_work() error handling 9p: roll p9_tag_remove into p9_req_put 9p: Add client parameter to p9_req_put() 9p: Drop kref usage 9p: Fix some kernel-doc comments 9p fid refcount: cleanup p9_fid_put calls 9p fid refcount: add a 9p_fid_ref tracepoint 9p fid refcount: add p9_fid_get/put wrappers 9p: Fix minor typo in code comment 9p: Remove unnecessary variable for old fids while walking from d_parent 9p: Make the path walk logic more clear about when cloning is required 9p: Track the root fid with its own variable during lookups	2022-08-06 14:48:54 -07:00
Nick Desaulniers	ac0dbed9ba	net: seg6: initialize induction variable to first valid array index Fixes the following warnings observed when building CONFIG_IPV6_SEG6_LWTUNNEL=y with clang: net/ipv6/seg6_local.o: warning: objtool: seg6_local_fill_encap() falls through to next function seg6_local_get_encap_size() net/ipv6/seg6_local.o: warning: objtool: seg6_local_cmp_encap() falls through to next function input_action_end() LLVM can fully unroll loops in seg6_local_get_encap_size() and seg6_local_cmp_encap(). One issue in those loops is that the induction variable is initialized to 0. The loop iterates over members of seg6_action_params, a global array of struct seg6_action_param calling their put() function pointer members. seg6_action_param uses an array initializer to initialize SEG6_LOCAL_SRH and later elements, which is the third enumeration of an anonymous union. The guard `if (attrs & SEG6_F_ATTR(i))` may prevent this from being called at runtime, but it would still be UB for `seg6_action_params[0]->put` to be called; the unrolled loop will make the initial iterations unreachable, which LLVM will later rotate to fallthrough to the next function. Make this more obvious that this cannot happen to the compiler by initializing the loop induction variable to the minimum valid index that seg6_action_params is initialized to. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220802161203.622293-1-ndesaulniers@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 19:34:54 -07:00
Francois Romieu	df1c941468	net: avoid overflow when rose /proc displays timer information. rose /proc code does not serialize timer accesses. Initial report by Bernard F6BVP Pidoux exhibits overflow amounting to 116 ticks on its HZ=250 system. Full timer access serialization would imho be overkill as rose /proc does not enforce consistency between displayed ROSE_STATE_XYZ and timer values during changes of state. The patch may also fix similar behavior in ax25 /proc, ax25 ioctl and netrom /proc as they all exhibit the same timer serialization policy. This point has not been reported though. The sole remaining use of ax25_display_timer - ax25 rtt valuation - may also perform marginally better but I have not analyzed it too deeply. Cc: Thomas DL9SAU Osterried <thomas@osterried.de> Link: https://lore.kernel.org/all/d5e93cc7-a91f-13d3-49a1-b50c11f0f811@free.fr/ Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Bernard Pidoux <f6bvp@free.fr> Link: https://lore.kernel.org/r/Yuk9vq7t7VhmnOXu@electric-eye.fr.zoreil.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 19:00:02 -07:00
Pablo Neira Ayuso	b06ada6df9	netfilter: flowtable: fix incorrect Kconfig dependencies Remove default to 'y', this infrastructure is not fundamental for the flowtable operational. Add a missing dependency on CONFIG_NF_FLOW_TABLE. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: `b038177636` ("netfilter: nf_flow_table: count pending offload workqueue tasks") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 18:50:15 -07:00
Florian Westphal	399a14ec79	netfilter: nf_tables: fix crash when nf_trace is enabled do not access info->pkt when info->trace is not 1. nft_traceinfo is not initialized, except when tracing is enabled. The 'nft_trace_enabled' static key cannot be used for this, we must always check info->trace first. Pass nft_pktinfo directly to avoid this. Fixes: `e34b9ed96c` ("netfilter: nf_tables: avoid skb access on nf_stolen") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 18:50:14 -07:00
Linus Torvalds	6614a3c316	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...	2022-08-05 16:32:45 -07:00
Linus Torvalds	965a9d75e3	Tracing updates for 5.20 / 6.0 - Runtime verification infrastructure This is the biggest change for this pull request. It introduces the runtime verification that is necessary for running Linux on safety critical systems. It allows for deterministic automata models to be inserted into the kernel that will attach to tracepoints, where the information on these tracepoints will move the model from state to state. If a state is encountered that does not belong to the model, it will then activate a given reactor, that could just inform the user or even panic the kernel (for which safety critical systems will detect and can recover from). - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be confused with "work in progress"), and Wakeup While Not Running (WWNR). - Added __vstring() helper to the TRACE_EVENT() macro to replace several vsnprintf() usages that were all doing it wrong. - eprobes now can have their event autogenerated when the event name is left off. - The rest is various cleanups and fixes. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYu0yzRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qj4HAP4tQtV55rjj4DQ5XIXmtI3/64PmyRSJ +y4DEXi1UvEUCQD/QAuQfWoT/7gh35ltkfeS4t3ockzy14rrkP5drZigiQA= =kEtM -----END PGP SIGNATURE----- Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Runtime verification infrastructure This is the biggest change here. It introduces the runtime verification that is necessary for running Linux on safety critical systems. It allows for deterministic automata models to be inserted into the kernel that will attach to tracepoints, where the information on these tracepoints will move the model from state to state. If a state is encountered that does not belong to the model, it will then activate a given reactor, that could just inform the user or even panic the kernel (for which safety critical systems will detect and can recover from). - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be confused with "work in progress"), and Wakeup While Not Running (WWNR). - Added __vstring() helper to the TRACE_EVENT() macro to replace several vsnprintf() usages that were all doing it wrong. - eprobes now can have their event autogenerated when the event name is left off. - The rest is various cleanups and fixes. * tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) rv: Unlock on error path in rv_unregister_reactor() tracing: Use alignof__(struct {type b;}) instead of offsetof() tracing/eprobe: Show syntax error logs in error_log file scripts/tracing: Fix typo 'the the' in comment tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT tracing: Use free_trace_buffer() in allocate_trace_buffers() tracing: Use a struct alignof to determine trace event field alignment rv/reactor: Add the panic reactor rv/reactor: Add the printk reactor rv/monitor: Add the wwnr monitor rv/monitor: Add the wip monitor rv/monitor: Add the wip monitor skeleton created by dot2k Documentation/rv: Add deterministic automata instrumentation documentation Documentation/rv: Add deterministic automata monitor synthesis documentation tools/rv: Add dot2k Documentation/rv: Add deterministic automaton documentation tools/rv: Add dot2c Documentation/rv: Add a basic documentation rv/include: Add instrumentation helper functions rv/include: Add deterministic automata monitor definition via C macros ...	2022-08-05 09:41:12 -07:00
Herbert Xu	ba953a9d89	af_key: Do not call xfrm_probe_algs in parallel When namespace support was added to xfrm/afkey, it caused the previously single-threaded call to xfrm_probe_algs to become multi-threaded. This is buggy and needs to be fixed with a mutex. Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Fixes: `283bc9f35b` ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-08-05 10:22:14 +02:00
Paolo Abeni	c886d70286	mptcp: do not queue data on closed subflows Dipanjan reported a syzbot splat at close time: WARNING: CPU: 1 PID: 10818 at net/ipv4/af_inet.c:153 inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153 Modules linked in: uio_ivshmem(OE) uio(E) CPU: 1 PID: 10818 Comm: kworker/1:16 Tainted: G OE 5.19.0-rc6-g2eae0556bb9d #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153 Code: 21 02 00 00 41 8b 9c 24 28 02 00 00 e9 07 ff ff ff e8 34 4d 91 f9 89 ee 4c 89 e7 e8 4a 47 60 ff e9 a6 fc ff ff e8 20 4d 91 f9 <0f> 0b e9 84 fe ff ff e8 14 4d 91 f9 0f 0b e9 d4 fd ff ff e8 08 4d RSP: 0018:ffffc9001b35fa78 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002879d0 RCX: ffff8881326f3b00 RDX: 0000000000000000 RSI: ffff8881326f3b00 RDI: 0000000000000002 RBP: ffff888179662674 R08: ffffffff87e983a0 R09: 0000000000000000 R10: 0000000000000005 R11: 00000000000004ea R12: ffff888179662400 R13: ffff888179662428 R14: 0000000000000001 R15: ffff88817e38e258 FS: 0000000000000000(0000) GS:ffff8881f5f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020007bc0 CR3: 0000000179592000 CR4: 0000000000150ee0 Call Trace: <TASK> __sk_destruct+0x4f/0x8e0 net/core/sock.c:2067 sk_destruct+0xbd/0xe0 net/core/sock.c:2112 __sk_free+0xef/0x3d0 net/core/sock.c:2123 sk_free+0x78/0xa0 net/core/sock.c:2134 sock_put include/net/sock.h:1927 [inline] __mptcp_close_ssk+0x50f/0x780 net/mptcp/protocol.c:2351 __mptcp_destroy_sock+0x332/0x760 net/mptcp/protocol.c:2828 mptcp_worker+0x5d2/0xc90 net/mptcp/protocol.c:2586 process_one_work+0x9cc/0x1650 kernel/workqueue.c:2289 worker_thread+0x623/0x1070 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> The root cause of the problem is that an mptcp-level (re)transmit can race with mptcp_close() and the packet scheduler checks the subflow state before acquiring the socket lock: we can try to (re)transmit on an already closed ssk. Fix the issue checking again the subflow socket status under the subflow socket lock protection. Additionally add the missing check for the fallback-to-tcp case. Fixes: `d5f49190de` ("mptcp: allow picking different xmit subflows") Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-05 08:51:28 +01:00
Paolo Abeni	c0bf3c6aa4	mptcp: move subflow cleanup in mptcp_destroy_common() If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE eBPF program, the MPTCP protocol ends-up leaking all the subflows: the related cleanup happens in __mptcp_destroy_sock() that is not invoked in such code path. Address the issue moving the subflow sockets cleanup in the mptcp_destroy_common() helper, which is invoked in every msk cleanup path. Additionally get rid of the intermediate list_splice_init step, which is an unneeded relic from the past. The issue is present since before the reported root cause commit, but any attempt to backport the fix before that hash will require a complete rewrite. Fixes: `e16163b6e2` ("mptcp: refactor shutdown and close") Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-05 08:51:28 +01:00
Linus Torvalds	c1c76700a0	SPDX changes for 6.0-rc1 Here is the set of SPDX comment updates for 6.0-rc1. Nothing huge here, just a number of updated SPDX license tags and cleanups based on the review of a number of common patterns in GPLv2 boilerplate text. Also included in here are a few other minor updates, 2 USB files, and one Documentation file update to get the SPDX lines correct. All of these have been in the linux-next tree for a very long time. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYupz3g8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynPUgCgslaf2ssCgW5IeuXbhla+ZBRAzisAnjVgOvLN 4AKdqbiBNlFbCroQwmeQ =v1sg -----END PGP SIGNATURE----- Merge tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here is the set of SPDX comment updates for 6.0-rc1. Nothing huge here, just a number of updated SPDX license tags and cleanups based on the review of a number of common patterns in GPLv2 boilerplate text. Also included in here are a few other minor updates, two USB files, and one Documentation file update to get the SPDX lines correct. All of these have been in the linux-next tree for a very long time" * tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (28 commits) Documentation: samsung-s3c24xx: Add blank line after SPDX directive x86/crypto: Remove stray comment terminator treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_406.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_398.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_391.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_390.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_318.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_298.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_292.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_179.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 2) treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 1) treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_160.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_147.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_133.RULE ...	2022-08-04 12:12:54 -07:00
Linus Torvalds	cfeafd9466	Driver core / kernfs changes for 6.0-rc1 Here is the set of driver core and kernfs changes for 6.0-rc1. "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYuqCnw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym/JgCcCnaycJY00ZPRQm3LQCyzfJ0HgqoAn2qxGV+K NKycLeXZSnuvIA87dycE =/4Jk -----END PGP SIGNATURE----- Merge tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / kernfs updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.0-rc1. The "biggest" thing in here is some scalability improvements for kernfs for large systems. Other than that, included in here are: - arch topology and cache info changes that have been reviewed and discussed a lot. - potential error path cleanup fixes - deferred driver probe cleanups - firmware loader cleanups and tweaks - documentation updates - other small things All of these have been in the linux-next tree for a while with no reported problems" * tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits) docs: embargoed-hardware-issues: fix invalid AMD contact email firmware_loader: Replace kmap() with kmap_local_page() sysfs docs: ABI: Fix typo in comment kobject: fix Kconfig.debug "its" grammar kernfs: Fix typo 'the the' in comment docs: driver-api: firmware: add driver firmware guidelines. (v3) arch_topology: Fix cache attributes detection in the CPU hotplug path ACPI: PPTT: Leave the table mapped for the runtime usage cacheinfo: Use atomic allocation for percpu cache attributes drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist MAINTAINERS: Change mentions of mpm to olivia docs: ABI: sysfs-devices-soc: Update Lee Jones' email address docs: ABI: sysfs-class-pwm: Update Lee Jones' email address Documentation/process: Add embargoed HW contact for LLVM Revert "kernfs: Change kernfs_notify_list to llist." ACPI: Remove the unused find_acpi_cpu_cache_topology() arch_topology: Warn that topology for nested clusters is not supported arch_topology: Add support for parsing sockets in /cpu-map arch_topology: Set cluster identifier in each core/thread from /cpu-map arch_topology: Limit span of cpu_clustergroup_mask() ...	2022-08-04 11:31:20 -07:00
Vladimir Oltean	4873a1b202	net/sched: remove hacks added to dev_trans_start() for bonding to work Now that the bonding driver keeps track of the last TX time of ARP and NS probes, we effectively revert the following commits: `32d3e51a82` ("net_sched: use macvlan real dev trans_start in dev_trans_start()") `07ce76aa9b` ("net_sched: make dev_trans_start return vlan's real dev trans_start") Note that the approach of continuing to hack at this function would not get us very far, hence the desire to take a different approach. DSA is also a virtual device that uses NETIF_F_LLTX, but there, many uppers share the same lower (DSA master, i.e. the physical host port of a switch). By making dev_trans_start() on a DSA interface return the dev_trans_start() of the master, we effectively assume that all other DSA interfaces are silent, otherwise this corrupts the validity of the probe timestamp data from the bonding driver's perspective. Furthermore, the hacks didn't take into consideration the fact that the lower interface of @dev may not have been physical either. For example, VLAN over VLAN, or DSA with 2 masters in a LAG. And even furthermore, there are NETIF_F_LLTX devices which are not stacked, like veth. The hack here would not work with those, because it would not have to provide the bonding driver something to chew at all. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-03 19:20:13 -07:00
Linus Torvalds	f86d1fbbe7	Networking changes for 6.0. Core ---- - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF --- - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols --------- - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API ---------- - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers ---------------------- - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers ------- - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years. Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2 UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+ jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0 WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9 QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm 1nYpXKHA8qML =hxEG -----END PGP SIGNATURE----- Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Paolo Abeni: "Core: - Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one - Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router. - Network-side support for IO uring zero-copy send. - A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic. - Rename reference tracking helpers to a more consistent netdev_* schema. - Adapt u64_stats_t type to address load/store tearing issues. - Refine debug helper usage to reduce the log noise caused by bots. BPF: - Improve socket map performance, avoiding skb cloning on read operation. - Add support for 64 bits enum, to match types exposed by kernel. - Introduce support for sleepable uprobes program. - Introduce support for enum textual representation in libbpf. - New helpers to implement synproxy with eBPF/XDP. - Improve loop performances, inlining indirect calls when possible. - Removed all the deprecated libbpf APIs. - Implement new eBPF-based LSM flavor. - Add type match support, which allow accurate queries to the eBPF used types. - A few TCP congetsion control framework usability improvements. - Add new infrastructure to manipulate CT entries via eBPF programs. - Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function. Protocols: - Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention. - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support. - Add support to forciby close TIME_WAIT TCP sockets via user-space tools. - Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy. - Support for changing the initial MTPCP subflow priority/backup status - Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure. - Extend CAN ethtool support with timestamping capabilities - Refactor CAN build infrastructure to allow building only the needed features. Driver API: - Remove devlink mutex to allow parallel commands on multiple links. - Add support for pause stats in distributed switch. - Implement devlink helpers to query and flash line cards. - New helper for phy mode to register conversion. New hardware / drivers: - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro. - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch. - Ethernet DSA driver for the Microchip LAN937x switch. - Ethernet PHY driver for the Aquantia AQR113C EPHY. - CAN driver for the OBD-II ELM327 interface. - CAN driver for RZ/N1 SJA1000 CAN controller. - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device. Drivers: - Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload - Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate - Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default - Microsoft vNIC driver (mana) - add support for XDP redirect - Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support - Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics - Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY - Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload - Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration - Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support Old code removal: - Neterion vxge ethernet driver: this is untouched since more than 10 years" * tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits) doc: sfp-phylink: Fix a broken reference wireguard: selftests: support UML wireguard: allowedips: don't corrupt stack when detecting overflow wireguard: selftests: update config fragments wireguard: ratelimiter: use hrtimer in selftest net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ net: usb: ax88179_178a: Bind only to vendor-specific interface selftests: net: fix IOAM test skip return code net: usb: make USB_RTL8153_ECM non user configurable net: marvell: prestera: remove reduntant code octeontx2-pf: Reduce minimum mtu size to 60 net: devlink: Fix missing mutex_unlock() call net/tls: Remove redundant workqueue flush before destroy net: txgbe: Fix an error handling path in txgbe_probe() net: dsa: Fix spelling mistakes and cleanup code Documentation: devlink: add add devlink-selftests to the table of contents dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: ionic: fix error check for vlan flags in ionic_set_nic_features() net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() nfp: flower: add support for tunnel offload without key ID ...	2022-08-03 16:29:08 -07:00
Linus Torvalds	ff89dd08c0	net/9p abuses iov_iter primitives - attempts to copy _from_ a destination-only iov_iter when it handles Rerror arriving in reply to zero-copy request. Not hard to fix, fortunately; it's a prereq for the iov_iter_get_pages() work in the second part of iov_iter series, ended up in a separate branch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYurQLQAKCRBZ7Krx/gZQ 65AiAP9Mmpu3yMWmfMEnTEjBv4iSuG37JdgHE/IE/P6q99opfQEAxThED/nJVuaG YZuNUx60OT9Au1hSdfl7EjAN4dg/Kw8= =tL2V -----END PGP SIGNATURE----- Merge tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p iov_iter fix from Al Viro: "net/9p abuses iov_iter primitives - it attempts to copy _from_ a destination-only iov_iter when it handles Rerror arriving in reply to zero-copy request. Not hard to fix, fortunately. This is a prereq for the iov_iter_get_pages() work in the second part of iov_iter series, ended up in a separate branch" * tag 'pull-work.9p' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: handling Rerror without copy_from_iter_full()	2022-08-03 14:03:51 -07:00
Jeff Layton	a8af0d682a	libceph: clean up ceph_osdc_start_request prototype This function always returns 0, and ignores the nofail boolean. Drop the nofail argument, make the function void return and fix up the callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 14:05:39 +02:00
Paolo Abeni	7c6327c77d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/ax25/af_ax25.c `d7c4c9e075` ("ax25: fix incorrect dev_tracker usage") `d62607c3fe` ("net: rename reference+tracking helpers") drivers/net/netdevsim/fib.c `180a6a3ee6` ("netdevsim: fib: Fix reference count leak on route deletion failure") `012ec02ae4` ("netdevsim: convert driver to use unlocked devlink API during init/fini") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-03 09:04:55 +02:00
Antony Antony	6aa811acdb	xfrm: clone missing x->lastused in xfrm_do_migrate x->lastused was not cloned in xfrm_do_migrate. Add it to clone during migrate. Fixes: `80c9abaabf` ("[XFRM]: Extension for dynamic update of endpoint address(es)") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-08-03 07:27:37 +02:00
Antony Antony	717ada9f10	Revert "xfrm: update SA curlft.use_time" This reverts commit `af734a26a1`. The abvoce commit is a regression according RFC 2367. A better fix would be use x->lastused. Which will be propsed later. according to RFC 2367 use_time == sadb_lifetime_usetime. "sadb_lifetime_usetime For CURRENT, the time, in seconds, when association was first used. For HARD and SOFT, the number of seconds after the first use of the association until it expires." Fixes: `af734a26a1` ("xfrm: update SA curlft.use_time") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-08-03 07:27:37 +02:00
Linus Torvalds	c2a24a7a03	This update includes the following changes: API: - Make proc files report fips module name and version. Algorithms: - Move generic SHA1 code into lib/crypto. - Implement Chinese Remainder Theorem for RSA. - Remove blake2s. - Add XCTR with x86/arm64 acceleration. - Add POLYVAL with x86/arm64 acceleration. - Add HCTR2. - Add ARIA. Drivers: - Add support for new CCP/PSP device ID in ccp. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmLosAAACgkQxycdCkmx i6dvgxAAzcw0cKMuq3dbQamzeVu1bDW8rPb7yHnpXal3ao5ewa15+hFjsKhdh/s3 cjM5Lu7Qx4lnqtsh2JVSU5o2SgEpptxXNfxAngcn46ld5EgV/G4DYNKuXsatMZ2A erCzXqG9dDxJmREat+5XgVfD1RFVsglmEA/Nv4Rvn+9O4O6PfwRa8GyUzeKC+byG qs/1JyiPqpyApgzCvlQFAdTF4PM7ruDtg3mnMy2EKAzqj4JUseXRi1i81vLVlfBL T40WESG/CnOwIF5MROhziAtkJMS4Y4v2VQ2++1p0gwG6pDCnq4w7u9cKPXYfNgZK fMVCxrNlxIH3W99VfVXbXwqDSN6qEZtQvhnliwj9aEbEltIoH+B02wNfS/BDsTec im+5NCnNQ6olMPyL0yHrMKisKd+DwTrEfYT5H2kFhcdcYZncQ9C6el57kimnJRzp 4ymPRudCKm/8weWGTtmjFMi+PFP4LgvCoR+VMUd+gVe91F9ZMAO0K7b5z5FVDyDf wmsNBvsEnTdm/r7YceVzGwdKQaP9sE5wq8iD/yySD1PjlmzZos1CtCrqAIT/v2RK pQdZCIkT8qCB+Jm03eEd4pwjEDnbZdQmpKt4cTy0HWIeLJVG1sXPNpgwPCaBEV4U g0nctILtypChlSDmuGhTCyuElfMg6CXt4cgSZJTBikT+QcyWOm4= =rfWK -----END PGP SIGNATURE----- Merge tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Make proc files report fips module name and version Algorithms: - Move generic SHA1 code into lib/crypto - Implement Chinese Remainder Theorem for RSA - Remove blake2s - Add XCTR with x86/arm64 acceleration - Add POLYVAL with x86/arm64 acceleration - Add HCTR2 - Add ARIA Drivers: - Add support for new CCP/PSP device ID in ccp" * tag 'v5.20-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (89 commits) crypto: tcrypt - Remove the static variable initialisations to NULL crypto: arm64/poly1305 - fix a read out-of-bound crypto: hisilicon/zip - Use the bitmap API to allocate bitmaps crypto: hisilicon/sec - fix auth key size error crypto: ccree - Remove a useless dma_supported() call crypto: ccp - Add support for new CCP/PSP device ID crypto: inside-secure - Add missing MODULE_DEVICE_TABLE for of crypto: hisilicon/hpre - don't use GFP_KERNEL to alloc mem during softirq crypto: testmgr - some more fixes to RSA test vectors cyrpto: powerpc/aes - delete the rebundant word "block" in comments hwrng: via - Fix comment typo crypto: twofish - Fix comment typo crypto: rmd160 - fix Kconfig "its" grammar crypto: keembay-ocs-ecc - Drop if with an always false condition Documentation: qat: rewrite description Documentation: qat: Use code block for qat sysfs example crypto: lib - add module license to libsha1 crypto: lib - make the sha1 library optional crypto: lib - move lib/sha1.c into lib/crypto/ crypto: fips - make proc files report fips module name and version ...	2022-08-02 17:45:14 -07:00
Jason Wang	4f88619455	libceph: fix ceph_pagelist_reserve() comment typo The double `without' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:13 +02:00
Daichi Mukai	842d6b019b	libceph: print fsid and epoch with osd id Print fsid and epoch in libceph log messages to distinct from which each message come. [ idryomov: don't bother with gid for now, print epoch instead ] Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com> Signed-off-by: Daichi Mukai <daichi-mukai@cybozu.co.jp> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Li Qiong	fc54cb8d87	libceph: check pointer before assigned to "c->rules[]" It should be better to check pointer firstly, then assign it to c->rules[]. Refine code a little bit. Signed-off-by: Li Qiong <liqiong@nfschina.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2022-08-03 00:54:12 +02:00
Linus Torvalds	42df1cbf6a	for-5.20/io_uring-zerocopy-send-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm/MQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoaXD/9Nevo4KQmlG83ZcZfu2d51VlGtt6/Dl7LL pr07RfnRFJcjeCPCwXCXmu6rrlY+inpfEWv9iCR/ImoeESOJCzm0dN/nlffO/zT1 E0h5AlEoDv2bYrCnVkbfvxL722TZqGeLiDE4YY1jVbuUfs3TDmLQzfGbORK+Zw4y wPEMDZP1yWHoyeHUGWFasu6dpWiAwsZ4sTX0J631YwIBDNWKZqtienIiY15rK4dz GioBea6voe8Fos0VEhCBOKXMmV9mG4yVOPeaDbTWTRfuzGNF8b7t2vg7mz+PrbBY M8h1oEt+/+FnsCIZqfaEUzqHX6quv46OVtq/F5L3yNz/5QEsnqfv08ZFwD3sXdgZ /RFxXamfcn/LoxzZ9eLu3MeyzpXp6frxBcgTNGc3q2TlIwXr1WsIx2N4PxZh00GM ssW/ulaOZvZmOmDlbdeSC7sp3R1JmHO4qVlHowr58ce8pkishNTwlZZGr0sHyeNq /Wkd9NQEQEFD6AIzZ/Mz9CsmzHeHYpy6GhicFrcLuU4YF/fnQ6T4hTjlIlucGv/S IeqoAHrurCB0/p1ml6VfJ58xUWXNCCCkKC5+xu8Vm6/RgMlIw5KkzvVEBfflnomB wVJLYsLw41gnlqqpwISR39I7cDV+s6xC5P8YAA/NLz692HDIUrRX14dlbZuXIgbc ROeHB2N5+g== =vSwm -----END PGP SIGNATURE----- Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring zerocopy support from Jens Axboe: "This adds support for efficient support for zerocopy sends through io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and UDP. The core network changes to support this is in a stable branch from Jakub that both io_uring and net-next has pulled in, and the io_uring changes are layered on top of that. All of the work has been done by Pavel" * tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits) io_uring: notification completion optimisation io_uring: export req alloc from core io_uring/net: use unsigned for flags io_uring/net: make page accounting more consistent io_uring/net: checks errors of zc mem accounting io_uring/net: improve io_get_notif_slot types selftests/io_uring: test zerocopy send io_uring: enable managed frags with register buffers io_uring: add zc notification flush requests io_uring: rename IORING_OP_FILES_UPDATE io_uring: flush notifiers after sendzc io_uring: sendzc with fixed buffers io_uring: allow to pass addr into sendzc io_uring: account locked pages for non-fixed zc io_uring: wire send zc request type io_uring: add notification slot registration io_uring: add rsrc referencing for notifiers io_uring: complete notifiers in tw io_uring: cache struct io_notif io_uring: add zc notification infrastructure ...	2022-08-02 13:37:55 -07:00
Linus Torvalds	b349b1181d	for-5.20/io_uring-2022-07-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg 6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7 0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp 2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3 Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2 43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw 5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+ NiU6VihPOw== =gxeV -----END PGP SIGNATURE----- Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: - As per (valid) complaint in the last merge window, fs/io_uring.c has grown quite large these days. io_uring isn't really tied to fs either, as it supports a wide variety of functionality outside of that. Move the code to io_uring/ and split it into files that either implement a specific request type, and split some code into helpers as well. The code is organized a lot better like this, and io_uring.c is now < 4K LOC (me). - Deprecate the epoll_ctl opcode. It'll still work, just trigger a warning once if used. If we don't get any complaints on this, and I don't expect any, then we can fully remove it in a future release (me). - Improve the cancel hash locking (Hao) - kbuf cleanups (Hao) - Efficiency improvements to the task_work handling (Dylan, Pavel) - Provided buffer improvements (Dylan) - Add support for recv/recvmsg multishot support. This is similar to the accept (or poll) support for have for multishot, where a single SQE can trigger everytime data is received. For applications that expect to do more than a few receives on an instantiated socket, this greatly improves efficiency (Dylan). - Efficiency improvements for poll handling (Pavel) - Poll cancelation improvements (Pavel) - Allow specifiying a range for direct descriptor allocations (Pavel) - Cleanup the cqe32 handling (Pavel) - Move io_uring types to greatly cleanup the tracing (Pavel) - Tons of great code cleanups and improvements (Pavel) - Add a way to do sync cancelations rather than through the sqe -> cqe interface, as that's a lot easier to use for some use cases (me). - Add support to IORING_OP_MSG_RING for sending direct descriptors to a different ring. This avoids the usually problematic SCM case, as we disallow those. (me) - Make the per-command alloc cache we use for apoll generic, place limits on it, and use it for netmsg as well (me). - Various cleanups (me, Michal, Gustavo, Uros) * tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...	2022-08-02 13:20:44 -07:00
Ammar Faizi	80ef928643	net: devlink: Fix missing mutex_unlock() call Commit `2dec18ad82` forgets to call mutex_unlock() before the function returns in the error path: New smatch warnings: net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \ returns '&region->snapshot_lock'. Make sure we call mutex_unlock() in this error path. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `2dec18ad82` ("net: devlink: remove region snapshots list dependency on devlink->lock") Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:47:10 -07:00
Tariq Toukan	d81c7cdd7a	net/tls: Remove redundant workqueue flush before destroy destroy_workqueue() safely destroys the workqueue after draining it. No need for the explicit call to flush_workqueue(). Remove it. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:44:38 -07:00
Xie Shaowen	062cf5ebc2	net: dsa: Fix spelling mistakes and cleanup code fix follow spelling misktakes: desconstructed ==> deconstructed enforcment ==> enforcement Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Link: https://lore.kernel.org/r/20220730092254.3102875-1-studentxswpy@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:23:06 -07:00
Hangyu Hua	a41b17ff9d	dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is released before sock_alloc_send_skb and then relocked after sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push to add skb to an already full sk_write_queue: thread1--->lock thread1--->dccp_qpolicy_full: queue is full. drop a skb thread1--->unlock thread2--->lock thread2--->dccp_qpolicy_full: queue is not full. no need to drop. thread2--->unlock thread1--->lock thread1--->dccp_qpolicy_push: add a skb. queue is full. thread1--->unlock thread2--->lock thread2--->dccp_qpolicy_push: add a skb! thread2--->unlock Fix this by moving dccp_qpolicy_full. Fixes: `b1308dc015` ("[DCCP]: Set TX Queue Length Bounds via Sysctl") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:11:56 -07:00
Eric Dumazet	2df91e397d	net: rose: add netdev ref tracker to 'struct rose_sock' This will help debugging netdevice refcount problems with CONFIG_NET_DEV_REFCNT_TRACKER=y Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 11:59:23 -07:00
Eric Dumazet	931027820e	net: rose: fix netdev reference changes Bernard reported that trying to unload rose module would lead to infamous messages: unregistered_netdevice: waiting for rose0 to become free. Usage count = xx This patch solves the issue, by making sure each socket referring to a netdevice holds a reference count on it, and properly releases it in rose_release(). rose_dev_first() is also fixed to take a device reference before leaving the rcu_read_locked section. Following patch will add ref_tracker annotations to ease future bug hunting. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 11:59:23 -07:00
Jiri Pirko	09b278462f	net: devlink: enable parallel ops on netlink interface As the devlink_mutex was removed and all devlink instances are protected individually by devlink->lock mutex, allow the netlink ops to run in parallel and therefore allow user to execute commands on multiple devlink instances simultaneously. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 12:14:00 +01:00
Jiri Pirko	d3efc2a6a6	net: devlink: remove devlink_mutex All accesses to devlink structure from userspace and drivers are locked with devlink->lock instance mutex. Also, devlinks xa_array iteration is taken care of by iteration helpers taking devlink reference. Therefore, remove devlink_mutex as it is no longer needed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 12:14:00 +01:00
Jiri Pirko	644a66c60f	net: devlink: convert reload command to take implicit devlink->lock Convert reload command to behave the same way as the rest of the commands and let if be called with devlink->lock held. Remove the temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK flag is no longer used, remove it alongside. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 12:14:00 +01:00
Jiri Pirko	c2368b1980	net: devlink: introduce "unregistering" mark and use it during devlinks iteration Add new mark called "unregistering" to be set at the beginning of devlink_unregister() function. Check this mark during devlinks iteration in order to prevent getting a reference of devlink which is being currently unregistered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 12:14:00 +01:00
Kuniyuki Iwashima	02a7cb2866	udp: Remove redundant __udp_sysctl_init() call from udp_init(). __udp_sysctl_init() is called for init_net via udp_sysctl_ops. While at it, we can rename __udp_sysctl_init() to udp_sysctl_init(). Fixes: `1e80295158` ("udp: Move the udp sysctl to namespace.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 12:07:53 +01:00
Li Qiong	5121db6afb	net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug() If 'local_odp_mr->r_trans_private' is a error code, it is better to print the error code than to print the value of IS_ERR(). Signed-off-by: Li Qiong <liqiong@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-01 11:45:15 +01:00
Steven Rostedt (Google)	9abc291812	batman-adv: tracing: Use the new __vstring() helper Instead of open coding a __dynamic_array() with a fixed length (which defeats the purpose of the dynamic array in the first place). Use the new __vstring() helper that will use a va_list and only write enough of the string into the ring buffer that is needed. Link: https://lkml.kernel.org/r/20220724191650.236b1355@rorschach.local.home Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: netdev@vger.kernel.org Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2022-07-30 13:52:47 -04:00
Yu Zhe	0f14a8351a	dn_route: replace "jiffies-now>0" with "jiffies!=now" Use "jiffies != now" to replace "jiffies - now > 0" to make code more readable. We want to put a limit on how long the loop can run for before rescheduling. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Link: https://lore.kernel.org/r/20220729061712.22666-1-yuzhe@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-29 20:12:49 -07:00
Jakub Kicinski	5fc7c5887c	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-07-29 We've added 22 non-merge commits during the last 4 day(s) which contain a total of 27 files changed, 763 insertions(+), 120 deletions(-). The main changes are: 1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper, from Paul Chaignon. 2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong. 3) Fix for XDP frame length calculation, from Lorenzo Bianconi. 4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate s390x quirks with socketcall(), from Ilya Leoshkevich. 5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI, from Daniel Müller. 6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai. 7) Fix allocation warnings in netdevsim, from Jakub Kicinski. 8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton. 9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev. 10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei, Jörn-Thorben Hinz, Yang Li. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits) bpf: Remove unneeded semicolon libbpf: Add bpf_obj_get_opts() netdevsim: Avoid allocation warnings triggered from user space bpf: Fix NULL pointer dereference when registering bpf trampoline bpf: Fix test_progs -j error with fentry/fexit tests selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout bpftool: Don't try to return value from void function in skeleton bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro bpf: btf: Fix vsnprintf return value check libbpf: Support PPC in arch_specific_syscall_pfx selftests/bpf: Adjust vmtest.sh to use local kernel configuration selftests/bpf: Copy over libbpf configs selftests/bpf: Sort configuration selftests/bpf: Attach to socketcall() in test_probe_user libbpf: Extend BPF_KSYSCALL documentation bpf, devmap: Compute proper xdp_frame len redirecting frames bpf: Fix bpf_xdp_pointer return pointer selftests/bpf: Don't assign outer source IP to host bpf: Set flow flag to allow any source IP in bpf_tunnel_key geneve: Use ip_tunnel_key flow flags in route lookups ... ==================== Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-29 19:04:29 -07:00
Chuck Lever	28fffa6c57	SUNRPC: Expand the svc_alloc_arg_err tracepoint Record not only the number of pages requested, but the number of pages that were actually allocated, to get a measure of progress (or lack thereof). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2022-07-29 20:08:56 -04:00

1 2 3 4 5 ...

70416 Commits