linux

Author	SHA1	Message	Date
Menglong Dong	10580c4791	net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu() Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu(). Following new drop reasons are introduced: SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-07 11:18:49 +00:00
Menglong Dong	c1f166d1f7	net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core() Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(), following drop reasons are introduced: SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-07 11:18:49 +00:00
Menglong Dong	33cba42985	net: ipv4: use kfree_skb_reason() in ip_rcv_core() Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new drop reasons are introduced: SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-07 11:18:49 +00:00
Menglong Dong	2df3041ba3	net: netfilter: use kfree_drop_reason() for NF_DROP Replace kfree_skb() with kfree_skb_reason() in nf_hook_slow() when skb is dropped by reason of NF_DROP. Following new drop reasons are introduced: SKB_DROP_REASON_NETFILTER_DROP Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-07 11:18:49 +00:00
Eric Dumazet	28f9222138	net/smc: fix ref_tracker issue in smc_pnet_add() I added the netdev_tracker_alloc() right after ndev was stored into the newly allocated object: new_pe->ndev = ndev; if (ndev) netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL); But I missed that later, we could end up freeing new_pe, then calling dev_put(ndev) to release the reference on ndev. The new_pe->dev_tracker would not be freed. To solve this issue, move the netdev_tracker_alloc() call to the point we know for sure new_pe will be kept. syzbot report (on net-next tree, but the bug is present in net tree) WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18 RSP: 0018:ffffc900043b7400 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85 R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000 FS: 00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119 netdev_tracker_free include/linux/netdevice.h:3867 [inline] dev_put_track include/linux/netdevice.h:3884 [inline] dev_put_track include/linux/netdevice.h:3880 [inline] dev_put include/linux/netdevice.h:3910 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline] smc_pnet_enter net/smc/smc_pnet.c:493 [inline] smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `b60645248a` ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-06 11:08:03 +00:00
Eric Dumazet	9c1be1935f	net: initialize init_net earlier While testing a patch that will follow later ("net: add netns refcount tracker to struct nsproxy") I found that devtmpfs_init() was called before init_net was initialized. This is a bug, because devtmpfs_setup() calls ksys_unshare(CLONE_NEWNS); This has the effect of increasing init_net refcount, which will be later overwritten to 1, as part of setup_net(&init_net) We had too many prior patches [1] trying to work around the root cause. Really, make sure init_net is in BSS section, and that net_ns_init() is called earlier at boot time. Note that another patch ("vfs: add netns refcount tracker to struct fs_context") also will need net_ns_init() being called before vfs_caches_init() As a bonus, this patch saves around 4KB in .data section. [1] `f8c46cb390` ("netns: do not call pernet ops for not yet set up init_net namespace") `b5082df801` ("net: Initialise init_net.count to 1") `734b65417b` ("net: Statically initialize init_net.dev_base_head") v2: fixed a build error reported by kernel build bots (CONFIG_NET=n) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-06 11:04:29 +00:00
Juhee Kang	4acc45db71	net: hsr: use hlist_head instead of list_head for mac addresses Currently, HSR manages mac addresses of known HSR nodes by using list_head. It takes a lot of time when there are a lot of registered nodes due to finding specific mac address nodes by using linear search. We can be reducing the time by using hlist. Thus, this patch moves list_head to hlist_head for mac addresses and this allows for further improvement of network performance. Condition: registered 10,000 known HSR nodes Before: # iperf3 -c 192.168.10.1 -i 1 -t 10 Connecting to host 192.168.10.1, port 5201 [ 5] local 192.168.10.2 port 59442 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.49 sec 3.75 MBytes 21.1 Mbits/sec 0 158 KBytes [ 5] 1.49-2.05 sec 1.25 MBytes 18.7 Mbits/sec 0 166 KBytes [ 5] 2.05-3.06 sec 2.44 MBytes 20.3 Mbits/sec 56 16.9 KBytes [ 5] 3.06-4.08 sec 1.43 MBytes 11.7 Mbits/sec 11 38.0 KBytes [ 5] 4.08-5.00 sec 951 KBytes 8.49 Mbits/sec 0 56.3 KBytes After: # iperf3 -c 192.168.10.1 -i 1 -t 10 Connecting to host 192.168.10.1, port 5201 [ 5] local 192.168.10.2 port 36460 connected to 192.168.10.1 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 7.39 MBytes 62.0 Mbits/sec 3 130 KBytes [ 5] 1.00-2.00 sec 5.06 MBytes 42.4 Mbits/sec 16 113 KBytes [ 5] 2.00-3.00 sec 8.58 MBytes 72.0 Mbits/sec 42 94.3 KBytes [ 5] 3.00-4.00 sec 7.44 MBytes 62.4 Mbits/sec 2 131 KBytes [ 5] 4.00-5.07 sec 8.13 MBytes 63.5 Mbits/sec 38 92.9 KBytes Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-06 10:55:52 +00:00
Eric Dumazet	5a8fb33e53	skmsg: convert struct sk_msg_sg::copy to a bitmap We have plans for increasing MAX_SKB_FRAGS, but sk_msg_sg::copy is currently an unsigned long, limiting MAX_SKB_FRAGS to 30 on 32bit arches. Convert it to a bitmap, as Jakub suggested. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:34:47 +00:00
Eric Dumazet	4c6c11ea0f	net: refine dev_put()/dev_hold() debugging We are still chasing some syzbot reports where we think a rogue dev_put() is called with no corresponding prior dev_hold(). Unfortunately it eats a reference on dev->dev_refcnt taken by innocent dev_hold_track(), meaning that the refcount saturation splat comes too late to be useful. Make sure that 'not tracked' dev_put() and dev_hold() better use CONFIG_NET_DEV_REFCNT_TRACKER=y debug infrastructure: Prior patch in the series allowed ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and to use a separate refcount only to detect too many put() even in the following case: dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here Add clarification about netdev_tracker_alloc() role. v2: I replaced the dev_put() in linkwatch_do_dev() with __dev_put() because callers called netdev_tracker_free(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:22:45 +00:00
Eric Dumazet	f2f2325ec7	ip6mr: ip6mr_sk_done() can exit early in common cases In many cases, ip6mr_sk_done() is called while no ipmr socket has been registered. This removes 4 rtnl acquisitions per netns dismantle, with following callers: igmp6_net_exit(), tcpv6_net_exit(), ndisc_net_exit() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:20:34 +00:00
Eric Dumazet	145c7a7938	ipv6: make mc_forwarding atomic This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:20:34 +00:00
Paolo Abeni	de5a1f3ce4	net: gro: minor optimization for dev_gro_receive() While inspecting some perf report, I noticed that the compiler emits suboptimal code for the napi CB initialization, fetching and storing multiple times the memory for flags bitfield. This is with gcc 10.3.1, but I observed the same with older compiler versions. We can help the compiler to do a nicer work clearing several fields at once using an u32 alias. The generated code is quite smaller, with the same number of conditional. Before: objdump -t net/core/gro.o \| grep " F .text" 0000000000000bb0 l F .text 0000000000000357 dev_gro_receive After: 0000000000000bb0 l F .text 000000000000033c dev_gro_receive v1 -> v2: - use struct_group (Alexander and Alex) RFC -> v1: - use __struct_group to delimit the zeroed area (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:13:52 +00:00
Paolo Abeni	7881453e4a	net: gro: avoid re-computing truesize twice on recycle After commit `5e10da5385` ("skbuff: allow 'slow_gro' for skb carring sock reference") and commit `af352460b4` ("net: fix GRO skb truesize update") the truesize of the skb with stolen head is properly updated by the GRO engine, we don't need anymore resetting it at recycle time. v1 -> v2: - clarify the commit message (Alexander) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 15:13:52 +00:00
Paul Blakey	35d39fecbc	net/sched: Enable tc skb ext allocation on chain miss only when needed Currently tc skb extension is used to send miss info from tc to ovs datapath module, and driver to tc. For the tc to ovs miss it is currently always allocated even if it will not be used by ovs datapath (as it depends on a requested feature). Export the static key which is used by openvswitch module to guard this code path as well, so it will be skipped if ovs datapath doesn't need it. Enable this code path once ovs datapath needs it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-05 10:12:53 +00:00
Geliang Tang	09f12c3ab7	mptcp: allow to use port and non-signal in set_flags It's illegal to use both port and non-signal flags for adding address. But it's legal to use both of them for setting flags, which always uses non-signal flags, backup or fullmesh. This patch moves this non-signal flag with port check from mptcp_pm_parse_addr() to mptcp_nl_cmd_add_addr(). Do the check only when adding addresses, not setting flags or deleting addresses. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-04 20:30:23 -08:00
Justin Iurman	08731d30e7	ipv6: ioam: Insertion frequency in lwtunnel output Add support for the IOAM insertion frequency inside its lwtunnel output function. This patch introduces a new (atomic) counter for packets, based on which the algorithm will decide if IOAM should be added or not. Default frequency is "1/1" (i.e., applied to all packets) for backward compatibility. The iproute2 patch is ready and will be submitted as soon as this one is accepted. Previous iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ mode ... ] ... New iproute2 command: ip -6 ro ad fc00::1/128 encap ioam6 [ freq k/n ] [ mode ... ] ... Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-04 20:24:45 -08:00
Eric Dumazet	f8d9d93851	tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) case syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY) calls over the same TCP socket would again trigger the infamous warning in inet_sock_destruct() WARN_ON(sk_forward_alloc_get(sk)); While Talal took into account a mix of regular copied data and MSG_ZEROCOPY one in the same skb, the sendpage() path has been forgotten. We want the charging to happen for sendpage(), because pages could be coming from a pipe. What is missing is the downgrading of pure zerocopy status to make sure sk_forward_alloc will stay synced. Add tcp_downgrade_zcopy_pure() helper so that we can use it from the two callers. Fixes: `9b65b17db7` ("net: avoid double accounting for pure zerocopy skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-04 20:07:12 -08:00
Jakub Kicinski	c78b8b20e3	net: don't include ndisc.h from ipv6.h Nothing in ipv6.h needs ndisc.h, drop it. Link: https://lore.kernel.org/r/20220203043457.2222388-1-kuba@kernel.org Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Link: https://lore.kernel.org/r/20220203231240.2297588-1-kuba@kernel.org Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-04 14:15:11 -08:00
Linus Torvalds	cff7f2237c	Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "A patch to make it possible to disable zero copy path in the messenger to avoid checksum or authentication tag mismatches and ensuing session resets in case the destination buffer isn't guaranteed to be stable" * tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client: libceph: optionally use bounce buffer on recv path in crc mode libceph: make recv path in secure mode work the same as send path	2022-02-04 09:54:02 -08:00
Johannes Berg	f0a6fd1527	cfg80211: fix race in netlink owner interface destruction My previous fix here to fix the deadlock left a race where the exact same deadlock (see the original commit referenced below) can still happen if cfg80211_destroy_ifaces() already runs while nl80211_netlink_notify() is still marking some interfaces as nl_owner_dead. The race happens because we have two loops here - first we dev_close() all the netdevs, and then we destroy them. If we also have two netdevs (first one need only be a wdev though) then we can find one during the first iteration, close it, and go to the second iteration -- but then find two, and try to destroy also the one we didn't close yet. Fix this by only iterating once. Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Fixes: `ea6b2098dd` ("cfg80211: fix locking in netlink owner interface destruction") Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220201130951.22093-1-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:31:44 +01:00
Jiapeng Chong	c761161851	mac80211: Remove redundent assignment channel_type Fix the following coccicheck warnings: net/mac80211/util.c:3265:3: warning: Value stored to 'channel_type' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220113161557.129427-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:27:45 +01:00
Baligh Gasmi	45d33746d2	mac80211: remove useless ieee80211_vif_is_mesh() check We check ieee80211_vif_is_mesh() at the top if() block, there's no need to check for it again. Signed-off-by: Baligh Gasmi <gasmibal@gmail.com> Link: https://lore.kernel.org/r/20220203153035.198697-1-gasmibal@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:27:07 +01:00
Avraham Stern	ea5907db2a	mac80211: fix struct ieee80211_tx_info size The size of the status_driver_data field was not adjusted when the is_valid_ack_signal field was added. Since the size of struct ieee80211_tx_info is limited, replace the is_valid_ack_signal field with a flags field, and adjust the struct size accordingly. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.0ff363d4fa56.I45792c0187034a6d0e1c99a7db741996ef7caba3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:26:53 +01:00
Mordechay Goodstein	97634ef4bf	mac80211: mlme: validate peer HE supported rates We validate that AP has mandatory rates set in HE capabilities. Also we make sure AP is consistent with itself on rates set in HE basic rates required joining the BSS and rates set in HE capabilities. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.7023450fdf16.I194df59252097ba25a0a543456d4350f1607a538@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:26:40 +01:00
Johannes Berg	453a2a8205	mac80211: remove unused macros Various macros in mac80211 aren't used, remove them. In one case it's used under ifdef, so ifdef it for the W=2 warning. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.5172d7fd878e.I2f1fce686a2b71003f083b2566fb09cf16b8165a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:26:27 +01:00
Johannes Berg	1b198233a3	cfg80211: pmsr: remove useless ifdef guards This isn't a header file, I guess I must've copied from the header file and forgotten to remove the guards. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.330d03623b08.Idda91cd6f1c7bd865a50c47d408e5cdab0fd951f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:26:16 +01:00
Johannes Berg	ae962e5f63	mac80211: airtime: avoid variable shadowing This isn't very dangerous, since the outer 'rate' variable isn't even a pointer, but it's still confusing, so use a different variable inside. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.8e9b2bfaa0f5.I41c53f754eef28206d04dafc7263ccb99b63d490@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:25:55 +01:00
Mordechay Goodstein	6ad1dce5eb	mac80211: mlme: add documentation from spec to code Reference the spec why we decline HE support in case STA don't support all HE basic rates recurred by AP. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f1bafd0861b7.I566612d99bca5245dc06cbcc70369b94a525389c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:25:45 +01:00
Mordechay Goodstein	abd5a8e5cc	mac80211: vht: use HE macros for parsing HE capabilities IEEE80211_VHT_MCS_NOT_SUPPORTED and IEEE80211_HE_MCS_NOT_SUPPORTED have the same value so no real bug, but for code integrity use the HE macros for parsing HE capabilities. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.e974b7b3b217.I732cc7f770c7fa06e4840adb5d45d7ee99ac8eb5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:25:32 +01:00
Avraham Stern	5666ee154f	cfg80211: don't add non transmitted BSS to 6GHz scanned channels When adding 6GHz channels to scan request based on reported co-located APs, don't add channels that have only APs with "non-transmitted" BSSes if they only match the wildcard SSID since they will be found by probing the "transmitted" BSS. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d3644f292d00dfe0f42653269f2559e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:24:41 +01:00
Johannes Berg	667aa74264	cfg80211/mac80211: assume CHECKSUM_COMPLETE includes SNAP There's currently only one driver that reports CHECKSUM_COMPLETE, that is iwlwifi. The current hardware there calculates checksum after the SNAP header, but only RFC 1042 (and some other cases, but replicating the exact hardware logic for corner cases in the driver seemed awkward.) Newer generations of hardware will checksum _including_ the SNAP, which makes things easier. To handle that, simply always assume the checksum _includes_ the SNAP header, which this patch does, requiring to first add it for older iwlwifi hardware, and then remove it again later on conversion. Alternatively, we could have: 1) Always assumed the checksum starts _after_ the SNAP header; the problem with this is that we'd have to replace the exact "what is the SNAP" check in iwlwifi that cfg80211 has. 2) Made it configurable with some flag, but that seemed like too much complexity. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.230736e19e0e.I3e6745873585ad943c152fab9e23b5221f17a95f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:23:19 +01:00
Mordechay Goodstein	f39b7d62a1	mac80211: consider RX NSS in UHB connection In UHB connection we don't have any HT/VHT elemens so in order to calculate the max RX-NSS we need also to look at HE capa element, this causes to limit us to max rx nss in UHB to 1. Also anyway we need to look at HE max rx NSS and not only at HT/VHT capa to determine the max rx nss over the connection. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.3713e0dea5dd.I3b9a15b4c53465c3f86f35459e9dc15ae4ea2abd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:23:01 +01:00
Johannes Berg	1f2c104448	mac80211: limit bandwidth in HE capabilities If we're limiting bandwidth for some reason such as regulatory restrictions, then advertise that limitation just like we do for VHT today, so the AP is aware we cannot use the higher BW it might be using. Fixes: `41cbb0f5a2` ("mac80211: add support for HE") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.70c8e3e7ee76.If317630de69ff1146bec7d47f5b83038695eb71d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-02-04 16:22:39 +01:00
Jakub Kicinski	b93235e689	tls: cap the output scatter list to something reasonable TLS recvmsg() passes user pages as destination for decrypt. The decrypt operation is repeated record by record, each record being 16kB, max. TLS allocates an sg_table and uses iov_iter_get_pages() to populate it with enough pages to fit the decrypted record. Even though we decrypt a single message at a time we size the sg_table based on the entire length of the iovec. This leads to unnecessarily large allocations, risking triggering OOM conditions. Use iov_iter_truncate() / iov_iter_reexpand() to construct a "capped" version of iov_iter_npages(). Alternatively we could parametrize iov_iter_npages() to take the size as arg instead of using i->count, or do something else.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-04 10:14:07 +00:00
Florian Westphal	c828414ac9	netfilter: nft_compat: suppress comment match No need to have the datapath call the always-true comment match stub. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	7890cbea66	netfilter: exthdr: add support for tcp option removal This allows to replace a tcp option with nop padding to selectively disable a particular tcp option. Optstrip mode is chosen when userspace passes the exthdr expression with neither a source nor a destination register attribute. This is identical to xtables TCPOPTSTRIP extension. The only difference is that TCPOPTSTRIP allows to pass in a bitmap of options to remove rather than a single number. Unlike TCPOPTSTRIP this expression can be used multiple times in the same rule to get the same effect. We could add a new nested attribute later on in case there is a use case for single-expression-multi-remove. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	20ff320246	netfilter: conntrack: pptp: use single option structure Instead of exposing the four hooks individually use a sinle hook ops structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	1015c3de23	netfilter: conntrack: remove extension register api These no longer register/unregister a meaningful structure so remove it. Cc: Paul Blakey <paulb@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	1bc91a5ddf	netfilter: conntrack: handle ->destroy hook via nat_ops instead The nat module already exposes a few functions to the conntrack core. Move the nat extension destroy hook to it. After this, no conntrack extension needs a destroy hook. 'struct nf_ct_ext_type' and the register/unregister api can be removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	5f31edc067	netfilter: conntrack: move extension sizes into core No need to specify this in the registration modules, we already collect all sizes for build-time checks on the maximum combined size. After this change, all extensions except nat have no meaningful content in their nf_ct_ext_type struct definition. Next patch handles nat, this will then allow to remove the dynamic register api completely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Florian Westphal	bb62a765b1	netfilter: conntrack: make all extensions 8-byte alignned All extensions except one need 8 byte alignment, so just make that the default. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:28 +01:00
Nicolas Dichtel	8b54136472	netfilter: nfqueue: enable to get skb->priority This info could be useful to improve traffic analysis. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:27 +01:00
Kevin Mitchell	5bed9f3f63	netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY The udp_error function verifies the checksum of incoming UDP packets if one is set. This has the desirable side effect of setting skb->ip_summed to CHECKSUM_COMPLETE, signalling that this verification need not be repeated further up the stack. Conversely, when the UDP checksum is empty, which is perfectly legal (at least inside IPv4), udp_error previously left no trace that the checksum had been deemed acceptable. This was a problem in particular for nf_reject_ipv4, which verifies the checksum in nf_send_unreach() before sending ICMP_DEST_UNREACH. It makes no accommodation for zero UDP checksums unless they are already marked as CHECKSUM_UNNECESSARY. This commit ensures packets with empty UDP checksum are marked as CHECKSUM_UNNECESSARY, which is explicitly recommended in skbuff.h. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 06:30:27 +01:00
Florian Westphal	d1ca60efc5	netfilter: ctnetlink: disable helper autoassign When userspace, e.g. conntrackd, inserts an entry with a specified helper, its possible that the helper is lost immediately after its added: ctnetlink_create_conntrack -> nf_ct_helper_ext_add + assign helper -> ctnetlink_setup_nat -> ctnetlink_parse_nat_setup -> parse_nat_setup -> nfnetlink_parse_nat_setup -> nf_nat_setup_info -> nf_conntrack_alter_reply -> __nf_ct_try_assign_helper ... and __nf_ct_try_assign_helper will zero the helper again. Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like when helper is assigned via ruleset. Dropped old 'not strictly necessary' comment, it referred to use of rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER(). NB: Fixes tag intentionally incorrect, this extends the referenced commit, but this change won't build without IPS_HELPER introduced there. Fixes: `6714cf5465` ("netfilter: nf_conntrack: fix explicit helper attachment and NAT") Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 05:39:57 +01:00
Florian Westphal	82b72cb946	netfilter: conntrack: re-init state for retransmitted syn-ack TCP conntrack assumes that a syn-ack retransmit is identical to the previous syn-ack. This isn't correct and causes stuck 3whs in some more esoteric scenarios. tcpdump to illustrate the problem: client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7] server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663] Note the invalid/outdated synack ack number. Conntrack marks this syn-ack as out-of-window/invalid, but it did initialize the reply direction parameters based on this packets content. client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7] ... retransmit... server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663] and another bogus synack. This repeats, then client re-uses for a new attempt: client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7] server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663] ... but still gets a invalid syn-ack. This repeats until: server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663] server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663] client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7] server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583] This syn-ack has the correct ack number, but conntrack flags it as invalid: The internal state was created from the first syn-ack seen so the sequence number of the syn-ack is treated as being outside of the announced window. Don't assume that retransmitted syn-ack is identical to previous one. Treat it like the first syn-ack and reinit state. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 05:39:51 +01:00
Florian Westphal	cc4f9d6203	netfilter: conntrack: move synack init code to helper It seems more readable to use a common helper in the followup fix rather than copypaste or goto. No functional change intended. The function is only called for syn-ack or syn in repy direction in case of simultaneous open. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 05:39:27 +01:00
Florian Westphal	a9e8503def	netfilter: nft_payload: don't allow th access for fragments Loads relative to ->thoff naturally expect that this points to the transport header, but this is only true if pkt->fragoff == 0. This has little effect for rulesets with connection tracking/nat because these enable ip defra. For other rulesets this prevents false matches. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 05:38:15 +01:00
Florian Westphal	77b337196a	netfilter: conntrack: don't refresh sctp entries in closed state Vivek Thrivikraman reported: An SCTP server application which is accessed continuously by client application. When the session disconnects the client retries to establish a connection. After restart of SCTP server application the session is not established because of stale conntrack entry with connection state CLOSED as below. (removing this entry manually established new connection): sctp 9 CLOSED src=10.141.189.233 [..] [ASSURED] Just skip timeout update of closed entries, we don't want them to stay around forever. Reported-and-tested-by: Vivek Thrivikraman <vivek.thrivikraman@est.tech> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1579 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-02-04 05:38:15 +01:00
Eric Dumazet	25ee1660a5	net: minor __dev_alloc_name() optimization __dev_alloc_name() allocates a private zeroed page, then sets bits in it while iterating through net devices. It can use __set_bit() to avoid unnecessary locked operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220203064609.3242863-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-03 19:11:14 -08:00
Jakub Kicinski	c59400a68c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-03 17:36:16 -08:00

... 29 30 31 32 33 ...

69599 Commits