linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-25 21:51:40 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	a717932db1	netfilter pull request 24-01-24 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWxX0UACgkQ1V2XiooU IOTN2xAAq60JXo1A05dnma3DzXxIsCQKmum+ph1ii8gvZ2qqJT5+CVk0pMuHXEfi UPt/FfGIC3WZQ7yOLLxkGeUv8g7rxCncIsJtjxQSTh1gaQhePiATMUpwIhJ5W5tq QUw6DrZfv3Y95Gth61BDokBEhGVntWTV2ra608gx5PrpXQvCid7CJqKeg3hzaoSr cWonnRsxlwHdu1R4vqrZwnEMj6BBJfviOvS9HPGEul9LRQneXNRMuEJ0L73vjU9h gdh8oQHxoAgOHsK1KczNK9no9rnGgmLS9K98tBjRYdJwfxsI4YSjL3/+FnxTgWgp GsGZP+/+aQD+GKAN/eAO7cjfaSZweqhr8JxaUdyOZRqeQYw2v3SV2zyEKTw2hahM 1JvY7oEtthMMylPxJac075jbeizoNRU9J5AeJNqWexMZK5P6Yb0BC5DRUeHC0+RN y99Yj5z4fwMmBgWdTBXnIm0k87bouQsQBDS3Rn7QTD2b1RhQTTEar+W816QyOBng 8tllV25b4SOtXc6xouqP4YgCB1wuYN74Bvxb+vgGK2mj9oTLpTMEtFXXduenqoGl wQutx0fumJMAHTIHz77RDlTAanEbFOEq35R0PV+cfm9173inI77bZyGcznWlM+Zn fM+Gr1ka4JmO6RegYJHh3kUrM2l43zVmN8O1gQCnkRuJJZ6BwVk= =SirU -----END PGP SIGNATURE----- Merge tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Update nf_tables kdoc to keep it in sync with the code, from George Guo. 2) Handle NETDEV_UNREGISTER event for inet/ingress basechain. 3) Reject configuration that cause nft_limit to overflow, from Florian Westphal. 4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal. 5) Disallow to encode queue number and error in verdicts. This reverts a patch which seems to have introduced an early attempt to support for nfqueue maps, which is these days supported via nft_queue expression. 6) Sanitize family via .validate for expressions that explicitly refer to NF_INET_* hooks. * tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: validate NFPROTO_* family netfilter: nf_tables: reject QUEUE/DROP verdict parameters netfilter: nf_tables: restrict anonymous set and map names to 16 bytes netfilter: nft_limit: reject configurations that cause integer overflow netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain netfilter: nf_tables: cleanup documentation ==================== Link: https://lore.kernel.org/r/20240124191248.75463-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 21:03:17 -08:00
Zhipeng Lu	f6cc4b6a3a	fjes: fix memleaks in fjes_hw_setup In fjes_hw_setup, it allocates several memory and delay the deallocation to the fjes_hw_exit in fjes_probe through the following call chain: fjes_probe \|-> fjes_hw_init \|-> fjes_hw_setup \|-> fjes_hw_exit However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus all the resources allocated in fjes_hw_setup will be leaked. In this patch, we free those resources in fjes_hw_setup and prevents such leaks. Fixes: `2fcbca6877` ("fjes: platform_driver's .probe and .remove routine") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 18:03:53 -08:00
Jakub Kicinski	77be22473b	Merge branch 'fix-module_description-for-net-p2' Breno Leitao says: ==================== Fix MODULE_DESCRIPTION() for net (p2) There are hundreds of network modules that misses MODULE_DESCRIPTION(), causing a warnning when compiling with W=1. Example: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o This part2 of the patchset focus on the drivers/net/ethernet drivers. There are still some missing warnings in drivers/net/ethernet that will be fixed in an upcoming patchset. v1: https://lore.kernel.org/all/20240122184543.2501493-2-leitao@debian.org/ ==================== Link: https://lore.kernel.org/r/20240123190332.677489-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:56 -08:00
Breno Leitao	bdc6734115	net: fill in MODULE_DESCRIPTION()s for rvu_mbox W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Marvel RVU mbox driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-11-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:21 -08:00
Breno Leitao	07d1e0ce87	net: fill in MODULE_DESCRIPTION()s for litex W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the LiteX Liteeth Ethernet device. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Gabriel Somlo <gsomlo@gmail.com> Link: https://lore.kernel.org/r/20240123190332.677489-10-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:21 -08:00
Breno Leitao	8183c470c1	net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Freescale PQ MDIO driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-9-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:21 -08:00
Breno Leitao	2e87576488	net: fill in MODULE_DESCRIPTION()s for fec W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the FEC (MPC8xx) Ethernet controller. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20240123190332.677489-8-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	07c42d2375	net: fill in MODULE_DESCRIPTION()s for enetc W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the NXP ENETC Ethernet driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-7-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	27881ca8c8	net: fill in MODULE_DESCRIPTION()s for nps_enet W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the EZchip NPS ethernet driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-6-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	53c83e2d36	net: fill in MODULE_DESCRIPTION()s for ep93xxx_eth W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Cirrus EP93xx ethernet driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	bb567fbbbb	net: fill in MODULE_DESCRIPTION()s for liquidio W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Cavium Liquidio. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-4-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	39535d7ff6	net: fill in MODULE_DESCRIPTION()s for Broadcom bgmac W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Broadcom iProc GBit driver. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240123190332.677489-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Breno Leitao	f5e414167b	net: fill in MODULE_DESCRIPTION()s for 8390 W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to all the good old 8390 modules and drivers. Signed-off-by: Breno Leitao <leitao@debian.org> CC: geert@linux-m68k.org Link: https://lore.kernel.org/r/20240123190332.677489-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:12:20 -08:00
Jakub Kicinski	0879020a78	selftests: netdevsim: fix the udp_tunnel_nic test This test is missing a whole bunch of checks for interface renaming and one ifup. Presumably it was only used on a system with renaming disabled and NetworkManager running. Fixes: `91f430b2c4` ("selftests: net: add a test for UDP tunnel info infra") Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 15:11:10 -08:00
Jakub Kicinski	0719b5338a	selftests: net: fix rps_default_mask with >32 CPUs If there is more than 32 cpus the bitmask will start to contain commas, leading to: ./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected Remove the commas, bash doesn't interpret leading zeroes as oct so that should be good enough. Switch to bash, Simon reports that not all shells support this type of substitution. Fixes: `c12e0d5f26` ("self-tests: introduce self-tests for RPS default mask") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122195815.638997-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 13:55:19 -08:00
Jenishkumar Maheshbhai Patel	9f538b415d	net: mvpp2: clear BM pool before initialization Register value persist after booting the kernel using kexec which results in kernel panic. Thus clear the BM pool registers before initialisation to fix the issue. Fixes: `3f518509de` ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Jenishkumar Maheshbhai Patel <jpatel2@marvell.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://lore.kernel.org/r/20240119035914.2595665-1-jpatel2@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 12:27:33 -08:00
Bernd Edlinger	a5f5eee282	net: stmmac: Wait a bit for the reset to take effect otherwise the synopsys_id value may be read out wrong, because the GMAC_VERSION register might still be in reset state, for at least 1 us after the reset is de-asserted. Add a wait for 10 us before continuing to be on the safe side. > From what have you got that delay value? Just try and error, with very old linux versions and old gcc versions the synopsys_id was read out correctly most of the time (but not always), with recent linux versions and recnet gcc versions it was read out wrongly most of the time, but again not always. I don't have access to the VHDL code in question, so I cannot tell why it takes so long to get the correct values, I also do not have more than a few hardware samples, so I cannot tell how long this timeout must be in worst case. Experimentally I can tell that the register is read several times as zero immediately after the reset is de-asserted, also adding several no-ops is not enough, adding a printk is enough, also udelay(1) seems to be enough but I tried that not very often, and I have not access to many hardware samples to be 100% sure about the necessary delay. And since the udelay here is only executed once per device instance, it seems acceptable to delay the boot for 10 us. BTW: my hardware's synopsys id is 0x37. Fixes: `c5e4ddbdfa` ("net: stmmac: Add support for optional reset control") Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-24 12:19:59 -08:00
Pablo Neira Ayuso	d0009effa8	netfilter: nf_tables: validate NFPROTO_* family Several expressions explicitly refer to NF_INET_* hook definitions from expr->ops->validate, however, family is not validated. Bail out with EOPNOTSUPP in case they are used from unsupported families. Fixes: `0ca743a559` ("netfilter: nf_tables: add compatibility layer for x_tables") Fixes: `a3c90f7a23` ("netfilter: nf_tables: flow offload expression") Fixes: `2fa841938c` ("netfilter: nf_tables: introduce routing expression") Fixes: `554ced0a6e` ("netfilter: nf_tables: add support for native socket matching") Fixes: `ad49d86e07` ("netfilter: nf_tables: Add synproxy support") Fixes: `4ed8eb6570` ("netfilter: nf_tables: Add native tproxy support") Fixes: `6c47260250` ("netfilter: nf_tables: add xfrm expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:40 +01:00
Florian Westphal	f342de4e2f	netfilter: nf_tables: reject QUEUE/DROP verdict parameters This reverts commit `e0abdadcc6`. core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar, or 0. Due to the reverted commit, its possible to provide a positive value, e.g. NF_ACCEPT (1), which results in use-after-free. Its not clear to me why this commit was made. NF_QUEUE is not used by nftables; "queue" rules in nftables will result in use of "nft_queue" expression. If we later need to allow specifiying errno values from userspace (do not know why), this has to call NF_DROP_GETERR and check that "err <= 0" holds true. Fixes: `e0abdadcc6` ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters") Cc: stable@vger.kernel.org Reported-by: Notselwyn <notselwyn@pwning.tech> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:39 +01:00
Florian Westphal	b462579b2b	netfilter: nf_tables: restrict anonymous set and map names to 16 bytes nftables has two types of sets/maps, one where userspace defines the name, and anonymous sets/maps, where userspace defines a template name. For the latter, kernel requires presence of exactly one "%d". nftables uses "__set%d" and "__map%d" for this. The kernel will expand the format specifier and replaces it with the smallest unused number. As-is, userspace could define a template name that allows to move the set name past the 256 bytes upperlimit (post-expansion). I don't see how this could be a problem, but I would prefer if userspace cannot do this, so add a limit of 16 bytes for the '%d' template name. 16 bytes is the old total upper limit for set names that existed when nf_tables was merged initially. Fixes: `387454901b` ("netfilter: nf_tables: Allow set names of up to 255 chars") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:02:30 +01:00
Florian Westphal	c9d9eb9c53	netfilter: nft_limit: reject configurations that cause integer overflow Reject bogus configs where internal token counter wraps around. This only occurs with very very large requests, such as 17gbyte/s. Its better to reject this rather than having incorrect ratelimit. Fixes: `d2168e849e` ("netfilter: nft_limit: add per-byte limiting") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 20:01:16 +01:00
Pablo Neira Ayuso	01acb2e866	netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER event is reported, otherwise a stale reference to netdevice remains in the hook list. Fixes: `60a3815da7` ("netfilter: add inet ingress support") Cc: stable@vger.kernel.org Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 19:50:21 +01:00
George Guo	b253d87fd7	netfilter: nf_tables: cleanup documentation - Correct comments for nlpid, family, udlen and udata in struct nft_table, and afinfo is no longer a member of enum nft_set_class. - Add comment for data in struct nft_set_elem. - Add comment for flags in struct nft_ctx. - Add comments for timeout in struct nft_set_iter, and flags is not a member of struct nft_set_iter, remove the comment for it. - Add comments for commit, abort, estimate and gc_init in struct nft_set_ops. - Add comments for pending_update, num_exprs, exprs and catchall_list in struct nft_set. - Add comment for ext_len in struct nft_set_ext_tmpl. - Add comment for inner_ops in struct nft_expr_type. - Add comments for clone, destroy_clone, reduce, gc, offload, offload_action, offload_stats in struct nft_expr_ops. - Add comments for blob_gen_0, blob_gen_1, bound, genmask, udlen, udata, blob_next in struct nft_chain. - Add comment for flags in struct nft_base_chain. - Add comments for udlen, udata in struct nft_object. - Add comment for type in struct nft_object_ops. - Add comment for hook_list in struct nft_flowtable, and remove comments for dev_name and ops which are not members of struct nft_flowtable. Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-01-24 19:50:20 +01:00
Ido Schimmel	32f2a0afa9	net/sched: flower: Fix chain template offload When a qdisc is deleted from a net device the stack instructs the underlying driver to remove its flow offload callback from the associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack then continues to replay the removal of the filters in the block for this driver by iterating over the chains in the block and invoking the 'reoffload' operation of the classifier being used. In turn, the classifier in its 'reoffload' operation prepares and emits a 'FLOW_CLS_DESTROY' command for each filter. However, the stack does not do the same for chain templates and the underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when a qdisc is deleted. This results in a memory leak [1] which can be reproduced using [2]. Fix by introducing a 'tmplt_reoffload' operation and have the stack invoke it with the appropriate arguments as part of the replay. Implement the operation in the sole classifier that supports chain templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}' command based on whether a flow offload callback is being bound to a filter block or being unbound from one. As far as I can tell, the issue happens since cited commit which reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains() in __tcf_block_put(). The order cannot be reversed as the filter block is expected to be freed after flushing all the chains. [1] unreferenced object 0xffff888107e28800 (size 2048): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..\|......[...... 01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................ backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab374e>] __kmalloc+0x4e/0x90 [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0 unreferenced object 0xffff88816d2c0400 (size 1024): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): 40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8..... 10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m.... backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90 [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0 [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460 [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0 [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [2] # tc qdisc add dev swp1 clsact # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32 # tc qdisc del dev swp1 clsact # devlink dev reload pci/0000:06:00.0 Fixes: `bbf73830cd` ("net: sched: traverse chains in block with tcf_get_next_chain()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-24 01:33:59 +00:00
Jakub Kicinski	04fe7c5029	selftests: fill in some missing configs for net We are missing a lot of config options from net selftests, it seems: tun/tap: CONFIG_TUN, CONFIG_MACVLAN, CONFIG_MACVTAP fib_tests: CONFIG_NET_SCH_FQ_CODEL l2tp: CONFIG_L2TP, CONFIG_L2TP_V3, CONFIG_L2TP_IP, CONFIG_L2TP_ETH sctp-vrf: CONFIG_INET_DIAG txtimestamp: CONFIG_NET_CLS_U32 vxlan_mdb: CONFIG_BRIDGE_VLAN_FILTERING gre_gso: CONFIG_NET_IPGRE_DEMUX, CONFIG_IP_GRE, CONFIG_IPV6_GRE srv6_end_dt*_l3vpn: CONFIG_IPV6_SEG6_LWTUNNEL ip_local_port_range: CONFIG_MPTCP fib_test: CONFIG_NET_CLS_BASIC rtnetlink: CONFIG_MACSEC, CONFIG_NET_SCH_HTB, CONFIG_XFRM_INTERFACE CONFIG_NET_IPGRE, CONFIG_BONDING fib_nexthops: CONFIG_MPLS, CONFIG_MPLS_ROUTING vxlan_mdb: CONFIG_NET_ACT_GACT tls: CONFIG_TLS, CONFIG_CRYPTO_CHACHA20POLY1305 psample: CONFIG_PSAMPLE fcnal: CONFIG_TCP_MD5SIG Try to add them in a semi-alphabetical order. Fixes: `62199e3f16` ("selftests: net: Add VXLAN MDB test") Fixes: `c12e0d5f26` ("self-tests: introduce self-tests for RPS default mask") Fixes: `122db5e363` ("selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE") Link: https://lore.kernel.org/r/20240122203528.672004-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:22:58 -08:00
Michael Kelley	6941f67ad3	hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a result, the default VMBus ring buffer size on ARM64 with 64K page size is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120 Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer). Unfortunately, the module parameter specifying the ring buffer size is in units of 4 Kbyte pages. Ideally, it should be in units that are independent of PAGE_SIZE, but backwards compatibility prevents changing that now. Fix this by having netvsc_drv_init() hardcode 4096 instead of using PAGE_SIZE when calculating the ring buffer size in bytes. Also use the VMBUS_RING_SIZE macro to ensure proper alignment when running with page size larger than 4K. Cc: <stable@vger.kernel.org> # 5.15.x Fixes: `7aff79e297` ("Drivers: hv: Enable Hyper-V code to be built on ARM64") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:19:42 -08:00
Rahul Rameshbabu	3222bc997a	Revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb" This reverts commit `b34ab3527b`. Using skb_ensure_writable_head_tail without a call to skb_unshare causes the MACsec stack to operate on the original skb rather than a copy in the macsec_encrypt path. This causes the buffer to be exceeded in space, and leads to warnings generated by skb_put operations. Opting to revert this change since skb_copy_expand is more efficient than skb_ensure_writable_head_tail followed by a call to skb_unshare. Log: ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:2464! invalid opcode: 0000 [#1] SMP KASAN CPU: 21 PID: 61997 Comm: iperf3 Not tainted 6.7.0-rc8_for_upstream_debug_2024_01_07_17_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_put+0x113/0x190 Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 70 3b 9d bc 00 00 00 77 0e 48 83 c4 08 4c 89 e8 5b 5d 41 5d c3 <0f> 0b 4c 8b 6c 24 20 89 74 24 04 e8 6d b7 f0 fe 8b 74 24 04 48 c7 RSP: 0018:ffff8882694e7278 EFLAGS: 00010202 RAX: 0000000000000025 RBX: 0000000000000100 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88816ae0bad4 RBP: ffff88816ae0ba60 R08: 0000000000000004 R09: 0000000000000004 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88811ba5abfa R13: ffff8882bdecc100 R14: ffff88816ae0ba60 R15: ffff8882bdecc0ae FS: 00007fe54df02740(0000) GS:ffff88881f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe54d92e320 CR3: 000000010a345003 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? die+0x33/0x90 ? skb_put+0x113/0x190 ? do_trap+0x1b4/0x3b0 ? skb_put+0x113/0x190 ? do_error_trap+0xb6/0x180 ? skb_put+0x113/0x190 ? handle_invalid_op+0x2c/0x30 ? skb_put+0x113/0x190 ? exc_invalid_op+0x2b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? skb_put+0x113/0x190 ? macsec_start_xmit+0x4e9/0x21d0 macsec_start_xmit+0x830/0x21d0 ? get_txsa_from_nl+0x400/0x400 ? lock_downgrade+0x690/0x690 ? dev_queue_xmit_nit+0x78b/0xae0 dev_hard_start_xmit+0x151/0x560 __dev_queue_xmit+0x1580/0x28f0 ? check_chain_key+0x1c5/0x490 ? netdev_core_pick_tx+0x2d0/0x2d0 ? __ip_queue_xmit+0x798/0x1e00 ? lock_downgrade+0x690/0x690 ? mark_held_locks+0x9f/0xe0 ip_finish_output2+0x11e4/0x2050 ? ip_mc_finish_output+0x520/0x520 ? ip_fragment.constprop.0+0x230/0x230 ? __ip_queue_xmit+0x798/0x1e00 __ip_queue_xmit+0x798/0x1e00 ? __skb_clone+0x57a/0x760 __tcp_transmit_skb+0x169d/0x3490 ? lock_downgrade+0x690/0x690 ? __tcp_select_window+0x1320/0x1320 ? mark_held_locks+0x9f/0xe0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? tcp_small_queue_check.isra.0+0x120/0x3d0 tcp_write_xmit+0x12b6/0x7100 ? skb_page_frag_refill+0x1e8/0x460 __tcp_push_pending_frames+0x92/0x320 tcp_sendmsg_locked+0x1ed4/0x3190 ? tcp_sendmsg_fastopen+0x650/0x650 ? tcp_sendmsg+0x1a/0x40 ? mark_held_locks+0x9f/0xe0 ? lockdep_hardirqs_on_prepare+0x286/0x400 tcp_sendmsg+0x28/0x40 ? inet_send_prepare+0x1b0/0x1b0 __sock_sendmsg+0xc5/0x190 sock_write_iter+0x222/0x380 ? __sock_sendmsg+0x190/0x190 ? kfree+0x96/0x130 vfs_write+0x842/0xbd0 ? kernel_write+0x530/0x530 ? __fget_light+0x51/0x220 ? __fget_light+0x51/0x220 ksys_write+0x172/0x1d0 ? update_socket_protocol+0x10/0x10 ? __x64_sys_read+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x286/0x400 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7fe54d9018b7 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdbd4191d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fe54d9018b7 RDX: 0000000000000025 RSI: 0000000000d9859c RDI: 0000000000000004 RBP: 0000000000d9859c R08: 0000000000000004 R09: 0000000000000000 R10: 00007fe54d80afe0 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000025 R14: 00007fe54e00ec00 R15: 0000000000d982a0 </TASK> Modules linked in: 8021q garp mrp iptable_raw bonding vfio_pci rdma_ucm ib_umad mlx5_vfio_pci mlx5_ib vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ip_gre nf_tables ipip tunnel4 ib_ipoib ip6_gre gre ip6_tunnel tunnel6 geneve openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core zram zsmalloc fuse [last unloaded: ib_uverbs] ---[ end trace 0000000000000000 ]--- Cc: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Link: https://lore.kernel.org/r/20240118191811.50271-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:17:04 -08:00
Jakub Kicinski	1347775dea	wireless fixes for v6.8-rc2 The most visible fix here is the ath11k crash fix which was introduced in v6.7. We also have a fix for iwlwifi memory corruption and few smaller fixes in the stack. -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmWuipMRHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZt17wgAhrkxpwRpMuRrV6VxHl9m+NXk7is2vni2 JZbqlvMIw1Hm+40K9D0WgFdNZUeAtBcd567MAbiqdzqRNB9DtEvnsXIKlKINwxIA QFskkXR1f0sj79Hz3q7iWQq+jxDvAU5tge/WU65Na7+224sdyzBg7DZab8/buOsm 1xdx69MtGNU+dm4+V1Xp8h9jB7WAjq7N+ZhC6YfH6QSCL7JSL9Co/NC098gBnAEx cm59vPOxk8+QoHKDjjmClTIhxOEgR6pSM8T3Dne9OYO8ONhxqdVSgd0Br+mEZgQ4 r61i88zK6ZmVZYckk6fhuGCLiKC6CFwS0eCLDQnKK1ufyRxDi84Y/Q== =Cwmf -----END PGP SIGNATURE----- Merge tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.8-rc2 The most visible fix here is the ath11k crash fix which was introduced in v6.7. We also have a fix for iwlwifi memory corruption and few smaller fixes in the stack. * tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: fix race condition on enabling fast-xmit wifi: iwlwifi: fix a memory corruption wifi: mac80211: fix potential sta-link leak wifi: cfg80211/mac80211: remove dependency on non-existing option wifi: cfg80211: fix missing interfaces when dumping wifi: ath11k: rely on mac80211 debugfs handling for vif wifi: p54: fix GCC format truncation warning with wiphy->fw_version ==================== Link: https://lore.kernel.org/r/20240122153434.E0254C433C7@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 08:38:13 -08:00
Zhengchao Shao	435e202d64	ipv6: init the accept_queue's spinlocks in inet6_create In commit 198bc90e0e73("tcp: make sure init the accept_queue's spinlocks once"), the spinlocks of accept_queue are initialized only when socket is created in the inet4 scenario. The locks are not initialized when socket is created in the inet6 scenario. The kernel reports the following error: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107) register_lock_class (kernel/locking/lockdep.c:1289) __lock_acquire (kernel/locking/lockdep.c:5015) lock_acquire.part.0 (kernel/locking/lockdep.c:5756) _raw_spin_lock_bh (kernel/locking/spinlock.c:178) inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386) tcp_disconnect (net/ipv4/tcp.c:2981) inet_shutdown (net/ipv4/af_inet.c:935) __sys_shutdown (./include/linux/file.h:32 net/socket.c:2438) __x64_sys_shutdown (net/socket.c:2445) do_syscall_64 (arch/x86/entry/common.c:52) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f52ecd05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030 RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640 R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0 Fixes: `198bc90e0e` ("tcp: make sure init the accept_queue's spinlocks once") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-01-23 13:44:50 +01:00
Zhengchao Shao	234ec0b603	netlink: fix potential sleeping issue in mqueue_flush_file I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. Fixes: `c05cdb1b86` ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-01-23 11:21:18 +01:00
Kuniyuki Iwashima	97de5a15ed	selftest: Don't reuse port for SO_INCOMING_CPU test. Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to fire somewhat randomly. # # RUN so_incoming_cpu.before_reuseport.test3 ... # # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0) # # test3: Test terminated by assertion # # FAIL so_incoming_cpu.before_reuseport.test3 # not ok 3 so_incoming_cpu.before_reuseport.test3 When the test failed, not-yet-accepted CLOSE_WAIT sockets received SYN with a "challenging" SEQ number, which was sent from an unexpected CPU that did not create the receiver. The test basically does: 1. for each cpu: 1-1. create a server 1-2. set SO_INCOMING_CPU 2. for each cpu: 2-1. set cpu affinity 2-2. create some clients 2-3. let clients connect() to the server on the same cpu 2-4. close() clients 3. for each server: 3-1. accept() all child sockets 3-2. check if all children have the same SO_INCOMING_CPU with the server The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse. In a loop of 2., close() changed the client state to FIN_WAIT_2, and the peer transitioned to CLOSE_WAIT. In another loop of 2., connect() happened to select the same port of the FIN_WAIT_2 socket, and it was reused as the default value of net.ipv4.tcp_tw_reuse is 2. As a result, the new client sent SYN to the CLOSE_WAIT socket from a different CPU, and the receiver's sk_incoming_cpu was overwritten with unexpected CPU ID. Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket responded with Challenge ACK. The new client properly returned RST and effectively killed the CLOSE_WAIT socket. This way, all clients were created successfully, but the error was detected later by 3-2., ASSERT_EQ(cpu, i). To avoid the failure, let's make sure that (i) the number of clients is less than the number of available ports and (ii) such reuse never happens. Fixes: `6df96146b2` ("selftest: Add test for SO_INCOMING_CPU.") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-01-23 10:48:07 +01:00
Salvatore Dipietro	7267e8dcad	tcp: Add memory barrier to tcp_push() On CPUs with weak memory models, reads and updates performed by tcp_push to the sk variables can get reordered leaving the socket throttled when it should not. The tasklet running tcp_wfree() may also not observe the memory updates in time and will skip flushing any packets throttled by tcp_push(), delaying the sending. This can pathologically cause 40ms extra latency due to bad interactions with delayed acks. Adding a memory barrier in tcp_push removes the bug, similarly to the previous commit `bf06200e73` ("tcp: tsq: fix nonagle handling"). smp_mb__after_atomic() is used to not incur in unnecessary overhead on x86 since not affected. Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and Apache Tomcat 9.0.83 running the basic servlet below: import java.io.IOException; import java.io.OutputStreamWriter; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; public class HelloWorldServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=utf-8"); OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8"); String s = "a".repeat(3096); osw.write(s,0,s.length()); osw.flush(); } } Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+ values is observed while, with the patch, the extra latency disappears. No patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.91ms 75.000% 1.13ms 90.000% 1.46ms 99.000% 1.74ms 99.900% 1.89ms 99.990% 41.95ms <<< 40+ ms extra latency 99.999% 48.32ms 100.000% 48.96ms With patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.90ms 75.000% 1.13ms 90.000% 1.45ms 99.000% 1.72ms 99.900% 1.83ms 99.990% 2.11ms <<< no 40+ ms extra latency 99.999% 2.53ms 100.000% 2.62ms Patch has been also tested on x86 (m7i.2xlarge instance) which it is not affected by this issue and the patch doesn't introduce any additional delay. Fixes: `7aa5470c2c` ("tcp: tsq: move tsq_flags close to sk_wmem_alloc") Signed-off-by: Salvatore Dipietro <dipiets@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-01-23 10:22:31 +01:00
Sharath Srinivasan	13e788deb7	net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv Syzcaller UBSAN crash occurs in rds_cmsg_recv(), which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1), but with array size of 4 (RDS_RX_MAX_TRACES). Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from trace.rx_trace_pos[i] in rds_recv_track_latency(), with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the off-by-one bounds check in rds_recv_track_latency() to prevent a potential crash in rds_cmsg_recv(). Found by syzcaller: ================================================================= UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39 index 4 is out of range for type 'u64 [4]' CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348 rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585 rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1066 __sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246 __do_sys_recvfrom net/socket.c:2264 [inline] __se_sys_recvfrom net/socket.c:2260 [inline] __x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b ================================================================== Fixes: `3289025aed` ("RDS: add receive message trace used by application") Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7LfYFvkRsOVcw@mail.gmail.com/ Signed-off-by: Sharath Srinivasan <sharath.srinivasan@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:24:00 +00:00
Horatiu Vultur	aaf632f7ab	net: micrel: Fix PTP frame parsing for lan8814 The HW has the capability to check each frame if it is a PTP frame, which domain it is, which ptp frame type it is, different ip address in the frame. And if one of these checks fail then the frame is not timestamp. Most of these checks were disabled except checking the field minorVersionPTP inside the PTP header. Meaning that once a partner sends a frame compliant to 8021AS which has minorVersionPTP set to 1, then the frame was not timestamp because the HW expected by default a value of 0 in minorVersionPTP. This is exactly the same issue as on lan8841. Fix this issue by removing this check so the userspace can decide on this. Fixes: `ece1950283` ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:02:23 +00:00
David S. Miller	94fa82b095	Merge branch 'dpll-fixes' Arkadiusz Kubalewski says: ==================== dpll: fix unordered unbind/bind registerer issues Fix issues when performing unordered unbind/bind of a kernel modules which are using a dpll device with DPLL_PIN_TYPE_MUX pins. Currently only serialized bind/unbind of such use case works, fix the issues and allow for unserialized kernel module bind order. The issues are observed on the ice driver, i.e., $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind results in: ice 0000:af:00.0: Removed PTP clock BUG: kernel NULL pointer dereference, address: 0000000000000010 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5_next-queue_19th-Oct-2023-01625-g039e5d15e451 #1 Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice] Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b 66 08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b 10 41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90 RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000 R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0 R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000 FS: 00007fdc7dae0740(0000) GS:ffff888c105c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x76/0x170 ? exc_page_fault+0x65/0x150 ? asm_exc_page_fault+0x22/0x30 ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice] ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice] dpll_msg_add_pin_parents+0x142/0x1d0 dpll_pin_event_send+0x7d/0x150 dpll_pin_on_pin_unregister+0x3f/0x100 ice_dpll_deinit_pins+0xa1/0x230 [ice] ice_dpll_deinit+0x29/0xe0 [ice] ice_remove+0xcd/0x200 [ice] pci_device_remove+0x33/0xa0 device_release_driver_internal+0x193/0x200 unbind_store+0x9d/0xb0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2bb/0x3e0 ksys_write+0x5f/0xe0 do_syscall_64+0x59/0x90 ? filp_close+0x1b/0x30 ? do_dup2+0x7d/0xd0 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fdc7d93eb97 Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97 RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001 RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0 R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0 </TASK> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1 vfio irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs libcrc32c rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr i2c_i801 ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache jbd2 sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded: iavf] CR2: 0000000000000010 v6: - fix memory corruption on error path in patch [v5 2/4] ==================== Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:01:11 +00:00
Arkadiusz Kubalewski	7dc5b18ff7	dpll: fix register pin with unregistered parent pin In case of multiple kernel module instances using the same dpll device: if only one registers dpll device, then only that one can register directly connected pins with a dpll device. When unregistered parent is responsible for determining if the muxed pin can be registered with it or not, the drivers need to be loaded in serialized order to work correctly - first the driver instance which registers the direct pins needs to be loaded, then the other instances could register muxed type pins. Allow registration of a pin with a parent even if the parent was not yet registered, thus allow ability for unserialized driver instance load order. Do not WARN_ON notification for unregistered pin, which can be invoked for described case, instead just return error. Fixes: `9431063ad3` ("dpll: core: Add DPLL framework base functions") Fixes: `9d71b54b65` ("dpll: netlink: Add DPLL framework base functions") Reviewed-by: Jan Glaza <jan.glaza@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:01:11 +00:00
Arkadiusz Kubalewski	db2ec3c946	dpll: fix userspace availability of pins If parent pin was unregistered but child pin was not, the userspace would see the "zombie" pins - the ones that were registered with a parent pin (dpll_pin_on_pin_register(..)). Technically those are not available - as there is no dpll device in the system. Do not dump those pins and prevent userspace from any interaction with them. Provide a unified function to determine if the pin is available and use it before acting/responding for user requests. Fixes: `9d71b54b65` ("dpll: netlink: Add DPLL framework base functions") Reviewed-by: Jan Glaza <jan.glaza@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:01:11 +00:00
Arkadiusz Kubalewski	830ead5fb0	dpll: fix pin dump crash for rebound module When a kernel module is unbound but the pin resources were not entirely freed (other kernel module instance of the same PCI device have had kept the reference to that pin), and kernel module is again bound, the pin properties would not be updated (the properties are only assigned when memory for the pin is allocated), prop pointer still points to the kernel module memory of the kernel module which was deallocated on the unbind. If the pin dump is invoked in this state, the result is a kernel crash. Prevent the crash by storing persistent pin properties in dpll subsystem, copy the content from the kernel module when pin is allocated, instead of using memory of the kernel module. Fixes: `9431063ad3` ("dpll: core: Add DPLL framework base functions") Fixes: `9d71b54b65` ("dpll: netlink: Add DPLL framework base functions") Reviewed-by: Jan Glaza <jan.glaza@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:01:11 +00:00
Arkadiusz Kubalewski	b6a11a7fc4	dpll: fix broken error path in dpll_pin_alloc(..) If pin type is not expected, or pin properities failed to allocate memory, the unwind error path shall not destroy pin's xarrays, which were not yet initialized. Add new goto label and use it to fix broken error path. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 11:01:11 +00:00
David S. Miller	ef48521672	Merge branch 'tun-fixes' Yunjian Wang says: ==================== fixes for tun There are few places on the receive path where packet receives and packet drops were not accounted for. This patchset fixes that issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 10:58:05 +00:00
Yunjian Wang	f1084c427f	tun: add missing rx stats accounting in tun_xdp_act The TUN can be used as vhost-net backend, and it is necessary to count the packets transmitted from TUN to vhost-net/virtio-net. However, there are some places in the receive path that were not taken into account when using XDP. It would be beneficial to also include new accounting for successfully received bytes using dev_sw_netstats_rx_add. Fixes: `761876c857` ("tap: XDP support") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 10:58:04 +00:00
Yunjian Wang	5744ba05e7	tun: fix missing dropped counter in tun_xdp_act The commit `8ae1aff0b3` ("tuntap: split out XDP logic") includes dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions. Unfortunately, that commit missed the dropped counter when error occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes this issue. Fixes: `8ae1aff0b3` ("tuntap: split out XDP logic") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 10:58:04 +00:00
Jakub Kicinski	d09486a04f	net: fix removing a namespace with conflicting altnames Mark reports a BUG() when a net namespace is removed. kernel BUG at net/core/dev.c:11520! Physical interfaces moved outside of init_net get "refunded" to init_net when that namespace disappears. The main interface name may get overwritten in the process if it would have conflicted. We need to also discard all conflicting altnames. Recent fixes addressed ensuring that altnames get moved with the main interface, which surfaced this problem. Reported-by: Марк Коренберг <socketpair@gmail.com> Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEavtDg@mail.gmail.com/ Fixes: `7663d52209` ("net: check for altname conflicts when changing netdev's netns") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-22 01:09:30 +00:00
Michal Schmidt	359724fa3a	idpf: distinguish vports by the dev_port attribute idpf registers multiple netdevs (virtual ports) for one PCI function, but it does not provide a way for userspace to distinguish them with sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is a bug not to set dev_port for independent ports on the same PCI bus, device and function. Without dev_port set, systemd-udevd's default naming policy attempts to assign the same name ("ens2f0") to all four idpf netdevs on my test system and obviously fails, leaving three of them with the initial eth<N> name. With this patch, systemd-udevd is able to assign unique names to the netdevs (e.g. "ens2f0", "ens2f0d1", "ens2f0d2", "ens2f0d3"). The Intel-provided out-of-tree idpf driver already sets dev_port. In this patch I chose to do it in the same place in the idpf_cfg_netdev function. Fixes: `0fe45467a1` ("idpf: add create vport and netdev configuration") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-21 18:11:09 +00:00
Eric Dumazet	a54d51fb2d	udp: fix busy polling Generic sk_busy_loop_end() only looks at sk->sk_receive_queue for presence of packets. Problem is that for UDP sockets after blamed commit, some packets could be present in another queue: udp_sk(sk)->reader_queue In some cases, a busy poller could spin until timeout expiration, even if some packets are available in udp_sk(sk)->reader_queue. v3: - make sk_busy_loop_end() nicer (Willem) v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats. - add a sk_is_inet() check in sk_is_udp() (Willem feedback) - add a sk_is_inet() check in sk_is_tcp(). Fixes: `2276f58ac5` ("udp: use a separate rx queue for packet reception") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-01-21 18:09:30 +00:00
Kuniyuki Iwashima	e3f9bed9be	llc: Drop support for ETH_P_TR_802_2. syzbot reported an uninit-value bug below. [0] llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2 (0x0011), and syzbot abused the latter to trigger the bug. write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16) llc_conn_handler() initialises local variables {saddr,daddr}.mac based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes them to __llc_lookup(). However, the initialisation is done only when skb->protocol is htons(ETH_P_802_2), otherwise, __llc_lookup_established() and __llc_lookup_listener() will read garbage. The missing initialisation existed prior to commit `211ed86510` ("net: delete all instances of special processing for token ring"). It removed the part to kick out the token ring stuff but forgot to close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv(). Let's remove llc_tr_packet_type and complete the deprecation. [0]: BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90 __llc_lookup_established+0xe9d/0xf90 __llc_lookup net/llc/llc_conn.c:611 [inline] llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 __netif_receive_skb_one_core net/core/dev.c:5527 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641 netif_receive_skb_internal net/core/dev.c:5727 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5786 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x1490 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Local variable daddr created at: llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 Fixes: `211ed86510` ("net: delete all instances of special processing for token ring") Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-19 21:30:09 -08:00
Eric Dumazet	dad555c816	llc: make llc_ui_sendmsg() more robust against bonding changes syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no headroom, but subsequently trying to push 14 bytes of Ethernet header [1] Like some others, llc_ui_sendmsg() releases the socket lock before calling sock_alloc_send_skb(). Then it acquires it again, but does not redo all the sanity checks that were performed. This fix: - Uses LL_RESERVED_SPACE() to reserve space. - Check all conditions again after socket lock is held again. - Do not account Ethernet header for mtu limitation. [1] skbuff: skb_under_panic: text:ffff800088baa334 len:1514 put:14 head:ffff0000c9c37000 data:ffff0000c9c36ff2 tail:0x5dc end:0x6c0 dev:bond0 kernel BUG at net/core/skbuff.c:193 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6875 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00101-g0802e17d9aca-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic net/core/skbuff.c:189 [inline] pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 lr : skb_panic net/core/skbuff.c:189 [inline] lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 sp : ffff800096f97000 x29: ffff800096f97010 x28: ffff80008cc8d668 x27: dfff800000000000 x26: ffff0000cb970c90 x25: 00000000000005dc x24: ffff0000c9c36ff2 x23: ffff0000c9c37000 x22: 00000000000005ea x21: 00000000000006c0 x20: 000000000000000e x19: ffff800088baa334 x18: 1fffe000368261ce x17: ffff80008e4ed000 x16: ffff80008a8310f8 x15: 0000000000000001 x14: 1ffff00012df2d58 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000ff0100 x9 : e28a51f1087e8400 x8 : e28a51f1087e8400 x7 : ffff80008028f8d0 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800082b78714 x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000089 Call trace: skb_panic net/core/skbuff.c:189 [inline] skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 skb_push+0xf0/0x108 net/core/skbuff.c:2451 eth_header+0x44/0x1f8 net/ethernet/eth.c:83 dev_hard_header include/linux/netdevice.h:3188 [inline] llc_mac_hdr_init+0x110/0x17c net/llc/llc_output.c:33 llc_sap_action_send_xid_c+0x170/0x344 net/llc/llc_s_ac.c:85 llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline] llc_sap_next_state net/llc/llc_sap.c:182 [inline] llc_sap_state_process+0x1ec/0x774 net/llc/llc_sap.c:209 llc_build_and_send_xid_pkt+0x12c/0x1c0 net/llc/llc_sap.c:270 llc_ui_sendmsg+0x7bc/0xb1c net/llc/af_llc.c:997 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x194/0x274 net/socket.c:767 splice_to_socket+0x7cc/0xd58 fs/splice.c:881 do_splice_from fs/splice.c:933 [inline] direct_splice_actor+0xe4/0x1c0 fs/splice.c:1142 splice_direct_to_actor+0x2a0/0x7e4 fs/splice.c:1088 do_splice_direct+0x20c/0x348 fs/splice.c:1194 do_sendfile+0x4bc/0xc70 fs/read_write.c:1254 __do_sys_sendfile64 fs/read_write.c:1322 [inline] __se_sys_sendfile64 fs/read_write.c:1308 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1308 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595 Code: aa1803e6 aa1903e7 a90023f5 94792f6a (d4210000) Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+2a7024e9502df538e8ef@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240118183625.4007013-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-19 21:28:50 -08:00
Lin Ma	6c21660fe2	vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING In the vlan_changelink function, a loop is used to parse the nested attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to obtain the struct ifla_vlan_qos_mapping. These two nested attributes are checked in the vlan_validate_qos_map function, which calls nla_validate_nested_deprecated with the vlan_map_policy. However, this deprecated validator applies a LIBERAL strictness, allowing the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC. Consequently, the loop in vlan_changelink may parse an attribute of type IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of struct ifla_vlan_qos_mapping, which is not necessarily true. To address this issue and ensure compatibility, this patch introduces two type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING. Fixes: `07b5b17e15` ("[VLAN]: Use rtnl_link API") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-19 21:25:06 -08:00
Jakub Kicinski	9b6979563b	Merge branch 'bnxt_en-bug-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes This series contains 5 miscellaneous fixes. The fixes include adding delay for FLR, buffer memory leak, RSS table size calculation, ethtool self test kernel warning, and mqprio crash. ==================== Link: https://lore.kernel.org/r/20240117234515.226944-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-19 21:15:18 -08:00
Michael Chan	467739baf6	bnxt_en: Fix possible crash after creating sw mqprio TCs The driver relies on netdev_get_num_tc() to get the number of HW offloaded mqprio TCs to allocate and free TX rings. This won't work and can potentially crash the system if software mqprio or taprio TCs have been setup. netdev_get_num_tc() will return the number of software TCs and it may cause the driver to allocate or free more TX rings that it should. Fix it by adding a bp->num_tc field to store the number of HW offload mqprio TCs for the device. Use bp->num_tc instead of netdev_get_num_tc(). This fixes a crash like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 42b8404067 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 120 PID: 8661 Comm: ifconfig Kdump: loaded Tainted: G OE 5.18.16 #1 Hardware name: Lenovo ThinkSystem SR650 V3/SB27A92818, BIOS ESE114N-2.12 04/25/2023 RIP: 0010:bnxt_hwrm_cp_ring_alloc_p5+0x10/0x90 [bnxt_en] Code: 41 5c 41 5d 41 5e c3 cc cc cc cc 41 8b 44 24 08 66 89 03 eb c6 e8 b0 f1 7d db 0f 1f 44 00 00 41 56 41 55 41 54 55 48 89 fd 53 <48> 8b 06 48 89 f3 48 81 c6 28 01 00 00 0f b6 96 13 ff ff ff 44 8b RSP: 0018:ff65907660d1fa88 EFLAGS: 00010202 RAX: 0000000000000010 RBX: ff4dde1d907e4980 RCX: f400000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: ff4dde1d907e4980 RBP: ff4dde1d907e4980 R08: 000000000000000f R09: 0000000000000000 R10: ff4dde5f02671800 R11: 0000000000000008 R12: 0000000088888889 R13: 0500000000000000 R14: 00f0000000000000 R15: ff4dde5f02671800 FS: 00007f4b126b5740(0000) GS:ff4dde9bff600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000416f9c6002 CR4: 0000000000771ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> bnxt_hwrm_ring_alloc+0x204/0x770 [bnxt_en] bnxt_init_chip+0x4d/0x680 [bnxt_en] ? bnxt_poll+0x1a0/0x1a0 [bnxt_en] __bnxt_open_nic+0xd2/0x740 [bnxt_en] bnxt_open+0x10b/0x220 [bnxt_en] ? raw_notifier_call_chain+0x41/0x60 __dev_open+0xf3/0x1b0 __dev_change_flags+0x1db/0x250 dev_change_flags+0x21/0x60 devinet_ioctl+0x590/0x720 ? avc_has_extended_perms+0x1b7/0x420 ? _copy_from_user+0x3a/0x60 inet_ioctl+0x189/0x1c0 ? wp_page_copy+0x45a/0x6e0 sock_do_ioctl+0x42/0xf0 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120 sock_ioctl+0x1ce/0x2e0 __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x59/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x12/0x30 ? do_syscall_64+0x69/0x90 ? exc_page_fault+0x62/0x150 Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240117234515.226944-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-19 21:15:13 -08:00

1 2 3 4 5 ...

1247975 Commits