linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-04 18:13:04 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	d4031ec844	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/raw.c `3632679d9e` ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") `c85be08fc4` ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/freescale/fec_main.c `9025944fdd` ("net: fec: add dma_wmb to ensure correct descriptor values") `144470c88c` ("net: fec: using the standard return codes when xdp xmit errors") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-25 19:57:39 -07:00
Niklas Schnelle	657d42cf5d	s390/ism: Set DMA coherent mask A future change will convert the DMA API implementation from the architecture specific arch/s390/pci/pci_dma.c to using the common code drivers/iommu/dma-iommu.c which the utilizes the same IOMMU hardware through the s390-iommu driver. Unlike the s390 specific DMA API this requires devices to correctly set the coherent mask to be allowed to use IOVAs >2^32 in dma_alloc_coherent(). This was however not done for ISM devices. ISM requires such addresses since currently the DMA aperture for PCI devices starts at 2^32 and all calls to dma_alloc_coherent() would thus fail. Link: https://lore.kernel.org/all/20230310-dma_iommu-v9-1-65bb8edd2beb@linux.ibm.com/ Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230524075411.3734141-1-schnelle@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 15:39:58 +02:00
Christophe JAILLET	623a713853	net/mlx4: Use bitmap_weight_and() Use bitmap_weight_and() instead of hand writing it. This saves a few LoC and is slightly faster, should it mater. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/a29c2348a062408bec45cee2601b2417310e5ea7.1684865809.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 13:35:08 +02:00
Paolo Abeni	e8f8b33230	Merge branch 'net-tcp-make-txhash-use-consistent-for-ipv4' Antoine Tenart says: ==================== net: tcp: make txhash use consistent for IPv4 Series is divided in two parts. First two commits make the txhash (used for the skb hash in TCP) to be consistent for all IPv4/TCP packets (IPv6 doesn't have the same issue). Last commit improve a hash-related doc. One example is when using OvS with dp_hash, which uses skb->hash, to select a path. We'd like packets from the same flow to be consistent, as well as the hash being stable over time when using net.core.txrehash=0. Same applies for kernel ECMP which also can use skb->hash. ==================== Link: https://lore.kernel.org/r/20230523161453.196094-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 13:20:48 +02:00
Antoine Tenart	7016eb7386	Documentation: net: net.core.txrehash is not specific to listening sockets The net.core.txrehash documentation mentions this knob is for listening sockets only, while sk_rethink_txhash can be called on SYN and RTO retransmits on all TCP sockets. Remove the listening socket part. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 13:20:45 +02:00
Antoine Tenart	c0a8966e2b	net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECV When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in TIME_WAIT and SYN_RECV where it's not set in the reply skb from ip_send_unicast_reply. Those packets will have a mismatched hash with others from the same flow as their hashes will be 0. IPv6 does not have the same issue as the hash is set from the socket txhash in those cases. This commits sets the hash in the reply skb from ip_send_unicast_reply, which makes the IPv4 code behaving like IPv6. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 13:20:45 +02:00
Antoine Tenart	4fbfde4e27	net: tcp: make the txhash available in TIME_WAIT sockets for IPv4 too Commit `c67b85558f` ("ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state") made the socket txhash also available in TIME_WAIT sockets but for IPv6 only. Make it available for IPv4 too as we'll use it in later commits. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 13:20:45 +02:00
Kuniyuki Iwashima	ad42a35bdf	udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). syzbot reported [0] a null-ptr-deref in sk_get_rmem0() while using IPPROTO_UDPLITE (0x88): 14:25:52 executing program 1: r0 = socket$inet6(0xa, 0x80002, 0x88) We had a similar report [1] for probably sk_memory_allocated_add() in __sk_mem_raise_allocated(), and commit `c915fe13cb` ("udplite: fix NULL pointer dereference") fixed it by setting .memory_allocated for udplite_prot and udplitev6_prot. To fix the variant, we need to set either .sysctl_wmem_offset or .sysctl_rmem. Now UDP and UDPLITE share the same value for .memory_allocated, so we use the same .sysctl_wmem_offset for UDP and UDPLITE. [0]: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 6829 Comm: syz-executor.1 Not tainted 6.4.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/28/2023 RIP: 0010:sk_get_rmem0 include/net/sock.h:2907 [inline] RIP: 0010:__sk_mem_raise_allocated+0x806/0x17a0 net/core/sock.c:3006 Code: c1 ea 03 80 3c 02 00 0f 85 23 0f 00 00 48 8b 44 24 08 48 8b 98 38 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 0f 8d 6f 0a 00 00 8b RSP: 0018:ffffc90005d7f450 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004d92000 RDX: 0000000000000000 RSI: ffffffff88066482 RDI: ffffffff8e2ccbb8 RBP: ffff8880173f7000 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000030000 R13: 0000000000000001 R14: 0000000000000340 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9800000(0063) knlGS:00000000f7f1cb40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002e82f000 CR3: 0000000034ff0000 CR4: 00000000003506f0 Call Trace: <TASK> __sk_mem_schedule+0x6c/0xe0 net/core/sock.c:3077 udp_rmem_schedule net/ipv4/udp.c:1539 [inline] __udp_enqueue_schedule_skb+0x776/0xb30 net/ipv4/udp.c:1581 __udpv6_queue_rcv_skb net/ipv6/udp.c:666 [inline] udpv6_queue_rcv_one_skb+0xc39/0x16c0 net/ipv6/udp.c:775 udpv6_queue_rcv_skb+0x194/0xa10 net/ipv6/udp.c:793 __udp6_lib_mcast_deliver net/ipv6/udp.c:906 [inline] __udp6_lib_rcv+0x1bda/0x2bd0 net/ipv6/udp.c:1013 ip6_protocol_deliver_rcu+0x2e7/0x1250 net/ipv6/ip6_input.c:437 ip6_input_finish+0x150/0x2f0 net/ipv6/ip6_input.c:482 NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ip6_input+0xa0/0xd0 net/ipv6/ip6_input.c:491 ip6_mc_input+0x40b/0xf50 net/ipv6/ip6_input.c:585 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] NF_HOOK include/linux/netfilter.h:297 [inline] ipv6_rcv+0x250/0x380 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5491 __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5605 netif_receive_skb_internal net/core/dev.c:5691 [inline] netif_receive_skb+0x133/0x7a0 net/core/dev.c:5750 tun_rx_batched+0x4b3/0x7a0 drivers/net/tun.c:1553 tun_get_user+0x2452/0x39c0 drivers/net/tun.c:1989 tun_chr_write_iter+0xdf/0x200 drivers/net/tun.c:2035 call_write_iter include/linux/fs.h:1868 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x945/0xd50 fs/read_write.c:584 ksys_write+0x12b/0x250 fs/read_write.c:637 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x70/0x82 RIP: 0023:0xf7f21579 Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 RSP: 002b:00000000f7f1c590 EFLAGS: 00000282 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 00000000000000c8 RCX: 0000000020000040 RDX: 0000000000000083 RSI: 00000000f734e000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: Link: https://lore.kernel.org/netdev/CANaxB-yCk8hhP68L4Q2nFOJht8sqgXGGQO2AftpHs0u1xyGG5A@mail.gmail.com/ [1] Fixes: `850cbaddb5` ("udp: use it's own memory accounting schema") Reported-by: syzbot+444ca0907e96f7c5e48b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=444ca0907e96f7c5e48b Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230523163305.66466-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-05-25 10:51:58 +02:00
Russell King (Oracle)	ae4899bb48	net: phylink: provide phylink_pcs_config() and phylink_pcs_link_up() Add two helper functions for calling PCS methods. phylink_pcs_config() allows us to handle PCS configuration specifics in one location, rather than the two call sites. phylink_pcs_link_up() gives us consistency. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1q1TzK-007Exd-Rs@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:23:29 -07:00
Jakub Kicinski	aa015a204b	Merge branch 'net-phy-mscc-support-vsc8501' David Epping says: ==================== net: phy: mscc: support VSC8501 this updated series of patches adds support for the VSC8501 Ethernet PHY and fixes support for the VSC8502 PHY in cases where no other software (like U-Boot) has initialized the PHY after power up. The first patch simply adds the VSC8502 to the MODULE_DEVICE_TABLE, where I guess it was unintentionally missing. I have no hardware to test my change. The second patch adds the VSC8501 PHY with exactly the same driver implementation as the existing VSC8502. The (new) third patch removes phydev locking from vsc85xx_rgmii_set_skews(), as discussed for v2 of the patch set. The (now) fourth patch fixes the initialization for VSC8501 and VSC8502. I have tested this patch with VSC8501 on hardware in RGMII mode only. https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8501-03_Datasheet_60001741A.PDF https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8502-03_Datasheet_60001742B.pdf Table 4-42 "RGMII CONTROL, ADDRESS 20E2 (0X14)" Bit 11 for each of them. By default the RX_CLK is disabled for these PHYs. In cases where no other software, like U-Boot, enabled the clock, this results in no received packets being handed to the MAC. The patch enables this clock output. According to Microchip support (case number 01268776) this applies to all modes (RGMII, GMII, and MII). Other PHYs sharing the same register map and code, like VSC8530/31/40/41 have the clock enabled and the relevant bit 11 is reserved and read-only for them. As per previous discussion the patch still clears the bit on these PHYs, too, possibly more easily supporting other future PHYs implementing this functionality. For the VSC8572 family of PHYs, having a different register map, no such changes are applied. ==================== Link: https://lore.kernel.org/r/20230523153108.18548-1-david.epping@missinglinkelectronics.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:14:29 -07:00
David Epping	71460c9ec5	net: phy: mscc: enable VSC8501/2 RGMII RX clock By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is disabled. To allow packet forwarding towards the MAC it needs to be enabled. For other PHYs supported by this driver the clock output is enabled by default. Fixes: `d316986331` ("net: phy: mscc: add support for VSC8502") Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:14:23 -07:00
David Epping	7df0b33d79	net: phy: mscc: remove unnecessary phydev locking Holding the struct phy_device (phydev) lock is unnecessary when accessing phydev->interface in the PHY driver .config_init method, which is the only place that vsc85xx_rgmii_set_skews() is called from. The phy_modify_paged() function implements required MDIO bus level locking, which can not be achieved by a phydev lock. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:14:10 -07:00
David Epping	fb055ce4a9	net: phy: mscc: add support for VSC8501 The VSC8501 PHY can use the same driver implementation as the VSC8502. Adding the PHY ID and copying the handler functions of VSC8502 is sufficient to operate it. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:14:10 -07:00
David Epping	57fb54ab9f	net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE The mscc driver implements support for VSC8502, so its ID should be in the MODULE_DEVICE_TABLE for automatic loading. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Fixes: `d316986331` ("net: phy: mscc: add support for VSC8502") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:14:10 -07:00
Jakub Kicinski	1de5900c81	Merge branch 'bug-fixes-for-net-handshake' Chuck Lever says: ==================== Bug fixes for net/handshake Paolo observed that there is a possible leak of sock->file. I haven't looked into that yet, but it seems to be separate from the fixes in this series, so no need to hold these up. ==================== The submissions mentions net-next but it means netdev (perhaps merge window left over when trees are converged). In any case, it should have gone into net, but was instead applied to net-next as commit `deb2e484ba` ("Merge branch 'net-handshake-fixes'"). These are fixes tho, and Chuck needs them to make progress with the client so double-merging them into net... it is what it is :( Link: https://lore.kernel.org/r/168381978252.84244.1933636428135211300.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:26 -07:00
Chuck Lever	26fb5480a2	net/handshake: Enable the SNI extension to work properly Enable the upper layer protocol to specify the SNI peername. This avoids the need for tlshd to use a DNS lookup, which can return a hostname that doesn't match the incoming certificate's SubjectName. Fixes: `2fd5532044` ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:24 -07:00
Chuck Lever	1ce77c998f	net/handshake: Unpin sock->file if a handshake is cancelled If user space never calls DONE, sock->file's reference count remains elevated. Enable sock->file to be freed eventually in this case. Reported-by: Jakub Kacinski <kuba@kernel.org> Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:24 -07:00
Chuck Lever	fc490880e3	net/handshake: handshake_genl_notify() shouldn't ignore @flags Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:24 -07:00
Chuck Lever	7afc6d0a10	net/handshake: Fix uninitialized local variable trace_handshake_cmd_done_err() simply records the pointer in @req, so initializing it to NULL is sufficient and safe. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:24 -07:00
Chuck Lever	7ea9c1ec66	net/handshake: Fix handshake_dup() ref counting If get_unused_fd_flags() fails, we ended up calling fput(sock->file) twice. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Paolo Abeni <pabeni@redhat.com> Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:24 -07:00
Chuck Lever	a095326e2c	net/handshake: Remove unneeded check from handshake_dup() handshake_req_submit() now verifies that the socket has a file. Fixes: `3b3009ea8a` ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 22:05:23 -07:00
Jakub Kicinski	0c615f1cc3	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZG4AiAAKCRDbK58LschI g+xlAQCmefGbDuwPckZLnomvt6gl4bkIjs7kc1ySbG9QBnaInwD/WyrJaQIPijuD qziHPAyx+MEgPseFU1b7Le35SZ66IwM= =s4R1 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-05-24 We've added 19 non-merge commits during the last 10 day(s) which contain a total of 20 files changed, 738 insertions(+), 448 deletions(-). The main changes are: 1) Batch of BPF sockmap fixes found when running against NGINX TCP tests, from John Fastabend. 2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails, from Anton Protopopov. 3) Init the BPF offload table earlier than just late_initcall, from Jakub Kicinski. 4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields, from Will Deacon. 5) Remove a now unsupported __fallthrough in BPF samples, from Andrii Nakryiko. 6) Fix a typo in pkg-config call for building sign-file, from Jeremy Sowden. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0 bpf, sockmap: Build helper to create connected socket pair bpf, sockmap: Pull socket helpers out of listen test for general use bpf, sockmap: Incorrectly handling copied_seq bpf, sockmap: Wake up polling after data copy bpf, sockmap: TCP data stall on recv before accept bpf, sockmap: Handle fin correctly bpf, sockmap: Improved check for empty queue bpf, sockmap: Reschedule is now done through backlog bpf, sockmap: Convert schedule_work into delayed_work bpf, sockmap: Pass skb ownership through read_skb bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields samples/bpf: Drop unnecessary fallthrough bpf: netdev: init the offload table earlier selftests/bpf: Fix pkg-config call building sign-file ==================== Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 21:57:57 -07:00
Jakub Kicinski	8a5ad2ea6b	Merge branch 'net-pcs-xpcs-cleanups-for-clause-73-support' Russell King says: ==================== net: pcs: xpcs: cleanups for clause 73 support This series cleans up xpcs code, moving much of the clause 73 code out of the driver into places where others can make use of it. Specifically, we add a helper to convert a clause 73 advertisement to ethtool link modes to mdio.h, and a helper to resolve the clause 73 negotiation state to phylink, which includes the pause modes. In doing this cleanup, several issues were identified with the original xpcs implementation: 1) it masks the link partner advertisement with its own advertisement so userspace can't see what the full link partner advertisement was. 2) it was always setting pause modes irrespective of the advertisements on either end of the link. 3) it was reading the STAT1 registers multiple times. Reading STAT1 has the side effect of unlatching the link-down status, so multiple reads should be avoided. This patch series addresses the first two first by addressing the issues, and then by moving over to the new helpers. The third issue is solved by restructuring the xpcs code. ==================== Link: https://lore.kernel.org/r/ZGyR/jDyYTYzRklg@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:30 -07:00
Russell King (Oracle)	883a98ede4	net: pcs: xpcs: avoid reading STAT1 more than once Avoid reading the STAT1 registers more than once while getting the PCS state, as this register contains latching-low bits that are lost after the first read. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:23 -07:00
Russell King (Oracle)	21234ef166	net: pcs: xpcs: use phylink_resolve_c73() helper Use phylink_resolve_c73() to resolve the clause 73 autonegotiation result. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:23 -07:00
Russell King (Oracle)	428d603fca	net: pcs: xpcs: correct pause resolution xpcs was indicating symmetric pause should be enabled regardless of the advertisements by either party. Fix this to use linkmode_resolve_pause() now that we're no longer obliterating the link partner's advertisement by logically anding it with our own. This is transitional, the function will be entirely replaced with phylink_resolve_c73() in the following patch. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	1f94ba198b	net: pcs: xpcs: correct lp_advertising contents lp_advertising is supposed to reflect the link partner's advertisement unmodified by the local advertisement, but xpcs bitwise ands it with the local advertisement prior to calculating the resolution of the negotiation. Fix this by moving the bitwise and to xpcs_resolve_lpa_c73() so it can place the results in a temporary bitmap before passing that to ixpcs_get_max_usxgmii_speed(). Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	3f0360e09c	net: pcs: xpcs: use mii_c73_to_linkmode() helper Convert xpcs clause 73 reading to use the newly introduced mii_c73_to_linkmode() helper to translate the link partner advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	6f7b89b45f	net: pcs: xpcs: clean up reading clause 73 link partner advertisement Read the clause 73 link partner advertisement in a loop and then translate to the ethtool modes. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	dad987484e	net: phylink: add function to resolve clause 73 negotiation Add a function to resolve clause 73 negotiation according to the priority resolution function described in clause 73.3.6. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	dc7a51411e	net: phylink: remove duplicated linkmode pause resolution Phylink had two chunks of code virtually the same for resolving the negotiated pause modes. Factor this down to one function. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
Russell King (Oracle)	e9261467ae	net: mdio: add clause 73 to ethtool conversion helper Add a helper to convert a clause 73 advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-24 09:13:22 -07:00
David S. Miller	41a45ea49d	Merge branch 'devlink-port_del-new-cleanup' Jiri Pirko says: ==================== devlink: small port_new/del() cleanup This patchset cleans up couple of leftovers after recent devlink locking changes. Previously, both port_new/dev() commands were called without holding instance lock. Currently all devlink commands are called with instance lock held. The first patch just removes redundant port notification. The second one removes couple of outdated comments. The last patch changes port_dev() to have devlink_port pointer as an arg instead of port_index, which makes it similar to the rest of port related ops. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 10:34:26 +01:00
Jiri Pirko	9277649c66	devlink: pass devlink_port pointer to ops->port_del() instead of index Historically there was a reason why port_dev() along with for example port_split() did get port_index instead of the devlink_port pointer. With the locking changes that were done which ensured devlink instance mutex is hold for every command, the port ops could get devlink_port pointer directly. Change the forgotten port_dev() op to be as others and pass devlink_port pointer instead of port_index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 10:34:26 +01:00
Jiri Pirko	1bb1b57898	devlink: remove no longer true locking comment from port_new/del() All commands are called holding instance lock. Remove the outdated comment that says otherwise. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 10:34:26 +01:00
Jiri Pirko	c496daeb86	devlink: remove duplicate port notification The notification about created port is send from devl_port_register() function called from ops->port_new(). No need to send it again here, so remove the call and the helper function. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 10:34:26 +01:00
David S. Miller	47469d2d59	Merge branch 'tools-ynl-byteorder' Donald Hunter says: ==================== tools: ynl: Add byte-order support for struct members This patchset adds support to ynl for handling byte-order in struct members. The first patch is a refactor to use predefined Struct() objects instead of generating byte-order specific formats on the fly. The second patch adds byte-order handling for struct members. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:46:54 +01:00
Donald Hunter	bddd2e561b	tools: ynl: Handle byte-order in struct members Add support for byte-order in struct members in the genetlink-legacy spec. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:46:54 +01:00
Donald Hunter	7c2435ef76	tools: ynl: Use dict of predefined Structs to decode scalar types Use a dict of predefined Struct() objects to decode scalar types in native, big or little endian format. This removes the repetitive code for the scalar variants and ensures all the signed variants are supported. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:46:54 +01:00
Gavrilov Ilia	878ecb0897	ipv6: Fix out-of-bounds access in ipv6_find_tlv() optlen is fetched without checking whether there is more than one byte to parse. It can lead to out-of-bounds access. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: `c61a404325` ("[IPV6]: Find option offset by type.") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:43:39 +01:00
David S. Miller	ba46c96db9	mlx5-fixes-2023-05-22 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmRsUT8ACgkQSD+KveBX +j61Zwf9GyvzrD29Lmu0/BTsLAnf7GAyJi/SMzXJ09Tp1dAYSWmF2DE3fzKvNoQ/ VT2udSKbZ96b2N9SGF396KZaV8gHxg23IAzILia1JDPd4Pn7YaNymAIWGU7vn+Tq ErG7atPVnJV5R1H6SwO2KpOClG7jOjUPMF87uDCl2g+IpYNgjKa9hcnt5bguztC2 KBW/sV7BCYVWOUrmlSe1hH2Fn4djhga3i4JBIzjp55Dz1voIu5SHsT13Ou2/UuiC 1RDqBTJ9WvnviAxICbI96TLMJTFnDo9HFGHPIQRhZ6k25PIuWX6GLMKaceVlfCd+ BZvRG+PNOsDR9a9tFCjMBfx7KE0bMw== =dwjN -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2023-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2023-05-22 This series provides bug fixes for the mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:40:14 +01:00
Russell King (Oracle)	59088b5a94	net: phy: avoid kernel warning dump when stopping an errored PHY When taking a network interface down (or removing a SFP module) after the PHY has encountered an error, phy_stop() complains incorrectly that it was called from HALTED state. The reason this is incorrect is that the network driver will have called phy_start() when the interface was brought up, and the fact that the PHY has a problem bears no relationship to the administrative state of the interface. Taking the interface administratively down (which calls phy_stop()) is always the right thing to do after a successful phy_start() call, whether or not the PHY has encountered an error. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:28:27 +01:00
David S. Miller	18731fe01d	Merge branch 'RTO_ONLINK' Guillaume Nault says: ==================== ipv4: Remove RTO_ONLINK from udp, ping and raw sockets. udp_sendmsg(), ping_v4_sendmsg() and raw_sendmsg() use similar patterns for restricting their route lookup to on-link hosts. Although they use slightly different code, they all use RTO_ONLINK to override the least significant bit of their tos value. RTO_ONLINK is used to restrict the route scope even when the scope is set to RT_SCOPE_UNIVERSE. Therefore it isn't necessary: we can properly set the scope to RT_SCOPE_LINK instead. Removing RTO_ONLINK will allow to convert .flowi4_tos to dscp_t in the future, thus allowing to properly separate the DSCP from the ECN bits in the networking stack. This patch series defines a common helper to figure out what's the scope of the route lookup. This unifies the way udp, ping and raw sockets get their routing scope and removes their dependency on RTO_ONLINK. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:22:06 +01:00
Guillaume Nault	0e26371db5	udp: Stop using RTO_ONLINK. Use ip_sendmsg_scope() to properly initialise the scope in flowi4_init_output(), instead of overriding tos with the RTO_ONLINK flag. The objective is to eventually remove RTO_ONLINK, which will allow converting .flowi4_tos to dscp_t. Now that the scope is determined by ip_sendmsg_scope(), we need to check its result to set the 'connected' variable. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:22:06 +01:00
Guillaume Nault	c85be08fc4	raw: Stop using RTO_ONLINK. Use ip_sendmsg_scope() to properly initialise the scope in flowi4_init_output(), instead of overriding tos with the RTO_ONLINK flag. The objective is to eventually remove RTO_ONLINK, which will allow converting .flowi4_tos to dscp_t. The MSG_DONTROUTE and SOCK_LOCALROUTE cases were already handled by raw_sendmsg() (SOCK_LOCALROUTE was handled by the RT_CONN_FLAGS*() macros called by get_rtconn_flags()). However, opt.is_strictroute wasn't taken into account. Therefore, a side effect of this patch is to now honour opt.is_strictroute, and thus align raw_sendmsg() with ping_v4_sendmsg() and udp_sendmsg(). Since raw_sendmsg() was the only user of get_rtconn_flags(), we can now remove this function. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:22:06 +01:00
Guillaume Nault	726de790f6	ping: Stop using RTO_ONLINK. Define a new helper to figure out the correct route scope to use on TX, depending on socket configuration, ancillary data and send flags. Use this new helper to properly initialise the scope in flowi4_init_output(), instead of overriding tos with the RTO_ONLINK flag. The objective is to eventually remove RTO_ONLINK, which will allow converting .flowi4_tos to dscp_t. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:22:06 +01:00
Arınç ÜNAL	04910d8cbf	net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs The commit `c6d96df9fa` ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging") makes VLAN RX offloading to be only used on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621 and MT7622). The commit disables the proper handling of special tagged (DSA) frames, added with commit `87e3df4961` ("net-next: ethernet: mediatek: add CDM able to recognize the tag for DSA"), for non MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the CPU improperly. This issue can be observed on frames with TCP, for example, a TCP speed test using iperf3 won't work. The commit disables the proper handling of special tagged (DSA) frames because it assumes that these SoCs don't use more than one MAC, which is wrong. Although I made Frank address this false assumption on the patch log when they sent the patch on behalf of Felix, the code still made changes with this assumption. Therefore, the proper handling of special tagged (DSA) frames must be kept enabled in all circumstances as it doesn't affect non DSA tagged frames. Hardware DSA untagging, introduced with the commit `2d7605a729` ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX offloading are operations on the two CDM components of the frame engine, CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and are between the MACs and the CPU. These operations apply to all MACs of the SoC so if one MAC uses DSA and the other doesn't, the hardware DSA untagging operation will cause the CDMP component to transmit non DSA tagged frames to the CPU improperly. Since the VLAN RX offloading feature configuration was dropped, VLAN RX offloading can only be used along with hardware DSA untagging. So, for the case above, we need to disable both features and leave it to the CPU, therefore software, to untag the DSA and VLAN tags. So the correct way to handle this is: For all SoCs: Enable the proper handling of special tagged (DSA) frames (MTK_CDMQ_IG_CTRL). For non MTK_NETSYS_V2 SoCs: Enable hardware DSA untagging (MTK_CDMP_IG_CTRL). Enable VLAN RX offloading (MTK_CDMP_EG_CTRL). When a non MTK_NETSYS_V2 SoC MAC does not use DSA: Disable hardware DSA untagging (MTK_CDMP_IG_CTRL). Disable VLAN RX offloading (MTK_CDMP_EG_CTRL). Fixes: `c6d96df9fa` ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-24 08:04:41 +01:00
Jakub Kicinski	7e7b3b097a	docs: netdev: document the existence of the mail bot We had a good run, but after 4 weeks of use we heard someone asking about pw-bot commands. Let's explain its existence in the docs. It's not a complete documentation but hopefully it's enough for the casual contributor. The project and scope are in flux so the details would likely become out of date, if we were to document more in depth. Link: https://lore.kernel.org/all/20230522140057.GB18381@nucnuc.mle/ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522230903.1853151-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 21:11:54 -07:00
Coco Li	a695641c8e	gve: Support IPv6 Big TCP on DQ Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO packets. See https://lwn.net/Articles/895398/. This can improve the throughput and CPU usage. Perf test result: ip -d link show $DEV gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000 For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate. In single streams when line rate is not saturated, we expect throughput improvements. When the networking is performing at line rate, we expect cpu usage improvements. Tcp_stream (unidirectional stream test, T=thread, F=flow): skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68 skb=64kb, T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67 skb=180kb, T=1, F=1, yes zerocopy: throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52 skb=64kb, T=1, F=1, yes zerocopy: throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25 skb=180kb, T=20, F=100, no zerocopy: throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4 skb=64kb, T=20, F=100, no zerocopy: throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69 skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7 skb=64kb, T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01 Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 21:11:35 -07:00
Pratyush Yadav	8a02fb71d7	net: fix skb leak in __skb_tstamp_tx() Commit `50749f2dd6` ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.") added a call to skb_orphan_frags_rx() to fix leaks with zerocopy skbs. But it ended up adding a leak of its own. When skb_orphan_frags_rx() fails, the function just returns, leaking the skb it just cloned. Free it before returning. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Fixes: `50749f2dd6` ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.") Signed-off-by: Pratyush Yadav <ptyadav@amazon.de> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230522153020.32422-1-ptyadav@amazon.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 20:51:43 -07:00

1 2 3 4 5 ...

1185991 Commits