linux

Author	SHA1	Message	Date
Tobias Waldekranz	87c167bb94	net: bridge: mst: Notify switchdev drivers of MST mode changes Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	122c29486e	net: bridge: mst: Support setting and reporting MST port states Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	8c678d6056	net: bridge: mst: Allow changing a VLAN's MSTI Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Tobias Waldekranz	ec7328b591	net: bridge: mst: Multiple Spanning Tree (MST) mode Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:57 -07:00
Bill Wendling	8624a95ecd	vlan: use correct format characters When compiling with -Wformat, clang emits the following warning: net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] mp->priority, ((mp->vlan_qos >> 13) & 0x7)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:34:49 -07:00
Jakub Kicinski	e243f39685	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 13:56:58 -07:00
Miaoqian Lin	cb0b430b4e	net: dsa: Add missing of_node_put() in dsa_port_parse_of The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done. Fixes: `6d4e5c570c` ("net: dsa: get port type at parse time") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-03-17 13:13:27 +01:00
Maor Dickman	ab95465cde	net/sched: add vlan push_eth and pop_eth action to the hardware IR Add vlan push_eth and pop_eth action to the hardware intermediate representation model which would subsequently allow it to be used by drivers for offload. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 19:59:36 -07:00
Tobias Waldekranz	a860352e9d	net: dsa: Never offload FDB entries on standalone ports If a port joins a bridge that it can't offload, it will fallback to standalone mode and software bridging. In this case, we never want to offload any FDB entries to hardware either. Previously, for host addresses, we would eventually end up in dsa_port_bridge_host_fdb_add, which would unconditionally dereference dp->bridge and cause a segfault. Fixes: `c26933639b` ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220315233033.1468071-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 19:36:56 -07:00
Jakub Kicinski	a0bfd73deb	linux-can-next-for-5.18-20220316 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmIyS4wTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCtfkuQ2KDTXXh+B/9f2BSFmv8Um1ajAl+On7JTGi9DhPj7 WOlFCX3NbcyQIrZ/0F4exi7WAani2NyCLke+UQAbIBiNPyUpkTv9gYCcPj8Tq5S6 oJyWf8qRt2T8Gth/MJVm60N8TDUjjNfdSrYHLL4HSUF6zd9hpAQXv/4AXTXrZC6K 4bPGTeNOEvsWwKg+wmdtIO5YnZwilc1sA85TLNq6En0asBUAT3h08P56rex2Stg0 qu9HjAvQ+szpGrgwB2Uk6x8h2BKF+BXhz9/Cn7eCFjOP+YTDjVbqbap0wDXZy4Mt Eg2g9gRmfvB0MjseWf+PoxoAsO7X9T0JNBqbULBipR+PvWOne6oD53yK =KN3V -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.18-20220316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-03-16 the first 3 patches are by Oliver Hartkopp target the CAN ISOTP protocol and fix a problem found by syzbot in isotp_bind(), return -EADDRNOTAVAIL in unbound sockets in isotp_recvmsg() and add support for MSG_TRUNC to isotp_recvmsg(). Amit Kumar Mahapatra converts the xilinx,can device tree bindings to yaml. The last patch is by Julia Lawall and fixes typos in the ucan driver. * tag 'linux-can-next-for-5.18-20220316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: ucan: fix typos in comments dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML can: isotp: support MSG_TRUNC flag when reading from socket can: isotp: return -EADDRNOTAVAIL when reading from unbound socket can: isotp: sanitize CAN ID checks in isotp_bind() ==================== Link: https://lore.kernel.org/r/20220316204710.716341-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 18:56:27 -07:00
Oliver Hartkopp	42bf50a179	can: isotp: support MSG_TRUNC flag when reading from socket When providing the MSG_TRUNC flag via recvmsg() syscall the return value provides the real length of the packet or datagram, even when it was longer than the passed buffer. Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1065932671 Link: https://lore.kernel.org/all/20220316164258.54155-3-socketcan@hartkopp.net Suggested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-16 21:41:40 +01:00
Oliver Hartkopp	30ffd5332e	can: isotp: return -EADDRNOTAVAIL when reading from unbound socket When reading from an unbound can-isotp socket the syscall blocked indefinitely. As unbound sockets (without given CAN address information) do not make sense anyway we directly return -EADDRNOTAVAIL on read() analogue to the known behavior from sendmsg(). Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/linux-can/can-utils/issues/349 Link: https://lore.kernel.org/all/20220316164258.54155-2-socketcan@hartkopp.net Suggested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-16 21:41:40 +01:00
Oliver Hartkopp	3ea566422c	can: isotp: sanitize CAN ID checks in isotp_bind() Syzbot created an environment that lead to a state machine status that can not be reached with a compliant CAN ID address configuration. The provided address information consisted of CAN ID 0x6000001 and 0xC28001 which both boil down to 11 bit CAN IDs 0x001 in sending and receiving. Sanitize the SFF/EFF CAN ID values before performing the address checks. Fixes: `e057dd3fc2` ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20220316164258.54155-1-socketcan@hartkopp.net Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-16 21:41:39 +01:00
Jakub Kicinski	706217c1ce	devlink: pass devlink_port to port_split / port_unsplit callbacks Now that devlink ports are protected by the instance lock it seems natural to pass devlink_port as an argument to the port_split / port_unsplit callbacks. This should save the drivers from doing a lookup. In theory drivers may have supported unsplitting ports which were not registered prior to this change. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 12:56:45 -07:00
Jakub Kicinski	49e83bbe8c	devlink: hold the instance lock in port_split / port_unsplit callbacks Let the core take the devlink instance lock around port splitting and remove the now redundant locking in the drivers. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 12:56:42 -07:00
Jakub Kicinski	2cb7b4890d	devlink: expose instance locking and add locked port registering It should be familiar and beneficial to expose devlink instance lock to the drivers. This way drivers can block devlink from calling them during critical sections without breakneck locking. Add port helpers, port splitting callbacks will be the first target. Use 'devl_' prefix for "explicitly locked" API. Initial RFC used '__devlink' but that's too much typing. devl_lock_is_held() is not defined without lockdep, which is the same behavior as lockdep_is_held() itself. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 12:56:31 -07:00
Jakub Kicinski	186abea8a8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-03-16 1) Fix a kernel-info-leak in pfkey. From Haimin Zhang. 2) Fix an incorrect check of the return value of ipv6_skip_exthdr. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: esp6: fix check on ipv6_skip_exthdr's return value af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register ==================== Link: https://lore.kernel.org/r/20220316121142.3142336-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 11:39:37 -07:00
David Ahern	40867d74c3	net: Add l3mdev index to flow struct and avoid oif reset for port devices The fundamental premise of VRF and l3mdev core code is binding a socket to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope. Legacy code resets flowi_oif to the l3mdev losing any original port device binding. Ben (among others) has demonstrated use cases where the original port device binding is important and needs to be retained. This patch handles that by adding a new entry to the common flow struct that can indicate the l3mdev index for later rule and table matching avoiding the need to reset flowi_oif. In addition to allowing more use cases that require port device binds, this patch brings a few datapath simplications: 1. l3mdev_fib_rule_match is only called when walking fib rules and always after l3mdev_update_flow. That allows an optimization to bail early for non-VRF type uses cases when flowi_l3mdev is not set. Also, only that index needs to be checked for the FIB table id. 2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev (e.g., VRF) device. By resetting flowi_oif only for this case the FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed, removing several checks in the datapath. The flowi_iif path can be simplified to only be called if the it is not loopback (loopback can not be assigned to an L3 domain) and the l3mdev index is not already set. 3. Avoid another device lookup in the output path when the fib lookup returns a reject failure. Note: 2 functional tests for local traffic with reject fib rules are updated to reflect the new direct failure at FIB lookup time for ping rather than the failure on packet path. The current code fails like this: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: Warning: source address might be selected on device other than: eth1 PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data. --- 172.16.3.1 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms where the test now directly fails: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: connect: No route to host Signed-off-by: David Ahern <dsahern@kernel.org> Tested-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-15 20:20:02 -07:00
Jakub Kicinski	abe2fec8ee	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack. 2) Reject unsupported families when creating tables, from Phil Sutter. 3) GRE support for the flowtable, from Toshiaki Makita. 4) Add GRE offload support for act_ct, also from Toshiaki. 5) Update mlx5 driver to support for GRE flowtable offload, from Toshiaki Makita. 6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge, from Jiapeng Chong. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bridge: clean up some inconsistent indenting net/mlx5: Support GRE conntrack offload act_ct: Support GRE offload netfilter: flowtable: Support GRE netfilter: nf_tables: Reject tables of unsupported family Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" ==================== Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-15 11:52:25 -07:00
Eric Dumazet	c700525fcc	net/packet: fix slab-out-of-bounds access in packet_recvmsg() syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH and mmap operations, tpacket_rcv() is queueing skbs with garbage in skb->cb[], triggering a too big copy [1] Presumably, users of af_packet using mmap() already gets correct metadata from the mapped buffer, we can simply make sure to clear 12 bytes that might be copied to user space later. BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline] BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631 CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b3660695e80 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 memcpy+0x39/0x60 mm/kasan/shadow.c:66 memcpy include/linux/fortify-string.h:225 [inline] packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fdfd5954c29 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60 R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54 </TASK> addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame: ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246 this frame has 1 object: [32, 160) 'addr' Memory state around the buggy address: ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 >ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 ^ ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 ================================================================== Fixes: `0fb375fb9b` ("[AF_PACKET]: Allow for > 8 byte hardware addresses.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-14 22:08:34 -07:00
Jakub Kicinski	15d703921f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net coming late in the 5.17-rc process: 1) Revert port remap to mitigate shadowing service ports, this is causing problems in existing setups and this mitigation can be achieved with explicit ruleset, eg. ... tcp sport < 16386 tcp dport >= 32768 masquerade random This patches provided a built-in policy similar to the one described above. 2) Disable register tracking infrastructure in nf_tables. Florian reported two issues: - Existing expressions with no implemented .reduce interface that causes data-store on register should cancel the tracking. - Register clobbering might be possible storing data on registers that are larger than 32-bits. This might lead to generating incorrect ruleset bytecode. These two issues are scheduled to be addressed in the next release cycle. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: disable register tracking Revert "netfilter: conntrack: tag conntracks picked up in local out hook" Revert "netfilter: nat: force port remap to prevent shadowing well-known ports" ==================== Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-14 15:51:10 -07:00
Sabrina Dubroca	4db4075f92	esp6: fix check on ipv6_skip_exthdr's return value Commit `5f9c55c806` ("ipv6: check return value of ipv6_skip_exthdr") introduced an incorrect check, which leads to all ESP packets over either TCPv6 or UDPv6 encapsulation being dropped. In this particular case, offset is negative, since skb->data points to the ESP header in the following chain of headers, while skb->network_header points to the IPv6 header: IPv6 \| ext \| ... \| ext \| UDP \| ESP \| ... That doesn't seem to be a problem, especially considering that if we reach esp6_input_done2, we're guaranteed to have a full set of headers available (otherwise the packet would have been dropped earlier in the stack). However, it means that the return value will (intentionally) be negative. We can make the test more specific, as the expected return value of ipv6_skip_exthdr will be the (negated) size of either a UDP header, or a TCP header with possible options. In the future, we should probably either make ipv6_skip_exthdr explicitly accept negative offsets (and adjust its return value for error cases), or make ipv6_skip_exthdr only take non-negative offsets (and audit all callers). Fixes: `5f9c55c806` ("ipv6: check return value of ipv6_skip_exthdr") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2022-03-14 11:42:27 +01:00
Vladimir Oltean	47d75f7822	net: dsa: report and change port dscp priority using dcbnl Similar to the port-based default priority, IEEE 802.1Q-2018 allows the Application Priority Table to define QoS classes (0 to 7) per IP DSCP value (0 to 63). In the absence of an app table entry for a packet with DSCP value X, QoS classification for that packet falls back to other methods (VLAN PCP or port-based default). The presence of an app table for DSCP value X with priority Y makes the hardware classify the packet to QoS class Y. As opposed to the default-prio where DSA exposes only a "set" in dsa_switch_ops (because the port-based default is the fallback, it always exists, either implicitly or explicitly), for DSCP priorities we expose an "add" and a "del". The addition of a DSCP entry means trusting that DSCP priority, the deletion means ignoring it. Drivers that already trust (at least some) DSCP values can describe their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is called for each DSCP value from 0 to 63. Again, there can be more than one dcbnl app table entry for the same DSCP value, DSA chooses the one with the largest configured priority. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-14 10:36:15 +00:00
Vladimir Oltean	d538eca85c	net: dsa: report and change port default priority using dcbnl The port-based default QoS class is assigned to packets that lack a VLAN PCP (or the port is configured to not trust the VLAN PCP), an IP DSCP (or the port is configured to not trust IP DSCP), and packets on which no tc-skbedit action has matched. Similar to other drivers, this can be exposed to user space using the DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table D-8 - Sel field values that when the Selector is 1, the Protocol ID value of 0 denotes the "Default application priority. For use when application priority is not otherwise specified." The way in which the dcbnl integration in DSA has been designed has to do with its requirements. Andrew Lunn explains that SOHO switches are expected to come with some sort of pre-configured QoS profile, and that it is desirable for this to come pre-loaded into the DSA slave interfaces' DCB application priority table. In the dcbnl design, this is possible because calls to dcb_ieee_setapp() can be initiated by anyone including being self-initiated by this device driver. However, what makes this challenging to implement in DSA is that the DSA core manages the net_devices (effectively hiding them from drivers), while drivers manage the hardware. The DSA core has no knowledge of what individual drivers' QoS policies are. DSA could export to drivers a wrapper over dcb_ieee_setapp() and these could call that function to pre-populate the app priority table, however drivers don't have a good moment in time to do this. The dsa_switch_ops :: setup() method gets called before the net_devices are created (dsa_slave_create), and so is dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops :: port_enable(), but this gets called upon each ndo_open. If we add app table entries on every open, we'd need to remove them on close, to avoid duplicate entry errors. But if we delete app priority entries on close, what we delete may not be the initial, driver pre-populated entries, but rather user-added entries. So it is clear that letting drivers choose the timing of the dcb_ieee_setapp() call is inappropriate. The alternative which was chosen is to introduce hardware-specific ops in dsa_switch_ops, and effectively hide dcbnl details from drivers as well. For pre-populating the application table, dsa_slave_dcbnl_init() will call ds->ops->port_get_default_prio() which is supposed to read from hardware. If the operation succeeds, DSA creates a default-prio app table entry. The method is called as soon as the slave_dev is registered, but before we release the rtnl_mutex. This is done such that user space sees the app table entries as soon as it sees the interface being registered. The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is NULL. Because there are still dcbnl-unaware DSA drivers even if they have dcbnl_ops populated, the way to restore the behavior is to make all dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific dsa_switch_ops method. The dcbnl framework absurdly allows there to be more than one app table entry for the same selector and protocol (in other words, more than one port-based default priority). In the iproute2 dcb program, there is a "replace" syntactical sugar command which performs an "add" and a "del" to hide this away. But we choose the largest configured priority when we call ds->ops->port_set_default_prio(), using __fls(). When there is no default-prio app table entry left, the port-default priority is restored to 0. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-14 10:36:15 +00:00
Sebastian Andrzej Siewior	fbd9a2ceba	net: Add lockdep asserts to ____napi_schedule(). ____napi_schedule() needs to be invoked with disabled interrupts due to __raise_softirq_irqoff (in order not to corrupt the per-CPU list). ____napi_schedule() needs also to be invoked from an interrupt context so that the raised-softirq is processed while the interrupt context is left. Add lockdep asserts for both conditions. While this is the second time the irq/softirq check is needed, provide a generic lockdep_assert_softirq_will_run() which is used by both caller. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-14 10:09:28 +00:00
Pablo Neira Ayuso	ed5f85d422	netfilter: nf_tables: disable register tracking The register tracking infrastructure is incomplete, it might lead to generating incorrect ruleset bytecode, disable it by now given we are late in the release process. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-03-12 16:07:38 +01:00
David S. Miller	97aeb877de	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: GTP support in switchdev Marcin Szycik says: Add support for adding GTP-C and GTP-U filters in switchdev mode. To create a filter for GTP, create a GTP-type netdev with ip tool, enable hardware offload, add qdisc and add a filter in tc: ip link add $GTP0 type gtp role <sgsn/ggsn> hsize <hsize> ethtool -K $PF0 hw-tc-offload on tc qdisc add dev $GTP0 ingress tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ action mirred egress redirect dev $VF1_PR By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. ICE COMMS package is required to create a filter as it contains GTP profiles. Changes in iproute2 [1] are required to be able to add GTP netdev and use GTP-specific options (QFI and PDU type). [1] https://lore.kernel.org/netdev/20220211182902.11542-1-wojciech.drewek@intel.com/T --- v2: Add more CC v3: Fix mail thread, sorry for spam v4: Add GTP echo response in gtp module v5: Change patch order v6: Add GTP echo request in gtp module v7: Fix kernel-docs in ice v8: Remove handling of GTP Echo Response v9: Add sending of multicast message on GTP Echo Response, fix GTP-C dummy packet selection v10: Rebase, fixed most 80 char line limits v11: Rebase, collect Harald's Reviewed-by on patch 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-12 11:54:29 +00:00
Eric Dumazet	625788b584	net: add per-cpu storage and net->core_stats Before adding yet another possibly contended atomic_long_t, it is time to add per-cpu storage for existing ones: dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler Because many devices do not have to increment such counters, allocate the per-cpu storage on demand, so that dev_get_stats() does not have to spend considerable time folding zero counters. Note that some drivers have abused these counters which were supposed to be only used by core networking stack. v4: should use per_cpu_ptr() in dev_get_stats() (Jakub) v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo) v2: add a missing include (reported by kernel test robot <lkp@intel.com>) Change in netdev_core_stats_alloc() (Jakub) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: jeffreyji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 23:17:24 -08:00
Jiyong Park	8e6ed96376	vsock: each transport cycles only on its own sockets When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: `c0cfa2d8a7` ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 23:14:19 -08:00
Jakub Kicinski	2387834dd2	net: remove exports for netdev_name_node_alt_create() and destroy netdev_name_node_alt_create() and netdev_name_node_alt_destroy() are only called by rtnetlink, so no need for exports. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220310223952.558779-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 22:55:09 -08:00
Christoph Hellwig	515bb3071e	tcp: unexport tcp_ca_get_key_by_name and tcp_ca_get_name_by_key Both functions are only used by core networking code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220310143229.895319-1-hch@lst.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 22:51:40 -08:00
Tadeusz Struk	5e34af4142	net: ipv6: fix skb_over_panic in __ip6_append_data Syzbot found a kernel bug in the ipv6 stack: LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580 The reproducer triggers it by sending a crafted message via sendmmsg() call, which triggers skb_over_panic, and crashes the kernel: skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575 head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0 dev:<NULL> Update the check that prevents an invalid packet with MTU equal to the fregment header size to eat up all the space for payload. The reproducer can be found here: LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000 Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 17:11:38 -08:00
Jakub Kicinski	0b3660695e	brcmfmac * add BCM43454/6 support rtw89 * add support for 160 MHz channels and 6 GHz band * hardware scan support iwlwifi * support UHB TAS enablement via BIOS * remove a bunch of W=1 warnings * add support for channel switch offload * support 32 Rx AMPDU sessions in newer devices * add support for a couple of new devices * add support for band disablement via BIOS mt76 * mt7915 thermal management improvements * SAR support for more mt76 drivers * mt7986 wmac support on mt7915 ath11k * debugfs interface to configure firmware debug log level * debugfs interface to test Target Wake Time (TWT) * provide 802.11ax High Efficiency (HE) data via radiotap ath9k * use hw_random API instead of directly dumping into random.c wcn36xx * fix wcn3660 to work on 5 GHz band ath6kl * add device ID for WLU5150-D81 cfg80211/mac80211 * initial EHT (from 802.11be) support (EHT rates, 320 MHz, larger block-ack) * support disconnect on HW restart -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmIrQoUACgkQB8qZga/f l8SV9RAAhZwiX4tkcjOYh3vDCOlmZRZV7dy0CYtcRlyHvO/4xH0DJUCbItW3hkeY HwLeaTE9J6INCui/iWbWVWsBKoiYQHEWxbfLYg6xDeQR4ijYQaz1c9inevu6qdOn 3STKzBjsJ8uQF81ANjTFsL33B9olceIrHttqVI0Ezv6YlAQ1JYRNBBikh8NM+XPN /AUdsG9KyWRuraPbPf1sZapMJBGpvDMhKlo8LW08Xv9sC8to57Tw5AHVwMY71Ipu ClE0EyDGYRm8W+cbJvZ1bp7D/TGcIspAdpPR9JAznXWeFhyl6bswGtUsf3FGxXNk 1i+1tonRlL3Xi9CvXDmGk2fstYe4MSmWXVFehoulMY9F2C1ibp6PrLa8SLjC+wzu 1QDfM65ggc90uu0AJLTOp9qnkapvz3/FGL5z9sx2OEM1Iks2RwOpbB6gKo+C0A9W 3wMxgPPt4mMV2WIgYv1okfcghUoH2l3b1n+Iq+osOa9pbdLrMhvzsrhIQZBaFnBa 3S5yhGh8djEla2+FmmMs0RKvRX+m+FeVjkJ8ozPLZl880A0OLmZZ+6Wnoa3ZQHmi AkuOLhCGm3PVXCN8Mb0nwHmc+LJS/V/U5VBDzieOXMKM4OjMlbGQNt4+2bEJ+Qd3 jlTkt1cLI/gFvdoFmsJUEOrpT49qZ94obmX8u07pEO/fI+bXHF4= =ccps -----END PGP SIGNATURE----- Merge tag 'wireless-next-2022-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== brcmfmac * add BCM43454/6 support rtw89 * add support for 160 MHz channels and 6 GHz band * hardware scan support iwlwifi * support UHB TAS enablement via BIOS * remove a bunch of W=1 warnings * add support for channel switch offload * support 32 Rx AMPDU sessions in newer devices * add support for a couple of new devices * add support for band disablement via BIOS mt76 * mt7915 thermal management improvements * SAR support for more mt76 drivers * mt7986 wmac support on mt7915 ath11k * debugfs interface to configure firmware debug log level * debugfs interface to test Target Wake Time (TWT) * provide 802.11ax High Efficiency (HE) data via radiotap ath9k * use hw_random API instead of directly dumping into random.c wcn36xx * fix wcn3660 to work on 5 GHz band ath6kl * add device ID for WLU5150-D81 cfg80211/mac80211 * initial EHT (from 802.11be) support (EHT rates, 320 MHz, larger block-ack) * support disconnect on HW restart * tag 'wireless-next-2022-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (247 commits) mac80211: Add support to trigger sta disconnect on hardware restart mac80211: fix potential double free on mesh join mac80211: correct legacy rates check in ieee80211_calc_rx_airtime nl80211: fix typo of NL80211_IF_TYPE_OCB in documentation mac80211: Use GFP_KERNEL instead of GFP_ATOMIC when possible mac80211: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE rtw89: 8852c: process logic efuse map rtw89: 8852c: process efuse of phycap rtw89: support DAV efuse reading operation rtw89: 8852c: add chip::dle_mem rtw89: add page_regs to handle v1 chips rtw89: add chip_info::{h2c,c2h}_reg to support more chips rtw89: add hci_func_en_addr to support variant generation rtw89: add power_{on/off}_func rtw89: read chip version depends on chip ID rtw89: pci: use a struct to describe all registers address related to DMA channel rtw89: pci: add V1 of PCI channel address rtw89: pci: add struct rtw89_pci_info rtw89: 8852c: add 8852c empty files MAINTAINERS: add devicetree bindings entry for mt76 ... ==================== Link: https://lore.kernel.org/r/20220311124029.213470-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-11 13:00:17 -08:00
Wojciech Drewek	e3acda7ade	net/sched: Allow flower to match on GTP options Options are as follows: PDU_TYPE:QFI and they refernce to the fields from the PDU Session Protocol. PDU Session data is conveyed in GTP-U Extension Header. GTP-U Extension Header is described in 3GPP TS 29.281. PDU Session Protocol is described in 3GPP TS 38.415. PDU_TYPE - indicates the type of the PDU Session Information (4 bits) QFI - QoS Flow Identifier (6 bits) # ip link add gtp_dev type gtp role sgsn # tc qdisc add dev gtp_dev ingress # tc filter add dev gtp_dev protocol ip parent ffff: \ flower \ enc_key_id 11 \ gtp_opts 1:8/ff:ff \ action mirred egress redirect dev eth0 Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-11 08:28:27 -08:00
Kurt Kanzenbach	f65e58440d	flow_dissector: Add support for HSRv0 Commit `bf08824a0f` ("flow_dissector: Add support for HSR") added support for HSR within the flow dissector. However, it only works for HSR in version 1. Version 0 uses a different Ether Type. Add support for it. Reported-by: Anthony Harivel <anthony.harivel@linutronix.de> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-11 11:05:52 +00:00
Youghandhar Chintala	7d352ccf1e	mac80211: Add support to trigger sta disconnect on hardware restart Currently in case of target hardware restart, we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Many ath10k wifi chipsets have sequence numbers for the data packets assigned by firmware and the mac sequence number will restart from zero after target hardware restart leading to mismatch in the sequence number expected by the remote peer vs the sequence number of the frame sent by the target firmware. This mismatch in sequence number will cause out-of-order packets on the remote peer and all the frames sent by the device are dropped until we reach the sequence number which was sent before we restarted the target hardware In order to fix this, we trigger a sta disconnect, in case of target hw restart. After this there will be a fresh connection and thereby avoiding the dropping of frames by remote peer. The right fix would be to pull the entire data path into the host which is not feasible or would need lots of complex changes and will still be inefficient. Tested on ath10k using WCN3990, QCA6174 Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> Link: https://lore.kernel.org/r/20220308115325.5246-2-youghand@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-03-11 11:59:19 +01:00
Linus Lüssing	4a2d4496e1	mac80211: fix potential double free on mesh join While commit `6a01afcf84` ("mac80211: mesh: Free ie data when leaving mesh") fixed a memory leak on mesh leave / teardown it introduced a potential memory corruption caused by a double free when rejoining the mesh: ieee80211_leave_mesh() -> kfree(sdata->u.mesh.ie); ... ieee80211_join_mesh() -> copy_mesh_setup() -> old_ie = ifmsh->ie; -> kfree(old_ie); This double free / kernel panics can be reproduced by using wpa_supplicant with an encrypted mesh (if set up without encryption via "iw" then ifmsh->ie is always NULL, which avoids this issue). And then calling: $ iw dev mesh0 mesh leave $ iw dev mesh0 mesh join my-mesh Note that typically these commands are not used / working when using wpa_supplicant. And it seems that wpa_supplicant or wpa_cli are going through a NETDEV_DOWN/NETDEV_UP cycle between a mesh leave and mesh join where the NETDEV_UP resets the mesh.ie to NULL via a memcpy of default_mesh_setup in cfg80211_netdev_notifier_call, which then avoids the memory corruption, too. The issue was first observed in an application which was not using wpa_supplicant but "Senf" instead, which implements its own calls to nl80211. Fixing the issue by removing the kfree()'ing of the mesh IE in the mesh join function and leaving it solely up to the mesh leave to free the mesh IE. Cc: stable@vger.kernel.org Fixes: `6a01afcf84` ("mac80211: mesh: Free ie data when leaving mesh") Reported-by: Matthias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Tested-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Link: https://lore.kernel.org/r/20220310183513.28589-1-linus.luessing@c0d3.blue Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-03-11 11:51:18 +01:00
MeiChia Chiu	022143d0c5	mac80211: correct legacy rates check in ieee80211_calc_rx_airtime There are no legacy rates on 60GHz or sub-1GHz band, so modify the check. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: MeiChia Chiu <MeiChia.Chiu@mediatek.com> Link: https://lore.kernel.org/r/20220308021645.16272-1-MeiChia.Chiu@mediatek.com [Ghz -> GHz] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-03-11 11:45:36 +01:00
Christophe JAILLET	60df54f8e6	mac80211: Use GFP_KERNEL instead of GFP_ATOMIC when possible Previous memory allocations in this function already use GFP_KERNEL, so use __dev_alloc_skb() and an explicit GFP_KERNEL instead of an implicit GFP_ATOMIC. This gives more opportunities of successful allocation. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/194a0e2ff00c3fae88cc9fba47431747360c8242.1645345378.git.christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-03-11 11:42:49 +01:00
Jakub Kicinski	155fb43b70	net: limit altnames to 64k total Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages. Prevent the problem from occurring by checking the length of the property list before adding new entries. Reported-by: George Shuklin <george.shuklin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:15:23 -08:00
Jakub Kicinski	5d26cff5bd	net: account alternate interface name memory George reports that altnames can eat up kernel memory. We should charge that memory appropriately. Reported-by: George Shuklin <george.shuklin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:15:20 -08:00
Ilya Maximets	1926407a4a	net: openvswitch: fix uAPI incompatibility with existing user space Few years ago OVS user space made a strange choice in the commit [1] to define types only valid for the user space inside the copy of a kernel uAPI header. '#ifndef __KERNEL__' and another attribute was added later. This leads to the inevitable clash between user space and kernel types when the kernel uAPI is extended. The issue was unveiled with the addition of a new type for IPv6 extension header in kernel uAPI. When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the older user space application, application tries to parse it as OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with every IPv6 packet that goes to the user space, IPv6 support is fully broken. Fixing that by bringing these user space attributes to the kernel uAPI to avoid the clash. Strictly speaking this is not the problem of the kernel uAPI, but changing it is the only way to avoid breakage of the older user space applications at this point. These 2 types are explicitly rejected now since they should not be passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved out from the '#ifdef __KERNEL__' as there is no good reason to hide it from the userspace. And it's also explicitly rejected now, because it's for in-kernel use only. Comments with warnings were added to avoid the problem coming back. (1 << type) converted to (1ULL << type) to avoid integer overflow on OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now. [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") Fixes: `28a3f06017` ("net: openvswitch: IPv6: Add IPv6 extension header support") Link: https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8a5f@nvidia.com Link: `beb75a40fd` Reported-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220309222033.3018976-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:14:52 -08:00
Jakub Kicinski	8bed3d02a6	linux-can-next-for-5.18-20220310 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmIpyLkTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCtfkuQ2KDTXU2LB/oDq++xqO2WjiRgj//tzyz8MIUlDcR6 WsLGiXUj3XOhj/hX11RcttIKceYf2Tzen9skbabOe3zwBOqutqiXkLRjPKZ6EVpX sZsVq0J+Bgz8IeE3u+qAHDt2ycc4AOMrVefkssXSn+r/06cKLZKnalw+8ioce4cu 4GyR6Bfm59aY1tdL3p+KcoPwR78FQsTxrbUAYyJ9UlfagZvc6uWe42hk6JBsXjBp eU/1UvnUuxhHsCetovVUbmRXFnsyWOIbyKrcPMsn+agUDNP+gASbP3lL710w812v 8DS79MllMe3OFPpWi4Yrg/ihG0m5o4n1OVSWZkyKvpSE0rIpn3lTY2gB =u2iw -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.18-20220310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-03-10 The first 3 patches are by Oliver Hartkopp, target the CAN ISOTP protocol and update the CAN frame sending behavior, and increases the max PDU size to 64 kByte. The next 2 patches are also by Oliver Hartkopp and update the virtual VXCAN driver so that CAN frames send into the peer name space show up as RX'ed CAN frames. Vincent Mailhol contributes a patch for the etas_es58x driver to fix a false positive dereference uninitialized variable warning. 2 patches by Ulrich Hecht add r8a779a0 SoC support to the rcar_canfd driver. The remaining 21 patches target the gs_usb driver and are by Peter Fink, Ben Evans, Eric Evenchick and me. This series cleans up the gs-usb driver, documents some bits of the USB ABI used by the widely used open source firmware candleLight, adds support for up to 3 CAN interfaces per USB device, adds CAN-FD support, adds quirks for some hardware and software workarounds and finally adds support for 2 new devices. * tag 'linux-can-next-for-5.18-20220310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (29 commits) can: gs_usb: add VID/PID for ABE CAN Debugger devices can: gs_usb: add VID/PID for CES CANext FD devices can: gs_usb: add extended bt_const feature can: gs_usb: activate quirks for CANtact Pro unconditionally can: gs_usb: add quirk for CANtact Pro overlapping GS_USB_BREQ value can: gs_usb: add usb quirk for NXP LPC546xx controllers can: gs_usb: add CAN-FD support can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame can: gs_usb: support up to 3 channels per device can: gs_usb: gs_usb_probe(): introduce udev and make use of it can: gs_usb: document the PAD_PKTS_TO_MAX_PKT_SIZE feature can: gs_usb: document the USER_ID feature can: gs_usb: update GS_CAN_FEATURE_IDENTIFY documentation can: gs_usb: add HW timestamp mode bit can: gs_usb: gs_make_candev(): call SET_NETDEV_DEV() after handling all bt_const->feature can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb() can: gs_usb: rewrap error messages can: gs_usb: GS_CAN_FLAG_OVERFLOW: make use of BIT() can: gs_usb: sort include files alphabetically can: gs_usb: fix checkpatch warning ... ==================== Link: https://lore.kernel.org/r/20220310142903.341658-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:09:27 -08:00
Jakub Kicinski	1e8a3f0d2a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/dsa/dsa2.c commit `afb3cc1a39` ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit `e83d565378` ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit `97b0129146` ("ice: Fix error with handling of bonding MTU") commit `43113ff734` ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit `fc7f750dc9` ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit `4bcc4249b4` ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 17:16:56 -08:00
Linus Torvalds	186d32bbf0	Networking fixes for 5.17-rc8/final, including fixes from bluetooth, and ipsec. Current release - regressions: - Bluetooth: fix unbalanced unlock in set_device_flags() - Bluetooth: fix not processing all entries on cmd_sync_work, make connect with qualcomm and intel adapters reliable - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" - xdp: xdp_mem_allocator can be NULL in trace_mem_connect() - eth: ice: fix race condition and deadlock during interface enslave Current release - new code bugs: - tipc: fix incorrect order of state message data sanity check Previous releases - regressions: - esp: fix possible buffer overflow in ESP transformation - dsa: unlock the rtnl_mutex when dsa_master_setup() fails - phy: meson-gxl: fix interrupt handling in forced mode - smsc95xx: ignore -ENODEV errors when device is unplugged Previous releases - always broken: - xfrm: fix tunnel mode fragmentation behavior - esp: fix inter address family tunneling on GSO - tipc: fix null-deref due to race when enabling bearer - sctp: fix kernel-infoleak for SCTP sockets - eth: macb: fix lost RX packet wakeup race in NAPI receive - eth: intel stop disabling VFs due to PF error responses - eth: bcmgenet: don't claim WOL when its not available Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIqlOsACgkQMUZtbf5S IrtKJBAAjZpYBwwHty6JR7AahLF4LNO+o1KmraqFV7YByS5NRfBRpXV7asvpxJNF 9iJhOWtLMsz/mVq0OXdx/+NpDh9JIHrQzb3GiskeKzBdhHmW4HjuYug1gytqRDMx uZOiQEuJSREu0tCsfcVWTF8wm4OgmPWtyZNZq2kwXsHiKoptB9KFK9pcvD6Utxrg jTpYBS5I9cX0Sj+gG9fZFNeyaxgmKkC5cM4cSLcheGSKHvEbX6MIXfi2Wb1VRBzE Qk/1JbkQf4gQ1BAu9kt8+jgWqW7vSnDn2iYUVw7RSSlj5xIM4f4m71nS9XzejJLb ADry24arlmknMS9Rhpy7n3ogNn/5MtlsZt01z/AAyZDRc1rrsWDqOJugtDRSnSEh yAhAsl/vqOuoovA86IRBTji8JlyfNZXt33K7+1KKDsj1wzSpcB9AKTDps8Ncu9uL elyaU2v4bTdhdqkQnxpcsLlLcV3FzLaWUVLpcla3XVLvzjEnoY+mhR5boW735uj7 f8Ig9Aj4UceJ+sQtXywciknE1+s48/pWqs8b8Y5DXX1P168A1ud5voy4Po6RvqQG B17WvAaq/7DsMKcuofeykFHCKlwO36xdt6l0ExaQuzmV+NgoEBWAmgwsyl9ktFpT I09D2RMPfTqYgdNvYkKGBrMKV87weVvHpMIeJiG1YeiBB3e1Xw8= =WfAR -----END PGP SIGNATURE----- Merge tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, and ipsec. Current release - regressions: - Bluetooth: fix unbalanced unlock in set_device_flags() - Bluetooth: fix not processing all entries on cmd_sync_work, make connect with qualcomm and intel adapters reliable - Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" - xdp: xdp_mem_allocator can be NULL in trace_mem_connect() - eth: ice: fix race condition and deadlock during interface enslave Current release - new code bugs: - tipc: fix incorrect order of state message data sanity check Previous releases - regressions: - esp: fix possible buffer overflow in ESP transformation - dsa: unlock the rtnl_mutex when dsa_master_setup() fails - phy: meson-gxl: fix interrupt handling in forced mode - smsc95xx: ignore -ENODEV errors when device is unplugged Previous releases - always broken: - xfrm: fix tunnel mode fragmentation behavior - esp: fix inter address family tunneling on GSO - tipc: fix null-deref due to race when enabling bearer - sctp: fix kernel-infoleak for SCTP sockets - eth: macb: fix lost RX packet wakeup race in NAPI receive - eth: intel stop disabling VFs due to PF error responses - eth: bcmgenet: don't claim WOL when its not available" * tag 'net-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). ice: Fix race condition during interface enslave net: phy: meson-gxl: improve link-up behavior net: bcmgenet: Don't claim WOL when its not available net: arc_emac: Fix use after free in arc_mdio_probe() sctp: fix kernel-infoleak for SCTP sockets net: phy: correct spelling error of media in documentation net: phy: DP83822: clear MISR2 register to disable interrupts gianfar: ethtool: Fix refcount leak in gfar_get_ts_info selftests: pmtu.sh: Kill nettest processes launched in subshell. selftests: pmtu.sh: Kill tcpdump processes launched by subshell. NFC: port100: fix use-after-free in port100_send_complete net/mlx5e: SHAMPO, reduce TIR indication net/mlx5e: Lag, Only handle events from highest priority multipath entry net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE net/mlx5: Fix a race on command flush flow net/mlx5: Fix size field in bufferx_reg struct ax25: Fix NULL pointer dereference in ax25_kill_by_device net: marvell: prestera: Add missing of_node_put() in prestera_switch_set_base_mac_addr net: ethernet: lpc_eth: Handle error for clk_enable ...	2022-03-10 16:47:58 -08:00
Sebastian Andrzej Siewior	e0ae713023	xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). Since the commit mentioned below __xdp_reg_mem_model() can return a NULL pointer. This pointer is dereferenced in trace_mem_connect() which leads to segfault. The trace points (mem_connect + mem_disconnect) were put in place to pair connect/disconnect using the IDs. The ID is only assigned if __xdp_reg_mem_model() does not return NULL. That connect trace point is of no use if there is no ID. Skip that connect trace point if xdp_alloc is NULL. [ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace point ] Fixes: `4a48ef70b9` ("xdp: Allow registering memory model without rxq reference") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 16:09:29 -08:00
Eric Dumazet	633593a808	sctp: fix kernel-infoleak for SCTP sockets syzbot reported a kernel infoleak [1] of 4 bytes. After analysis, it turned out r->idiag_expires is not initialized if inet_sctp_diag_fill() calls inet_diag_msg_common_fill() Make sure to clear idiag_timer/idiag_retrans/idiag_expires and let inet_diag_msg_sctpasoc_fill() fill them again if needed. [1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:154 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 instrument_copy_to_user include/linux/instrumented.h:121 [inline] copyout lib/iov_iter.c:154 [inline] _copy_to_iter+0x6ef/0x25a0 lib/iov_iter.c:668 copy_to_iter include/linux/uio.h:162 [inline] simple_copy_to_iter+0xf3/0x140 net/core/datagram.c:519 __skb_datagram_iter+0x2d5/0x11b0 net/core/datagram.c:425 skb_copy_datagram_iter+0xdc/0x270 net/core/datagram.c:533 skb_copy_datagram_msg include/linux/skbuff.h:3696 [inline] netlink_recvmsg+0x669/0x1c80 net/netlink/af_netlink.c:1977 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x795/0xa10 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x19d/0x210 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3247 [inline] __kmalloc_node_track_caller+0xe0c/0x1510 mm/slub.c:4975 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0x545/0xf90 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1158 [inline] netlink_dump+0x3e5/0x16c0 net/netlink/af_netlink.c:2248 __netlink_dump_start+0xcf8/0xe90 net/netlink/af_netlink.c:2373 netlink_dump_start include/linux/netlink.h:254 [inline] inet_diag_handler_cmd+0x2e7/0x400 net/ipv4/inet_diag.c:1341 sock_diag_rcv_msg+0x24a/0x620 netlink_rcv_skb+0x40c/0x7e0 net/netlink/af_netlink.c:2494 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:277 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x1093/0x1360 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x14d9/0x1720 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] sock_write_iter+0x594/0x690 net/socket.c:1061 do_iter_readv_writev+0xa7f/0xc70 do_iter_write+0x52c/0x1500 fs/read_write.c:851 vfs_writev fs/read_write.c:924 [inline] do_writev+0x645/0xe00 fs/read_write.c:967 __do_sys_writev fs/read_write.c:1040 [inline] __se_sys_writev fs/read_write.c:1037 [inline] __x64_sys_writev+0xe5/0x120 fs/read_write.c:1037 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Bytes 68-71 of 2508 are uninitialized Memory access of size 2508 starts at ffff888114f9b000 Data copied to user address 00007f7fe09ff2e0 CPU: 1 PID: 3478 Comm: syz-executor306 Not tainted 5.17.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: `8f840e47f1` ("sctp: add the sctp_diag.c file") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20220310001145.297371-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 14:46:42 -08:00
Oliver Hartkopp	9c0c191d82	can: isotp: set max PDU size to 64 kByte The reason to extend the max PDU size from 4095 Byte (12 bit length value) to a 32 bit value (up to 4 GByte) was to be able to flash 64 kByte bootloaders with a single ISO-TP PDU. The max PDU size in the Linux kernel implementation was set to 8200 Bytes to be able to test the length information escape sequence. It turns out that the demand for 64 kByte PDUs is real so the value for MAX_MSG_LENGTH is set to 66000 to be able to potentially add some checksums to the 65.536 Byte block. Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1056142301 Link: https://lore.kernel.org/all/20220309120416.83514-3-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-10 09:23:45 +01:00
Oliver Hartkopp	530e0d46c6	can: isotp: set default value for N_As to 50 micro seconds The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s). Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative. With this N_As/frame_txtime value the 'burst mode' is disabled by default. As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-10 09:23:45 +01:00
Oliver Hartkopp	4b7fe92c06	can: isotp: add local echo tx processing for consecutive frames Instead of dumping the CAN frames into the netdevice queue the process to transmit consecutive frames (CF) now waits for the frame to be transmitted and therefore echo'ed from the CAN interface. Link: https://lore.kernel.org/all/20220309120416.83514-1-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-03-10 09:23:45 +01:00

1 2 3 4 5 ...

68566 Commits