linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 04:31:50 +00:00

Author	SHA1	Message	Date
Kevin Brodsky	60daf8d40b	net/compat: Update msg_control_is_user when setting a kernel pointer cmsghdr_from_user_compat_to_kern() is an unusual case w.r.t. how the kmsg->msg_control* fields are used. The input struct msghdr holds a pointer to a user buffer, i.e. ksmg->msg_control_user is active. However, upon success, a kernel pointer is stored in kmsg->msg_control. kmsg->msg_control_is_user should therefore be updated accordingly. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Kevin Brodsky	c39ef21304	net: Ensure ->msg_control_user is used for user buffers Since commit `1f466e1f15` ("net: cleanly handle kernel vs user buffers for ->msg_control"), pointers to user buffers should be stored in struct msghdr::msg_control_user, instead of the msg_control field. Most users of msg_control have already been converted (where user buffers are involved), but not all of them. This patch attempts to address the remaining cases. An exception is made for null checks, as it should be safe to use msg_control unconditionally for that purpose. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:09:27 +01:00
Arseniy Krasnov	eaaa4e9239	vsock/loopback: don't disable irqs for queue access This replaces 'skb_queue_tail()' with 'virtio_vsock_skb_queue_tail()'. The first one uses 'spin_lock_irqsave()', second uses 'spin_lock_bh()'. There is no need to disable interrupts in the loopback transport as there is no access to the queue with skbs from interrupt context. Both virtio and vhost transports work in the same way. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 11:04:04 +01:00
David S. Miller	c61fcc090f	Merge branch 'mana-jumbo-frames' Haiyang Zhang says: ==================== net: mana: Add support for jumbo frame The set adds support for jumbo frame, with some optimization for the RX path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:20 +01:00
Haiyang Zhang	80f6215b45	net: mana: Add support for jumbo frame During probe, get the hardware-allowed max MTU by querying the device configuration. Users can select MTU up to the device limit. When XDP is in use, limit MTU settings so the buffer size is within one page. And, when MTU is set to a too large value, XDP is not allowed to run. Also, to prevent changing MTU fails, and leaves the NIC in a bad state, pre-allocate all buffers before starting the change. So in low memory condition, it will return error, without affecting the NIC. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	2fbbd712ba	net: mana: Enable RX path to handle various MTU sizes Update RX data path to allocate and use RX queue DMA buffers with proper size based on potentially various MTU sizes. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	a2917b2349	net: mana: Refactor RX buffer allocation code to prepare for various MTU Move out common buffer allocation code from mana_process_rx_cqe() and mana_alloc_rx_wqe() to helper functions. Refactor related variables so they can be changed in one place, and buffer sizes are in sync. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Haiyang Zhang	ce518bc3e9	net: mana: Use napi_build_skb in RX path Use napi_build_skb() instead of build_skb() to take advantage of the NAPI percpu caches to obtain skbuff_head. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-04-14 08:56:19 +01:00
Jakub Kicinski	e473ea818b	mlx5-updates-2023-04-11 1) Vlad adds the support for linux bridge multicast offload support Patches #1 through #9 Synopsis Vlad Says: ============== Implement support of bridge multicast offload in mlx5. Handle port object attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast offload and bridge snooping support on bridge. Handle port object SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB. Steering architecture Existing offload infrastructure relies on two levels of flow tables - bridge ingress and egress. For multicast offload the architecture is extended with additional layer of per-port multicast replication tables. Such tables filter loopback traffic (so packets are not replicated to their source port) and pop VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in egress table. MDB egress rule can point to multiple per-port multicast tables, which causes matching multicast traffic to be replicated to all of them, and, consecutively, to several bridge ports: +--------+--+ +---------------------------------------> Port 1 \| \| \| +-^------+--+ \| \| \| \| +-----------------------------------------+ \| +---------------------------+ \| \| EGRESS table \| \| +--> PORT 1 multicast table \| \| +----------------------------------+ +-----------------------------------------+ \| \| +---------------------------+ \| \| INGRESS table \| \| \| \| \| \| \| \| +----------------------------------+ \| dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ \| \| FG0: \| \| \| \| \| dst_mac=P1,vlan=Y -> pop vlan, goto P1 \| \| \| src_port=dst_port -> drop \| \| \| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ \| \| FG1: \| \| \| ... \| \| dst_mac=P2,vlan=Y -> goto P2 \| \| \| \| VLAN X -> pop, goto port \| \| \| \| \| dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ \| ... \| \| +----------------------------------+ \| \| \| \| \| VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ \| \| \| FG3: \| \| \| \| matchall -> goto port \| \| \| \| \| \| \| +---------------------------+ \| \| \| \| \| \| +--------+--+ +---------------------------------------> Port 2 \| \| \| +-^------+--+ \| \| \| \| \| +---------------------------+ \| +--> PORT 2 multicast table \| \| +---------------------------+ \| \| \| \| \| FG0: \| \| \| src_port=dst_port -> drop \| \| \| FG1: \| \| \| VLAN X -> pop, goto port \| \| \| ... \| \| \| \| \| \| FG3: \| \| \| matchall -> goto port +-------+ \| \| +---------------------------+ Patches overview: - Patch 1 adds hardware definition bits for capabilities required to replicate multicast packets to multiple per-port tables. These bits are used by following patches to only attempt multicast offload if firmware and hardware provide necessary support. - Pathces 2-4 patches are preparations and refactoring. - Patch 5 implements necessary infrastructure to toggle multicast offload via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification. This also enabled IGMP and MLD snooping. - Patch 6 implements per-port multicast replication tables. It only supports filtering of loopback packets. - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged' VLANs. - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It creates MDB replication rules in egress table that can replicate packets to multiple per-port multicast tables. - Patch 9 adds tracepoints for MDB events. ============== 2) Parav Create a new allocation profile for SFs, to save on memory 3) Yevgeny provides some initial patches for upcoming software steering support new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. As a preparation Yevgeny provides the following 4 patches: - Patch 1: Add required mlx5_ifc HW bits - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg support and adds appropriate support in dr_send.c - Patch 4: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQ2LDIACgkQSD+KveBX +j59UAf+NfLEq7i/gYOri1rLwXOrgPd13w64Oob79+pKVXzRqik/gBg4d0CKdc2X Qafntudus7IvVrGJCM0vurw7nC75H9BMFh3dQW2u3uaR9m6i0k4tSKIAoJUFHMdj dMgAJywYkCCXlBbGVhRE3yybal2EXfOSJ+Fr2tGeqpL4vRlg4kIaVuFGcGk2gMpq gOahX6wtuop3ABzY3sqKCIQ1Q2jWx9AWWz+lEmw7HJGmHGugscYrgSIloHDnu1fz Bt6uODn7GFr866rQr9+JemG0VyK8ahyisEFnX1oJ3642ZidPcDrbIrrtMWeEqo7X 1tcmY/wNti87xFbg3gL+DZekhHTHoQ== =oIN8 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-11 1) Vlad adds the support for linux bridge multicast offload support Patches #1 through #9 Synopsis Vlad Says: ============== Implement support of bridge multicast offload in mlx5. Handle port object attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast offload and bridge snooping support on bridge. Handle port object SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB. Steering architecture Existing offload infrastructure relies on two levels of flow tables - bridge ingress and egress. For multicast offload the architecture is extended with additional layer of per-port multicast replication tables. Such tables filter loopback traffic (so packets are not replicated to their source port) and pop VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in egress table. MDB egress rule can point to multiple per-port multicast tables, which causes matching multicast traffic to be replicated to all of them, and, consecutively, to several bridge ports: +--------+--+ +---------------------------------------> Port 1 \| \| \| +-^------+--+ \| \| \| \| +-----------------------------------------+ \| +---------------------------+ \| \| EGRESS table \| \| +--> PORT 1 multicast table \| \| +----------------------------------+ +-----------------------------------------+ \| \| +---------------------------+ \| \| INGRESS table \| \| \| \| \| \| \| \| +----------------------------------+ \| dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ \| \| FG0: \| \| \| \| \| dst_mac=P1,vlan=Y -> pop vlan, goto P1 \| \| \| src_port=dst_port -> drop \| \| \| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ \| \| FG1: \| \| \| ... \| \| dst_mac=P2,vlan=Y -> goto P2 \| \| \| \| VLAN X -> pop, goto port \| \| \| \| \| dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ \| ... \| \| +----------------------------------+ \| \| \| \| \| VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ \| \| \| FG3: \| \| \| \| matchall -> goto port \| \| \| \| \| \| \| +---------------------------+ \| \| \| \| \| \| +--------+--+ +---------------------------------------> Port 2 \| \| \| +-^------+--+ \| \| \| \| \| +---------------------------+ \| +--> PORT 2 multicast table \| \| +---------------------------+ \| \| \| \| \| FG0: \| \| \| src_port=dst_port -> drop \| \| \| FG1: \| \| \| VLAN X -> pop, goto port \| \| \| ... \| \| \| \| \| \| FG3: \| \| \| matchall -> goto port +-------+ \| \| +---------------------------+ Patches overview: - Patch 1 adds hardware definition bits for capabilities required to replicate multicast packets to multiple per-port tables. These bits are used by following patches to only attempt multicast offload if firmware and hardware provide necessary support. - Pathces 2-4 patches are preparations and refactoring. - Patch 5 implements necessary infrastructure to toggle multicast offload via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification. This also enabled IGMP and MLD snooping. - Patch 6 implements per-port multicast replication tables. It only supports filtering of loopback packets. - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged' VLANs. - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It creates MDB replication rules in egress table that can replicate packets to multiple per-port multicast tables. - Patch 9 adds tracepoints for MDB events. ============== 2) Parav Create a new allocation profile for SFs, to save on memory 3) Yevgeny provides some initial patches for upcoming software steering support new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. As a preparation Yevgeny provides the following 4 patches: - Patch 1: Add required mlx5_ifc HW bits - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg support and adds appropriate support in dr_send.c - Patch 4: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows * tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Add modify-header-pattern ICM pool net/mlx5: DR, Prepare sending new WQE type net/mlx5: Add new WQE for updating flow table net/mlx5: Add mlx5_ifc bits for modify header argument net/mlx5: DR, Set counter ID on the last STE for STEv1 TX net/mlx5: Create a new profile for SFs net/mlx5: Bridge, add tracepoints for multicast net/mlx5: Bridge, implement mdb offload net/mlx5: Bridge, support multicast VLAN pop net/mlx5: Bridge, add per-port multicast replication tables net/mlx5: Bridge, snoop igmp/mld packets net/mlx5: Bridge, extract code to lookup parent bridge of port net/mlx5: Bridge, move additional data structures to priv header net/mlx5: Bridge, increase bridge tables sizes net/mlx5: Add mlx5_ifc definitions for bridge multicast support ==================== Link: https://lore.kernel.org/r/20230412040752.14220-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:28:03 -07:00
Jakub Kicinski	f7d29571ab	Merge branch 'add-kernel-tc-mqprio-and-tc-taprio-support-for-preemptible-traffic-classes' Vladimir Oltean says: ==================== Add kernel tc-mqprio and tc-taprio support for preemptible traffic classes The last RFC in August 2022 contained a proposal for the UAPI of both TSN standards which together form Frame Preemption (802.1Q and 802.3): https://lore.kernel.org/netdev/20220816222920.1952936-1-vladimir.oltean@nxp.com/ It wasn't clear at the time whether the 802.1Q portion of Frame Preemption should be exposed via the tc qdisc (mqprio, taprio) or via some other layer (perhaps also ethtool like the 802.3 portion, or dcbnl), even though the options were discussed extensively, with pros and cons: https://lore.kernel.org/netdev/20220816222920.1952936-3-vladimir.oltean@nxp.com/ So the 802.3 portion got submitted separately and finally was accepted: https://lore.kernel.org/netdev/20230119122705.73054-1-vladimir.oltean@nxp.com/ leaving the only remaining question: how do we expose the 802.1Q bits? This series proposes that we use the Qdisc layer, through separate (albeit very similar) UAPI in mqprio and taprio, and that both these Qdiscs pass the information down to the offloading device driver through the common mqprio offload structure (which taprio also passes). An implementation is provided for the NXP LS1028A on-board Ethernet endpoint (enetc). Previous versions also contained support for its embedded switch (felix), but this needs more work and will be submitted separately. v4: https://lore.kernel.org/netdev/20230403103440.2895683-1-vladimir.oltean@nxp.com/ v2: https://lore.kernel.org/netdev/20230219135309.594188-1-vladimir.oltean@nxp.com/ v1: https://lore.kernel.org/netdev/20230216232126.3402975-1-vladimir.oltean@nxp.com/ ==================== Link: https://lore.kernel.org/r/20230411180157.1850527-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:12 -07:00
Vladimir Oltean	01e23b2b3b	net: enetc: add support for preemptible traffic classes PFs which support the MAC Merge layer also have a set of 8 registers called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)". Through these, a traffic class (group of TX rings of same dequeue priority) can be mapped to the eMAC or to the pMAC. There's nothing particularly spectacular here. We should probably only commit the preemptible TCs to hardware once the MAC Merge layer became active, but unlike Felix, we don't have an IRQ that notifies us of that. We'd have to sleep for up to verifyTime (127 ms) to wait for a resolution coming from the verification state machine; not only from the ndo_setup_tc() code path, but also from enetc_mm_link_state_update(). Since it's relatively complicated and has a relatively small benefit, I'm not doing it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	50764da37c	net: enetc: rename "mqprio" to "qopt" To gain access to the larger encapsulating structure which has the type tc_mqprio_qopt_offload, rename just the "qopt" field as "qopt". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	a721c3e54b	net/sched: taprio: allow per-TC user input of FP adminStatus This is a duplication of the FP adminStatus logic introduced for tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload structure embedded within tc_taprio_qopt_offload. So practically, if a device driver is written to treat the mqprio portion of taprio just like standalone mqprio, it gets unified handling of frame preemption. I would have reused more code with taprio, but this is mostly netlink attribute parsing, which is hard to transform into generic code without having something that stinks as a result. We have the same variables with the same semantics, just different nlattr type values (TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12; TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and consequently, different policies for the nest. Every time nla_parse_nested() is called, an on-stack table "tb" of nlattr pointers is allocated statically, up to the maximum understood nlattr type. That array size is hardcoded as a constant, but when transforming this into a common parsing function, it would become either a VLA (which the Linux kernel rightfully doesn't like) or a call to the allocator. Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE" and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE" use cases. HOLD and RELEASE events are emitted towards the underlying MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a Set-And-Release-MAC gate operation. So within the tc-taprio UAPI space, one can distinguish between the 2 use cases by choosing whether to use the TC_TAPRIO_CMD_SET_AND_HOLD and TC_TAPRIO_CMD_SET_AND_RELEASE gate operations within the schedule, or just TC_TAPRIO_CMD_SET_GATES. A small part of the change is dedicated to refactoring the max_sdu nlattr parsing to put all logic under the "if" that tests for presence of that nlattr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	f62af20bed	net/sched: mqprio: allow per-TC user input of FP adminStatus IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each packet priority can be assigned to a "frame preemption status" value of either "express" or "preemptible". Express priorities are transmitted by the local device through the eMAC, and preemptible priorities through the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge layer). The FP adminStatus is defined per packet priority, but 802.1Q clause 12.30.1.1.1 framePreemptionAdminStatus also says that: \| Priorities that all map to the same traffic class should be \| constrained to use the same value of preemption status. It is impossible to ignore the cognitive dissonance in the standard here, because it practically means that the FP adminStatus only takes distinct values per traffic class, even though it is defined per priority. I can see no valid use case which is prevented by having the kernel take the FP adminStatus as input per traffic class (what we do here). In addition, this also enforces the above constraint by construction. User space network managers which wish to expose FP adminStatus per priority are free to do so; they must only observe the prio_tc_map of the netdev (which presumably is also under their control, when constructing the mqprio netlink attributes). The reason for configuring frame preemption as a property of the Qdisc layer is that the information about "preemptible TCs" is closest to the place which handles the num_tc and prio_tc_map of the netdev. If the UAPI would have been any other layer, it would be unclear what to do with the FP information when num_tc collapses to 0. A key assumption is that only mqprio/taprio change the num_tc and prio_tc_map of the netdev. Not sure if that's a great assumption to make. Having FP in tc-mqprio can be seen as an implementation of the use case defined in 802.1Q Annex S.2 "Preemption used in isolation". There will be a separate implementation of FP in tc-taprio, for the other use cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	c54876cd59	net/sched: pass netlink extack to mqprio and taprio offload With the multiplexed ndo_setup_tc() model which lacks a first-class struct netlink_ext_ack * argument, the only way to pass the netlink extended ACK message down to the device driver is to embed it within the offload structure. Do this for struct tc_mqprio_qopt_offload and struct tc_taprio_qopt_offload. Since struct tc_taprio_qopt_offload also contains a tc_mqprio_qopt_offload structure, and since device drivers might effectively reuse their mqprio implementation for the mqprio portion of taprio, we make taprio set the extack in both offload structures to point at the same netlink extack message. In fact, the taprio handling is a bit more tricky, for 2 reasons. First is because the offload structure has a longer lifetime than the extack structure. The driver is supposed to populate the extack synchronously from ndo_setup_tc() and leave it alone afterwards. To not have any use-after-free surprises, we zero out the extack pointer when we leave taprio_enable_offload(). The second reason is because taprio does overwrite the extack message on ndo_setup_tc() error. We need to switch to the weak form of setting an extack message, which preserves a potential message set by the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	ab277d2084	net/sched: mqprio: add an extack message to mqprio_parse_opt() Ferenc reports that a combination of poor iproute2 defaults and obscure cases where the kernel returns -EINVAL make it difficult to understand what is wrong with this command: $ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name veth1 $ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 RTNETLINK answers: Invalid argument Hopefully with this patch, the cause is clearer: Error: Device does not support hardware offload. The kernel was (and still is) rejecting this because iproute2 defaults to "hw 1" if this command line option is not specified. Link: https://lore.kernel.org/netdev/ede5e9a2f27bf83bfb86d3e8c4ca7b34093b99e2.camel@inf.elte.hu/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	57f21bf854	net/sched: mqprio: add extack to mqprio_parse_nlattr() Netlink attribute parsing in mqprio is a minesweeper game, with many options having the possibility of being passed incorrectly and the user being none the wiser. Try to make errors less sour by giving user space some information regarding what went wrong. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	3dd0c16ec9	net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONS In commit `4e8b86c062` ("mqprio: Introduce new hardware offload mode and shaper in mqprio"), the TCA_OPTIONS format of mqprio was extended to contain a fixed portion (of size NLA_ALIGN(sizeof struct tc_mqprio_qopt)) and a variable portion of other nlattrs (in the TCA_MQPRIO_* type space) following immediately afterwards. In commit `feb2cf3dcf` ("net/sched: mqprio: refactor nlattr parsing to a separate function"), we've moved the nlattr handling to a smaller function, but yet, a small parse_attr() still remains, and the larger mqprio_parse_nlattr() still does not have access to the beginning, and the length, of the TCA_OPTIONS region containing these other nlattrs. In a future change, the mqprio qdisc will need to iterate through this nlattr region to discover other attributes, so eliminate parse_attr() and add 2 variables in mqprio_parse_nlattr() which hold the beginning and the length of the nlattr range. We avoid the need to memset when nlattr_opt_len has insufficient length by pre-initializing the table "tb". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Vladimir Oltean	d54151aa0f	net: ethtool: create and export ethtool_dev_mm_supported() Create a wrapper over __ethtool_dev_mm_supported() which also calls ethnl_ops_begin() and ethnl_ops_complete(). It can be used by other code layers, such as tc, to make sure that preemptible TCs are supported (this is true if an underlying MAC Merge layer exists). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:22:10 -07:00
Rahul Rameshbabu	85a4abed15	tools: ynl: Rename ethtool to ethtool.py Make it explicit that this tool is not a drop-in replacement for ethtool. This tool is intended for testing ethtool functionality implemented in the kernel and should use a name that differentiates it from the ethtool utility. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Link: https://lore.kernel.org/r/20230413012252.184434-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:18:29 -07:00
Rahul Rameshbabu	3ea31e6664	tools: ynl: Remove absolute paths to yaml files from ethtool testing tool Absolute paths for the spec and schema files make the ethtool testing tool unusable with freshly checked-out source trees. Replace absolute paths with relative paths for both files in the Documentation/ directory. Issue seen before the change Traceback (most recent call last): File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/./ethtool", line 424, in <module> main() File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/./ethtool", line 158, in main ynl = YnlFamily(spec, schema) File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/lib/ynl.py", line 342, in __init__ super().__init__(def_path, schema) File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/lib/nlspec.py", line 333, in __init__ with open(spec_path, "r") as stream: FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/google/home/sdf/src/linux/Documentation/netlink/specs/ethtool.yaml' Fixes: `f3d07b02b2` ("tools: ynl: ethtool testing tool") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Link: https://lore.kernel.org/r/20230413012252.184434-1-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:18:29 -07:00
Jakub Kicinski	916b15fbf2	Merge branch 'macb-ptp-minor-updates' Harini Katakam says: ==================== Macb PTP minor updates - Enable PTP unicast - Optimize HW timestamp reading ==================== Link: https://lore.kernel.org/r/20230411123712.11459-1-harini.katakam@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:16:12 -07:00
Harini Katakam	8c0d0fe044	net: macb: Optimize reading HW timestamp The seconds input from BD (6 bits) just needs to be ORed with the upper bits from timer in this function. Avoid addition operation every single time. Seconds rollover handling is left untouched. Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:16:09 -07:00
Harini Katakam	ee4e92c26c	net: macb: Enable PTP unicast Enable transmission and reception of PTP unicast packets by updating PTP unicast config bit and setting current HW mac address as allowed address in PTP unicast filter registers. Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:16:09 -07:00
Harini Katakam	adee474a3b	net: macb: Update gem PTP support check There are currently two checks for PTP functionality - one on GEM capability and another on the kernel config option. Combine them into a single function as there's no use case where gem_has_ptp is TRUE and MACB_USE_HWSTAMP is false. Signed-off-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 22:16:09 -07:00
Jakub Kicinski	fb4be9a4e7	Merge branch 'ocelot-felix-driver-cleanup' Vladimir Oltean says: ==================== Ocelot/Felix driver cleanup The cleanup mostly handles the statistics code path - some issues regarding understandability became apparent after the series "Fix trainwreck with Ocelot switch statistics counters": https://lore.kernel.org/netdev/20230321010325.897817-1-vladimir.oltean@nxp.com/ There is also one patch which cleans up a misleading comment in the DSA felix_setup(). ==================== Link: https://lore.kernel.org/r/20230412124737.2243527-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:08 -07:00
Vladimir Oltean	a291399e63	net: mscc: ocelot: fix ineffective WARN_ON() in ocelot_stats.c Since it is hopefully now clear that, since "last" and "layout[i].reg" are enum types and not addresses, the existing WARN_ON() is ineffective in checking that the _addresses_ are sorted in the proper order. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	6663c01eca	net: mscc: ocelot: strengthen type of "int i" in ocelot_stats.c The "int i" used to index the struct ocelot_stat_layout array actually has a specific type: enum ocelot_stat. Use it, so that the WARN() comment from ocelot_prepare_stats_regions() makes more sense. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	eae0b9d15b	net: mscc: ocelot: strengthen type of "u32 reg" and "u32 base" in ocelot_stats.c Use the specific enum ocelot_reg to make it clear that the region registers are encoded and not plain addresses. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	a9afc3e41c	net: dsa: felix: remove confusing/incorrect comment from felix_setup() That comment was written prior to knowing that what I was actually seeing was a manifestation of the bug fixed in commit `b4024c9e5c` ("felix: Fix initialization of ioremap resources"). There isn't any particular reason now why the hardware initialization is done in felix_setup(), so just delete that comment to avoid spreading misinformation. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	93f0f93bbd	net: mscc: ocelot: remove blank line at the end of ocelot_stats.c Commit `a3bb8f521f` ("net: mscc: ocelot: remove unnecessary exposure of stats structures") made an unnecessary change which was to add a new line at the end of ocelot_stats.c. Remove it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	07de32655b	net: mscc: ocelot: debugging print for statistics regions To make it easier to debug future issues with statistics counters not getting aggregated properly into regions, like what happened in commit `6acc72a43e` ("net: mscc: ocelot: fix stats region batching"), add some dev_dbg() prints which show the regions that were dynamically determined. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	40cd07cb42	net: mscc: ocelot: refactor enum ocelot_reg decoding to helper ocelot_io.c duplicates the decoding of an enum ocelot_reg (which holds an enum ocelot_target in the upper bits and an index into a regmap array in the lower bits) 4 times. We'd like to reuse that logic once more, from ocelot.c. In order to do that, let's consolidate the existing 4 instances into a header accessible both by ocelot.c as well as by ocelot_io.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Vladimir Oltean	9ecd05794b	net: mscc: ocelot: strengthen type of "u32 reg" in I/O accessors The "u32 reg" argument that is passed to these functions is not a plain address, but rather a driver-specific encoding of another enum ocelot_target target in the upper bits, and an index into the u32 ocelot->map[target][] array in the lower bits. That encoded value takes the type "enum ocelot_reg" and is what is passed to these I/O functions, so let's actually use that to prevent type confusion. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 21:56:06 -07:00
Jakub Kicinski	c2865b1122	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhSiwAKCRDbK58LschI g8cbAQCH4xrquOeDmYyGXFQGchHZAIj++tKg8ABU4+hYeJtrlwEA6D4W6wjoSZRk mLSptZ9qro8yZA86BvyPvlBT1h9ELQA= =StAc -----END PGP SIGNATURE----- Daniel Borkmann says: ==================== pull-request: bpf-next 2023-04-13 We've added 260 non-merge commits during the last 36 day(s) which contain a total of 356 files changed, 21786 insertions(+), 11275 deletions(-). The main changes are: 1) Rework BPF verifier log behavior and implement it as a rotating log by default with the option to retain old-style fixed log behavior, from Andrii Nakryiko. 2) Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params, from Christian Ehrig. 3) Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton, from Alexei Starovoitov. 4) Optimize hashmap lookups when key size is multiple of 4, from Anton Protopopov. 5) Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps, from David Vernet. 6) Add support for stashing local BPF kptr into a map value via bpf_kptr_xchg(). This is useful e.g. for rbtree node creation for new cgroups, from Dave Marchevsky. 7) Fix BTF handling of is_int_ptr to skip modifiers to work around tracing issues where a program cannot be attached, from Feng Zhou. 8) Migrate a big portion of test_verifier unit tests over to test_progs -a verifier_* via inline asm to ease {read,debug}ability, from Eduard Zingerman. 9) Several updates to the instruction-set.rst documentation which is subject to future IETF standardization (https://lwn.net/Articles/926882/), from Dave Thaler. 10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register known bits information propagation, from Daniel Borkmann. 11) Add skb bitfield compaction work related to BPF with the overall goal to make more of the sk_buff bits optional, from Jakub Kicinski. 12) BPF selftest cleanups for build id extraction which stand on its own from the upcoming integration work of build id into struct file object, from Jiri Olsa. 13) Add fixes and optimizations for xsk descriptor validation and several selftest improvements for xsk sockets, from Kal Conley. 14) Add BPF links for struct_ops and enable switching implementations of BPF TCP cong-ctls under a given name by replacing backing struct_ops map, from Kui-Feng Lee. 15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable offset stack read as earlier Spectre checks cover this, from Luis Gerhorst. 16) Fix issues in copy_from_user_nofault() for BPF and other tracers to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner and Alexei Starovoitov. 17) Add --json-summary option to test_progs in order for CI tooling to ease parsing of test results, from Manu Bretelle. 18) Batch of improvements and refactoring to prep for upcoming bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator, from Martin KaFai Lau. 19) Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations, from Quentin Monnet. 20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting the module name from BTF of the target and searching kallsyms of the correct module, from Viktor Malik. 21) Improve BPF verifier handling of '<const> <cond> <non_const>' to better detect whether in particular jmp32 branches are taken, from Yonghong Song. 22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock. A built-in cc or one from a kernel module is already able to write to app_limited, from Yixin Shen. Conflicts: Documentation/bpf/bpf_devel_QA.rst `b7abcd9c65` ("bpf, doc: Link to submitting-patches.rst for general patch submission info") `0f10f647f4` ("bpf, docs: Use internal linking for link to netdev subsystem doc") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ include/net/ip_tunnels.h `bc9d003dc4` ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts") `ac931d4cde` ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices") https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/ net/bpf/test_run.c `e5995bc7e2` ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption") `294635a816` ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES") https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/ ==================== Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 16:43:38 -07:00
Jakub Kicinski	800e68c44f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: tools/testing/selftests/net/config `62199e3f16` ("selftests: net: Add VXLAN MDB test") `3a0385be13` ("selftests: add the missing CONFIG_IP_SCTP in net config") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 16:04:28 -07:00
Linus Torvalds	829cca4d17	Including fixes from bpf, and bluetooth. Not all that quiet given spring celebrations, but "current" fixes are thinning out, which is encouraging. One outstanding regression in the mlx5 driver when using old FW, not blocking but we're pushing for a fix. Current release - new code bugs: - eth: enetc: workaround for unresponsive pMAC after receiving express traffic Previous releases - regressions: - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the pid/seq fields 0 for backward compatibility Previous releases - always broken: - sctp: fix a potential overflow in sctp_ifwdtsn_skip - mptcp: - use mptcp_schedule_work instead of open-coding it and make the worker check stricter, to avoid scheduling work on closed sockets - fix NULL pointer dereference on fastopen early fallback - skbuff: fix memory corruption due to a race between skb coalescing and releasing clones confusing page_pool reference counting - bonding: fix neighbor solicitation validation on backup slaves - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp - bpf: arm64: fixed a BTI error on returning to patched function - openvswitch: fix race on port output leading to inf loop - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid returning a different errno than expected - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove - Bluetooth: fix printing errors if LE Connection times out - Bluetooth: assorted UaF, deadlock and data race fixes - eth: macb: fix memory corruption in extended buffer descriptor mode Misc: - adjust the XDP Rx flow hash API to also include the protocol layers over which the hash was computed Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQ4ZrsACgkQMUZtbf5S IruWUQ/9F+HlnHf3Sv08zGlnV++vLaJ/Ld8C2YNYNxRwuoJvcCyikQ/ZfUKdKAoS kCf0XqFD2SMl8wHpCQBhK4uXvKBdBMx6L6wEp7dbDciGl/+5yihe9opBBXKekWbB ULRZcZE7RACri/QsXQhD7Y8p530xnYWQXO8ZMjR3vOAWxplJtBDNDnXi7hqtxQpW Vzwl1ntvD1msmxhvy0UZrgeesL8R3UckFvqYEqnINeyd8E8HB1dAOg899/ZPUbjA UoEw5VsSBSr9DE7+Fs6uD8trBxQ1CrdRVJjhRhwivHk8/Ro7dIzjcVV30ei3wucz 0RiNvCqypkeLeRrcVlSk8lR5r9FBGvhDMFbcGM8lHnxSc0WB+Sj+2iup4fpTE8/p VUIvhhzuBuXU4Sc022pm6BL5DBSKdnJRquFq6XCTwnQM6v7fvzu1yWNXsQom8Nbi 1/ZcFdj27FHwMpU0JPZ4PFxT7Ta830UyulVZuyWA+zEzlN7DvW3O7bGQC72GEuID Xc58D4kVtywzbntFmUjuhXCD/6vvD5WW5orLpMWg5SH9F14prt3C9OFSpTwTTq+W uPBEslhnhhCPecTNo2iFPLX3bN67n8KDVUWua1AHaqmcK7QFGs0njJGGLpFdHMll SuNgrNrtQE9EHH8XL6VbSD2zf35ZfoKVg6qvL3oeLzXkGkZrnls= =W+J2 -----END PGP SIGNATURE----- Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, and bluetooth. Not all that quiet given spring celebrations, but "current" fixes are thinning out, which is encouraging. One outstanding regression in the mlx5 driver when using old FW, not blocking but we're pushing for a fix. Current release - new code bugs: - eth: enetc: workaround for unresponsive pMAC after receiving express traffic Previous releases - regressions: - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the pid/seq fields 0 for backward compatibility Previous releases - always broken: - sctp: fix a potential overflow in sctp_ifwdtsn_skip - mptcp: - use mptcp_schedule_work instead of open-coding it and make the worker check stricter, to avoid scheduling work on closed sockets - fix NULL pointer dereference on fastopen early fallback - skbuff: fix memory corruption due to a race between skb coalescing and releasing clones confusing page_pool reference counting - bonding: fix neighbor solicitation validation on backup slaves - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp - bpf: arm64: fixed a BTI error on returning to patched function - openvswitch: fix race on port output leading to inf loop - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid returning a different errno than expected - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove - Bluetooth: fix printing errors if LE Connection times out - Bluetooth: assorted UaF, deadlock and data race fixes - eth: macb: fix memory corruption in extended buffer descriptor mode Misc: - adjust the XDP Rx flow hash API to also include the protocol layers over which the hash was computed" * tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type veth: bpf_xdp_metadata_rx_hash add xdp rss hash type mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type xdp: rss hash types representation selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters skbuff: Fix a race between coalescing and releasing SKBs net: macb: fix a memory corruption in extended buffer descriptor mode selftests: add the missing CONFIG_IP_SCTP in net config udp6: fix potential access to stale information selftests: openvswitch: adjust datapath NL message declaration selftests: mptcp: userspace pm: uniform verify events mptcp: fix NULL pointer dereference on fastopen early fallback mptcp: stricter state check in mptcp_worker mptcp: use mptcp_schedule_work instead of open-coding it net: enetc: workaround for unresponsive pMAC after receiving express traffic sctp: fix a potential overflow in sctp_ifwdtsn_skip net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() rtnetlink: Restore RTM_NEW/DELLINK notification behavior net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes ...	2023-04-13 15:33:04 -07:00
Linus Torvalds	4413ad01e2	Devicetree fixes for v6.2, part 3: - Fix interaction between fw_devlink and DT overlays causing devices to not be probed - Fix the compatible string for loongson,cpu-interrupt-controller -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmQ1t60ACgkQ+vtdtY28 YcNfvBAApQjSAtrvYlHH5Lp6ANCzNZiXJO2FX5kpPtcdIHMLB0Soo4z2IE6B9V1r e61dw5j21CuRDlsoOG1odg9//02/KK2Dgz7ebisKWVF+1UuWbps1stNuXO3MbLUj Cq4GH4EUs5JwED145jOhLWWq2/bkymJvgWVU8n3Q/q/uL+Fjm4aJxgx92p6IZdN3 CixxhXBAkWLOs9ij8f6bsSUts28XoPZsk+kucdXc+83UninAXJC2KzuvQga8mBPF MGuxQTXmD5vgdPyvqj1D9U3uqkDE6HudrUDXr1yq9esPObjvUTkh09/Wm7OqiDu9 GBUyhD+3GaBK18rKiL0JSDGbGamNR9BaFshovuPEmlAtaoHv8nbv/MmfTnCWwjjs 5lQ7LmOJCuRuQmcTOR8q1csVhUXxssNGaOaclOOXou/crItmSDlLAaj6XLRTPt45 4jPiNKgDH4Pj2vqGBeYnhNyPyc+Y5IVc88pV8kUxfsqnMluzsoLC+0JADXNMhk1T 3sfecpceQav4TFhPOcMIHAkgldBnPQomW6laEn4Ul+dAyAes7q6Y0SvjQy03gDKU LY5QIsHLs5YZXur8TYSbU7Yt44hr4SAm0uz1kcCArlmNtidcjYuw1tLAWnGTxZrx 5q+ZuQ6NTiPKwxTK0Zhf+HqZdzx2IE6JXaPBOeOdYxjTSWGmcKk= =x2A8 -----END PGP SIGNATURE----- Merge tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix interaction between fw_devlink and DT overlays causing devices to not be probed - Fix the compatible string for loongson,cpu-interrupt-controller * tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: treewide: Fix probing of devices in DT overlays dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible	2023-04-13 15:21:56 -07:00
Linus Torvalds	531f27ad5e	This is just a revert of the AMD fix, because the fix fixed broken some laptops. We are working on a proper solution. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmQ4YHcACgkQQRCzN7AZ XXO3QRAAlEE9OIeq6yhrnmQyhXTysZNCjMigkppIWAjY96twBifKo2PyaALb5v9O VPlbLDUB250RInrakOjjY3AFGiXP/UEYBsMk7L5UGS3bloyrBLyFEKoXccAO0qO6 pPlVlLN+iszsO+vx2fSiE65o6ZMZTU9FWcroTbvDfO228e9xeq1mjH4d4H88sr90 /PUhK/CT13zdTYX/eB8rF9IwlqAGRbuoNPr70M8cpJckVBs/BDt/EQLdBNzScY2l 5zcaYcpxQlLI+uiBjf3kHKXVcOZwSKYMFK04306blgWhSAxEqMsM3aiwk3C612Ri fc70ONHl7OXSnIREcobIEq22Ehd3L/TEI0br76DkWkurQB29YT4WyEWAj1lmlzCY afHhX5d+ipVybOW3rQfCTdf/U23jVLrvA+n1bwsJqEpACsEXHyfHA+0oidBeiW2O 62wmP9wxjUiO2AkIfNJuMdf12BdK+r0Rvk/5mRIZTUKOkx7B1T/zVHCnt9Qir2BT 0O/3wUTz23onrR/1OnLiSOYQfmlly0/jZu2jyFYoIFxlB2imeqKfFnB0QHjzXOxe BeBTXPa1Y7r4ESNjt5z78MvlGIZLXf+PLx7nfxfVgClu3SUauNPFTFVkv45RJLWU AcL33B5OpmDT1EFxToPNto/kFTa4SzPm1dxEHpNcdURNEaq2KfQ= =FCMQ -----END PGP SIGNATURE----- Merge tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fix from Linus Walleij: "This is just a revert of the AMD fix, because the fix broke some laptops. We are working on a proper solution" * tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: Revert "pinctrl: amd: Disable and mask interrupts on resume"	2023-04-13 15:17:59 -07:00
Linus Torvalds	f1be7b6c16	drm-fixes for -rc7 - two fbcon regressions - amdgpu: dp mst, smu13 - i915: dual link dsi for tgl+ - armada, nouveau, drm/sched, fbmem -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmQ4XEYACgkQTA9ye/CY qnFDIA/9FZJx4rW3rNl1fSN9UufQuDBq86kOECbG53VM9QcZVdDaV35cWBEyqG4p Ue8M3aZwltZ0+x6qeZ0rV+qZKlD3KYidq/MgW0YGPdVvSdrdVk0OOc4N6uQU1P1v Phr0hFHVao0+bWQE5gdK9S7DW23m/ys0y0C+LXVRNWWf4kVh5DFSjaqEo+SU8Gj1 6GOUWf+rH44r4cHqx4sgZ8r79jWdc3Bjb6OLSE3YVhyU2cffVJ8+AgIA6JyStOR0 tw0NKfAMI0BBlEewvvNWqKcLvZ7qz5bG+byy6amzX8bLmdyyz2BarM5MKSf7F0NY BC2GiQKUEh+hLsvUhLdqYV2iLKRI2Qjd0ESYF9WO2UElsTEf/2tuGKD5WTMmdJLG 1BjEC1lQL/uSHWy+9qpppEKkmQHo+6MLlvbbbNVxgz8n/ysxnJGTe5ntGRWc88nR 7yrHILf5Ry7/4D+PBctZqalf/JfklrY3hIuTl6XTgLE6eKM5Wgc2xn0ItmUXQD9i B18TCUiElr/FlMyw/oTWcstrI9rjrAv9FBZ0qoEqkXCfu7TPT+tJ+6oLq2rCJKUh 4GuzOm7PYFBu0lof8Cmthz/Z8GDKJrdgdfDfp8yJp7jmIz2q8Xm3yqG66xSQKuAE dOJpHIrQgQAxgTvsSAALXmlPN2XAp5dHBaS4vXPgbFNKWkWi3H0= =aI2h -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Daniel Vetter: - two fbcon regressions - amdgpu: dp mst, smu13 - i915: dual link dsi for tgl+ - armada, nouveau, drm/sched, fbmem * tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm: fbcon: set_con2fb_map needs to set con2fb_map! fbcon: Fix error paths in set_con2fb_map drm/amd/pm: correct the pcie link state check for SMU13 drm/amd/pm: correct SMU13.0.7 max shader clock reporting drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings drm/amd/display: Pass the right info to drm_dp_remove_payload drm/armada: Fix a potential double free in an error handling path fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace drm/nouveau/fb: add missing sysmen flush callbacks drm/i915/dsi: fix DSS CTL register offsets for TGL+ drm/scheduler: Fix UAF race in drm_sched_entity_push_job()	2023-04-13 14:58:55 -07:00
Jakub Kicinski	d0f89c4c1d	bpf-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhVyQAKCRDbK58LschI g0lKAQDScgS8nBlgupnWVal4JzhzzJaoabETf2sIl6Sd0UJAQwD8C3DakFP7O24N 0YE3WGpHMEvVzeCnM5HTFbbdKCjgPQc= =QmoW -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-04-13 We've added 6 non-merge commits during the last 1 day(s) which contain a total of 14 files changed, 205 insertions(+), 38 deletions(-). The main changes are: 1) One late straggler fix on the XDP hints side which fixes bpf_xdp_metadata_rx_hash kfunc API before the release goes out in order to provide information on the RSS hash type, from Jesper Dangaard Brouer. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type veth: bpf_xdp_metadata_rx_hash add xdp rss hash type mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type xdp: rss hash types representation selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters ==================== Link: https://lore.kernel.org/r/20230413192939.10202-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-13 13:04:44 -07:00
Daniel Vetter	cab2932213	Short summary of fixes pull: * armada: Fix double free * fb: Clear FB_ACTIVATE_KD_TEXT in ioctl * nouveau: Add missing callbacks * scheduler: Fix use-after-free error -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmQ4TOMACgkQaA3BHVML eiPw+wf9EpgmwUbs3gAb1gXTnXlum8zfkXrYNgZ7nU9G7e16tt5ToZnkGpJVEO5x Ep8mkvGHAathronn1kT/EjC2V6cVbFnWb2IQX5Hb7OhMxSwSy2lFCsPgMZlPCMCX 3hxxLxpWE+GqkxZDS8+99xif7FCLqgbAR77Ca0VXG5vfaKfd8sqh0tXxP/m23eyT CmvT28kw6+cgG32Etf52UN9RJLBLSIUfl/34DGu8hmoIaYK0+AVNwOQNXKnDX/MM HbAKDKdx2souQYrBz+5rHsfkGVXyZx2gmxH7TD4srIwgBuvNude+LICVpH1qWUlq ZUM8B88yxjMte9Wr1CBCWy76OqxcIQ== =Ko4T -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * armada: Fix double free * fb: Clear FB_ACTIVATE_KD_TEXT in ioctl * nouveau: Add missing callbacks * scheduler: Fix use-after-free error Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230413184233.GA8148@linux-uq9g	2023-04-13 20:47:58 +02:00
Daniel Borkmann	8c5c2a4898	bpf, sockmap: Revert buggy deadlock fix in the sockhash and sockmap syzbot reported a splat and bisected it to recent commit `ed17aa92dc` ("bpf, sockmap: fix deadlocks in the sockhash and sockmap"): [...] WARNING: CPU: 1 PID: 9280 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 Modules linked in: CPU: 1 PID: 9280 Comm: syz-executor.1 Not tainted 6.2.0-syzkaller-13249-gd319f344561d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023 RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376 [...] Call Trace: <TASK> spin_unlock_bh include/linux/spinlock.h:395 [inline] sock_map_del_link+0x2ea/0x510 net/core/sock_map.c:165 sock_map_unref+0xb0/0x1d0 net/core/sock_map.c:184 sock_hash_delete_elem+0x1ec/0x2a0 net/core/sock_map.c:945 map_delete_elem kernel/bpf/syscall.c:1536 [inline] __sys_bpf+0x2edc/0x53e0 kernel/bpf/syscall.c:5053 __do_sys_bpf kernel/bpf/syscall.c:5166 [inline] __se_sys_bpf kernel/bpf/syscall.c:5164 [inline] __x64_sys_bpf+0x79/0xc0 kernel/bpf/syscall.c:5164 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fe8f7c8c169 </TASK> [...] Revert for now until we have a proper solution. Fixes: `ed17aa92dc` ("bpf, sockmap: fix deadlocks in the sockhash and sockmap") Reported-by: syzbot+49f6cef45247ff249498@syzkaller.appspotmail.com Cc: Hsin-Wei Hung <hsinweih@uci.edu> Cc: Xin Liu <liuxin350@huawei.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/000000000000f1db9605f939720e@google.com/	2023-04-13 20:36:32 +02:00
Alexei Starovoitov	b65ef48c95	Merge branch 'XDP-hints: change RX-hash kfunc bpf_xdp_metadata_rx_hash' Jesper Dangaard Brouer says: ==================== Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value, but doesn't provide information on the RSS hash type (part of 6.3-rc). This patchset proposal is to change the function call signature via adding a pointer value argument for providing the RSS hash type. Patchset also removes all bpf_printk's from xdp_hw_metadata program that we expect driver developers to use. Instead counters are introduced for relaying e.g. skip and fail info. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer	0f26b74e7d	selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg Update BPF selftests to use the new RSS type argument for kfunc bpf_xdp_metadata_rx_hash. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer	9123397aee	mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type via matching individual Completion Queue Entry (CQE) status bits. Fixes: `ab46182d0d` ("net/mlx4_en: Support RX XDP metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer	96b1a098f3	veth: bpf_xdp_metadata_rx_hash add xdp rss hash type Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type. The veth driver currently only support XDP-hints based on SKB code path. The SKB have lost information about the RSS hash type, by compressing the information down to a single bitfield skb->l4_hash, that only knows if this was a L4 hash value. In preparation for veth, the xdp_rss_hash_type have an L4 indication bit that allow us to return a meaningful L4 indication when working with SKB based packets. Fixes: `306531f024` ("veth: Support RX XDP metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer	67f245c2ec	mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type via mapping table. The mlx5 hardware can also identify and RSS hash IPSEC. This indicate hash includes SPI (Security Parameters Index) as part of IPSEC hash. Extend xdp core enum xdp_rss_hash_type with IPSEC hash type. Fixes: `bc8d405b1b` ("net/mlx5e: Support RX XDP metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer	0cd917a4a8	xdp: rss hash types representation The RSS hash type specifies what portion of packet data NIC hardware used when calculating RSS hash value. The RSS types are focused on Internet traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4 primarily TCP vs UDP, but some hardware supports SCTP. Hardware RSS types are differently encoded for each hardware NIC. Most hardware represent RSS hash type as a number. Determining L3 vs L4 often requires a mapping table as there often isn't a pattern or sorting according to ISO layer. The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that contains both BITs for the L3/L4 types, and combinations to be used by drivers for their mapping tables. The enum xdp_rss_type_bits get exposed to BPF via BTF, and it is up to the BPF-programmer to match using these defines. This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding a pointer value argument for provide the RSS hash type. Change signature for all xmo_rx_hash calls in drivers to make it compile. The RSS type implementations for each driver comes as separate patches. Fixes: `3d76a4d3d4` ("bpf: XDP metadata RX kfuncs") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer	e8163b98d9	selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters The tool xdp_hw_metadata can be used by driver developers implementing XDP-hints metadata kfuncs. Remove all bpf_printk calls, as the tool already transfers all the XDP-hints related information via metadata area to AF_XDP userspace process. Add counters for providing remaining information about failure and skipped packet events. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-04-13 11:15:10 -07:00

1 2 3 4 5 ...

1172508 Commits