Commit Graph

1172508 Commits

Author SHA1 Message Date
Kevin Brodsky
60daf8d40b net/compat: Update msg_control_is_user when setting a kernel pointer
cmsghdr_from_user_compat_to_kern() is an unusual case w.r.t. how
the kmsg->msg_control* fields are used. The input struct msghdr
holds a pointer to a user buffer, i.e. ksmg->msg_control_user is
active. However, upon success, a kernel pointer is stored in
kmsg->msg_control. kmsg->msg_control_is_user should therefore be
updated accordingly.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 11:09:27 +01:00
Kevin Brodsky
c39ef21304 net: Ensure ->msg_control_user is used for user buffers
Since commit 1f466e1f15 ("net: cleanly handle kernel vs user
buffers for ->msg_control"), pointers to user buffers should be
stored in struct msghdr::msg_control_user, instead of the
msg_control field.  Most users of msg_control have already been
converted (where user buffers are involved), but not all of them.

This patch attempts to address the remaining cases. An exception is
made for null checks, as it should be safe to use msg_control
unconditionally for that purpose.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 11:09:27 +01:00
Arseniy Krasnov
eaaa4e9239 vsock/loopback: don't disable irqs for queue access
This replaces 'skb_queue_tail()' with 'virtio_vsock_skb_queue_tail()'.
The first one uses 'spin_lock_irqsave()', second uses 'spin_lock_bh()'.
There is no need to disable interrupts in the loopback transport as
there is no access to the queue with skbs from interrupt context. Both
virtio and vhost transports work in the same way.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 11:04:04 +01:00
David S. Miller
c61fcc090f Merge branch 'mana-jumbo-frames'
Haiyang Zhang says:

====================
net: mana: Add support for jumbo frame

The set adds support for jumbo frame,
with some optimization for the RX path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 08:56:20 +01:00
Haiyang Zhang
80f6215b45 net: mana: Add support for jumbo frame
During probe, get the hardware-allowed max MTU by querying the device
configuration. Users can select MTU up to the device limit.
When XDP is in use, limit MTU settings so the buffer size is within
one page. And, when MTU is set to a too large value, XDP is not allowed
to run.
Also, to prevent changing MTU fails, and leaves the NIC in a bad state,
pre-allocate all buffers before starting the change. So in low memory
condition, it will return error, without affecting the NIC.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 08:56:19 +01:00
Haiyang Zhang
2fbbd712ba net: mana: Enable RX path to handle various MTU sizes
Update RX data path to allocate and use RX queue DMA buffers with
proper size based on potentially various MTU sizes.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 08:56:19 +01:00
Haiyang Zhang
a2917b2349 net: mana: Refactor RX buffer allocation code to prepare for various MTU
Move out common buffer allocation code from mana_process_rx_cqe() and
mana_alloc_rx_wqe() to helper functions.
Refactor related variables so they can be changed in one place, and buffer
sizes are in sync.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 08:56:19 +01:00
Haiyang Zhang
ce518bc3e9 net: mana: Use napi_build_skb in RX path
Use napi_build_skb() instead of build_skb() to take advantage of the
NAPI percpu caches to obtain skbuff_head.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14 08:56:19 +01:00
Jakub Kicinski
e473ea818b mlx5-updates-2023-04-11
1) Vlad adds the support for linux bridge multicast offload support
    Patches #1 through #9
    Synopsis
 
 Vlad Says:
 ==============
 Implement support of bridge multicast offload in mlx5. Handle port object
 attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast
 offload and bridge snooping support on bridge. Handle port object
 SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB.
 
 Steering architecture
 
 Existing offload infrastructure relies on two levels of flow tables - bridge
 ingress and egress. For multicast offload the architecture is extended with
 additional layer of per-port multicast replication tables. Such tables filter
 loopback traffic (so packets are not replicated to their source port) and pop
 VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in
 egress table. MDB egress rule can point to multiple per-port multicast tables,
 which causes matching multicast traffic to be replicated to all of them, and,
 consecutively, to several bridge ports:
 
                                                                                                                             +--------+--+
                                                                                     +---------------------------------------> Port 1 |  |
                                                                                     |                                       +-^------+--+
                                                                                     |                                         |
                                                                                     |                                         |
                                        +-----------------------------------------+  |     +---------------------------+       |
                                        | EGRESS table                            |  |  +--> PORT 1 multicast table    |       |
 +----------------------------------+   +-----------------------------------------+  |  |  +---------------------------+       |
 | INGRESS table                    |   |                                         |  |  |  |                           |       |
 +----------------------------------+   | dst_mac=P1,vlan=X -> pop vlan, goto P1  +--+  |  | FG0:                      |       |
 |                                  |   | dst_mac=P1,vlan=Y -> pop vlan, goto P1  |     |  | src_port=dst_port -> drop |       |
 | src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2  +--+  |  | FG1:                      |       |
 | ...                              |   | dst_mac=P2,vlan=Y -> goto P2            |  |  |  | VLAN X -> pop, goto port  |       |
 |                                  |   | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+  | ...                       |       |
 +----------------------------------+   |                                         |  |  |  | VLAN Y -> pop, goto port  +-------+
                                        +-----------------------------------------+  |  |  | FG3:                      |
                                                                                     |  |  | matchall -> goto port     |
                                                                                     |  |  |                           |
                                                                                     |  |  +---------------------------+
                                                                                     |  |
                                                                                     |  |
                                                                                     |  |                                    +--------+--+
                                                                                     +---------------------------------------> Port 2 |  |
                                                                                        |                                    +-^------+--+
                                                                                        |                                      |
                                                                                        |                                      |
                                                                                        |  +---------------------------+       |
                                                                                        +--> PORT 2 multicast table    |       |
                                                                                           +---------------------------+       |
                                                                                           |                           |       |
                                                                                           | FG0:                      |       |
                                                                                           | src_port=dst_port -> drop |       |
                                                                                           | FG1:                      |       |
                                                                                           | VLAN X -> pop, goto port  |       |
                                                                                           | ...                       |       |
                                                                                           |                           |       |
                                                                                           | FG3:                      |       |
                                                                                           | matchall -> goto port     +-------+
                                                                                           |                           |
                                                                                           +---------------------------+
 
 Patches overview:
 
 - Patch 1 adds hardware definition bits for capabilities required to replicate
   multicast packets to multiple per-port tables. These bits are used by
   following patches to only attempt multicast offload if firmware and hardware
   provide necessary support.
 
 - Pathces 2-4 patches are preparations and refactoring.
 
 - Patch 5 implements necessary infrastructure to toggle multicast offload
   via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification.
   This also enabled IGMP and MLD snooping.
 
 - Patch 6 implements per-port multicast replication tables. It only supports
   filtering of loopback packets.
 
 - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged'
   VLANs.
 
 - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It
   creates MDB replication rules in egress table that can replicate packets to
   multiple per-port multicast tables.
 
 - Patch 9 adds tracepoints for MDB events.
 
 ==============
 
 2) Parav Create a new allocation profile for SFs, to save on memory
 
 3) Yevgeny provides some initial patches for upcoming software steering
    support new pattern/arguments type of modify_header actions.
 
 Starting with ConnectX-6 DX, we use a new design of modify_header FW object.
 The current modify_header object allows for having only limited number of
 these FW objects, which means that we are limited in the number of offloaded
 flows that require modify_header action.
 
 As a preparation Yevgeny provides the following 4 patches:
  - Patch 1: Add required mlx5_ifc HW bits
  - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg
    support and adds appropriate support in dr_send.c
  - Patch 4: Add ICM pool for modify-header-pattern objects and implement
    patterns cache, allowing patterns reuse for different flows
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmQ2LDIACgkQSD+KveBX
 +j59UAf+NfLEq7i/gYOri1rLwXOrgPd13w64Oob79+pKVXzRqik/gBg4d0CKdc2X
 Qafntudus7IvVrGJCM0vurw7nC75H9BMFh3dQW2u3uaR9m6i0k4tSKIAoJUFHMdj
 dMgAJywYkCCXlBbGVhRE3yybal2EXfOSJ+Fr2tGeqpL4vRlg4kIaVuFGcGk2gMpq
 gOahX6wtuop3ABzY3sqKCIQ1Q2jWx9AWWz+lEmw7HJGmHGugscYrgSIloHDnu1fz
 Bt6uODn7GFr866rQr9+JemG0VyK8ahyisEFnX1oJ3642ZidPcDrbIrrtMWeEqo7X
 1tcmY/wNti87xFbg3gL+DZekhHTHoQ==
 =oIN8
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-04-11

1) Vlad adds the support for linux bridge multicast offload support
   Patches #1 through #9
   Synopsis

Vlad Says:
==============
Implement support of bridge multicast offload in mlx5. Handle port object
attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast
offload and bridge snooping support on bridge. Handle port object
SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB.

Steering architecture

Existing offload infrastructure relies on two levels of flow tables - bridge
ingress and egress. For multicast offload the architecture is extended with
additional layer of per-port multicast replication tables. Such tables filter
loopback traffic (so packets are not replicated to their source port) and pop
VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in
egress table. MDB egress rule can point to multiple per-port multicast tables,
which causes matching multicast traffic to be replicated to all of them, and,
consecutively, to several bridge ports:

                                                                                                                            +--------+--+
                                                                                    +---------------------------------------> Port 1 |  |
                                                                                    |                                       +-^------+--+
                                                                                    |                                         |
                                                                                    |                                         |
                                       +-----------------------------------------+  |     +---------------------------+       |
                                       | EGRESS table                            |  |  +--> PORT 1 multicast table    |       |
+----------------------------------+   +-----------------------------------------+  |  |  +---------------------------+       |
| INGRESS table                    |   |                                         |  |  |  |                           |       |
+----------------------------------+   | dst_mac=P1,vlan=X -> pop vlan, goto P1  +--+  |  | FG0:                      |       |
|                                  |   | dst_mac=P1,vlan=Y -> pop vlan, goto P1  |     |  | src_port=dst_port -> drop |       |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2  +--+  |  | FG1:                      |       |
| ...                              |   | dst_mac=P2,vlan=Y -> goto P2            |  |  |  | VLAN X -> pop, goto port  |       |
|                                  |   | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+  | ...                       |       |
+----------------------------------+   |                                         |  |  |  | VLAN Y -> pop, goto port  +-------+
                                       +-----------------------------------------+  |  |  | FG3:                      |
                                                                                    |  |  | matchall -> goto port     |
                                                                                    |  |  |                           |
                                                                                    |  |  +---------------------------+
                                                                                    |  |
                                                                                    |  |
                                                                                    |  |                                    +--------+--+
                                                                                    +---------------------------------------> Port 2 |  |
                                                                                       |                                    +-^------+--+
                                                                                       |                                      |
                                                                                       |                                      |
                                                                                       |  +---------------------------+       |
                                                                                       +--> PORT 2 multicast table    |       |
                                                                                          +---------------------------+       |
                                                                                          |                           |       |
                                                                                          | FG0:                      |       |
                                                                                          | src_port=dst_port -> drop |       |
                                                                                          | FG1:                      |       |
                                                                                          | VLAN X -> pop, goto port  |       |
                                                                                          | ...                       |       |
                                                                                          |                           |       |
                                                                                          | FG3:                      |       |
                                                                                          | matchall -> goto port     +-------+
                                                                                          |                           |
                                                                                          +---------------------------+

Patches overview:

- Patch 1 adds hardware definition bits for capabilities required to replicate
  multicast packets to multiple per-port tables. These bits are used by
  following patches to only attempt multicast offload if firmware and hardware
  provide necessary support.

- Pathces 2-4 patches are preparations and refactoring.

- Patch 5 implements necessary infrastructure to toggle multicast offload
  via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification.
  This also enabled IGMP and MLD snooping.

- Patch 6 implements per-port multicast replication tables. It only supports
  filtering of loopback packets.

- Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged'
  VLANs.

- Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It
  creates MDB replication rules in egress table that can replicate packets to
  multiple per-port multicast tables.

- Patch 9 adds tracepoints for MDB events.

==============

2) Parav Create a new allocation profile for SFs, to save on memory

3) Yevgeny provides some initial patches for upcoming software steering
   support new pattern/arguments type of modify_header actions.

Starting with ConnectX-6 DX, we use a new design of modify_header FW object.
The current modify_header object allows for having only limited number of
these FW objects, which means that we are limited in the number of offloaded
flows that require modify_header action.

As a preparation Yevgeny provides the following 4 patches:
 - Patch 1: Add required mlx5_ifc HW bits
 - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg
   support and adds appropriate support in dr_send.c
 - Patch 4: Add ICM pool for modify-header-pattern objects and implement
   patterns cache, allowing patterns reuse for different flows

* tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: DR, Add modify-header-pattern ICM pool
  net/mlx5: DR, Prepare sending new WQE type
  net/mlx5: Add new WQE for updating flow table
  net/mlx5: Add mlx5_ifc bits for modify header argument
  net/mlx5: DR, Set counter ID on the last STE for STEv1 TX
  net/mlx5: Create a new profile for SFs
  net/mlx5: Bridge, add tracepoints for multicast
  net/mlx5: Bridge, implement mdb offload
  net/mlx5: Bridge, support multicast VLAN pop
  net/mlx5: Bridge, add per-port multicast replication tables
  net/mlx5: Bridge, snoop igmp/mld packets
  net/mlx5: Bridge, extract code to lookup parent bridge of port
  net/mlx5: Bridge, move additional data structures to priv header
  net/mlx5: Bridge, increase bridge tables sizes
  net/mlx5: Add mlx5_ifc definitions for bridge multicast support
====================

Link: https://lore.kernel.org/r/20230412040752.14220-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:28:03 -07:00
Jakub Kicinski
f7d29571ab Merge branch 'add-kernel-tc-mqprio-and-tc-taprio-support-for-preemptible-traffic-classes'
Vladimir Oltean says:

====================
Add kernel tc-mqprio and tc-taprio support for preemptible traffic classes

The last RFC in August 2022 contained a proposal for the UAPI of both
TSN standards which together form Frame Preemption (802.1Q and 802.3):
https://lore.kernel.org/netdev/20220816222920.1952936-1-vladimir.oltean@nxp.com/

It wasn't clear at the time whether the 802.1Q portion of Frame Preemption
should be exposed via the tc qdisc (mqprio, taprio) or via some other
layer (perhaps also ethtool like the 802.3 portion, or dcbnl), even
though the options were discussed extensively, with pros and cons:
https://lore.kernel.org/netdev/20220816222920.1952936-3-vladimir.oltean@nxp.com/

So the 802.3 portion got submitted separately and finally was accepted:
https://lore.kernel.org/netdev/20230119122705.73054-1-vladimir.oltean@nxp.com/

leaving the only remaining question: how do we expose the 802.1Q bits?

This series proposes that we use the Qdisc layer, through separate
(albeit very similar) UAPI in mqprio and taprio, and that both these
Qdiscs pass the information down to the offloading device driver through
the common mqprio offload structure (which taprio also passes).

An implementation is provided for the NXP LS1028A on-board Ethernet
endpoint (enetc). Previous versions also contained support for its
embedded switch (felix), but this needs more work and will be submitted
separately.

v4: https://lore.kernel.org/netdev/20230403103440.2895683-1-vladimir.oltean@nxp.com/
v2: https://lore.kernel.org/netdev/20230219135309.594188-1-vladimir.oltean@nxp.com/
v1: https://lore.kernel.org/netdev/20230216232126.3402975-1-vladimir.oltean@nxp.com/
====================

Link: https://lore.kernel.org/r/20230411180157.1850527-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:12 -07:00
Vladimir Oltean
01e23b2b3b net: enetc: add support for preemptible traffic classes
PFs which support the MAC Merge layer also have a set of 8 registers
called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)".
Through these, a traffic class (group of TX rings of same dequeue
priority) can be mapped to the eMAC or to the pMAC.

There's nothing particularly spectacular here. We should probably only
commit the preemptible TCs to hardware once the MAC Merge layer became
active, but unlike Felix, we don't have an IRQ that notifies us of that.
We'd have to sleep for up to verifyTime (127 ms) to wait for a
resolution coming from the verification state machine; not only from the
ndo_setup_tc() code path, but also from enetc_mm_link_state_update().
Since it's relatively complicated and has a relatively small benefit,
I'm not doing it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
50764da37c net: enetc: rename "mqprio" to "qopt"
To gain access to the larger encapsulating structure which has the type
tc_mqprio_qopt_offload, rename just the "qopt" field as "qopt".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
a721c3e54b net/sched: taprio: allow per-TC user input of FP adminStatus
This is a duplication of the FP adminStatus logic introduced for
tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload
structure embedded within tc_taprio_qopt_offload. So practically, if a
device driver is written to treat the mqprio portion of taprio just like
standalone mqprio, it gets unified handling of frame preemption.

I would have reused more code with taprio, but this is mostly netlink
attribute parsing, which is hard to transform into generic code without
having something that stinks as a result. We have the same variables
with the same semantics, just different nlattr type values
(TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12;
TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and
consequently, different policies for the nest.

Every time nla_parse_nested() is called, an on-stack table "tb" of
nlattr pointers is allocated statically, up to the maximum understood
nlattr type. That array size is hardcoded as a constant, but when
transforming this into a common parsing function, it would become either
a VLA (which the Linux kernel rightfully doesn't like) or a call to the
allocator.

Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q
Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE"
and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE"
use cases. HOLD and RELEASE events are emitted towards the underlying
MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a
Set-And-Release-MAC gate operation. So within the tc-taprio UAPI space,
one can distinguish between the 2 use cases by choosing whether to use
the TC_TAPRIO_CMD_SET_AND_HOLD and TC_TAPRIO_CMD_SET_AND_RELEASE gate
operations within the schedule, or just TC_TAPRIO_CMD_SET_GATES.

A small part of the change is dedicated to refactoring the max_sdu
nlattr parsing to put all logic under the "if" that tests for presence
of that nlattr.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
f62af20bed net/sched: mqprio: allow per-TC user input of FP adminStatus
IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each
packet priority can be assigned to a "frame preemption status" value of
either "express" or "preemptible". Express priorities are transmitted by
the local device through the eMAC, and preemptible priorities through
the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge
layer).

The FP adminStatus is defined per packet priority, but 802.1Q clause
12.30.1.1.1 framePreemptionAdminStatus also says that:

| Priorities that all map to the same traffic class should be
| constrained to use the same value of preemption status.

It is impossible to ignore the cognitive dissonance in the standard
here, because it practically means that the FP adminStatus only takes
distinct values per traffic class, even though it is defined per
priority.

I can see no valid use case which is prevented by having the kernel take
the FP adminStatus as input per traffic class (what we do here).
In addition, this also enforces the above constraint by construction.
User space network managers which wish to expose FP adminStatus per
priority are free to do so; they must only observe the prio_tc_map of
the netdev (which presumably is also under their control, when
constructing the mqprio netlink attributes).

The reason for configuring frame preemption as a property of the Qdisc
layer is that the information about "preemptible TCs" is closest to the
place which handles the num_tc and prio_tc_map of the netdev. If the
UAPI would have been any other layer, it would be unclear what to do
with the FP information when num_tc collapses to 0. A key assumption is
that only mqprio/taprio change the num_tc and prio_tc_map of the netdev.
Not sure if that's a great assumption to make.

Having FP in tc-mqprio can be seen as an implementation of the use case
defined in 802.1Q Annex S.2 "Preemption used in isolation". There will
be a separate implementation of FP in tc-taprio, for the other use
cases.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
c54876cd59 net/sched: pass netlink extack to mqprio and taprio offload
With the multiplexed ndo_setup_tc() model which lacks a first-class
struct netlink_ext_ack * argument, the only way to pass the netlink
extended ACK message down to the device driver is to embed it within the
offload structure.

Do this for struct tc_mqprio_qopt_offload and struct tc_taprio_qopt_offload.

Since struct tc_taprio_qopt_offload also contains a tc_mqprio_qopt_offload
structure, and since device drivers might effectively reuse their mqprio
implementation for the mqprio portion of taprio, we make taprio set the
extack in both offload structures to point at the same netlink extack
message.

In fact, the taprio handling is a bit more tricky, for 2 reasons.

First is because the offload structure has a longer lifetime than the
extack structure. The driver is supposed to populate the extack
synchronously from ndo_setup_tc() and leave it alone afterwards.
To not have any use-after-free surprises, we zero out the extack pointer
when we leave taprio_enable_offload().

The second reason is because taprio does overwrite the extack message on
ndo_setup_tc() error. We need to switch to the weak form of setting an
extack message, which preserves a potential message set by the driver.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
ab277d2084 net/sched: mqprio: add an extack message to mqprio_parse_opt()
Ferenc reports that a combination of poor iproute2 defaults and obscure
cases where the kernel returns -EINVAL make it difficult to understand
what is wrong with this command:

$ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name veth1
$ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
        queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
RTNETLINK answers: Invalid argument

Hopefully with this patch, the cause is clearer:

Error: Device does not support hardware offload.

The kernel was (and still is) rejecting this because iproute2 defaults
to "hw 1" if this command line option is not specified.

Link: https://lore.kernel.org/netdev/ede5e9a2f27bf83bfb86d3e8c4ca7b34093b99e2.camel@inf.elte.hu/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
57f21bf854 net/sched: mqprio: add extack to mqprio_parse_nlattr()
Netlink attribute parsing in mqprio is a minesweeper game, with many
options having the possibility of being passed incorrectly and the user
being none the wiser.

Try to make errors less sour by giving user space some information
regarding what went wrong.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
3dd0c16ec9 net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONS
In commit 4e8b86c062 ("mqprio: Introduce new hardware offload mode and
shaper in mqprio"), the TCA_OPTIONS format of mqprio was extended to
contain a fixed portion (of size NLA_ALIGN(sizeof struct tc_mqprio_qopt))
and a variable portion of other nlattrs (in the TCA_MQPRIO_* type space)
following immediately afterwards.

In commit feb2cf3dcf ("net/sched: mqprio: refactor nlattr parsing to a
separate function"), we've moved the nlattr handling to a smaller
function, but yet, a small parse_attr() still remains, and the larger
mqprio_parse_nlattr() still does not have access to the beginning, and
the length, of the TCA_OPTIONS region containing these other nlattrs.

In a future change, the mqprio qdisc will need to iterate through this
nlattr region to discover other attributes, so eliminate parse_attr()
and add 2 variables in mqprio_parse_nlattr() which hold the beginning
and the length of the nlattr range.

We avoid the need to memset when nlattr_opt_len has insufficient length
by pre-initializing the table "tb".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Vladimir Oltean
d54151aa0f net: ethtool: create and export ethtool_dev_mm_supported()
Create a wrapper over __ethtool_dev_mm_supported() which also calls
ethnl_ops_begin() and ethnl_ops_complete(). It can be used by other code
layers, such as tc, to make sure that preemptible TCs are supported
(this is true if an underlying MAC Merge layer exists).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:22:10 -07:00
Rahul Rameshbabu
85a4abed15 tools: ynl: Rename ethtool to ethtool.py
Make it explicit that this tool is not a drop-in replacement for ethtool.
This tool is intended for testing ethtool functionality implemented in the
kernel and should use a name that differentiates it from the ethtool
utility.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20230413012252.184434-2-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:18:29 -07:00
Rahul Rameshbabu
3ea31e6664 tools: ynl: Remove absolute paths to yaml files from ethtool testing tool
Absolute paths for the spec and schema files make the ethtool testing tool
unusable with freshly checked-out source trees. Replace absolute paths with
relative paths for both files in the Documentation/ directory.

Issue seen before the change

  Traceback (most recent call last):
    File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/./ethtool", line 424, in <module>
      main()
    File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/./ethtool", line 158, in main
      ynl = YnlFamily(spec, schema)
    File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/lib/ynl.py", line 342, in __init__
      super().__init__(def_path, schema)
    File "/home/binary-eater/Documents/mlx/linux/tools/net/ynl/lib/nlspec.py", line 333, in __init__
      with open(spec_path, "r") as stream:
  FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/google/home/sdf/src/linux/Documentation/netlink/specs/ethtool.yaml'

Fixes: f3d07b02b2 ("tools: ynl: ethtool testing tool")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20230413012252.184434-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:18:29 -07:00
Jakub Kicinski
916b15fbf2 Merge branch 'macb-ptp-minor-updates'
Harini Katakam says:

====================
Macb PTP minor updates

- Enable PTP unicast
- Optimize HW timestamp reading
====================

Link: https://lore.kernel.org/r/20230411123712.11459-1-harini.katakam@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:16:12 -07:00
Harini Katakam
8c0d0fe044 net: macb: Optimize reading HW timestamp
The seconds input from BD (6 bits) just needs to be ORed with the
upper bits from timer in this function. Avoid addition operation
every single time. Seconds rollover handling is left untouched.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:16:09 -07:00
Harini Katakam
ee4e92c26c net: macb: Enable PTP unicast
Enable transmission and reception of PTP unicast packets by
updating PTP unicast config bit and setting current HW mac
address as allowed address in PTP unicast filter registers.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:16:09 -07:00
Harini Katakam
adee474a3b net: macb: Update gem PTP support check
There are currently two checks for PTP functionality - one on GEM
capability and another on the kernel config option. Combine them
into a single function as there's no use case where gem_has_ptp is
TRUE and MACB_USE_HWSTAMP is false.

Signed-off-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 22:16:09 -07:00
Jakub Kicinski
fb4be9a4e7 Merge branch 'ocelot-felix-driver-cleanup'
Vladimir Oltean says:

====================
Ocelot/Felix driver cleanup

The cleanup mostly handles the statistics code path - some issues
regarding understandability became apparent after the series
"Fix trainwreck with Ocelot switch statistics counters":
https://lore.kernel.org/netdev/20230321010325.897817-1-vladimir.oltean@nxp.com/

There is also one patch which cleans up a misleading comment
in the DSA felix_setup().
====================

Link: https://lore.kernel.org/r/20230412124737.2243527-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:08 -07:00
Vladimir Oltean
a291399e63 net: mscc: ocelot: fix ineffective WARN_ON() in ocelot_stats.c
Since it is hopefully now clear that, since "last" and "layout[i].reg"
are enum types and not addresses, the existing WARN_ON() is ineffective
in checking that the _addresses_ are sorted in the proper order.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
6663c01eca net: mscc: ocelot: strengthen type of "int i" in ocelot_stats.c
The "int i" used to index the struct ocelot_stat_layout array actually
has a specific type: enum ocelot_stat. Use it, so that the WARN()
comment from ocelot_prepare_stats_regions() makes more sense.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
eae0b9d15b net: mscc: ocelot: strengthen type of "u32 reg" and "u32 base" in ocelot_stats.c
Use the specific enum ocelot_reg to make it clear that the region
registers are encoded and not plain addresses.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
a9afc3e41c net: dsa: felix: remove confusing/incorrect comment from felix_setup()
That comment was written prior to knowing that what I was actually
seeing was a manifestation of the bug fixed in commit b4024c9e5c
("felix: Fix initialization of ioremap resources").

There isn't any particular reason now why the hardware initialization is
done in felix_setup(), so just delete that comment to avoid spreading
misinformation.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
93f0f93bbd net: mscc: ocelot: remove blank line at the end of ocelot_stats.c
Commit a3bb8f521f ("net: mscc: ocelot: remove unnecessary exposure of
stats structures") made an unnecessary change which was to add a new
line at the end of ocelot_stats.c. Remove it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
07de32655b net: mscc: ocelot: debugging print for statistics regions
To make it easier to debug future issues with statistics counters not
getting aggregated properly into regions, like what happened in commit
6acc72a43e ("net: mscc: ocelot: fix stats region batching"), add some
dev_dbg() prints which show the regions that were dynamically
determined.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
40cd07cb42 net: mscc: ocelot: refactor enum ocelot_reg decoding to helper
ocelot_io.c duplicates the decoding of an enum ocelot_reg (which holds
an enum ocelot_target in the upper bits and an index into a regmap array
in the lower bits) 4 times.

We'd like to reuse that logic once more, from ocelot.c. In order to do
that, let's consolidate the existing 4 instances into a header
accessible both by ocelot.c as well as by ocelot_io.c.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Vladimir Oltean
9ecd05794b net: mscc: ocelot: strengthen type of "u32 reg" in I/O accessors
The "u32 reg" argument that is passed to these functions is not a plain
address, but rather a driver-specific encoding of another enum
ocelot_target target in the upper bits, and an index into the
u32 ocelot->map[target][] array in the lower bits. That encoded value
takes the type "enum ocelot_reg" and is what is passed to these I/O
functions, so let's actually use that to prevent type confusion.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 21:56:06 -07:00
Jakub Kicinski
c2865b1122 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhSiwAKCRDbK58LschI
 g8cbAQCH4xrquOeDmYyGXFQGchHZAIj++tKg8ABU4+hYeJtrlwEA6D4W6wjoSZRk
 mLSptZ9qro8yZA86BvyPvlBT1h9ELQA=
 =StAc
 -----END PGP SIGNATURE-----

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-13

We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).

The main changes are:

1) Rework BPF verifier log behavior and implement it as a rotating log
   by default with the option to retain old-style fixed log behavior,
   from Andrii Nakryiko.

2) Adds support for using {FOU,GUE} encap with an ipip device operating
   in collect_md mode and add a set of BPF kfuncs for controlling encap
   params, from Christian Ehrig.

3) Allow BPF programs to detect at load time whether a particular kfunc
   exists or not, and also add support for this in light skeleton,
   from Alexei Starovoitov.

4) Optimize hashmap lookups when key size is multiple of 4,
   from Anton Protopopov.

5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
   tasks to be stored in BPF maps, from David Vernet.

6) Add support for stashing local BPF kptr into a map value via
   bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
   for new cgroups, from Dave Marchevsky.

7) Fix BTF handling of is_int_ptr to skip modifiers to work around
   tracing issues where a program cannot be attached, from Feng Zhou.

8) Migrate a big portion of test_verifier unit tests over to
   test_progs -a verifier_* via inline asm to ease {read,debug}ability,
   from Eduard Zingerman.

9) Several updates to the instruction-set.rst documentation
   which is subject to future IETF standardization
   (https://lwn.net/Articles/926882/), from Dave Thaler.

10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
    known bits information propagation, from Daniel Borkmann.

11) Add skb bitfield compaction work related to BPF with the overall goal
    to make more of the sk_buff bits optional, from Jakub Kicinski.

12) BPF selftest cleanups for build id extraction which stand on its own
    from the upcoming integration work of build id into struct file object,
    from Jiri Olsa.

13) Add fixes and optimizations for xsk descriptor validation and several
    selftest improvements for xsk sockets, from Kal Conley.

14) Add BPF links for struct_ops and enable switching implementations
    of BPF TCP cong-ctls under a given name by replacing backing
    struct_ops map, from Kui-Feng Lee.

15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
    offset stack read as earlier Spectre checks cover this,
    from Luis Gerhorst.

16) Fix issues in copy_from_user_nofault() for BPF and other tracers
    to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
    and Alexei Starovoitov.

17) Add --json-summary option to test_progs in order for CI tooling to
    ease parsing of test results, from Manu Bretelle.

18) Batch of improvements and refactoring to prep for upcoming
    bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
    from Martin KaFai Lau.

19) Improve bpftool's visual program dump which produces the control
    flow graph in a DOT format by adding C source inline annotations,
    from Quentin Monnet.

20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
    the module name from BTF of the target and searching kallsyms of
    the correct module, from Viktor Malik.

21) Improve BPF verifier handling of '<const> <cond> <non_const>'
    to better detect whether in particular jmp32 branches are taken,
    from Yonghong Song.

22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
    A built-in cc or one from a kernel module is already able to write
    to app_limited, from Yixin Shen.

Conflicts:

Documentation/bpf/bpf_devel_QA.rst
  b7abcd9c65 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  0f10f647f4 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

include/net/ip_tunnels.h
  bc9d003dc4 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
  ac931d4cde ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/

net/bpf/test_run.c
  e5995bc7e2 ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
  294635a816 ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
====================

Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 16:43:38 -07:00
Jakub Kicinski
800e68c44f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

tools/testing/selftests/net/config
  62199e3f16 ("selftests: net: Add VXLAN MDB test")
  3a0385be13 ("selftests: add the missing CONFIG_IP_SCTP in net config")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 16:04:28 -07:00
Linus Torvalds
829cca4d17 Including fixes from bpf, and bluetooth.
Not all that quiet given spring celebrations, but "current" fixes
 are thinning out, which is encouraging. One outstanding regression
 in the mlx5 driver when using old FW, not blocking but we're pushing
 for a fix.
 
 Current release - new code bugs:
 
  - eth: enetc: workaround for unresponsive pMAC after receiving
    express traffic
 
 Previous releases - regressions:
 
  - rtnetlink: restore RTM_NEW/DELLINK notification behavior,
    keep the pid/seq fields 0 for backward compatibility
 
 Previous releases - always broken:
 
  - sctp: fix a potential overflow in sctp_ifwdtsn_skip
 
  - mptcp:
    - use mptcp_schedule_work instead of open-coding it and make
      the worker check stricter, to avoid scheduling work on closed
      sockets
    - fix NULL pointer dereference on fastopen early fallback
 
  - skbuff: fix memory corruption due to a race between skb coalescing
    and releasing clones confusing page_pool reference counting
 
  - bonding: fix neighbor solicitation validation on backup slaves
 
  - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
 
  - bpf: arm64: fixed a BTI error on returning to patched function
 
  - openvswitch: fix race on port output leading to inf loop
 
  - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
    returning a different errno than expected
 
  - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
 
  - Bluetooth: fix printing errors if LE Connection times out
 
  - Bluetooth: assorted UaF, deadlock and data race fixes
 
  - eth: macb: fix memory corruption in extended buffer descriptor mode
 
 Misc:
 
  - adjust the XDP Rx flow hash API to also include the protocol layers
    over which the hash was computed
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmQ4ZrsACgkQMUZtbf5S
 IruWUQ/9F+HlnHf3Sv08zGlnV++vLaJ/Ld8C2YNYNxRwuoJvcCyikQ/ZfUKdKAoS
 kCf0XqFD2SMl8wHpCQBhK4uXvKBdBMx6L6wEp7dbDciGl/+5yihe9opBBXKekWbB
 ULRZcZE7RACri/QsXQhD7Y8p530xnYWQXO8ZMjR3vOAWxplJtBDNDnXi7hqtxQpW
 Vzwl1ntvD1msmxhvy0UZrgeesL8R3UckFvqYEqnINeyd8E8HB1dAOg899/ZPUbjA
 UoEw5VsSBSr9DE7+Fs6uD8trBxQ1CrdRVJjhRhwivHk8/Ro7dIzjcVV30ei3wucz
 0RiNvCqypkeLeRrcVlSk8lR5r9FBGvhDMFbcGM8lHnxSc0WB+Sj+2iup4fpTE8/p
 VUIvhhzuBuXU4Sc022pm6BL5DBSKdnJRquFq6XCTwnQM6v7fvzu1yWNXsQom8Nbi
 1/ZcFdj27FHwMpU0JPZ4PFxT7Ta830UyulVZuyWA+zEzlN7DvW3O7bGQC72GEuID
 Xc58D4kVtywzbntFmUjuhXCD/6vvD5WW5orLpMWg5SH9F14prt3C9OFSpTwTTq+W
 uPBEslhnhhCPecTNo2iFPLX3bN67n8KDVUWua1AHaqmcK7QFGs0njJGGLpFdHMll
 SuNgrNrtQE9EHH8XL6VbSD2zf35ZfoKVg6qvL3oeLzXkGkZrnls=
 =W+J2
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, and bluetooth.

  Not all that quiet given spring celebrations, but "current" fixes are
  thinning out, which is encouraging. One outstanding regression in the
  mlx5 driver when using old FW, not blocking but we're pushing for a
  fix.

  Current release - new code bugs:

   - eth: enetc: workaround for unresponsive pMAC after receiving
     express traffic

  Previous releases - regressions:

   - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
     pid/seq fields 0 for backward compatibility

  Previous releases - always broken:

   - sctp: fix a potential overflow in sctp_ifwdtsn_skip

   - mptcp:
      - use mptcp_schedule_work instead of open-coding it and make the
        worker check stricter, to avoid scheduling work on closed
        sockets
      - fix NULL pointer dereference on fastopen early fallback

   - skbuff: fix memory corruption due to a race between skb coalescing
     and releasing clones confusing page_pool reference counting

   - bonding: fix neighbor solicitation validation on backup slaves

   - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp

   - bpf: arm64: fixed a BTI error on returning to patched function

   - openvswitch: fix race on port output leading to inf loop

   - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
     returning a different errno than expected

   - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove

   - Bluetooth: fix printing errors if LE Connection times out

   - Bluetooth: assorted UaF, deadlock and data race fixes

   - eth: macb: fix memory corruption in extended buffer descriptor mode

  Misc:

   - adjust the XDP Rx flow hash API to also include the protocol layers
     over which the hash was computed"

* tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
  skbuff: Fix a race between coalescing and releasing SKBs
  net: macb: fix a memory corruption in extended buffer descriptor mode
  selftests: add the missing CONFIG_IP_SCTP in net config
  udp6: fix potential access to stale information
  selftests: openvswitch: adjust datapath NL message declaration
  selftests: mptcp: userspace pm: uniform verify events
  mptcp: fix NULL pointer dereference on fastopen early fallback
  mptcp: stricter state check in mptcp_worker
  mptcp: use mptcp_schedule_work instead of open-coding it
  net: enetc: workaround for unresponsive pMAC after receiving express traffic
  sctp: fix a potential overflow in sctp_ifwdtsn_skip
  net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
  rtnetlink: Restore RTM_NEW/DELLINK notification behavior
  net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
  ...
2023-04-13 15:33:04 -07:00
Linus Torvalds
4413ad01e2 Devicetree fixes for v6.2, part 3:
- Fix interaction between fw_devlink and DT overlays causing
   devices to not be probed
 
 - Fix the compatible string for loongson,cpu-interrupt-controller
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmQ1t60ACgkQ+vtdtY28
 YcNfvBAApQjSAtrvYlHH5Lp6ANCzNZiXJO2FX5kpPtcdIHMLB0Soo4z2IE6B9V1r
 e61dw5j21CuRDlsoOG1odg9//02/KK2Dgz7ebisKWVF+1UuWbps1stNuXO3MbLUj
 Cq4GH4EUs5JwED145jOhLWWq2/bkymJvgWVU8n3Q/q/uL+Fjm4aJxgx92p6IZdN3
 CixxhXBAkWLOs9ij8f6bsSUts28XoPZsk+kucdXc+83UninAXJC2KzuvQga8mBPF
 MGuxQTXmD5vgdPyvqj1D9U3uqkDE6HudrUDXr1yq9esPObjvUTkh09/Wm7OqiDu9
 GBUyhD+3GaBK18rKiL0JSDGbGamNR9BaFshovuPEmlAtaoHv8nbv/MmfTnCWwjjs
 5lQ7LmOJCuRuQmcTOR8q1csVhUXxssNGaOaclOOXou/crItmSDlLAaj6XLRTPt45
 4jPiNKgDH4Pj2vqGBeYnhNyPyc+Y5IVc88pV8kUxfsqnMluzsoLC+0JADXNMhk1T
 3sfecpceQav4TFhPOcMIHAkgldBnPQomW6laEn4Ul+dAyAes7q6Y0SvjQy03gDKU
 LY5QIsHLs5YZXur8TYSbU7Yt44hr4SAm0uz1kcCArlmNtidcjYuw1tLAWnGTxZrx
 5q+ZuQ6NTiPKwxTK0Zhf+HqZdzx2IE6JXaPBOeOdYxjTSWGmcKk=
 =x2A8
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Fix interaction between fw_devlink and DT overlays causing devices to
   not be probed

 - Fix the compatible string for loongson,cpu-interrupt-controller

* tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  treewide: Fix probing of devices in DT overlays
  dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible
2023-04-13 15:21:56 -07:00
Linus Torvalds
531f27ad5e This is just a revert of the AMD fix, because the fix fixed
broken some laptops. We are working on a proper solution.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmQ4YHcACgkQQRCzN7AZ
 XXO3QRAAlEE9OIeq6yhrnmQyhXTysZNCjMigkppIWAjY96twBifKo2PyaALb5v9O
 VPlbLDUB250RInrakOjjY3AFGiXP/UEYBsMk7L5UGS3bloyrBLyFEKoXccAO0qO6
 pPlVlLN+iszsO+vx2fSiE65o6ZMZTU9FWcroTbvDfO228e9xeq1mjH4d4H88sr90
 /PUhK/CT13zdTYX/eB8rF9IwlqAGRbuoNPr70M8cpJckVBs/BDt/EQLdBNzScY2l
 5zcaYcpxQlLI+uiBjf3kHKXVcOZwSKYMFK04306blgWhSAxEqMsM3aiwk3C612Ri
 fc70ONHl7OXSnIREcobIEq22Ehd3L/TEI0br76DkWkurQB29YT4WyEWAj1lmlzCY
 afHhX5d+ipVybOW3rQfCTdf/U23jVLrvA+n1bwsJqEpACsEXHyfHA+0oidBeiW2O
 62wmP9wxjUiO2AkIfNJuMdf12BdK+r0Rvk/5mRIZTUKOkx7B1T/zVHCnt9Qir2BT
 0O/3wUTz23onrR/1OnLiSOYQfmlly0/jZu2jyFYoIFxlB2imeqKfFnB0QHjzXOxe
 BeBTXPa1Y7r4ESNjt5z78MvlGIZLXf+PLx7nfxfVgClu3SUauNPFTFVkv45RJLWU
 AcL33B5OpmDT1EFxToPNto/kFTa4SzPm1dxEHpNcdURNEaq2KfQ=
 =FCMQ
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
 "This is just a revert of the AMD fix, because the fix broke some
  laptops. We are working on a proper solution"

* tag 'pinctrl-v6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  Revert "pinctrl: amd: Disable and mask interrupts on resume"
2023-04-13 15:17:59 -07:00
Linus Torvalds
f1be7b6c16 drm-fixes for -rc7
- two fbcon regressions
 - amdgpu: dp mst, smu13
 - i915: dual link dsi for tgl+
 - armada, nouveau, drm/sched, fbmem
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmQ4XEYACgkQTA9ye/CY
 qnFDIA/9FZJx4rW3rNl1fSN9UufQuDBq86kOECbG53VM9QcZVdDaV35cWBEyqG4p
 Ue8M3aZwltZ0+x6qeZ0rV+qZKlD3KYidq/MgW0YGPdVvSdrdVk0OOc4N6uQU1P1v
 Phr0hFHVao0+bWQE5gdK9S7DW23m/ys0y0C+LXVRNWWf4kVh5DFSjaqEo+SU8Gj1
 6GOUWf+rH44r4cHqx4sgZ8r79jWdc3Bjb6OLSE3YVhyU2cffVJ8+AgIA6JyStOR0
 tw0NKfAMI0BBlEewvvNWqKcLvZ7qz5bG+byy6amzX8bLmdyyz2BarM5MKSf7F0NY
 BC2GiQKUEh+hLsvUhLdqYV2iLKRI2Qjd0ESYF9WO2UElsTEf/2tuGKD5WTMmdJLG
 1BjEC1lQL/uSHWy+9qpppEKkmQHo+6MLlvbbbNVxgz8n/ysxnJGTe5ntGRWc88nR
 7yrHILf5Ry7/4D+PBctZqalf/JfklrY3hIuTl6XTgLE6eKM5Wgc2xn0ItmUXQD9i
 B18TCUiElr/FlMyw/oTWcstrI9rjrAv9FBZ0qoEqkXCfu7TPT+tJ+6oLq2rCJKUh
 4GuzOm7PYFBu0lof8Cmthz/Z8GDKJrdgdfDfp8yJp7jmIz2q8Xm3yqG66xSQKuAE
 dOJpHIrQgQAxgTvsSAALXmlPN2XAp5dHBaS4vXPgbFNKWkWi3H0=
 =aI2h
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:

 - two fbcon regressions

 - amdgpu: dp mst, smu13

 - i915: dual link dsi for tgl+

 - armada, nouveau, drm/sched, fbmem

* tag 'drm-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm:
  fbcon: set_con2fb_map needs to set con2fb_map!
  fbcon: Fix error paths in set_con2fb_map
  drm/amd/pm: correct the pcie link state check for SMU13
  drm/amd/pm: correct SMU13.0.7 max shader clock reporting
  drm/amd/pm: correct SMU13.0.7 pstate profiling clock settings
  drm/amd/display: Pass the right info to drm_dp_remove_payload
  drm/armada: Fix a potential double free in an error handling path
  fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
  drm/nouveau/fb: add missing sysmen flush callbacks
  drm/i915/dsi: fix DSS CTL register offsets for TGL+
  drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
2023-04-13 14:58:55 -07:00
Jakub Kicinski
d0f89c4c1d bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDhVyQAKCRDbK58LschI
 g0lKAQDScgS8nBlgupnWVal4JzhzzJaoabETf2sIl6Sd0UJAQwD8C3DakFP7O24N
 0YE3WGpHMEvVzeCnM5HTFbbdKCjgPQc=
 =QmoW
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-04-13

We've added 6 non-merge commits during the last 1 day(s) which contain
a total of 14 files changed, 205 insertions(+), 38 deletions(-).

The main changes are:

1) One late straggler fix on the XDP hints side which fixes
   bpf_xdp_metadata_rx_hash kfunc API before the release goes out
   in order to provide information on the RSS hash type,
   from Jesper Dangaard Brouer.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
  mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
  veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
  mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
  xdp: rss hash types representation
  selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
====================

Link: https://lore.kernel.org/r/20230413192939.10202-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 13:04:44 -07:00
Daniel Vetter
cab2932213 Short summary of fixes pull:
* armada: Fix double free
  * fb: Clear FB_ACTIVATE_KD_TEXT in ioctl
  * nouveau: Add missing callbacks
  * scheduler: Fix use-after-free error
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmQ4TOMACgkQaA3BHVML
 eiPw+wf9EpgmwUbs3gAb1gXTnXlum8zfkXrYNgZ7nU9G7e16tt5ToZnkGpJVEO5x
 Ep8mkvGHAathronn1kT/EjC2V6cVbFnWb2IQX5Hb7OhMxSwSy2lFCsPgMZlPCMCX
 3hxxLxpWE+GqkxZDS8+99xif7FCLqgbAR77Ca0VXG5vfaKfd8sqh0tXxP/m23eyT
 CmvT28kw6+cgG32Etf52UN9RJLBLSIUfl/34DGu8hmoIaYK0+AVNwOQNXKnDX/MM
 HbAKDKdx2souQYrBz+5rHsfkGVXyZx2gmxH7TD4srIwgBuvNude+LICVpH1qWUlq
 ZUM8B88yxjMte9Wr1CBCWy76OqxcIQ==
 =Ko4T
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-04-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

 * armada: Fix double free
 * fb: Clear FB_ACTIVATE_KD_TEXT in ioctl
 * nouveau: Add missing callbacks
 * scheduler: Fix use-after-free error

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413184233.GA8148@linux-uq9g
2023-04-13 20:47:58 +02:00
Daniel Borkmann
8c5c2a4898 bpf, sockmap: Revert buggy deadlock fix in the sockhash and sockmap
syzbot reported a splat and bisected it to recent commit ed17aa92dc ("bpf,
sockmap: fix deadlocks in the sockhash and sockmap"):

  [...]
  WARNING: CPU: 1 PID: 9280 at kernel/softirq.c:376 __local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
  Modules linked in:
  CPU: 1 PID: 9280 Comm: syz-executor.1 Not tainted 6.2.0-syzkaller-13249-gd319f344561d #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
  RIP: 0010:__local_bh_enable_ip+0xbe/0x130 kernel/softirq.c:376
  [...]
  Call Trace:
  <TASK>
  spin_unlock_bh include/linux/spinlock.h:395 [inline]
  sock_map_del_link+0x2ea/0x510 net/core/sock_map.c:165
  sock_map_unref+0xb0/0x1d0 net/core/sock_map.c:184
  sock_hash_delete_elem+0x1ec/0x2a0 net/core/sock_map.c:945
  map_delete_elem kernel/bpf/syscall.c:1536 [inline]
  __sys_bpf+0x2edc/0x53e0 kernel/bpf/syscall.c:5053
  __do_sys_bpf kernel/bpf/syscall.c:5166 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5164 [inline]
  __x64_sys_bpf+0x79/0xc0 kernel/bpf/syscall.c:5164
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  RIP: 0033:0x7fe8f7c8c169
  </TASK>
  [...]

Revert for now until we have a proper solution.

Fixes: ed17aa92dc ("bpf, sockmap: fix deadlocks in the sockhash and sockmap")
Reported-by: syzbot+49f6cef45247ff249498@syzkaller.appspotmail.com
Cc: Hsin-Wei Hung <hsinweih@uci.edu>
Cc: Xin Liu <liuxin350@huawei.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/000000000000f1db9605f939720e@google.com/
2023-04-13 20:36:32 +02:00
Alexei Starovoitov
b65ef48c95 Merge branch 'XDP-hints: change RX-hash kfunc bpf_xdp_metadata_rx_hash'
Jesper Dangaard Brouer says:

====================

Current API for bpf_xdp_metadata_rx_hash() returns the raw RSS hash value,
but doesn't provide information on the RSS hash type (part of 6.3-rc).

This patchset proposal is to change the function call signature via adding
a pointer value argument for providing the RSS hash type.

Patchset also removes all bpf_printk's from xdp_hw_metadata program
that we expect driver developers to use. Instead counters are introduced
for relaying e.g. skip and fail info.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer
0f26b74e7d selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer
9123397aee mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via matching individual Completion Queue Entry (CQE) status bits.

Fixes: ab46182d0d ("net/mlx4_en: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer
96b1a098f3 veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.

The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.

In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.

Fixes: 306531f024 ("veth: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
67f245c2ec mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC.  This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Fixes: bc8d405b1b ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
0cd917a4a8 xdp: rss hash types representation
The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.

Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.

The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.

This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.

The RSS type implementations for each driver comes as separate patches.

Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
e8163b98d9 selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.

Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.

Add counters for providing remaining information about failure and
skipped packet events.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00