Commit Graph

1186949 Commits

Author SHA1 Message Date
Jiawen Wu
c3e382ad6d net: txgbe: Add software nodes to support phylink
Register software nodes for GPIO, I2C, SFP and PHYLINK. Define the
device properties.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 13:25:10 +02:00
Michal Smulski
4a56212774 net: dsa: mv88e6xxx: implement USXGMII mode for mv88e6393x
Enable USXGMII mode for mv88e6393x chips. Tested on Marvell 88E6191X.

Signed-off-by: Michal Smulski <michal.smulski@ooma.com>
Link: https://lore.kernel.org/r/20230605174442.12493-1-msmulski2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:26:36 -07:00
Jakub Kicinski
e06bd5e3ad Merge branch 'followup-fixes-for-the-dwmac-and-altera-lynx-conversion'
Maxime Chevallier says:

====================
Followup fixes for the dwmac and altera lynx conversion

Here's yet another version of the cleanup series for the TSE PCS replacement
by PCS Lynx. It includes Kconfig fixups, some missing initialisations
and a slight rework suggested by Russell for the dwmac cleanup sequence,
along with more explicit zeroing of local structures as per MAciej's
review.
====================

Link: https://lore.kernel.org/r/20230607135941.407054-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:50 -07:00
Maxime Chevallier
06b9dede1e net: dwmac_socfpga: initialize local data for mdio regmap configuration
Explicitly zero-ize the local mdio_regmap_config data, and explicitly
set the .autoscan parameter, as we only have a PCS on this bus.

Fixes: 5d1f3fe7d2 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:46 -07:00
Maxime Chevallier
fa19a5d9dc net: altera_tse: explicitly disable autoscan on the regmap-mdio bus
Set the .autoscan flag to false on the regmap-mdio bus, to avoid using a
random uninitialized value. We don't want autoscan in this case as the
mdio device is a PCS and not a PHY.

Fixes: db48abbaa1 ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:12 -07:00
Maxime Chevallier
a8dd7404c2 net: stmmac: make the pcs_lynx cleanup sequence specific to dwmac_socfpga
So far, only the dwmac_socfpga variant of stmmac uses PCS Lynx. Use a
dedicated cleanup sequence for dwmac_socfpga instead of using the
generic stmmac one.

Fixes: 5d1f3fe7d2 ("net: stmmac: dwmac-sogfpga: use the lynx pcs driver")
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:12 -07:00
Maxime Chevallier
fae555f5a5 net: altera_tse: Use the correct Kconfig option for the PCS_LYNX dependency
Use the correct Kconfig dependency for altera_tse as PCS_ALTERA_TSE was
replaced by PCS_LYNX.

Fixes: db48abbaa1 ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:12 -07:00
Maxime Chevallier
2d830f7a41 net: altera-tse: Initialize local structs before using it
The regmap_config and mdio_regmap_config objects needs to be zeroed before
using them. This will cause spurious errors at probe time as config->pad_bits
is containing random uninitialized data.

Fixes: db48abbaa1 ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 13:30:11 -07:00
Jakub Kicinski
6878eb59d9 Merge branch 'tools-ynl-generate-code-for-the-handshake-family'
Jakub Kicinski says:

====================
tools: ynl: generate code for the handshake family

Add necessary features and generate user space C code for serializing
/ deserializing messages of the handshake family.

In addition to basics already present in netdev and fou, handshake
has nested attrs and multi-attr u32.
====================

Link: https://lore.kernel.org/r/20230606194302.919343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 12:53:14 -07:00
Jakub Kicinski
7a11f70ce8 tools: ynl: generate code for the handshake family
Generate support for the handshake family.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 12:53:10 -07:00
Jakub Kicinski
58da455b31 tools: ynl-gen: improve unwind on parsing errors
When parsing multi-attr we count the objects and then allocate
an array to hold the parsed objects. If an attr space has multiple
multi-attr objects, however, if parsing the first array fails
we'll leave the object count for the second even tho the second
array was never allocated.

This may cause crashes when freeing objects on error.

Count attributes to a variable on the stack and only set the count
in the object once the memory was allocated.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 12:53:10 -07:00
Jakub Kicinski
2cc9671a82 tools: ynl-gen: fill in support for MultiAttr scalars
The handshake family needs support for MultiAttr scalars.
Right now we only support code gen for MultiAttr nested
types.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 12:53:10 -07:00
Simon Horman
e7214663e0 net: txgbe: Avoid passing uninitialised parameter to pci_wake_from_d3()
txgbe_shutdown() relies on txgbe_dev_shutdown() to initialise
wake by passing it by reference. However, txgbe_dev_shutdown()
doesn't use this parameter at all.

wake is then passed uninitialised by txgbe_dev_shutdown()
to pci_wake_from_d3().

Resolve this problem by:
* Removing the unused parameter from txgbe_dev_shutdown()
* Removing the uninitialised variable wake from txgbe_dev_shutdown()
* Passing false to pci_wake_from_d3() - this assumes that
  although uninitialised wake was in practice false (0).

I'm not sure that this counts as a bug, as I'm not sure that
it manifests in any unwanted behaviour. But in any case, the issue
was introduced by:

  3ce7547e5b ("net: txgbe: Add build support for txgbe")

Flagged by Smatch as:

  .../txgbe_main.c:486 txgbe_shutdown() error: uninitialized symbol 'wake'.

No functional change intended.
Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:28:58 +01:00
Atin Bainada
92db9e2e04 net: dsa: qca8k: remove unnecessary (void*) conversions
Pointer variables of (void*) type do not require type cast.

Signed-off-by: Atin Bainada <hi@atinb.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:26:19 +01:00
Masahiro Yamada
f71be9d084 net: liquidio: fix mixed module-builtin object
With CONFIG_LIQUIDIO=m and CONFIG_LIQUIDIO_VF=y (or vice versa),
$(common-objs) are linked to a module and also to vmlinux even though
the expected CFLAGS are different between builtins and modules.

This is the same situation as fixed by commit 637a642f5c ("zstd:
Fixing mixed module-builtin objects").

Introduce the new module, liquidio-core, to provide the common functions
to liquidio and liquidio-vf.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 12:22:44 +01:00
David Morley
0824a987a5 tcp: fix formatting in sysctl_net_ipv4.c
Fix incorrectly formatted tcp_syn_linear_timeouts sysctl in the
ipv4_net_table.

Fixes: ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:57:28 +01:00
Dan Carpenter
cad7526f33 net: dsa: ocelot: unlock on error in vsc9959_qos_port_tas_set()
This error path needs call mutex_unlock(&ocelot->tas_lock) before
returning.

Fixes: 2d800bc500 ("net/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:10:52 +01:00
David S. Miller
2f27d7890f Merge branch 'realtek-external-phy-clock'
Detlev Casanova says:

====================
net: phy: realtek: Support external PHY clock

Some PHYs can use an external clock that must be enabled before
communicating with them.

Changes since v3:
 * Do not call genphy_suspend if WoL is enabled.
Changes since v2:
 * Reword documentation commit message
Changes since v1:
 * Remove the clock name as it is not guaranteed to be identical across
   different PHYs
 * Disable/Enable the clock when suspending/resuming
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 09:52:25 +01:00
Detlev Casanova
59e227e289 net: phy: realtek: Disable clock on suspend
For PHYs that call rtl821x_probe() where an external clock can be
configured, make sure that the clock is disabled
when ->suspend() is called and enabled on resume.

The PHY_ALWAYS_CALL_SUSPEND is added to ensure that the suspend function
is actually always called.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 09:52:24 +01:00
Detlev Casanova
350b7a258f dt-bindings: net: phy: Document support for external PHY clk
Ethern PHYs can have external an clock that needs to be activated before
communicating with the PHY.

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 09:52:24 +01:00
Detlev Casanova
7300c9b574 net: phy: realtek: Add optional external PHY clock
In some cases, the PHY can use an external clock source instead of a
crystal.

Add an optional clock in the phy node to make sure that the clock source
is enabled, if specified, before probing.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 09:52:24 +01:00
Shradha Gupta
4cab498f33 hv_netvsc: Allocate rx indirection table size dynamically
Allocate the size of rx indirection table dynamically in netvsc
from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES
query instead of using a constant value of ITAB_NUM.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Tested-on: Ubuntu22 (azure VM, SKU size: Standard_F72s_v2)
Testcases:
1. ethtool -x eth0 output
2. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic
3. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 09:46:49 +01:00
Xin Long
ae28ea5cbd tipc: replace open-code bearer rcu_dereference access in bearer.c
Replace these open-code bearer rcu_dereference access with bearer_get(),
like other places in bearer.c. While at it, also use tipc_net() instead
of net_generic(net, tipc_net_id) to get "tn" in bearer.c.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/1072588a8691f970bda950c7e2834d1f2983f58e.1685976044.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:46:21 -07:00
Jakub Kicinski
be1f4a262b Merge branch 'ipv4-remove-rt_conn_flags-calls-in-flowi4_init_output'
Guillaume Nault says:

====================
ipv4: Remove RT_CONN_FLAGS() calls in flowi4_init_output().

Remove a few RT_CONN_FLAGS() calls used inside flowi4_init_output().
These users can be easily converted to set the scope properly, instead
of overloading the tos parameter with scope information as done by
RT_CONN_FLAGS().

The objective is to eventually remove RT_CONN_FLAGS() entirely, which
will then allow to also remove RTO_ONLINK and to finally convert
->flowi4_tos to dscp_t.
====================

Link: https://lore.kernel.org/r/cover.1685999117.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:13:05 -07:00
Guillaume Nault
6f8a76f802 tcp: Set route scope properly in cookie_v4_check().
RT_CONN_FLAGS(sk) overloads flowi4_tos with the RTO_ONLINK bit when
sk has the SOCK_LOCALROUTE flag set. This allows
ip_route_output_key_hash() to eventually adjust flowi4_scope.

Instead of relying on special handling of the RTO_ONLINK bit, we can
just set the route scope correctly. This will eventually allow to avoid
special interpretation of tos variables and to convert ->flowi4_tos to
dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:13:03 -07:00
Guillaume Nault
4b095281ca ipv4: Set correct scope in inet_csk_route_*().
RT_CONN_FLAGS(sk) overloads the tos parameter with the RTO_ONLINK bit
when sk has the SOCK_LOCALROUTE flag set. This is only useful for
ip_route_output_key_hash() to eventually adjust the route scope.

Let's drop RTO_ONLINK and set the correct scope directly to avoid this
special case in the future and to allow converting ->flowi4_tos to
dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:13:03 -07:00
Jakub Kicinski
fe109f6b63 Merge branch 'move-ksz9477-errata-handling-to-phy-driver'
Robert Hancock says:

====================
Move KSZ9477 errata handling to PHY driver

Patches to move handling for KSZ9477 PHY errata register fixes from
the DSA switch driver into the corresponding PHY driver, for more
proper layering and ordering.
====================

Link: https://lore.kernel.org/r/20230605153943.1060444-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:08:39 -07:00
Robert Hancock
6068e6d7ba net: dsa: microchip: remove KSZ9477 PHY errata handling
The KSZ9477 PHY errata handling code has now been moved into the Micrel
PHY driver, so it is no longer needed inside the DSA switch driver.
Remove it.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:08:37 -07:00
Robert Hancock
26dd2974c5 net: phy: micrel: Move KSZ9477 errata fixes to PHY driver
The ksz9477 DSA switch driver is currently updating some MMD registers
on the internal port PHYs to address some chip errata. However, these
errata are really a property of the PHY itself, not the switch they are
part of, so this is kind of a layering violation. It makes more sense for
these writes to be done inside the driver which binds to the PHY and not
the driver for the containing device.

This also addresses some issues where the ordering of when these writes
are done may have been incorrect, causing the link to erratically fail to
come up at the proper speed or at all. Doing this in the PHY driver
during config_init ensures that they happen before anything else tries to
change the state of the PHY on the port.

The new code also ensures that autonegotiation is disabled during the
register writes and re-enabled afterwards, as indicated by the latest
version of the errata documentation from Microchip.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:08:37 -07:00
Jakub Kicinski
2dc476404e Merge branch 'tools-ynl-user-space-c'
Jakub Kicinski says:

====================
tools: ynl: user space C

Use the code gen which is already in tree to generate a user space
library for a handful of simple families. I find YNL C quite useful
in some WIP projects, and I think others may find it useful, too.
I was hoping someone will pick this work up and finish it...
but it seems that Python YNL has largely stolen the thunder.
Python may not be great for selftest, tho, and actually this lib
is more fully-featured. The Python script was meant as a quick demo,
funny how those things go.

v2: https://lore.kernel.org/all/20230604175843.662084-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20230603052547.631384-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230605190108.809439-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:34 -07:00
Jakub Kicinski
ee0202e2e7 tools: ynl: add sample for netdev
Add a sample application using the C library.
My main goal is to make writing selftests easier but until
I have some of those ready I think it's useful to show off
the functionality and let people poke and tinker.

Sample outputs - dump:

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 0
      lo[1]	0:
  enp1s0[2]	23: basic redirect rx-sg

Notifications (watching veth pair getting added and deleted):

$ ./netdev
Select ifc ($ifindex; or 0 = dump; or -2 ntf check): -2
[53]	0: (ntf: dev-add-ntf)
[54]	0: (ntf: dev-add-ntf)
[54]	23: basic redirect rx-sg (ntf: dev-change-ntf)
[53]	23: basic redirect rx-sg (ntf: dev-change-ntf)
[53]	23: basic redirect rx-sg (ntf: dev-del-ntf)
[54]	23: basic redirect rx-sg (ntf: dev-del-ntf)

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:32 -07:00
Jakub Kicinski
d75fdfbc6f tools: ynl: support fou and netdev in C
Generate the code for netdev and fou families. They are simple
and already supported by the code gen.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:32 -07:00
Jakub Kicinski
86878f14d7 tools: ynl: user space helpers
Add "fixed" part of the user space Netlink Spec-based library.
This will get linked with the protocol implementations to form
a full API.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:31 -07:00
Jakub Kicinski
a99bfdf647 tools: ynl-gen: clean up stray new lines at the end of reply-less requests
Do not print empty lines before closing brackets.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 12:31:31 -07:00
Lukas Bulwahn
ae91f7e436 net/pppoe: fix a typo for the PPPOE_HASH_BITS_1 definition
Instead of its intention to define PPPOE_HASH_BITS_1, commit 96ba44c637
("net/pppoe: make number of hash bits configurable") actually defined
config PPPOE_HASH_BITS_2 twice in the ppp's Kconfig file due to a quick
typo with the numbers.

Fix the typo and define PPPOE_HASH_BITS_1.

Fixes: 96ba44c637 ("net/pppoe: make number of hash bits configurable")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jaco Kroon <jaco@uls.co.za>
Link: https://lore.kernel.org/r/20230605072743.11247-1-lukas.bulwahn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:28:30 +02:00
Andy Shevchenko
8d2b2281ae mac_pton: Clean up the header inclusions
Since hex_to_bin() is provided by hex.h there is no need to require
kernel.h. Replace the latter by the former and add missing export.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230604132858.6650-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:18:32 +02:00
Richard Gobert
7b355b76e2 gro: decrease size of CB
The GRO control block (NAPI_GRO_CB) is currently at its maximum size.
This commit reduces its size by putting two groups of fields that are
used only at different times into a union.

Specifically, the fields frag0 and frag0_len are the fields that make up
the frag0 optimisation mechanism, which is used during the initial
parsing of the SKB.

The fields last and age are used after the initial parsing, while the
SKB is stored in the GRO list, waiting for other packets to arrive.

There was one location in dev_gro_receive that modified the frag0 fields
after setting last and age. I changed this accordingly without altering
the code behaviour.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230601161407.GA9253@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 11:12:20 +02:00
Jakub Kicinski
ddb8701dcb Merge branch 'splice-net-handle-msg_splice_pages-in-af_kcm'
David Howells says:

====================
splice, net: Handle MSG_SPLICE_PAGES in AF_KCM

Here are patches to make AF_KCM handle the MSG_SPLICE_PAGES internal
sendmsg flag.  MSG_SPLICE_PAGES is an internal hint that tells the protocol
that it should splice the pages supplied if it can.  Its sendpage
implementation is then turned into a wrapper around that.

Does anyone actually use AF_KCM?  Upstream it has some issues.  It doesn't
seem able to handle a "message" longer than 113920 bytes without jamming
and doesn't handle the client termination once it is jammed.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524144923.3623536-1-dhowells@redhat.com/ # v1
====================

Link: https://lore.kernel.org/r/20230531110423.643196-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:58 -07:00
David Howells
5bb3a5cb3e kcm: Convert kcm_sendpage() to use MSG_SPLICE_PAGES
Convert kcm_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:56 -07:00
David Howells
2b03bcae66 kcm: Support MSG_SPLICE_PAGES
Make AF_KCM sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
spliced from the source iterator if possible.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 20:51:56 -07:00
Jakub Kicinski
28cfea989d mlx5-updates-2023-05-31
net/mlx5: Support 4 ports VF LAG, part 1/2
 
 This series continues the series[1] "Support 4 ports HCAs LAG mode"
 by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB
 E-Switch).
 
 This series of patches focuses on refactoring different sections of the
 code that make assumptions about VF LAG supporting only two ports. For
 instance, it assumes that each device can only have one peer.
 
 Patches 1-5:
 - Refactor ETH handling of TC rules of eswitches with peers.
 Patch 6:
 - Refactors peer miss group table.
 Patches 7-9:
 - Refactor single FDB E-Switch creation.
 Patch 10:
 - Refactor the DR layer.
 Patches 11-14:
 - Refactors devcom layer.
 
 Next series will refactor LAG layer and enable 4 ports VF LAG.
 This series specifically allows HCAs with 4 ports to create a VF LAG
 with only 4 ports. It is not possible to create a VF LAG with 2 or 3
 ports using HCAs that have 4 ports.
 
 Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
 However, upcoming patches will introduce support for HCAs with 4 ports.
 
 In order to activate VF LAG a user can execute:
 
 devlink dev eswitch set pci/0000:08:00.0 mode switchdev
 devlink dev eswitch set pci/0000:08:00.1 mode switchdev
 devlink dev eswitch set pci/0000:08:00.2 mode switchdev
 devlink dev eswitch set pci/0000:08:00.3 mode switchdev
 ip link add name bond0 type bond
 ip link set dev bond0 type bond mode 802.3ad
 ip link set dev eth2 master bond0
 ip link set dev eth3 master bond0
 ip link set dev eth4 master bond0
 ip link set dev eth5 master bond0
 
 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.
 
 User can verify LAG state and type via debugfs:
 /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
 /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type
 
 [1]
 https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmR6PrkACgkQSD+KveBX
 +j4QYAf/TlwnjQP+Is+9lJEIC+RH1bDPEsbiw+kwlMU6AveDYaHl1bevjJoE7qS5
 uvVunRh7SENWtU56TOiYO7CwkbTHAx1WdJdXZDudfed1olF0hMUO5yDixaFgiCMn
 hTWJJvNprWWK5Ti+yZ39ZHsrigXze7g4ZoJ0JFyDKfR3cobqNjgpOLXz+XMR0NEY
 7ym6VxZATNjVWorBeHMA+Oe249s4oK/m5i0LV/8yrkhDpFkCcfkTdLmvKQ+TRu11
 Akq1dd236LqGqAPO2tFg8NgnjsXu1vvCJnvjML5VFDbruDv3h3fa6F0mClrRmYm7
 f7I32QW10+NYBANcoWVCn6EQzmWDpQ==
 =ZDar
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-05-31

net/mlx5: Support 4 ports VF LAG, part 1/2

This series continues the series[1] "Support 4 ports HCAs LAG mode"
by Mark Bloch. This series adds support for 4 ports VF LAG (single FDB
E-Switch).

This series of patches focuses on refactoring different sections of the
code that make assumptions about VF LAG supporting only two ports. For
instance, it assumes that each device can only have one peer.

Patches 1-5:
- Refactor ETH handling of TC rules of eswitches with peers.
Patch 6:
- Refactors peer miss group table.
Patches 7-9:
- Refactor single FDB E-Switch creation.
Patch 10:
- Refactor the DR layer.
Patches 11-14:
- Refactors devcom layer.

Next series will refactor LAG layer and enable 4 ports VF LAG.
This series specifically allows HCAs with 4 ports to create a VF LAG
with only 4 ports. It is not possible to create a VF LAG with 2 or 3
ports using HCAs that have 4 ports.

Currently, the Merged E-Switch feature only supports HCAs with 2 ports.
However, upcoming patches will introduce support for HCAs with 4 ports.

In order to activate VF LAG a user can execute:

devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
devlink dev eswitch set pci/0000:08:00.2 mode switchdev
devlink dev eswitch set pci/0000:08:00.3 mode switchdev
ip link add name bond0 type bond
ip link set dev bond0 type bond mode 802.3ad
ip link set dev eth2 master bond0
ip link set dev eth3 master bond0
ip link set dev eth4 master bond0
ip link set dev eth5 master bond0

Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0
pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively.

User can verify LAG state and type via debugfs:
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state
/sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type

[1]
https://lore.kernel.org/netdev/20220510055743.118828-1-saeedm@nvidia.com/

* tag 'mlx5-updates-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices
  net/mlx5: Devcom, introduce devcom_for_each_peer_entry
  net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired
  net/mlx5: Devcom, Rename paired to ready
  net/mlx5: DR, handle more than one peer domain
  net/mlx5: E-switch, generalize shared FDB creation
  net/mlx5: E-switch, Handle multiple master egress rules
  net/mlx5: E-switch, refactor FDB miss rule add/remove
  net/mlx5: E-switch, enlarge peer miss group table
  net/mlx5e: Handle offloads flows per peer
  net/mlx5e: en_tc, re-factor query route port
  net/mlx5e: rep, store send to vport rules per peer
  net/mlx5e: tc, Refactor peer add/del flow
  net/mlx5e: en_tc, Extend peer flows to a list
====================

Link: https://lore.kernel.org/r/20230602191301.47004-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:42:22 -07:00
Jakub Kicinski
c422ac94e6 Merge branch 'drm-i915-use-ref_tracker-library-for-tracking-wakerefs'
Andrzej Hajda says:

====================
drm/i915: use ref_tracker library for tracking wakerefs

This is reviewed series of ref_tracker patches, ready to merge
via network tree, rebased on net-next/main.
i915 patches will be merged later via intel-gfx tree.
====================

Merge on top of an -rc tag in case it's needed in another tree.

Link: https://lore.kernel.org/r/20230224-track_gt-v9-0-5b47a33f55d1@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:45 -07:00
Andrzej Hajda
acd8f0e5d7 lib/ref_tracker: remove warnings in case of allocation failure
Library can handle allocation failures. To avoid allocation warnings
__GFP_NOWARN has been added everywhere. Moreover GFP_ATOMIC has been
replaced with GFP_NOWAIT in case of stack allocation on tracker free
call.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
227c6c8323 lib/ref_tracker: add printing to memory buffer
Similar to stack_(depot|trace)_snprint the patch
adds helper to printing stats to memory buffer.
It will be helpful in case of debugfs.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
b6d7c0eb2d lib/ref_tracker: improve printing stats
In case the library is tracking busy subsystem, simply
printing stack for every active reference will spam log
with long, hard to read, redundant stack traces. To improve
readabilty following changes have been made:
- reports are printed per stack_handle - log is more compact,
- added display name for ref_tracker_dir - it will differentiate
  multiple subsystems,
- stack trace is printed indented, in the same printk call,
- info about dropped references is printed as well.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
Andrzej Hajda
7a113ff635 lib/ref_tracker: add unlocked leak print helper
To have reliable detection of leaks, caller must be able to check under
the same lock both: tracked counter and the leaks. dir.lock is natural
candidate for such lock and unlocked print helper can be called with this
lock taken.
As a bonus we can reuse this helper in ref_tracker_dir_exit.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05 15:28:42 -07:00
David S. Miller
69da40ac34 Merge branch 'mlxsw-selftests-cleanups'
Petr Machata says:

====================
mlxsw, selftests: Cleanups

This patchset consolidates a number of disparate items that can all be
considered cleanups. They are all related to mlxsw in that they are
directly in mlxsw code, or in selftests that mlxsw heavily uses.

- patch #1 fixes a comment, patch #2 propagates an extack

- patches #3 and #4 tweak several loops to query a resource once and cache
  in a local variable instead of querying on each iteration

- patches #5 and #6 fix selftest diagrams, and #7 adds a missing diagram
  into an existing test

- patch #8 disables a PVID on a bridge in a selftest that should not need
  said PVID
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
f5136877f4 selftests: router_bridge_vlan: Set vlan_default_pvid 0 on the bridge
When everything is configured, VLAN membership on the bridge in this
selftest are as follows:

    # bridge vlan show
    port              vlan-id
    swp2              1 PVID Egress Untagged
                      555
    br1               1 Egress Untagged
                      555 PVID Egress Untagged

Note that it is possible for untagged traffic to just flow through as VLAN
1, instead of using VLAN 555 as intended by the test. This configuration
seems too close to "works by accident", and it would be better to just shut
out VLAN 1 altogether.

To that end, configure vlan_default_pvid of 0:

    # bridge vlan show
    port              vlan-id
    swp2              555
    br1               555 PVID Egress Untagged

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
812de4dfab selftests: router_bridge_vlan: Add a diagram
Add a topology diagram to this selftest to make the configuration easier to
understand.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
34ad708d1b selftests: mlxsw: egress_vid_classification: Fix the diagram
The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00