XDP and XDP socket require extra SQ/RQ/CQs. Most of these resources
are dynamically created: no XDP program loaded, no resources are
created. One exception is the SQ/CQ created for XDP_REDRIECT, used
for other netdev to forward packet to mlx5 for transmit. The patch
disables creation of SQ and CQ used for egress XDP_REDIRECT, by
checking whether ndo_xdp_xmit is set or not.
For netdev without XDP support such as non-uplink representor, this
saves around 0.35MB of memory, per representor netdevice per channel.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241031125856.530927-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dynamically allocating xdpsq, used by egress side XDP_REDIRECT.
mlx5 has multiple XDP sqs. Under struct mlx5e_channel:
1. rx_xdpsq: used for XDP_TX, an XDP prog handles the rx packet and
transmits using the same queue as rx.
2. xdpsq: used by egress side XDP_REDIRECT. This is for another interface
to redirect packet to the mlx5 interface, using ndo_xdp_xmit .
3. xsksq: used by XSK. XSK has its own dedicated channel, and it also
has resources of 1 and 2.
The patch changes only the 2. xdpsq.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241031125856.530927-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Removed the 'mlx5hws_' file name prefix from the internal HWS files.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241031125856.530927-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
After adding HWS support in a separate folder, moving all the SWS
code into its own folder as well.
Now SWS and HWS implementation are located in their appropriate
folders:
- steering/sws/
- steering/hws/
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241031125856.530927-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first approach was flawed, because there are situations where the
esw mode change fails, leaving the qos domain as NULL. Various calls
into the QoS infra then trigger a NULL pointer access and unhappiness.
Improve that by a combination of:
- Allocating the QoS domain on esw init and cleaning it up on teardown.
- Refactoring mode change to only call qos domain init but not cleanup.
- Making qos domain init idempotent - not change anything if nothing
needs changing.
Together, these should guarantee that, as long as the memory allocations
succeed, there should always be a valid qos domain until the esw
cleanup, no matter what mode changes happen (or failures thereof).
Fixes: 107a034d5c ("net/mlx5: qos: Store rate groups in a qos domain")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241031125856.530927-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add 3 tests to check for the expected behaviour of
qdisc_tree_reduce_backlog in special scenarios.
- The first test checks if the qdisc class is notified of deletion for
major handle 'ffff:'.
- The second test checks the same as the first test but with 'ffff:' as the root
qdisc.
- The third test checks if everything works if ingress is active.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://patch.msgid.link/20241101143148.1218890-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu says:
====================
net: stmmac: Refactor FPE as a separate module
Refactor FPE implementation by moving common code for DWMAC4 and
DWXGMAC into a separate FPE module.
FPE implementation for DWMAC4 and DWXGMAC differs only for:
1) Offset address of MAC_FPE_CTRL_STS and MTL_FPE_CTRL_STS
2) FPRQ(Frame Preemption Residue Queue) field in MAC_RxQ_Ctrl1
3) Bit offset of Frame Preemption Interrupt Enable
Tested on DWMAC CORE 5.20a and DWXGMAC CORE 3.20a
====================
Link: https://patch.msgid.link/cover.1730449003.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Synopsys XGMAC Databook defines MAC_RxQ_Ctrl1 register:
RQ: Frame Preemption Residue Queue
XGMAC_FPRQ is more readable and more consistent with GMAC4.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/611991edf9e9d6fac8b29c3fe952791b193ca179.1730449003.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
FPE implementation for DWMAC4 and DWXGMAC differs only for:
1) Offset address of MAC_FPE_CTRL_STS and MTL_FPE_CTRL_STS
2) FPRQ(Frame Preemption Residue Queue) field in MAC_RxQ_Ctrl1
3) Bit offset of Frame Preemption Interrupt Enable
Refactor FPE functions to avoid code duplication and
to simplify the code flow by avoiding the use of
function pointers.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/49de4607bae69ffe751b13329a3c07a990b82419.1730449003.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A single "priv->dma_cap.fpesel" checks HW capability only,
while both HW capability and driver capability shall be
checked by later refactoring to prevent unexpected behavior
for FPE on unsupported MAC cores and keep FPE as an optional
implementation for current and new MAC cores.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/01e9cd13aedd38cb0e9a5d9875c475ce35250188.1730449003.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By moving FPE related code info separate files, FPE implementation
becomes a separate module initially.
No functional change intended.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/e9ddf4fbf0fc053ae30592aa6c4363e72a4d8e62.1730449003.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit says:
====================
r8169: align RTL8125/RTL8126 PHY config with vendor driver
This series aligns the RTL8125/RTL8126 PHY config with vendor drivers
r8125 and r8126 respectively.
====================
Link: https://patch.msgid.link/7a849c7c-50ff-4a9b-9a1c-a963b0561c79@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This aligns some parameters with vendor driver r8125/r8126 to avoid
compatibility issues. Note that for RTL8125B there's no functional
change, just the open-coded version of the function is replaced.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/a8a9d896-fbe6-41f2-bf87-666567d3cdb3@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Align the EEE config for RTL8125A/RTL8125B with vendor driver r8125.
This should help to avoid compatibility issues.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/044c925e-8669-4b98-87df-95b4056f4f5f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZyP6TxUcZGFuaWVsQGlv
Z2VhcmJveC5uZXQACgkQ2yufC7HISINz7QD/RTuJAzPJXPQmjdzMj7pepjnSQH4K
DnOc1soDqjJPSFkBAMlklDCZqSsFoNtNxagbyILrYQBC/MsV9jngimK46DEN
=pDzC
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-10-31
We've added 13 non-merge commits during the last 16 day(s) which contain
a total of 16 files changed, 710 insertions(+), 668 deletions(-).
The main changes are:
1) Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it, from Puranjay Mohan.
2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest
into test_progs so that it can be run in BPF CI, from Alexis Lothoré.
3) Two BPF sockmap selftest fixes, from Zijian Zhang.
4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check,
from Vincent Li.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add a selftest for bpf_csum_diff()
selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier
bpf: bpf_csum_diff: Optimize and homogenize for all archs
net: checksum: Move from32to16() to generic header
selftests/bpf: remove xdp_synproxy IP_DF check
selftests/bpf: remove test_tcp_check_syncookie
selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie
selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress
selftests/bpf: get rid of global vars in btf_skc_cls_ingress
selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress
selftests/bpf: factorize conn and syncookies tests in a single runner
selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap
selftests/bpf: Fix msg_verify_data in test_sockmap
====================
Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev says:
====================
ibm: emac: cleanup modules to use devm
simplifies probe and removes remove functions. These drivers are small.
====================
Link: https://patch.msgid.link/20241030203727.6039-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Moves the handling right before they are used and allows merging a
branch.
Also get rid of the error handling as devm_request_irq can handle that.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-13-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoids manual frees. Also replaced irq_of_parse_and_map with
platform_get_irq since it's simpler and does the same thing.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-12-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplifies the probe function by a bit and allows removing the _remove
function such that devm now handles all cleanup.
printk gets converted to dev_err as np is now gone.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-10-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It seems that since inception, this driver never called mutex_destroy in
_remove. Use devm to handle this automatically.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-9-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplifies the probe function by a bit and allows removing the _remove
function such that devm now handles all cleanup.
printk gets converted to dev_err as np is now gone.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-7-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It seems that since inception, this driver never called mutex_destroy in
_remove. Use devm to handle this automatically.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-6-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplifies the probe function by a bit and allows removing the _remove
function such that devm now handles all cleanup.
printk gets converted to dev_err as np is now gone.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-4-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It seems that since inception, this driver never called mutex_destroy in
_remove. Use devm to handle this automatically.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241030203727.6039-3-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/6be084229008dcfa7a4e2758befccfd2217a331e.1730294788.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/f48335504a05b3587e0081a9b4511e0761571ca5.1730292157.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pass the "mdio" child node directly to `macb_mdiobus_register` to avoid
performing the node lookup twice.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241030085224.2632426-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver and description indicate "snps,kbbe" is a boolean, not an
uint32.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://patch.msgid.link/20241101211331.24605-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The check on ret has already been performed a few statements earlier
and ret has not been re-assigned and so the re-checking is redundant.
Clean up the code by removing the redundant check.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20241031135042.3250614-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduce port_setup_tc callback in mt7530 dsa driver in order to enable
dsa ports rate shaping via hw Token Bucket Filter (TBF) for hw switched
traffic.
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241031-mt7530-tc-offload-v2-1-cb242ad954a0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
|4ead7: lea 0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
|4eade: mov %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
|4eae3: mov %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
0.15 |4eae8: mov %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
94.16 |4eaed: push 0x10(%rsp)
2.79 |4eaf1: push 0x10(%rsp)
0.07 |4eaf5: push %rax
// Call net_dim()
0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make the start and end arguments to dim_calc_stats() const pointers
to clarify that the function does not modify their values.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20241031002326.3426181-1-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT
hlist_for_each_entry_rcu() provides very helpful splats, which help
to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing
config the same as described in
a3e4bf7f96 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect").
The fix itself is trivial: add the very same lockdep annotations
as were used to dereference ao_info from the socket.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao says:
====================
selftest: netconsole: Enhance selftest to validate userdata transmission
The netconsole selftest has been extended to cover userdata, a
significant subsystem within netconsole. This patch introduces support
for testing userdata by appending a key-value pair and verifying its
successful transmission via netconsole/netpoll.
Additionally, this patchseries addresses a pending change in the subnet
configuration for the selftest.
v1: https://lore.kernel.org/20241025161415.238215-1-leitao@debian.org
====================
Link: https://patch.msgid.link/20241029090030.1793551-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend netcons_basic selftest to verify the userdata functionality by:
1. Creating a test key in the userdata configfs directory
2. Writing a known value to the key
3. Validating the key-value pair appears in the captured network output
This ensures the userdata feature is properly tested during selftests.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241029090030.1793551-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lorenzo Bianconi says:
====================
Simplify Tx napi logic in airoha_eth driver
Simplify Tx napi logic relying on the packet index provided by
completion queue indicating the completed packet that can be removed
from the Tx DMA ring.
Read completion queue head and pending entry in airoha_qdma_tx_napi_poll().
====================
Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-0-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify Tx napi logic relying just on the packet index provided by
completion queue indicating the completed packet that can be removed
from the Tx DMA ring.
This is a preliminary patch to add Qdisc offload for airoha_eth driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-2-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to avoid any possible race, read completion queue head and
pending entry in airoha_qdma_tx_napi_poll routine instead of doing it in
airoha_irq_handler. Remove unused airoha_tx_irq_queue unused fields.
This is a preliminary patch to add Qdisc offload for airoha_eth driver.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20241029-airoha-en7581-tx-napi-work-v1-1-96ad1686b946@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoids having to use manual pointer manipulation.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241029233229.9385-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These are the preferred way to copy ethtool strings.
Avoids incrementing pointers all over the place.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241029234641.11448-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>