Commit Graph

1203143 Commits

Author SHA1 Message Date
Ido Schimmel
504fc6f4f7 vrf: Remove unnecessary RCU-bh critical section
dev_queue_xmit_nit() already uses rcu_read_lock() / rcu_read_unlock()
and nothing suggests that softIRQs should be disabled around it.
Therefore, remove the rcu_read_lock_bh() / rcu_read_unlock_bh()
surrounding it.

Tested using [1] with lockdep enabled.

[1]
 #!/bin/bash

 ip link add name vrf1 up type vrf table 100
 ip link add name veth0 type veth peer name veth1
 ip link set dev veth1 master vrf1
 ip link set dev veth0 up
 ip link set dev veth1 up
 ip address add 192.0.2.1/24 dev veth0
 ip address add 192.0.2.2/24 dev veth1
 ip rule add pref 32765 table local
 ip rule del pref 0
 tcpdump -i vrf1 -c 20 -w /dev/null &
 sleep 10
 ping -i 0.1 -c 10 -q 192.0.2.2

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230821142339.1889961-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:58:50 -07:00
Ido Schimmel
63c11dc2ca vxlan: vnifilter: Use GFP_KERNEL instead of GFP_ATOMIC
The function is not called from an atomic context so use GFP_KERNEL
instead of GFP_ATOMIC. The allocation of the per-CPU stats is already
performed with GFP_KERNEL.

Tested using test_vxlan_vnifiltering.sh with CONFIG_DEBUG_ATOMIC_SLEEP.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230821141923.1889776-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:58:45 -07:00
Yue Haibing
a491add19f net: ethernet: ti: Remove unused declarations
Commit e8609e6947 ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK")
removed am65_cpsw_nuss_adjust_link() but not its declaration.
Commit 84640e27f2 ("net: netcp: Add Keystone NetCP core ethernet driver")
declared but never implemented netcp_device_find_module().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20230821134029.40084-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:33:21 -07:00
Yue Haibing
dff96d7c0c net: microchip: Remove unused declarations
Commit 264a9c5c9d ("net: sparx5: Remove unused GLAG handling in PGID")
removed sparx5_pgid_alloc_glag() but not its declaration.
Commit 27d293ccee ("net: microchip: sparx5: Add support for rule count by cookie")
removed vcap_rule_iter() but not its declaration.
Commit 8beef08f46 ("net: microchip: sparx5: Adding initial VCAP API support")
declared but never implemented vcap_api_set_client().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821135556.43224-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:31:47 -07:00
Yue Haibing
efa47e80c2 ionic: Remove unused declarations
Commit fbfb803153 ("ionic: Add hardware init and device commands")
declared but never implemented ionic_q_rewind()/ionic_set_dma_mask().
Commit 969f843946 ("ionic: sync the filters in the work task")
declared but never implemented ionic_rx_filters_need_sync().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20230821134717.51936-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:30:06 -07:00
Yue Haibing
49e62a0462 net: mscc: ocelot: Remove unused declarations
Commit 6c30384eb1 ("net: mscc: ocelot: register devlink ports")
declared but never implemented ocelot_devlink_init() and
ocelot_devlink_teardown().
Commit 2096805497 ("net: mscc: ocelot: automatically detect VCAP constants")
declared but never implemented ocelot_detect_vcap_constants().
Commit 403ffc2c34 ("net: mscc: ocelot: add support for preemptible traffic classes")
declared but never implemented ocelot_port_update_preemptible_tcs().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230821130218.19096-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:29:15 -07:00
Yue Haibing
73582f090f net: dsa: microchip: Remove unused declarations
Commit 91a98917a8 ("net: dsa: microchip: move switch chip_id detection to ksz_common")
removed ksz8_switch_detect() but not its declaration.
Commit 6ec23aaaac ("net: dsa: microchip: move ksz_dev_ops to ksz_common.c")
declared but never implemented other functions.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20230821125501.19624-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22 10:28:13 -07:00
Hangbin Liu
691b2bf149 bonding: update port speed when getting bond speed
Andrew reported a bonding issue that if we put an active-back bond on top
of a 802.3ad bond interface. When the 802.3ad bond's speed/duplex changed
dynamically. The upper bonding interface's speed/duplex can't be changed at
the same time, which will show incorrect speed.

Fix it by updating the port speed when calling ethtool.

Reported-by: Andrew Schorr <ajschorr@alumni.princeton.edu>
Closes: https://lore.kernel.org/netdev/ZEt3hvyREPVdbesO@Laptop-X1/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20230821101008.797482-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22 15:10:35 +02:00
Zhengchao Shao
43c2817225 net: remove unnecessary input parameter 'how' in ifdown function
When the ifdown function in the dst_ops structure is referenced, the input
parameter 'how' is always true. In the current implementation of the
ifdown interface, ip6_dst_ifdown does not use the input parameter 'how',
xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter
'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and
xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown
function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22 13:19:02 +02:00
GONG, Ruiqi
3a198c95c9 alx: fix OOB-read compiler warning
The following message shows up when compiling with W=1:

In function ‘fortify_memcpy_chk’,
    inlined from ‘alx_get_ethtool_stats’ at drivers/net/ethernet/atheros/alx/ethtool.c:297:2:
./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  592 |    __read_overflow2_field(q_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to get alx stats altogether, alx_get_ethtool_stats() reads
beyond hw->stats.rx_ok. Fix this warning by directly copying hw->stats,
and refactor the unnecessarily complicated BUILD_BUG_ON btw.

Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230821013218.1614265-1-gongruiqi@huaweicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22 12:30:15 +02:00
Daniel Golle
90308679c2 net: pcs: lynxi: implement pcs_disable op
When switching from 10GBase-R/5GBase-R/USXGMII to one of the interface
modes provided by mtk-pcs-lynxi we need to make sure to always perform
a full configuration of the PHYA.

Implement pcs_disable op which resets the stored interface mode to
PHY_INTERFACE_MODE_NA to trigger a full reconfiguration once the LynxI
PCS driver had previously been deselected in favor of another PCS
driver such as the to-be-added driver for the USXGMII PCS found in
MT7988.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/f23d1a60d2c9d2fb72e32dcb0eaa5f7e867a3d68.1692327891.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 19:08:57 -07:00
Jakub Kicinski
7eb6deb3f5 Revert "pds_core: Fix some kernel-doc comments"
This reverts commit cb39c35783.
Patch was applied to hastily, the problem is already fixed
in Alex's vfio tree:
https://lore.kernel.org/all/20230821112237.105872b5.alex.williamson@redhat.com/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 12:11:35 -07:00
Yang Li
cb39c35783 pds_core: Fix some kernel-doc comments
Fix some kernel-doc comments to silence the warnings:

drivers/net/ethernet/amd/pds_core/auxbus.c:18: warning: Function parameter or member 'pf' not described in 'pds_client_register'
drivers/net/ethernet/amd/pds_core/auxbus.c:18: warning: Excess function parameter 'pf_pdev' description in 'pds_client_register'
drivers/net/ethernet/amd/pds_core/auxbus.c:58: warning: Function parameter or member 'pf' not described in 'pds_client_unregister'
drivers/net/ethernet/amd/pds_core/auxbus.c:58: warning: Excess function parameter 'pf_pdev' description in 'pds_client_unregister'

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-21 07:48:34 +01:00
Eric Dumazet
bc1fb82ae1 net: annotate data-races around sk->sk_lingertime
sk_getsockopt() runs locklessly. This means sk->sk_lingertime
can be read while other threads are changing its value.

Other reads also happen without socket lock being held,
and must be annotated.

Remove preprocessor logic using BITS_PER_LONG, compilers
are smart enough to figure this by themselves.

v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning
    (Jakub)

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-21 07:41:57 +01:00
Hangbin Liu
b4672c7337 IPv4: add extack info for IPv4 address add/delete
Add extack info for IPv4 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.

No extack message for the ifa_local checking in __inet_insert_ifa() as
it has been checked in find_matching_ifa().

Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-21 07:35:59 +01:00
Furong Xu
669a55560e net: stmmac: Check more MAC HW features for XGMAC Core 3.20
1. XGMAC Core does not have hash_filter definition, it uses
vlhash(VLAN Hash Filtering) instead, skip hash_filter when XGMAC.
2. Show exact size of Hash Table instead of raw register value.
3. Show full description of safety features defined by Synopsys Databook.
4. When safety feature is configured with no parity, or ECC only,
keep FSM Parity Checking disabled.

Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 18:19:20 +01:00
David S. Miller
43bc9bd67e Merge branch 'ipv6-update-route-when-delete-saddr'
Hangbin Liu says:

====================
ipv6: update route when delete source address

Currently, when remove an address, the IPv6 route will not remove the
prefer source address when the address is bond to other device. Fix this
issue and add related tests as Ido and David suggested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:27:22 +01:00
Hangbin Liu
429b55b441 selftests: fib_test: add a test case for IPv6 source address delete
Add a test case for IPv6 source address delete.

As David suggested, add tests:
- Single device using src address
- Two devices with the same source address
- VRF with single device using src address
- VRF with two devices using src address

As Ido points out, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device. This will give us similar results
to IPv4 if the route is installed in the same VRF as the nexthop device, but
not when the nexthop device is enslaved to a different VRF. So add tests:
- src address and nexthop dev in same VR
- src address and nexthop device in different VRF

The link local address delete logic is different from the global address.
It should only affect the associate device it bonds to. So add tests cases
for link local address testing.

Here is the test result:

IPv6 delete address route tests
    Single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    Two devices with the same source address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    VRF with single device using src address
    TEST: Prefsrc removed when src address removed on other device      [ OK ]
    VRF with two devices using src address
    TEST: Prefsrc not removed when src address exist on other device    [ OK ]
    TEST: Prefsrc removed when src address removed on all devices       [ OK ]
    src address and nexthop dev in same VRF
    TEST: Prefsrc removed from VRF when source address deleted          [ OK ]
    TEST: Prefsrc in default VRF not removed                            [ OK ]
    TEST: Prefsrc not removed from VRF when source address exist        [ OK ]
    TEST: Prefsrc in default VRF removed                                [ OK ]
    src address and nexthop device in different VRF
    TEST: Prefsrc not removed from VRF when nexthop dev in diff VRF     [ OK ]
    TEST: Prefsrc not removed in default VRF                            [ OK ]
    TEST: Prefsrc removed from VRF when nexthop dev in diff VRF         [ OK ]
    TEST: Prefsrc removed in default VRF                                [ OK ]
    Table ID 0
    TEST: Prefsrc removed from default VRF when source address deleted  [ OK ]
    Link local source route
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed when delete ll addr                           [ OK ]
    TEST: Prefsrc not removed when delete ll addr from other dev        [ OK ]
    TEST: Prefsrc removed even ll addr still exist on other dev         [ OK ]

Tests passed:  19
Tests failed:   0

Suggested-by: Ido Schimmel <idosch@idosch.org>
Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:27:22 +01:00
Hangbin Liu
b358f57f7d ipv6: do not match device when remove source route
After deleting an IPv6 address on an interface and cleaning up the
related preferred source entries, it is important to ensure that all
routes associated with the deleted address are properly cleared. The
current implementation of rt6_remove_prefsrc() only checks the preferred
source addresses bound to the current device. However, there may be
routes that are bound to other devices but still utilize the same
preferred source address.

To address this issue, it is necessary to also delete entries that are
bound to other interfaces but share the same source address with the
current device. Failure to delete these entries would leave routes that
are bound to the deleted address unclear. Here is an example reproducer
(I have omitted unrelated routes):

+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 1:2:3:4::5/64 dev dummy1
+ ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5
+ ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5
+ ip -6 route show
1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium
7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium

As Ido reminds, in IPv6, the preferred source address is looked up in
the same VRF as the first nexthop device, which is different with IPv4.
So, while removing the device checking, we also need to add an
ipv6_chk_addr() check to make sure the address does not exist on the other
devices of the rt nexthop device's VRF.

After fix:
+ ip addr del 1:2:3:4::5/64 dev dummy1
+ ip -6 route show
7:7:7::1 dev dummy1 metric 1024 pref medium
7:7:7::2 dev dummy2 metric 1024 pref medium

Reported-by: Thomas Haller <thaller@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2170513
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:27:21 +01:00
Hangbin Liu
c4cf2bc0d2 selftests: vrf_route_leaking: remove ipv6_ping_frag from default testing
As the initial commit 1a01727676 ("selftests: Add VRF route leaking
tests") said, the IPv6 MTU test fails as source address selection
picking ::1. Every time we run the selftest this one report failed.
There seems not much meaning  to keep reporting a failure for 3 years
that no one plan to fix/update. Let't just skip this one first. We can
add it back when the issue fixed.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:25:10 +01:00
Patrick Rohr
5cb249686e net: release reference to inet6_dev pointer
addrconf_prefix_rcv returned early without releasing the inet6_dev
pointer when the PIO lifetime is less than accept_ra_min_lft.

Fixes: 5027d54a9c ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:23:57 +01:00
Eric Dumazet
0f158b32a9 net: selectively purge error queue in IP_RECVERR / IPV6_RECVERR
Setting IP_RECVERR and IPV6_RECVERR options to zero currently
purges the socket error queue, which was probably not expected
for zerocopy and tx_timestamp users.

I discovered this issue while preparing commit 6b5f43ea08
("inet: move inet->recverr to inet->inet_flags"), I presume this
change does not need to be backported to stable kernels.

Add skb_errqueue_purge() helper to purge error messages only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:17:47 +01:00
David S. Miller
412a75dc61 Merge branch 'fixed_phy_register-return-value'
Ruan Jinjie says:

====================
net: Return PTR_ERR() for fixed_phy_register()

fixed_phy_register() returns not only -EIO or -ENODEV, but also
-EPROBE_DEFER, -EINVAL and -EBUSY. The Best practice is to return these
error codes with PTR_ERR().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:13:27 +01:00
Ruan Jinjie
294f48e9b2 net: lan743x: Return PTR_ERR() for fixed_phy_register()
fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -EIO. The Best practice is to return these
error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:13:27 +01:00
Ruan Jinjie
acf50d1adb net: bcmgenet: Return PTR_ERR() for fixed_phy_register()
fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -ENODEV. The Best practice is to return these
error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:13:27 +01:00
Ruan Jinjie
d6499f0b7c net: bgmac: Return PTR_ERR() for fixed_phy_register()
fixed_phy_register() returns -EPROBE_DEFER, -EINVAL and -EBUSY,
etc, in addition to -ENODEV. The best practice is to return
these error codes with PTR_ERR().

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 15:13:27 +01:00
Russell King (Oracle)
b22eef6864 net: dsa: realtek: add phylink_get_caps implementation
The user ports use RSGMII, but we don't have that, and DT doesn't
specify a phy interface mode, so phylib defaults to GMII. These support
1G, 100M and 10M with flow control. It is unknown whether asymetric
pause is supported at all speeds.

The CPU port uses MII/GMII/RGMII/REVMII by hardware pin strapping,
and support speeds specific to each, with full duplex only supported
in some modes. Flow control may be supported again by hardware pin
strapping, and theoretically is readable through a register but no
information is given in the datasheet for that.

So, we do a best efforts - and be lenient.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-20 11:38:43 +01:00
David S. Miller
85c786340a Merge branch 'vcap_get_rule-return-value'
Ruan Jinjie says:

====================
net: Update and fix return value check for vcap_get_rule()

As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner.

So se IS_ERR() to update the return value and fix the issue
in lan966x_ptp_add_trap().

Changes in v2:
- Update vcap_get_rule() to always return an ERR_PTR().
- Update the return value fix in lan966x_ptp_add_trap().
- Update the return value check in sparx5_tc_free_rule_resources().
====================

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:29:23 +01:00
Ruan Jinjie
95b358e4d9 net: microchip: sparx5: Update return value check for vcap_get_rule()
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to check the return value.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:29:23 +01:00
Ruan Jinjie
ab104318f6 net: lan966x: Fix return value check for vcap_get_rule()
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), so use IS_ERR() to fix the return value issue.

Fixes: 72df3489fb ("net: lan966x: Add ptp trap rules")
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:29:23 +01:00
Ruan Jinjie
093db9cda7 net: microchip: vcap api: Always return ERR_PTR for vcap_get_rule()
As Simon Horman suggests, update vcap_get_rule() to always
return an ERR_PTR() and update the error detection conditions to
use IS_ERR(), which would be more cleaner in this case.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:29:23 +01:00
Russell King (Oracle)
44a696de72 net: mdio: xgene: remove useless xgene_mdio_status
xgene_mdio_status is declared static, and is only written once by the
driver. It appears to have been this way since the driver was first
added to the kernel tree. No other users can be found, so let's remove
it.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:25:27 +01:00
Jiri Pirko
f65f305ae0 tools: ynl-gen: use temporary file for rendering
Currently any error during render leads to output an empty file.
That is quite annoying when using tools/net/ynl/ynl-regen.sh
which git greps files with content of "YNL-GEN.." and therefore ignores
empty files. So once you fail to regen, you have to checkout the file.

Avoid that by rendering to a temporary file first, only at the end
copy the content to the actual destination.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:24:38 +01:00
Kurt Kanzenbach
58f2ffdedf stmmac: intel: Enable correction of MAC propagation delay
All captured timestamps should be corrected by PHY, MAC and CDC introduced
latency/errors. The CDC correction is already used. Enable MAC propagation delay
correction as well which is available since commit 26cfb838aa ("net: stmmac:
correct MAC propagation delay").

Before:
|ptp4l[390.458]: rms    7 max   21 freq   +177 +/-  14 delay   357 +/-   1

After:
|ptp4l[620.012]: rms    7 max   20 freq   +195 +/-  14 delay   345 +/-   1

Tested on Intel Elkhart Lake.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Johannes Zink <j.zink@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 19:23:07 +01:00
Eric Dumazet
4025d3e73a net: add skb_queue_purge_reason and __skb_queue_purge_reason
skb_queue_purge() and __skb_queue_purge() become wrappers
around the new generic functions.

New SKB_DROP_REASON_QUEUE_PURGE drop reason is added,
but users can start adding more specific reasons.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 15:30:15 +01:00
David S. Miller
5b0a1414e0 Merge branch 'smc-features'
Guangguan Wang says:

====================
net/smc: several features's implementation for smc v2.1

This patch set implement several new features in SMC v2.1(https://
www.ibm.com/support/pages/node/7009315), including vendor unique
experimental options, max connections per lgr negotiation, max links
per lgr negotiation.

v1 - v2:
 - rename field fce_v20 to fce_v2_base in struct
   smc_clc_first_contact_ext_v2x
 - use smc_get_clc_first_contact_ext in smc_connect
   _rdma_v2_prepare
 - adding comment about field vendor_oui in struct
   smc_clc_msg_smcd
 - remove comment about SMC_CONN_PER_LGR_MAX in smc_
   clc_srv_v2x_features_validate
 - rename smc_clc_clnt_v2x_features_validate

RFC v2 - v1:
 - more description in commit message
 - modify SMC_CONN_PER_LGR_xxx and SMC_LINKS_ADD_LNK_xxx
   macro defination and usage
 - rename field release_ver to release_nr
 - remove redundant release version check in client
 - explicitly set the rc value in smc_llc_cli/srv_add_link

RFC v1 - RFC v2:
 - Remove ini pointer NULL check and fix code style in
   smc_clc_send_confirm_accept.
 - Optimize the max_conns check in smc_clc_xxx_v2x_features_validate.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:53 +01:00
Guangguan Wang
bbed596c74 net/smc: Extend SMCR v2 linkgroup netlink attribute
Add SMC_NLA_LGR_R_V2_MAX_CONNS and SMC_NLA_LGR_R_V2_MAX_LINKS
to SMCR v2 linkgroup netlink attribute SMC_NLA_LGR_R_V2 for
linkgroup's detail info showing.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:53 +01:00
Guangguan Wang
69b888e3bb net/smc: support max links per lgr negotiation in clc handshake
Support max links per lgr negotiation in clc handshake for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for the final
value of max links based on the client preferred max links and
self-preferred max links. Here use the minimum value of the client
preferred max links and server preferred max links.

Client                                       Server
     Proposal(max links(client preferred))
     -------------------------------------->

     Accept(max links(accepted value))
accepted value=min(client preferred, server preferred)
     <-------------------------------------

      Confirm(max links(accepted value))
     ------------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:53 +01:00
Guangguan Wang
7f0620b994 net/smc: support max connections per lgr negotiation
Support max connections per lgr negotiation for SMCR v2.1,
which is one of smc v2.1 features. Server makes decision for
the final value of max conns based on the client preferred
max conns and self-preferred max conns. Here use the minimum
value of client preferred max conns and server preferred max
conns.

Client                                     Server
     Proposal(max conns(client preferred))
     ------------------------------------>

     Accept(max conns(accepted value))
accepted value=min(client preferred, server preferred)
     <-----------------------------------

     Confirm(max conns(accepted value))
     ----------------------------------->

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang
6ac1e6563f net/smc: support smc v2.x features validate
Support SMC v2.x features validate for SMC v2.1. This is the frame
code for SMC v2.x features validate, and will take effects only when
the negotiated release version is v2.1 or later.

For Server, v2.x features' validation should be done in smc_clc_srv_
v2x_features_validate when receiving v2.1 or later CLC Proposal Message,
such as max conns, max links negotiation, the decision of the final
value of max conns and max links should be made in this function.
And final check for server when receiving v2.1 or later CLC Confirm
Message should be done in smc_clc_v2x_features_confirm_check.

For client, v2.x features' validation should be done in smc_clc_clnt_
v2x_features_validate when receiving v2.1 or later CLC Accept Message,
for example, the decision to accpt the accepted value or to decline
should be made in this function.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang
7290178a82 net/smc: add vendor unique experimental options area in clc handshake
Add vendor unique experimental options area in clc handshake. In clc
accept and confirm msg, vendor unique experimental options use the
16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext
in previous version. Because of the struct smc_clc_first_contact_ext
is widely used and limit the scope of modification, this patch moves
the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and
followed with the struct smc_clc_first_contact_ext in a new struct
names struct smc_clc_first_contact_ext_v2x.

For SMC-R first connection, in previous version, the struct smc_clc_
first_contact_ext and the 16-Bytes reserved field has already been
included in clc accept and confirm msg. Thus, this patch use struct
smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_
contact_ext and the 16-Bytes reserved field in SMC-R clc accept and
confirm msg is compatible with previous version.

For SMC-D first connection, in previous version, only the struct smc_
clc_first_contact_ext is included in clc accept and confirm msg, and
the 16-Bytes reserved field is not included. Thus, when the negotiated
smc release version is the version before v2.1, we still use struct
smc_clc_first_contact_ext for compatible consideration. If the negotiated
smc release version is v2.1 or later, use struct smc_clc_first_contact_
ext_v2x instead.

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Guangguan Wang
1e700948c9 net/smc: support smc release version negotiation in clc handshake
Support smc release version negotiation in clc handshake based on
SMC v2, where no negotiation process for different releases, but
for different versions. The latest smc release version was updated
to v2.1. And currently there are two release versions of SMCv2, v2.0
and v2.1. In the release version negotiation, client sends the preferred
release version by CLC Proposal Message, server makes decision for which
release version to use based on the client preferred release version and
self-supported release version (here choose the minimum release version
of the client preferred and server latest supported), then the decision
returns to client by CLC Accept Message. Client confirms the decision by
CLC Confirm Message.

Client                                    Server
      Proposal(preferred release version)
     ------------------------------------>

      Accept(accpeted release version)
 min(client preferred, server latest supported)
     <------------------------------------

      Confirm(accpeted release version)
     ------------------------------------>

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-19 12:46:52 +01:00
Yue Haibing
cb49ec0349 net: freescale: Remove unused declarations
Commit 5d93cfcf73 ("net: dpaa: Convert to phylink") removed
fman_set_mac_active_pause()/fman_get_pause_cfg() but not declarations.
Commit 48257c4f16 ("Add fs_enet ethernet network driver, for several
embedded platforms.") declared but never implemented
fs_enet_platform_init() and fs_enet_platform_cleanup().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230817134159.38484-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:39:49 -07:00
Eric Dumazet
726e9e8b94 tcp: refine skb->ooo_okay setting
Enabling BIG TCP on a low end platform apparently increased
chances of getting flows locked on one busy TX queue.

A similar problem was handled in commit 9b462d02d6
("tcp: TCP Small Queues and strange attractors"),
but the strategy worked for either bulk flows,
or 'large enough' RPC. BIG TCP changed how large
RPC needed to be to enable the work around:
If RPC fits in a single skb, TSQ never triggers.

Root cause for the problem is a busy TX queue,
with delayed TX completions.

This patch changes how we set skb->ooo_okay to detect
the case TX completion was not done, but incoming ACK
already was processed and emptied rtx queue.

Update the comment to explain the tricky details.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:29:36 -07:00
Jakub Kicinski
fc720399ff Merge branch 'bnxt_en-update-for-net-next'
Michael Chan says:

====================
bnxt_en: Update for net-next

This patchset contains 2 features:

- The page pool implementation for the normal RX path (non-XDP) for
paged buffers in the aggregation ring.

- Saving of the ring error counters across reset.
====================

Link: https://lore.kernel.org/r/20230817231911.165035-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:14:01 -07:00
Michael Chan
8becd1961c bnxt_en: Add tx_resets ring counter
Add a new tx_resets ring counter.  This counter will be saved as
tx_total_resets across any reset.  Since we currently do a full reset
in bnxt_sched_reset_txr(), the per ring counter will always be cleared
during reset.  Only the tx_total_resets count will be meaningful and we
only display this under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:59 -07:00
Michael Chan
a080b47a04 bnxt_en: Display the ring error counters under ethtool -S
The existing driver displays the sum of 4 ring counters under ethtool -S.
These counters are in the array bnxt_sw_func_stats.  These counters are
summed at the time of ethtool -S and will be lost when the device is reset.

Replace these counters with the new total ring error counters added in the
last patch.  These new counters are saved before reset.  ethtool -S will
now display the sum of the saved counters plus the current counters.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:59 -07:00
Michael Chan
4c70dbe3c0 bnxt_en: Save ring error counters across reset
Currently, the ring counters are stored in the per ring datastructure.
During reset, all the rings are freed together with the associated
datastructures.  As a result, all the ring error counters will be reset
to zero.

Add logic to keep track of the total error counts of all the rings
and save them before reset (including ifdown).  The next patch will
display these total ring error counters under ethtool -S.

Link: https://lore.kernel.org/netdev/CACKFLimD-bKmJ1tGZOLYRjWzEwxkri-Mw7iFme1x2Dr0twdCeg@mail.gmail.com/
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00
Michael Chan
d38c19b13b bnxt_en: Increment rx_resets counter in bnxt_disable_napi()
If we are doing a complete reset with irq_re_init set to true in
bnxt_close_nic(), all the ring structures will be freed.  New
structures will be allocated in bnxt_open_nic().  The current code
increments rx_resets counter in bnxt_enable_napi() if bnapi->in_reset
is true.  In a complete reset, bnapi->in_reset will never be true
since the structure is just allocated.

Increment the rx_resets counter in bnxt_disable_napi() instead.  This
will allow us to save all the ring error counters including the
rx_resets counters in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00
Somnath Kotur
578fcfd26e bnxt_en: Let the page pool manage the DMA mapping
Use the page pool's ability to maintain DMA mappings for us.
This avoids re-mapping of the recycled pages.

Link: https://lore.kernel.org/netdev/20230728231829.235716-4-michael.chan@broadcom.com/
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230817231911.165035-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:13:58 -07:00