Commit Graph

1311499 Commits

Author SHA1 Message Date
Eric Dumazet
78a0cb2f45 net: add debug check in skb_reset_inner_mac_header()
Make sure (skb->data - skb->head) can fit in skb->inner_mac_header

This needs CONFIG_DEBUG_NET=y.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20241105174403.850330-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:29:14 -08:00
Eric Dumazet
1732e4bedb net: add debug check in skb_reset_inner_network_header()
Make sure (skb->data - skb->head) can fit in skb->inner_network_header

This needs CONFIG_DEBUG_NET=y.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20241105174403.850330-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:29:14 -08:00
Eric Dumazet
cfe8394e06 net: add debug check in skb_reset_inner_transport_header()
Make sure (skb->data - skb->head) can fit in skb->inner_transport_header

This needs CONFIG_DEBUG_NET=y.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20241105174403.850330-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:29:14 -08:00
Eric Dumazet
1e4033b53d net: skb_reset_mac_len() must check if mac_header was set
Recent discussions show that skb_reset_mac_len() should be more careful.

We expect the MAC header being set.

If not, clear skb->mac_len and fire a warning for CONFIG_DEBUG_NET=y builds.

If after investigations we find that not having a MAC header was okay,
we can remove the warning.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89iJZGH+yEfJxfPWa3Hm7jxb-aeY2Up4HufmLMnVuQXt38A@mail.gmail.com/T/
Cc: En-Wei Wu <en-wei.wu@canonical.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20241105174403.850330-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:29:14 -08:00
Jakub Kicinski
3545f9b72f Merge branch 'ipv6-fix-hangup-on-device-removal'
Paolo Abeni says:

====================
ipv6: fix hangup on device removal

This addresses the infamous unregister_netdevice splat in net selftests;
the actual fix is carried by the first patch, while the 2nd one
addresses a related problem in the relevant test that was patially
hiding the problem.

Targeting net-next as the issue is quite old and I feel a little lost
in the fib info/nh jungle.
====================

Link: https://patch.msgid.link/cover.1730828007.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:27:40 -08:00
Paolo Abeni
52ed077aa6 selftests: net: really check for bg process completion
A recent refactor transformed the check for process completion
in a true statement, due to a typo.

As a result, the relevant test-case is unable to catch the
regression it was supposed to detect.

Restore the correct condition.

Fixes: 691bb4e49c ("selftests: net: avoid just another constant wait")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/0e6f213811f8e93a235307e683af8225cc6277ae.1730828007.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:27:35 -08:00
Paolo Abeni
eb02688c5c ipv6: release nexthop on device removal
The CI is hitting some aperiodic hangup at device removal time in the
pmtu.sh self-test:

unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 6
ref_tracker: veth_A-R1@ffff888013df15d8 has 1/5 users at
	dst_init+0x84/0x4a0
	dst_alloc+0x97/0x150
	ip6_dst_alloc+0x23/0x90
	ip6_rt_pcpu_alloc+0x1e6/0x520
	ip6_pol_route+0x56f/0x840
	fib6_rule_lookup+0x334/0x630
	ip6_route_output_flags+0x259/0x480
	ip6_dst_lookup_tail.constprop.0+0x5c2/0x940
	ip6_dst_lookup_flow+0x88/0x190
	udp_tunnel6_dst_lookup+0x2a7/0x4c0
	vxlan_xmit_one+0xbde/0x4a50 [vxlan]
	vxlan_xmit+0x9ad/0xf20 [vxlan]
	dev_hard_start_xmit+0x10e/0x360
	__dev_queue_xmit+0xf95/0x18c0
	arp_solicit+0x4a2/0xe00
	neigh_probe+0xaa/0xf0

While the first suspect is the dst_cache, explicitly tracking the dst
owing the last device reference via probes proved such dst is held by
the nexthop in the originating fib6_info.

Similar to commit f5b51fe804 ("ipv6: route: purge exception on
removal"), we need to explicitly release the originating fib info when
disconnecting a to-be-removed device from a live ipv6 dst: move the
fib6_info cleanup into ip6_dst_ifdown().

Tested running:

./pmtu.sh cleanup_ipv6_exception

in a tight loop for more than 400 iterations with no spat, running an
unpatched kernel  I observed a splat every ~10 iterations.

Fixes: f88d8ea67f ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/604c45c188c609b732286b47ac2a451a40f6cf6d.1730828007.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06 17:27:35 -08:00
Florian Westphal
a84e8c05f5 selftests: netfilter: nft_queue.sh: fix warnings with socat 1.8.0.0
Updated to a more recent socat release and saw this:
 socat E xioopen_ipdgram_listen(): unknown address family 0
 socat W address is opened in read-write mode but only supports read-only

First error is avoided via pf=ipv4 option, second one via -u
(unidirectional) mode.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20241104142821.2608-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:59:37 -08:00
Florian Westphal
fc49b80496 selftests: netfilter: run conntrack_dump_flush in netns
This test will fail if the initial namespace has conntrack
active due to unexpected number of flows returned on dump:

  conntrack_dump_flush.c:451:test_flush_by_zone:Expected ret (7) == 2 (2)
  test_flush_by_zone: Test failed
  FAIL  conntrack_dump_flush.test_flush_by_zone
  not ok 2 conntrack_dump_flush.test_flush_by_zone

Add a wrapper that unshares this program to avoid this problem.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20241104142529.2352-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:58:53 -08:00
Matthieu Baerts (NGI0)
f2c71c49da mptcp: remove unneeded lock when listing scheds
mptcp_get_available_schedulers() needs to iterate over the schedulers'
list only to read the names: it doesn't modify anything there.

In this case, it is enough to hold the RCU read lock, no need to combine
this with the associated spin lock as it was done since its introduction
in commit 73c900aa36 ("mptcp: add net.mptcp.available_schedulers").

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241104-net-next-mptcp-sched-unneeded-lock-v2-1-2ccc1e0c750c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:54:39 -08:00
Jakub Kicinski
dc0f314bc9 Merge branch 'add-the-dwmac-driver-support-for-t-head-th1520-soc'
Drew Fustini says:

====================
Add the dwmac driver support for T-HEAD TH1520 SoC

This series adds support for dwmac gigabit ethernet in the T-Head TH1520
RISC-V SoC used on boards like BeagleV Ahead and the LicheePi 4A.

The gigabit ethernet on these boards does need pinctrl support to mux
the necessary pads. The pinctrl-th1520 driver, pinctrl binding, and
related dts patches are in linux-next. However, they are not yet in
net-next/main.

Therefore, I am dropping the dts patch for v5 as it will not build on
net-next/main due to the lack of the padctrl0_apsys pin controller node
in next-next/main version th1520.dtsi. It does exist in linux-next [1]
and the two patches in this series allow the ethernet ports to work
correctly on the LPi4A and Ahead when applied to linux-next.

The dwmac-thead driver in this series does not need the pinctrl-th1520
driver to build. Nor does the thead,th1520-gmac.yaml binding need the
pinctrl binding to pass the schema check.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/riscv/boot/dts/thead/th1520.dtsi
====================

Link: https://patch.msgid.link/20241103-th1520-gmac-v7-0-ef094a30169c@tenstorrent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:50:14 -08:00
Jisheng Zhang
33a1a01e3a net: stmmac: Add glue layer for T-HEAD TH1520 SoC
Add dwmac glue driver to support the DesignWare-based GMAC controllers
on the T-HEAD TH1520 SoC.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Drew Fustini <dfustini@tenstorrent.com>
Link: https://patch.msgid.link/20241103-th1520-gmac-v7-2-ef094a30169c@tenstorrent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:50:09 -08:00
Jisheng Zhang
f920ce04c3 dt-bindings: net: Add T-HEAD dwmac support
Add documentation to describe the DesginWare-based GMAC controllers in
the T-HEAD TH1520 SoC.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Drew Fustini <dfustini@tenstorrent.com>
Link: https://patch.msgid.link/20241103-th1520-gmac-v7-1-ef094a30169c@tenstorrent.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:50:04 -08:00
Heiner Kallweit
83cb4b470c r8169: remove leftover locks after reverted change
After e31a9fedc7 ("Revert "r8169: disable ASPM during NAPI poll"")
these locks aren't needed any longer.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/680f2606-ac7d-4ced-8694-e5033855da9b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:47:12 -08:00
Jakub Kicinski
2eed720933 Merge branch 'add-support-for-synopsis-designware-version-3-72a'
Lothar Rubusch says:

====================
Add support for Synopsis DesignWare version 3.72a

Add compatibility and dt-binding for Synopsis DesignWare version 3.72a.
The dwmac is used on some older Altera/Intel SoCs such as Arria10.
Updating compatibles in the driver and bindings for the DT improves the
binding check coverage for such SoCs.
====================

Link: https://patch.msgid.link/20241102114122.4631-1-l.rubusch@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:45:19 -08:00
Lothar Rubusch
8bed89232a dt-bindings: net: snps,dwmac: add support for Arria10
The hard processor system (HPS) on the Intel/Altera Arria10 provides
three Ethernet Media Access Controller (EMAC) peripherals. Each EMAC
can be used to transmit and receive data at 10/100/1000 Mbps over
ethernet connections in compliance with the IEEE 802.3 specification.
The EMACs on the Arria10 are instances of the Synopsis DesignWare
Universal 10/100/1000 Ethernet MAC, version 3.72a.

Support the Synopsis DesignWare version 3.72a, which is used in Intel's
Arria10 SoC, since it was missing.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20241102114122.4631-3-l.rubusch@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:45:18 -08:00
Lothar Rubusch
ffda5c6287 net: stmmac: add support for dwmac 3.72a
The dwmac 3.72a is an ip version that can be found on Intel/Altera Arria10
SoCs. Going by the hardware features "snps,multicast-filter-bins" and
"snps,perfect-filter-entries" shall be supported. Thus add a
compatibility flag, and extend coverage of the driver for the 3.72a.

Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241102114122.4631-2-l.rubusch@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:45:17 -08:00
Rosen Penev
7a4ea5da4d net: hisilicon: hns: use ethtool string helpers
The latter is the preferred way to copy ethtool strings.

Avoids manually incrementing the pointer. Cleans up the code quite well.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20241101220023.290926-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:38:29 -08:00
Aaron Conole
7d1c2d517f openvswitch: Pass on secpath details for internal port rx.
Clearing the secpath for internal ports will cause packet drops when
ipsec offload or early SW ipsec decrypt are used.  Systems that rely
on these will not be able to actually pass traffic via openvswitch.

There is still an open issue for a flow miss packet - this is because
we drop the extensions during upcall and there is no facility to
restore such data (and it is non-trivial to add such functionality
to the upcall interface).  That means that when a flow miss occurs,
there will still be packet drops.  With this patch, when a flow is
found then traffic which has an associated xfrm extension will
properly flow.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://patch.msgid.link/20241101204732.183840-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:38:25 -08:00
Heiner Kallweit
2cd02f2fdd r8169: improve initialization of RSS registers on RTL8125/RTL8126
Replace the register addresses with the names used in r8125/r8126
vendor driver, and consider that RSS_CTRL_8125 is a 32 bit register.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/3bf2f340-b369-4174-97bf-fd38d4217492@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:36:16 -08:00
Jakub Kicinski
33d005b26f Merge branch 'a-pile-of-sfc-deadcode'
Dr. David Alan Gilbert says:

====================
A pile of sfc deadcode

This is a collection of deadcode removal in the sfc
drivers;  the split is vaguely where I found them in
the tree, with some left over.

This has been build tested and booted on an x86 VM,
but I fon't have the hardware to test; however
it's all full function removal.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
====================

Link: https://patch.msgid.link/20241102151625.39535-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:35:14 -08:00
Dr. David Alan Gilbert
d3e80070b5 sfc: Remove more unused functions
efx_ticks_to_usecs(), efx_reconfigure_port(), efx_ptp_get_mode(), and
efx_tx_get_copy_buffer_limited() are unused.
They seem to be partially due to the later splits to Siena, but
some seem unused for longer.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20241102151625.39535-5-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:35:11 -08:00
Dr. David Alan Gilbert
5254fdfc74 sfc: Remove unused mcdi functions
efx_mcdi_flush_rxqs(), efx_mcdi_rpc_async_quiet(),
efx_mcdi_rpc_finish_quiet(), and efx_mcdi_wol_filter_get_magic()
are unused.
I think these are fall out from the split into Siena
that happened in
commit 4d49e5cd4b ("sfc/siena: Rename functions in mcdi headers to avoid
conflicts with sfc")
and
commit d48523cb88 ("sfc: Copy shared files needed for Siena (part 2)")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20241102151625.39535-4-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:35:11 -08:00
Dr. David Alan Gilbert
70e58249a6 sfc: Remove unused efx_mae_mport_vf
efx_mae_mport_vf() has been unused since
commit 5227adff37 ("sfc: add mport lookup based on driver's mport data")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20241102151625.39535-3-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:35:11 -08:00
Dr. David Alan Gilbert
cc4914d904 sfc: Remove falcon deadcode
ef4_farch_dimension_resources(), ef4_nic_fix_nodesc_drop_stat(),
ef4_ticks_to_usecs() and ef4_tx_get_copy_buffer_limited() were
copied over from efx_ equivalents in 2016 but never used by
commit 5a6681e22c ("sfc: separate out SFC4000 ("Falcon") support into new
sfc-falcon driver")

EF4_MAX_FLUSH_TIME is also unused.

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20241102151625.39535-2-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:35:11 -08:00
Maurice Lambert
84bfbfbbd3 netlink: typographical error in nlmsg_type constants definition
This commit fix a typographical error in netlink nlmsg_type constants definition in the include/uapi/linux/rtnetlink.h at line 177. The definition is RTM_NEWNVLAN RTM_NEWVLAN instead of RTM_NEWVLAN RTM_NEWVLAN.

Signed-off-by: Maurice Lambert <mauricelambert434@gmail.com>
Fixes: 8dcea18708 ("net: bridge: vlan: add rtm definitions and dump support")
Link: https://patch.msgid.link/20241103223950.230300-1-mauricelambert434@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:33:55 -08:00
Vadim Fedorenko
6c0828d00f bnxt_en: replace PTP spinlock with seqlock
We can see high contention on ptp_lock while doing RX timestamping
on high packet rates over several queues. Spinlock is not effecient
to protect timecounter for RX timestamps when reads are the most
usual operations and writes are only occasional. It's better to use
seqlock in such cases.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:33:26 -08:00
Vadim Fedorenko
bb2ef9b92b bnxt_en: cache only 24 bits of hw counter
This hardware can provide only 48 bits of cycle counter. We can leave
only 24 bits in the cache to extend RX timestamps from 32 bits to 48
bits. Lower 8 bits of the cached value will be used to check for
roll-over while extending to full 48 bits.
This change makes cache writes atomic even on 32 bit platforms and we
can simply use READ_ONCE()/WRITE_ONCE() pair and remove spinlock. The
configuration structure will be also reduced by 4 bytes.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241103215108.557531-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 17:33:26 -08:00
Matthieu Baerts (NGI0)
f72aa1b276 selftests: net: include lib/sh/*.sh with lib.sh
Recently, the net/lib.sh file has been modified to include defer.sh from
net/lib/sh/ directory. The Makefile from net/lib has been modified
accordingly, but not the ones from the sub-targets using net/lib.sh.

Because of that, the new file is not installed as expected when
installing the Forwarding, MPTCP, and Netfilter targets, e.g.

  # make -C tools/testing/selftests TARGETS=net/mptcp install \
        INSTALL_PATH=/tmp/kself
  # cd /tmp/kself/
  # ./run_kselftest.sh -c net/mptcp
    TAP version 13
    1..7
    # timeout set to 1800
    # selftests: net/mptcp: mptcp_connect.sh
    # ./../lib.sh: line 5: /tmp/kself/net/lib/sh/defer.sh: No such file
      or directory
    # (...)

This can be fixed simply by adding all the .sh files from net/lib/sh
directory to the TEST_INCLUDES variable in the different Makefile's.

Fixes: a6e263f125 ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20241104-net-next-selftests-lib-sh-deps-v1-1-7c9f7d939fc2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 16:46:39 -08:00
Vadim Fedorenko
0452a2d8b8 mlx5_en: use read sequence for gettimex64
The gettimex64() doesn't modify values in timecounter, that's why there
is no need to update sequence counter. Reduce the contention on sequence
lock for multi-thread PHC reading use-case.

Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241014170103.2473580-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-05 15:47:14 -08:00
Paolo Abeni
ccb35037c4 Merge branch 'net-lan969x-add-vcap-functionality'
Daniel Machon says:

====================
net: lan969x: add VCAP functionality

== Description:

This series is the third of a multi-part series, that prepares and adds
support for the new lan969x switch driver.

The upstreaming efforts is split into multiple series (might change a
bit as we go along):

        1) Prepare the Sparx5 driver for lan969x (merged)

        2) Add support for lan969x (same basic features as Sparx5
           provides excl. FDMA and VCAP, merged).

    --> 3) Add lan969x VCAP functionality.

        4) Add RGMII and FDMA functionality.

== VCAP support:

The Versatile Content-Aware Processor (VCAP) is a content-aware packet
processor that allows wirespeed packet inspection for rich
implementation of, for example, advanced VLAN and QoS classification and
manipulations, IP source guarding, longest prefix matching for Layer-3
routing, and security features for wireline and wireless applications.
This is all achieved by programming rules into the VCAP.

When a VCAP is enabled, every frame passing through the switch is
analyzed and multiple keys are created based on the contents of the
frame. The frame is examined to determine the frame type (for example,
IPv4 TCP frame), so that the frame information is extracted according to
the frame type, port-specific configuration, and classification results
from the basic classification. Keys are applied to the VCAP and when
there is a match between a key and a rule in the VCAP, the rule is then
applied to the frame from which the key was extracted.

After this series is applied, the lan969x driver will support the same
VCAP functionality as Sparx5.

== Patch breakdown:

Patch #1 exposes some VCAP symbols for lan969x.

Patch #2 replaces VCAP uses of SPX5_PORTS with n_ports from the match
data.

Patch #3 adds new VCAP constants to match data

Patch #4 removes the is_sparx5() check to now initialize the VCAP API on
lan969x.

Patch #5 adds the auto-generated VCAP data for lan969x.

Patch #6 adds the VCAP configuration data for lan969x.

Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://patch.msgid.link/20241101-sparx5-lan969x-switch-driver-3-v1-0-3c76f22f4bfa@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:10 +01:00
Daniel Machon
1091487dc7 net: lan969x: add VCAP configuration data
Add configuration data (for consumption by the VCAP API) for the four
VCAP's that we are going to support. The following VCAP's will be
supported:

 - VCAP CLM: (also known as IS0) is part of the analyzer and enables
   frame classification using VCAP functionality.

 - VCAP IS2: is part of ANA_ACL and enables access control lists, using
   VCAP functionality.

 - VCAP ES0: is part of the rewriter and enables rewriting of frames
   using VCAP functionality.

 - VCAP ES2: is part of EACL and enables egress access control lists
   using VCAP functionality

The two VCAP's: CLM and IS2 use shared resources from the SUPER VCAP.
The SUPER VCAP is a shared pool of 6 blocks that can be distributed
freely among CLM and IS2.  Each block in the pool has 3,072 addresses
with entries, actions, and counters. ES0 and ES2 does not use shared
resources.

In the configuration data for lan969x CLM uses blocks 2-4 with a total
of 6 lookups. IS2 uses blocks 0-1 with a total of 4 lookups.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Daniel Machon
7ef750e490 net: lan969x: add autogenerated VCAP information
Platform VCAP data for each VCAP instance is auto-generated using an
internal Microchip tool. The generated VCAP data contains information
about keyfields, keyfield sets, actionfields, actionfield sets and
typegroups, which in combination are used to encode and decode rules in
the VCAP.

Add the auto-generated VCAP file lan969x_vcap_ag_api.c and assign the
two structs: lan969x_vcaps and lan969x_vcap_stats to the match data.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Daniel Machon
d4c97e39bf net: sparx5: execute sparx5_vcap_init() on lan969x
The is_sparx5() check was introduced in an earlier series, to make sure
the sparx5_vcap_init() was not executed on lan969x, as it was not
implemented there yet. Now that it is, remove that check.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Daniel Machon
8caa21e4e4 net: sparx5: add new VCAP constants to match data
In preparation for lan969x VCAP support, add the following three new
VCAP constants to match data:

    - vcaps_cfg (contains configuration data for each VCAP).

    - vcaps (contains auto-generated information about VCAP keys and
      actions).

    - vcap_stats: (contains auto-generated string names of all the keys
      and actions)

Add these constants to the Sparx5 match data constants and use them to
initialize the VCAP's in sparx5_vcap_init().

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Daniel Machon
8f5a812eff net: sparx5: replace SPX5_PORTS with n_ports
The Sparx5 VCAP implementation uses the SPX5_PORTS symbol to iterate over
the 65 front ports of Sparx5. Replace the use with the n_ports constant
from the match data, which translates to 65 of Sparx5 and 30 on lan969x.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Daniel Machon
9bdb67b53f net: sparx5: expose some sparx5 VCAP symbols
In preparation for lan969x VCAP support, expose the following symbols for
use by the lan969x VCAP implementation:

- The symbols SPARX5_*_LOOKUPS defines the number of lookups in each
  VCAP instance.  These are the same for lan969x. Move them to the
  header file.

- The struct sparx5_vcap_inst encapsulates information about a single
  VCAP instance. Move this struct to the header file and declare the
  sparx5_vcap_inst_cfg as extern.

Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 13:31:08 +01:00
Paolo Abeni
7af3a6558c Merge branch 'virtio_net-enable-premapped-mode-by-default'
Xuan Zhuo says:

====================
virtio_net: enable premapped mode by default

v1:
    1. fix some small problems
    2. remove commit "virtio_net: introduce vi->mode"

In the last linux version, we disabled this feature to fix the
regress[1].

The patch set is try to fix the problem and re-enable it.

More info: http://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@linux.alibaba.com

[1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
====================

Link: https://patch.msgid.link/20241029084615.91049-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 11:39:26 +01:00
Xuan Zhuo
fb22437c1b virtio_net: rx remove premapped failover code
Now, the premapped mode can be enabled unconditionally.

So we can remove the failover code for merge and small mode.

The virtnet_rq_xxx() helper would be only used if the mode is using pre
mapping. A check is added to prevent misusing of these API.

Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 11:37:41 +01:00
Xuan Zhuo
47008bb51c virtio_net: enable premapped mode for merge and small by default
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.

In premapped mod, we can perform only one dma operation for the pages of
the alloc frag. This is beneficial for the iommu device.

kernel command line: intel_iommu=on iommu.passthrough=0

       |  strict=0  | strict=1
Before |  775496pps | 428614pps
After  | 1109316pps | 742853pps

In the 6.11, we disabled this feature because a regress [1].

Now, we fix the problem and re-enable it.

[1]: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com

Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 11:37:40 +01:00
Xuan Zhuo
a33f3df850 virtio_net: big mode skip the unmap check
The virtio-net big mode did not enable premapped mode,
so we did not need to check the unmap. And the subsequent
commit will remove the failover code for failing enable
premapped for merge and small mode. So we need to remove
the checking do_dma code in the big mode path.

Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 11:37:40 +01:00
Xuan Zhuo
6aacd14844 virtio-net: fix overflow inside virtnet_rq_alloc
When the frag just got a page, then may lead to regression on VM.
Specially if the sysctl net.core.high_order_alloc_disable value is 1,
then the frag always get a page when do refill.

Which could see reliable crashes or scp failure (scp a file 100M in size
to VM).

The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur.

The commit f9dac92ba9 ("virtio_ring: enable premapped mode whatever
use_dma_api") introduced this problem. And we reverted some commits to
fix this in last linux version. Now we try to enable it and fix this
bug directly.

Here, when the frag size is not enough, we reduce the buffer len to fix
this problem.

Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Tested-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-05 11:37:40 +01:00
Jakub Kicinski
c688a96c43 Merge branch 'fix-sparse-warnings-in-dpaa_eth-driver'
Vladimir Oltean says:

====================
Fix sparse warnings in dpaa_eth driver

This is a follow-up of the discussion at:
https://lore.kernel.org/oe-kbuild-all/20241028-sticky-refined-lionfish-b06c0c@leitao/
where I said I would take care of the sparse warnings uncovered by
Breno's COMPILE_TEST change for the dpaa_eth driver.

There was one warning that I decided to treat as an actual bug:
https://lore.kernel.org/netdev/20241029163105.44135-1-vladimir.oltean@nxp.com/
and what remains here are those warnings which I consider harmless.

I would like Christophe to ack the entire series to be taken through
netdev. I find it weird that the qbman driver, whose major API consumer
is netdev, is maintained by a different group. In this case, the buggy
qm_sg_entry_get_off() function is defined in qbman but exclusively
called in netdev.
====================

Link: https://patch.msgid.link/20241029164317.50182-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:44:45 -08:00
Vladimir Oltean
0a746cf8bb net: dpaa_eth: extract hash using __be32 pointer in rx_default_dqrr()
Sparse provides the following output:

warning: cast to restricted __be32

This is a harmless warning due to the fact that we dereference the hash
stored in the FD using an incorrect type annotation. Suppress the
warning by using the correct __be32 type instead of u32. No functional
change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20241029164317.50182-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:44:43 -08:00
Vladimir Oltean
81f8ee2823 net: dpaa_eth: add assertions about SGT entry offsets in sg_fd_to_skb()
Multi-buffer frame descriptors (FDs) point to a buffer holding a
scatter/gather table (SGT), which is a finite array of fixed-size
entries, the last of which has qm_sg_entry_is_final(&sgt[i]) == true.

Each SGT entry points to a buffer holding pieces of the frame.
DPAARM.pdf explains in the figure called "Internal and External Margins,
Scatter/Gather Frame Format" that the SGT table is located within its
buffer at the same offset as the frame data start is located within the
first packet buffer.

                                 +------------------------+
    Scatter/Gather Buffer        |        First Buffer    |   Last Buffer
      ^ +------------+ ^       +-|---->^ +------------+   +->+------------+
      | |            | | ICEOF | |     | |            |      |////////////|
      | +------------+ v       | |     | |            |      |////////////|
 BSM  | |/ part of //|         | |BSM  | |            |      |////////////|
      | |/ Internal /|         | |     | |            |      |////////////|
      | |/ Context //|         | |     | |            |      |// Frame ///|
      | +------------+         | |     | |            | ...  |/ content //|
      | |            |         | |     | |            |      |////////////|
      | |            |         | |     | |            |      |////////////|
      v +------------+         | |     v +------------+      |////////////|
        | Scatter/ //| sgt[0]--+ |       |// Frame ///|      |////////////|
        | Gather List| ...       |       |/ content //|      +------------+ ^
        |////////////| sgt[N]----+       |////////////|      |            | | BEM
        |////////////|                   |////////////|      |            | |
        +------------+                   +------------+      +------------+ v

BSM = Buffer Start Margin, BEM = Buffer End Margin, both are configured
by dpaa_eth_init_rx_port() for the RX FMan port relevant here.

sg_fd_to_skb() runs in the calling context of rx_default_dqrr() -
the NAPI receive callback - which only expects to receive contiguous
(qm_fd_contig) or scatter/gather (qm_fd_sg) frame descriptors.
Everything else is irrelevant codewise.

The processing done by sg_fd_to_skb() is weird because it does not
conform to the expectations laid out by the aforementioned figure.
Namely, it parses the OFFSET field only for SGT entries with i != 0
(codewise, skb != NULL). In those cases, OFFSET should always be 0.
Also, it does not parse the OFFSET field for the sgt[0] case, the only
case where the buffer offset is meaningful in this context. There, it
uses the fd_off, aka the offset to the Scatter/Gather List in the
Scatter/Gather Buffer from the figure. By equivalence, they should both
be equal to the BSM (in turn, equal to priv->rx_headroom).

This can actually be explained due to the bug which we had in
qm_sg_entry_get_off() until the previous change:

- qm_sg_entry_get_off() did not actually _work_ for sgt[0]. It returned
  zero even with a non-zero offset, so fd_off had to be used as a fill-in.

- qm_sg_entry_get_off() always returned zero for sgt[i>0], and that
  resulted in no user-visible bug, because the buffer offset _was
  supposed_ to be zero for those buffers. So remove it from calculations.

Add assertions about the OFFSET field in both cases (first or subsequent
SGT entries) to make it absolutely obvious when something is not well
handled.

Similar logic can be seen in the driver for the architecturally similar
DPAA2, where dpaa2_eth_build_frag_skb() calls dpaa2_sg_get_offset() only
for i == 0. For the rest, there is even a comment stating the same thing:

	 * Data in subsequent SG entries is stored from the
	 * beginning of the buffer, so we don't need to add the
	 * sg_offset.

Tested on LS1046A.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20241029164317.50182-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:44:43 -08:00
Vladimir Oltean
a12fcef429 soc: fsl_qbman: use be16_to_cpu() in qm_sg_entry_get_off()
struct qm_sg_entry :: offset is a 13-bit field, declared as __be16.

When using be32_to_cpu(), a wrong value will be calculated on little
endian systems (Arm), because type promotion from 16-bit to 32-bit,
which is done before the byte swap and always in the CPU native
endianness, changes the value of the scatter/gather list entry offset in
big-endian interpretation (adds two zero bytes in the LSB interpretation).
The result of the byte swap is ANDed with GENMASK(12, 0), so the result
is always zero, because only those bytes added by type promotion remain
after the application of the bit mask.

The impact of the bug is that scatter/gather frames with a non-zero
offset into the buffer are treated by the driver as if they had a zero
offset. This is all in theory, because in practice, qm_sg_entry_get_off()
has a single caller, where the bug is inconsequential, because at that
call site the buffer offset will always be zero, as will be explained in
the subsequent change.

Flagged by sparse:

warning: cast to restricted __be32
warning: cast from restricted __be16

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20241029164317.50182-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:44:43 -08:00
Rosen Penev
d2068805f6 net: ena: remove devm from ethtool
There's no need for devm bloat here. In addition, these are freed right
before the function exits.

Also swapped kcalloc order for consistency.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://patch.msgid.link/20241101214828.289752-2-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:21:52 -08:00
David Woodhouse
18ec5491a4 ptp: Remove 'default y' for VMCLOCK PTP device
The VMCLOCK device gives support for accurate timekeeping even across
live migration, unlike the KVM PTP clock. To help ensure that users can
always use ptp_vmclock where it's available in preference to ptp_kvm,
set it to 'default PTP_1588_CLOCK_VMCLOCK' instead of 'default y'.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://patch.msgid.link/89955b74d225129d6e3d79b53aa8d81d1b50560f.camel@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:18:10 -08:00
Dr. David Alan Gilbert
6a7d68f727 net: ena: Remove deadcode
ena_com_get_dev_basic_stats() has been unused since 2017's
commit d81db24056 ("net/ena: refactor ena_get_stats64 to be atomic
context safe")

ena_com_get_offload_settings() has been unused since the original
commit of ENA back in 2016 in
commit 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic
Network Adapters (ENA)")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20241102220142.80285-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:17:37 -08:00
Dr. David Alan Gilbert
b356b91708 net: ena: Remove autopolling mode
This manually reverts
commit a4e262cde3 ("net: ena: allow automatic fallback to polling mode")

which is unused.

(I did it manually because there are other minor comment
and function changes surrounding it).
Build tested only.

Suggested-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://patch.msgid.link/20241103194149.293456-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-04 18:12:56 -08:00