A future change will convert the DMA API implementation from the
architecture specific arch/s390/pci/pci_dma.c to using the common code
drivers/iommu/dma-iommu.c which the utilizes the same IOMMU hardware
through the s390-iommu driver. Unlike the s390 specific DMA API this
requires devices to correctly set the coherent mask to be allowed to use
IOVAs >2^32 in dma_alloc_coherent(). This was however not done for ISM
devices. ISM requires such addresses since currently the DMA aperture
for PCI devices starts at 2^32 and all calls to dma_alloc_coherent()
would thus fail.
Link: https://lore.kernel.org/all/20230310-dma_iommu-v9-1-65bb8edd2beb@linux.ibm.com/
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230524075411.3734141-1-schnelle@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Antoine Tenart says:
====================
net: tcp: make txhash use consistent for IPv4
Series is divided in two parts. First two commits make the txhash (used
for the skb hash in TCP) to be consistent for all IPv4/TCP packets (IPv6
doesn't have the same issue). Last commit improve a hash-related doc.
One example is when using OvS with dp_hash, which uses skb->hash, to
select a path. We'd like packets from the same flow to be consistent, as
well as the hash being stable over time when using net.core.txrehash=0.
Same applies for kernel ECMP which also can use skb->hash.
====================
Link: https://lore.kernel.org/r/20230523161453.196094-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The net.core.txrehash documentation mentions this knob is for listening
sockets only, while sk_rethink_txhash can be called on SYN and RTO
retransmits on all TCP sockets.
Remove the listening socket part.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in
TIME_WAIT and SYN_RECV where it's not set in the reply skb from
ip_send_unicast_reply. Those packets will have a mismatched hash with
others from the same flow as their hashes will be 0. IPv6 does not have
the same issue as the hash is set from the socket txhash in those cases.
This commits sets the hash in the reply skb from ip_send_unicast_reply,
which makes the IPv4 code behaving like IPv6.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit c67b85558f ("ipv6: tcp: send consistent autoflowlabel in
TIME_WAIT state") made the socket txhash also available in TIME_WAIT
sockets but for IPv6 only. Make it available for IPv4 too as we'll use
it in later commits.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add two helper functions for calling PCS methods. phylink_pcs_config()
allows us to handle PCS configuration specifics in one location, rather
than the two call sites. phylink_pcs_link_up() gives us consistency.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q1TzK-007Exd-Rs@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Epping says:
====================
net: phy: mscc: support VSC8501
this updated series of patches adds support for the VSC8501 Ethernet
PHY and fixes support for the VSC8502 PHY in cases where no other
software (like U-Boot) has initialized the PHY after power up.
The first patch simply adds the VSC8502 to the MODULE_DEVICE_TABLE,
where I guess it was unintentionally missing. I have no hardware to
test my change.
The second patch adds the VSC8501 PHY with exactly the same driver
implementation as the existing VSC8502.
The (new) third patch removes phydev locking from
vsc85xx_rgmii_set_skews(), as discussed for v2 of the patch set.
The (now) fourth patch fixes the initialization for VSC8501 and VSC8502.
I have tested this patch with VSC8501 on hardware in RGMII mode only.
https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8501-03_Datasheet_60001741A.PDFhttps://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8502-03_Datasheet_60001742B.pdf
Table 4-42 "RGMII CONTROL, ADDRESS 20E2 (0X14)" Bit 11 for each of
them.
By default the RX_CLK is disabled for these PHYs. In cases where no
other software, like U-Boot, enabled the clock, this results in no
received packets being handed to the MAC.
The patch enables this clock output.
According to Microchip support (case number 01268776) this applies
to all modes (RGMII, GMII, and MII).
Other PHYs sharing the same register map and code, like
VSC8530/31/40/41 have the clock enabled and the relevant bit 11 is
reserved and read-only for them. As per previous discussion the
patch still clears the bit on these PHYs, too, possibly more easily
supporting other future PHYs implementing this functionality.
For the VSC8572 family of PHYs, having a different register map,
no such changes are applied.
====================
Link: https://lore.kernel.org/r/20230523153108.18548-1-david.epping@missinglinkelectronics.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
disabled. To allow packet forwarding towards the MAC it needs to be
enabled.
For other PHYs supported by this driver the clock output is enabled
by default.
Fixes: d316986331 ("net: phy: mscc: add support for VSC8502")
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Holding the struct phy_device (phydev) lock is unnecessary when
accessing phydev->interface in the PHY driver .config_init method,
which is the only place that vsc85xx_rgmii_set_skews() is called from.
The phy_modify_paged() function implements required MDIO bus level
locking, which can not be achieved by a phydev lock.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VSC8501 PHY can use the same driver implementation as the VSC8502.
Adding the PHY ID and copying the handler functions of VSC8502 is
sufficient to operate it.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mscc driver implements support for VSC8502, so its ID should be in
the MODULE_DEVICE_TABLE for automatic loading.
Signed-off-by: David Epping <david.epping@missinglinkelectronics.com>
Fixes: d316986331 ("net: phy: mscc: add support for VSC8502")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chuck Lever says:
====================
Bug fixes for net/handshake
Paolo observed that there is a possible leak of sock->file. I
haven't looked into that yet, but it seems to be separate from
the fixes in this series, so no need to hold these up.
====================
The submissions mentions net-next but it means netdev (perhaps
merge window left over when trees are converged). In any case,
it should have gone into net, but was instead applied to net-next
as commit deb2e484ba ("Merge branch 'net-handshake-fixes'").
These are fixes tho, and Chuck needs them to make progress with
the client so double-merging them into net... it is what it is :(
Link: https://lore.kernel.org/r/168381978252.84244.1933636428135211300.stgit@91.116.238.104.host.secureserver.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable the upper layer protocol to specify the SNI peername. This
avoids the need for tlshd to use a DNS lookup, which can return a
hostname that doesn't match the incoming certificate's SubjectName.
Fixes: 2fd5532044 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If user space never calls DONE, sock->file's reference count remains
elevated. Enable sock->file to be freed eventually in this case.
Reported-by: Jakub Kacinski <kuba@kernel.org>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
trace_handshake_cmd_done_err() simply records the pointer in @req,
so initializing it to NULL is sufficient and safe.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If get_unused_fd_flags() fails, we ended up calling fput(sock->file)
twice.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
handshake_req_submit() now verifies that the socket has a file.
Fixes: 3b3009ea8a ("net/handshake: Create a NETLINK service for handling handshake requests")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZG4AiAAKCRDbK58LschI
g+xlAQCmefGbDuwPckZLnomvt6gl4bkIjs7kc1ySbG9QBnaInwD/WyrJaQIPijuD
qziHPAyx+MEgPseFU1b7Le35SZ66IwM=
=s4R1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-05-24
We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).
The main changes are:
1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
from John Fastabend.
2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
from Anton Protopopov.
3) Init the BPF offload table earlier than just late_initcall,
from Jakub Kicinski.
4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
from Will Deacon.
5) Remove a now unsupported __fallthrough in BPF samples,
from Andrii Nakryiko.
6) Fix a typo in pkg-config call for building sign-file,
from Jeremy Sowden.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
bpf, sockmap: Build helper to create connected socket pair
bpf, sockmap: Pull socket helpers out of listen test for general use
bpf, sockmap: Incorrectly handling copied_seq
bpf, sockmap: Wake up polling after data copy
bpf, sockmap: TCP data stall on recv before accept
bpf, sockmap: Handle fin correctly
bpf, sockmap: Improved check for empty queue
bpf, sockmap: Reschedule is now done through backlog
bpf, sockmap: Convert schedule_work into delayed_work
bpf, sockmap: Pass skb ownership through read_skb
bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
samples/bpf: Drop unnecessary fallthrough
bpf: netdev: init the offload table earlier
selftests/bpf: Fix pkg-config call building sign-file
====================
Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King says:
====================
net: pcs: xpcs: cleanups for clause 73 support
This series cleans up xpcs code, moving much of the clause 73 code
out of the driver into places where others can make use of it.
Specifically, we add a helper to convert a clause 73 advertisement
to ethtool link modes to mdio.h, and a helper to resolve the clause
73 negotiation state to phylink, which includes the pause modes.
In doing this cleanup, several issues were identified with the
original xpcs implementation:
1) it masks the link partner advertisement with its own advertisement
so userspace can't see what the full link partner advertisement
was.
2) it was always setting pause modes irrespective of the advertisements
on either end of the link.
3) it was reading the STAT1 registers multiple times. Reading STAT1
has the side effect of unlatching the link-down status, so
multiple reads should be avoided.
This patch series addresses the first two first by addressing the
issues, and then by moving over to the new helpers. The third issue
is solved by restructuring the xpcs code.
====================
Link: https://lore.kernel.org/r/ZGyR/jDyYTYzRklg@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoid reading the STAT1 registers more than once while getting the PCS
state, as this register contains latching-low bits that are lost after
the first read.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use phylink_resolve_c73() to resolve the clause 73 autonegotiation
result.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xpcs was indicating symmetric pause should be enabled regardless of
the advertisements by either party. Fix this to use
linkmode_resolve_pause() now that we're no longer obliterating the
link partner's advertisement by logically anding it with our own.
This is transitional, the function will be entirely replaced with
phylink_resolve_c73() in the following patch.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lp_advertising is supposed to reflect the link partner's advertisement
unmodified by the local advertisement, but xpcs bitwise ands it with
the local advertisement prior to calculating the resolution of the
negotiation.
Fix this by moving the bitwise and to xpcs_resolve_lpa_c73() so it can
place the results in a temporary bitmap before passing that to
ixpcs_get_max_usxgmii_speed().
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert xpcs clause 73 reading to use the newly introduced
mii_c73_to_linkmode() helper to translate the link partner
advertisement to an ethtool bitmap.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Read the clause 73 link partner advertisement in a loop and then
translate to the ethtool modes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a function to resolve clause 73 negotiation according to the
priority resolution function described in clause 73.3.6.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Phylink had two chunks of code virtually the same for resolving the
negotiated pause modes. Factor this down to one function.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add a helper to convert a clause 73 advertisement to an ethtool bitmap.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jiri Pirko says:
====================
devlink: small port_new/del() cleanup
This patchset cleans up couple of leftovers after recent devlink locking
changes. Previously, both port_new/dev() commands were called without
holding instance lock. Currently all devlink commands are called with
instance lock held.
The first patch just removes redundant port notification.
The second one removes couple of outdated comments.
The last patch changes port_dev() to have devlink_port pointer as an arg
instead of port_index, which makes it similar to the rest of port
related ops.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Historically there was a reason why port_dev() along with for example
port_split() did get port_index instead of the devlink_port pointer.
With the locking changes that were done which ensured devlink instance
mutex is hold for every command, the port ops could get devlink_port
pointer directly. Change the forgotten port_dev() op to be as others
and pass devlink_port pointer instead of port_index.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All commands are called holding instance lock. Remove the outdated
comment that says otherwise.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The notification about created port is send from devl_port_register()
function called from ops->port_new(). No need to send it again here,
so remove the call and the helper function.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Donald Hunter says:
====================
tools: ynl: Add byte-order support for struct members
This patchset adds support to ynl for handling byte-order in struct
members. The first patch is a refactor to use predefined Struct() objects
instead of generating byte-order specific formats on the fly. The second
patch adds byte-order handling for struct members.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for byte-order in struct members in the genetlink-legacy
spec.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a dict of predefined Struct() objects to decode scalar types in native,
big or little endian format. This removes the repetitive code for the
scalar variants and ensures all the signed variants are supported.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
optlen is fetched without checking whether there is more than one byte to parse.
It can lead to out-of-bounds access.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: c61a404325 ("[IPV6]: Find option offset by type.")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmRsUT8ACgkQSD+KveBX
+j61Zwf9GyvzrD29Lmu0/BTsLAnf7GAyJi/SMzXJ09Tp1dAYSWmF2DE3fzKvNoQ/
VT2udSKbZ96b2N9SGF396KZaV8gHxg23IAzILia1JDPd4Pn7YaNymAIWGU7vn+Tq
ErG7atPVnJV5R1H6SwO2KpOClG7jOjUPMF87uDCl2g+IpYNgjKa9hcnt5bguztC2
KBW/sV7BCYVWOUrmlSe1hH2Fn4djhga3i4JBIzjp55Dz1voIu5SHsT13Ou2/UuiC
1RDqBTJ9WvnviAxICbI96TLMJTFnDo9HFGHPIQRhZ6k25PIuWX6GLMKaceVlfCd+
BZvRG+PNOsDR9a9tFCjMBfx7KE0bMw==
=dwjN
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2023-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-fixes-2023-05-22
This series provides bug fixes for the mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When taking a network interface down (or removing a SFP module) after
the PHY has encountered an error, phy_stop() complains incorrectly
that it was called from HALTED state.
The reason this is incorrect is that the network driver will have
called phy_start() when the interface was brought up, and the fact
that the PHY has a problem bears no relationship to the administrative
state of the interface. Taking the interface administratively down
(which calls phy_stop()) is always the right thing to do after a
successful phy_start() call, whether or not the PHY has encountered
an error.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault says:
====================
ipv4: Remove RTO_ONLINK from udp, ping and raw sockets.
udp_sendmsg(), ping_v4_sendmsg() and raw_sendmsg() use similar patterns
for restricting their route lookup to on-link hosts. Although they use
slightly different code, they all use RTO_ONLINK to override the least
significant bit of their tos value.
RTO_ONLINK is used to restrict the route scope even when the scope is
set to RT_SCOPE_UNIVERSE. Therefore it isn't necessary: we can properly
set the scope to RT_SCOPE_LINK instead.
Removing RTO_ONLINK will allow to convert .flowi4_tos to dscp_t in the
future, thus allowing to properly separate the DSCP from the ECN bits
in the networking stack.
This patch series defines a common helper to figure out what's the
scope of the route lookup. This unifies the way udp, ping and raw
sockets get their routing scope and removes their dependency on
RTO_ONLINK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.
Now that the scope is determined by ip_sendmsg_scope(), we need to
check its result to set the 'connected' variable.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ip_sendmsg_scope() to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag. The objective is to eventually remove RTO_ONLINK, which will
allow converting .flowi4_tos to dscp_t.
The MSG_DONTROUTE and SOCK_LOCALROUTE cases were already handled by
raw_sendmsg() (SOCK_LOCALROUTE was handled by the RT_CONN_FLAGS*()
macros called by get_rtconn_flags()). However, opt.is_strictroute
wasn't taken into account. Therefore, a side effect of this patch is to
now honour opt.is_strictroute, and thus align raw_sendmsg() with
ping_v4_sendmsg() and udp_sendmsg().
Since raw_sendmsg() was the only user of get_rtconn_flags(), we can now
remove this function.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define a new helper to figure out the correct route scope to use on TX,
depending on socket configuration, ancillary data and send flags.
Use this new helper to properly initialise the scope in
flowi4_init_output(), instead of overriding tos with the RTO_ONLINK
flag.
The objective is to eventually remove RTO_ONLINK, which will allow
converting .flowi4_tos to dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit c6d96df9fa ("net: ethernet: mtk_eth_soc: drop generic vlan rx
offload, only use DSA untagging") makes VLAN RX offloading to be only used
on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621
and MT7622). The commit disables the proper handling of special tagged
(DSA) frames, added with commit 87e3df4961 ("net-next: ethernet:
mediatek: add CDM able to recognize the tag for DSA"), for non
MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the
other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the
CPU improperly. This issue can be observed on frames with TCP, for example,
a TCP speed test using iperf3 won't work.
The commit disables the proper handling of special tagged (DSA) frames
because it assumes that these SoCs don't use more than one MAC, which is
wrong. Although I made Frank address this false assumption on the patch log
when they sent the patch on behalf of Felix, the code still made changes
with this assumption.
Therefore, the proper handling of special tagged (DSA) frames must be kept
enabled in all circumstances as it doesn't affect non DSA tagged frames.
Hardware DSA untagging, introduced with the commit 2d7605a729 ("net:
ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX
offloading are operations on the two CDM components of the frame engine,
CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and
are between the MACs and the CPU. These operations apply to all MACs of the
SoC so if one MAC uses DSA and the other doesn't, the hardware DSA
untagging operation will cause the CDMP component to transmit non DSA
tagged frames to the CPU improperly.
Since the VLAN RX offloading feature configuration was dropped, VLAN RX
offloading can only be used along with hardware DSA untagging. So, for the
case above, we need to disable both features and leave it to the CPU,
therefore software, to untag the DSA and VLAN tags.
So the correct way to handle this is:
For all SoCs:
Enable the proper handling of special tagged (DSA) frames
(MTK_CDMQ_IG_CTRL).
For non MTK_NETSYS_V2 SoCs:
Enable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Enable VLAN RX offloading (MTK_CDMP_EG_CTRL).
When a non MTK_NETSYS_V2 SoC MAC does not use DSA:
Disable hardware DSA untagging (MTK_CDMP_IG_CTRL).
Disable VLAN RX offloading (MTK_CDMP_EG_CTRL).
Fixes: c6d96df9fa ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We had a good run, but after 4 weeks of use we heard someone
asking about pw-bot commands. Let's explain its existence
in the docs. It's not a complete documentation but hopefully
it's enough for the casual contributor. The project and scope
are in flux so the details would likely become out of date,
if we were to document more in depth.
Link: https://lore.kernel.org/all/20230522140057.GB18381@nucnuc.mle/
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522230903.1853151-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO
packets. See https://lwn.net/Articles/895398/. This can improve the
throughput and CPU usage.
Perf test result:
ip -d link show $DEV
gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000
For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate.
In single streams when line rate is not saturated, we expect throughput improvements.
When the networking is performing at line rate, we expect cpu usage improvements.
Tcp_stream (unidirectional stream test, T=thread, F=flow):
skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68
skb=64kb, T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67
skb=180kb, T=1, F=1, yes zerocopy: throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52
skb=64kb, T=1, F=1, yes zerocopy: throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25
skb=180kb, T=20, F=100, no zerocopy: throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4
skb=64kb, T=20, F=100, no zerocopy: throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69
skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7
skb=64kb, T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 50749f2dd6 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with
TX timestamp.") added a call to skb_orphan_frags_rx() to fix leaks with
zerocopy skbs. But it ended up adding a leak of its own. When
skb_orphan_frags_rx() fails, the function just returns, leaking the skb
it just cloned. Free it before returning.
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Fixes: 50749f2dd6 ("tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230522153020.32422-1-ptyadav@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>