Commit Graph

1074669 Commits

Author SHA1 Message Date
Radu Bulie
c4680c9785 dpaa2-eth: Update SINGLE_STEP register access
DPAA2 MAC supports 1588 one step timestamping.
If this option is enabled then for each transmitted PTP event packet,
the 1588 SINGLE_STEP register is accessed to modify the following fields:

-offset of the correction field inside the PTP packet
-UDP checksum update bit,  in case the PTP event packet has
 UDP encapsulation

These values can change any time, because there may be multiple
PTP clients connected, that receive various 1588 frame types:
- L2 only frame
- UDP / Ipv4
- UDP / Ipv6
- other

The current implementation uses dpni_set_single_step_cfg to update the
SINLGE_STEP register.
Using an MC command  on the Tx datapath for each transmitted 1588 message
introduces high delays, leading to low throughput and consequently to a
small number of supported PTP clients. Besides these, the nanosecond
correction field from the PTP packet will contain the high delay from the
driver which together with the originTimestamp will render timestamp
values that are unacceptable in a GM clock implementation.

This patch updates the Tx datapath for 1588 messages when single step
timestamp is enabled and provides direct access to SINGLE_STEP register,
eliminating the  overhead caused by the dpni_set_single_step_cfg
MC command. MC version >= 10.32 implements this functionality.
If the MC version does not have support for returning the
single step register base address, the driver will use
dpni_set_single_step_cfg command for updates operations.

All the delay introduced by dpni_set_single_step_cfg
function will be eliminated (if MC version has support for returning the
base address of the single step register), improving the egress driver
performance for PTP packets when single step timestamping is enabled.

Before these changes the maximum throughput for 1588 messages with
single step hardware timestamp enabled was around 2000pps.
After the updates the throughput increased up to 32.82 Mbps / 46631.02 pps.

Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:27:17 +00:00
Radu Bulie
9572594ecf dpaa2-eth: Update dpni_get_single_step_cfg command
dpni_get_single_step_cfg is an MC firmware command used for
retrieving the contents of SINGLE_STEP 1588 register available
in a DPMAC.

This patch adds a new version of this command that returns as an extra
argument the physical base address of the aforementioned register.
The address will be used to directly modify the contents of the
SINGLE_STEP register instead of invoking the MC command
dpni_set_single_step_cgf. The former approach introduced huge delays on
the TX datapath when one step PTP events were transmitted. This led to low
throughput and high latencies observed in the PTP correction field.

Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:27:16 +00:00
Eric Dumazet
8a4fc54b07 net: get rid of rtnl_lock_unregistering()
After recent patches, and in particular commits
 faab39f63c ("net: allow out-of-order netdev unregistration") and
 e5f80fcf86 ("ipv6: give an IPv6 dev to blackhole_netdev")
we no longer need the barrier implemented in rtnl_lock_unregistering().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:24:03 +00:00
Volodymyr Mytnyk
b3ae2d350d net: prestera: flower: fix destroy tmpl in chain
Fix flower destroy template callback to release template
only for specific tc chain instead of all chain tempaltes.

The issue was intruduced by previous commit that introduced
multi-chain support.

Fixes: fa5d824ce5 ("net: prestera: acl: add multi-chain support offload")
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:22:03 +00:00
Eric Dumazet
36a29fb6b2 bridge: switch br_net_exit to batch mode
cleanup_net() is competing with other rtnl users.

Instead of calling br_net_exit() for each netns,
call br_net_exit_batch() once.

This gives cleanup_net() ability to group more devices
and call unregister_netdevice_many() only once for all bridge devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:20:12 +00:00
David S. Miller
a7cc3464e6 Merge branch 'mctp-i2c'
Matt Johnston says:

====================
MCTP I2C driver

This patch series adds a netdev driver providing MCTP transport over
I2C.

I think I've addressed all the points raised in v5. It now has
mctp_i2c_unregister() to run things in the correct order, waiting for
the worker thread and I2C rx to complete.

Cheers,
Matt

--

v6:
 - Changed netdev register/unregister/free to avoid races. Ensure that
   netif functions are not used by irq handler/threads after unregister.
 - Fix incoming I2C hwaddr that was previously incorrect (left
   shifted 1 bit)
 - Add a check that byte_count wire header matches the length received
 - Renamed I2C driver to mctp-i2c-interface
 - Removed __func__ from print messages, added missing newlines
 - Removed sysfs mctp_current_mux file which was used for debug
 - Renamed curr_lock to sel_lock
 - Tidied comment formatting
 - Fix newline in Kconfig
v5:
 - Fix incorrect format string
v4:
 - Switch to __i2c_transfer() rather than __i2c_smbus_xfer(), drop 255 byte
   smbus patches
 - Use wait_event_idle() for the sleeping TX thread
 - Use dev_addr_set()
v3:
 - Added Reviewed-bys for npcm7xx
 - Resend with net-next open
v2:
 - Simpler Kconfig condition for i2c-mux dependency, from Randy Dunlap
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:18:50 +00:00
Matt Johnston
f5b8abf9fc mctp i2c: MCTP I2C binding driver
Provides MCTP network transport over an I2C bus, as specified in
DMTF DSP0237. All messages between nodes are sent as SMBus Block Writes.

Each I2C bus to be used for MCTP is flagged in devicetree by a
'mctp-controller' property on the bus node. Each flagged bus gets a
mctpi2cX net device created based on the bus number. A
'mctp-i2c-controller' I2C client needs to be added under the adapter. In
an I2C mux situation the mctp-i2c-controller node must be attached only
to the root I2C bus. The I2C client will handle incoming I2C slave block
write data for subordinate busses as well as its own bus.

In configurations without devicetree a driver instance can be attached
to a bus using the I2C slave new_device mechanism.

The MCTP core will hold/release the MCTP I2C device while responses
are pending (a 6 second timeout or once a socket is closed, response
received etc). While held the MCTP I2C driver will lock the I2C bus so
that the correct I2C mux remains selected while responses are received.

(Ideally we would just lock the mux to keep the current bus selected for
the response rather than a full I2C bus lock, but that isn't exposed in
the I2C mux API)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Wolfram Sang <wsa@kernel.org> # I2C transport parts
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:18:49 +00:00
Matt Johnston
6881e493b0 dt-bindings: net: New binding mctp-i2c-controller
Used to define a local endpoint to communicate with MCTP peripherals
attached to an I2C bus. This I2C endpoint can communicate with remote
MCTP devices on the I2C bus.

In the example I2C topology below (matching the second yaml example) we
have MCTP devices on busses i2c1 and i2c6. MCTP-supporting busses are
indicated by the 'mctp-controller' DT property on an I2C bus node.

A mctp-i2c-controller I2C client DT node is placed at the top of the
mux topology, since only the root I2C adapter will support I2C slave
functionality.
                                               .-------.
                                               |eeprom |
    .------------.     .------.               /'-------'
    | adapter    |     | mux  --@0,i2c5------'
    | i2c1       ----.*|      --@1,i2c6--.--.
    |............|    \'------'           \  \  .........
    | mctp-i2c-  |     \                   \  \ .mctpB  .
    | controller |      \                   \  '.0x30   .
    |            |       \  .........        \  '.......'
    | 0x50       |        \ .mctpA  .         \ .........
    '------------'         '.0x1d   .          '.mctpC  .
                            '.......'          '.0x31   .
                                                '.......'
(mctpX boxes above are remote MCTP devices not included in the DT at
present, they can be hotplugged/probed at runtime. A DT binding for
specific fixed MCTP devices could be added later if required)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:18:49 +00:00
Mobashshera Rasool
4b340a5a72 net: ip6mr: add support for passing full packet on wrong mif
This patch adds support for MRT6MSG_WRMIFWHOLE which is used to pass
full packet and real vif id when the incoming interface is wrong.
While the RP and FHR are setting up state we need to be sending the
registers encapsulated with all the data inside otherwise we lose it.
The RP then decapsulates it and forwards it to the interested parties.
Currently with WRONGMIF we can only be sending empty register packets
and will lose that data.
This behaviour can be enabled by using MRT_PIM with
val == MRT6MSG_WRMIFWHOLE. This doesn't prevent MRT6MSG_WRONGMIF from
happening, it happens in addition to it, also it is controlled by the same
throttling parameters as WRONGMIF (i.e. 1 packet per 3 seconds currently).
Both messages are generated to keep backwards compatibily and avoid
breaking someone who was enabling MRT_PIM with val == 4, since any
positive val is accepted and treated the same.

Signed-off-by: Mobashshera Rasool <mobash.rasool.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:05:54 +00:00
Alexander Lobakin
7e1b54d077 i40e: remove dead stores on XSK hotpath
The 'if (ntu == rx_ring->count)' block in i40e_alloc_rx_buffers_zc()
was previously residing in the loop, but after introducing the
batched interface it is used only to wrap-around the NTU descriptor,
thus no more need to assign 'xdp'.

'cleaned_count' in i40e_clean_rx_irq_zc() was previously being
incremented in the loop, but after commit f12738b6ec
("i40e: remove unnecessary cleaned_count updates") it gets
assigned only once after it, so the initialization can be dropped.

Fixes: 6aab0bb0c5 ("i40e: Use the xsk batched rx allocation interface")
Fixes: f12738b6ec ("i40e: remove unnecessary cleaned_count updates")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 12:34:04 +00:00
Jakub Kicinski
bbcf340d9d Merge branch 'add-checks-for-incoming-packet-addresses'
Jeremy Kerr says:

====================
Add checks for incoming packet addresses

This series adds a couple of checks for valid addresses on incoming MCTP
packets. We introduce a couple of helpers in 1/2, and use them in the
ingress path in 2/2.
====================

Link: https://lore.kernel.org/r/20220218042554.564787-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 21:24:33 -08:00
Jeremy Kerr
86cdfd63f2 mctp: add address validity checking for packet receive
This change adds some basic sanity checks for the source and dest
headers of packets on initial receive.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 21:24:29 -08:00
Jeremy Kerr
cb196b7259 mctp: replace mctp_address_ok with more fine-grained helpers
Currently, we have mctp_address_ok(), which checks if an EID is in the
"valid" range of 8-254 inclusive. However, 0 and 255 may also be valid
addresses, depending on context. 0 is the NULL EID, which may be set
when physical addressing is used. 255 is valid as a destination address
for broadcasts.

This change renames mctp_address_ok to mctp_address_unicast, and adds
similar helpers for broadcast and null EIDs, which will be used in an
upcoming commit.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 21:24:28 -08:00
Jacques de Laval
47f0bd5032 net: Add new protocol attribute to IP addresses
This patch adds a new protocol attribute to IPv4 and IPv6 addresses.
Inspiration was taken from the protocol attribute of routes. User space
applications like iproute2 can set/get the protocol with the Netlink API.

The attribute is stored as an 8-bit unsigned integer.

The protocol attribute is set by kernel for these categories:

- IPv4 and IPv6 loopback addresses
- IPv6 addresses generated from router announcements
- IPv6 link local addresses

User space may pass custom protocols, not defined by the kernel.

Grouping addresses on their origin is useful in scenarios where you want
to distinguish between addresses based on who added them, e.g. kernel
vs. user space.

Tagging addresses with a string label is an existing feature that could be
used as a solution. Unfortunately the max length of a label is
15 characters, and for compatibility reasons the label must be prefixed
with the name of the device followed by a colon. Since device names also
have a max length of 15 characters, only -1 characters is guaranteed to be
available for any origin tag, which is not that much.

A reference implementation of user space setting and getting protocols
is available for iproute2:

9a6ea18bd7

Signed-off-by: Jacques de Laval <Jacques.De.Laval@westermo.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220217150202.80802-1-Jacques.De.Laval@westermo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 21:20:06 -08:00
Jakub Kicinski
6e2e59eaee Merge branch 'ionic-driver-updates'
Shannon Nelson says:

====================
ionic: driver updates

These are a couple of checkpatch cleanup patches, a bug fix,
and something to alleviate memory pressure in tight places.
====================

Link: https://lore.kernel.org/r/20220217220252.52293-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 20:37:17 -08:00
Shannon Nelson
ecea8bb429 ionic: clean up comments and whitespace
Fix up some checkpatch complaints that have crept in:
doubled words words, mispellled words, doubled lines.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 20:37:14 -08:00
Shannon Nelson
799c230e93 ionic: prefer strscpy over strlcpy
Replace strlcpy with strscpy to clean up a checkpatch complaint.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 20:37:14 -08:00
Brett Creeley
116dce0ff0 ionic: Use vzalloc for large per-queue related buffers
Use vzalloc for per-queue info structs that don't need any
DMA mapping to help relieve memory pressure found when used
in our limited SOC environment.

Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 20:37:14 -08:00
Shannon Nelson
12b1b997c0 ionic: catch transition back to RUNNING with fw_generation 0
In some graceful updates that get initially triggered by the
RESET event, especially with older firmware, the fw_generation
bits don't change but the fw_status is seen to go to 0 then back
to 1.  However, the driver didn't perform the restart, remained
waiting for fw_generation to change, and got left in limbo.

This is because the clearing of idev->fw_status_ready to 0
didn't happen correctly as it was buried in the transition
trigger: since the transition down was triggered not here
but in the RESET event handler, the clear to 0 didn't happen,
so the transition back to 1 wasn't detected.

Fix this particular case by bringing the setting of
idev->fw_status_ready back out to where it was before.

Fixes: 398d1e37f9 ("ionic: add FW_STOPPING state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 20:37:14 -08:00
Eric Dumazet
86213f80da net: avoid quadratic behavior in netdev_wait_allrefs_any()
If the list of devices has N elements, netdev_wait_allrefs_any()
is called N times, and linkwatch_forget_dev() is called N*(N-1)/2 times.

Fix this by calling linkwatch_forget_dev() only once per device.

Fixes: faab39f63c ("net: allow out-of-order netdev unregistration")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220218065430.2613262-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-18 07:28:31 -08:00
Eric Dumazet
086d49058c ipv6: annotate some data-races around sk->sk_prot
IPv6 has this hack changing sk->sk_prot when an IPv6 socket
is 'converted' to an IPv4 one with IPV6_ADDRFORM option.

This operation is only performed for TCP and UDP, knowing
their 'struct proto' for the two network families are populated
in the same way, and can not disappear while a reader
might use and dereference sk->sk_prot.

If we think about it all reads of sk->sk_prot while
either socket lock or RTNL is not acquired should be using READ_ONCE().

Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also
write over sk->sk_prot.

BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt

write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0:
 do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline]
 ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019
 udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649
 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489
 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
 __do_sys_setsockopt net/socket.c:2191 [inline]
 __se_sys_setsockopt net/socket.c:2188 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1:
 inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659
 ____sys_recvmsg+0x16c/0x320
 ___sys_recvmsg net/socket.c:2674 [inline]
 do_recvmmsg+0x3f5/0xae0 net/socket.c:2768
 __sys_recvmmsg net/socket.c:2847 [inline]
 __do_sys_recvmmsg net/socket.c:2870 [inline]
 __se_sys_recvmmsg net/socket.c:2863 [inline]
 __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:53:28 +00:00
Cédric Le Goater
7ea0c16a74 net/ibmvnic: Cleanup workaround doing an EOI after partition migration
There were a fair amount of changes to workaround a firmware bug leaving
a pending interrupt after migration of the ibmvnic device :

commit 2df5c60e19 ("net/ibmvnic: Ignore H_FUNCTION return from H_EOI
       		    to tolerate XIVE mode")
commit 284f87d2f3 ("Revert "net/ibmvnic: Fix EOI when running in
       		    XIVE mode"")
commit 11d49ce9f7 ("net/ibmvnic: Fix EOI when running in XIVE mode.")
commit f23e0643cd ("ibmvnic: Clear pending interrupt after device reset")

Here is the final one taking into account the XIVE interrupt mode.

Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:47:48 +00:00
jeffreyji
aaae162aeb teaming: deliver link-local packets with the link they arrive on
skb is ignored if team port is disabled. We want the skb to be delivered
if it's an link layer packet.

Issue is already fixed for bonding in
commit b89f04c61e ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")

changelog:

v2: change LLDP -> link layer in comments/commit descrip, comment format

Signed-off-by: jeffreyji <jeffreyji@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:40:52 +00:00
David S. Miller
a3b355c778 Merge branch 'qca8k-phylink'
Russell King says:

====================
net: dsa: qca8k: convert to phylink_pcs and mark as non-legacy

This series adds support into DSA for the mac_select_pcs method, and
converts qca8k to make use of this, eventually marking qca8k as non-
legacy.

Patch 1 adds DSA support for mac_select_pcs.
Patch 2 and patch 3 moves code around in qca8k to make patch 4 more
readable.
Patch 4 does a simple conversion to phylink_pcs.
Patch 5 moves the serdes configuration to phylink_pcs.
Patch 6 marks qca8k as non-legacy.

v2: fix dsa_phylink_mac_select_pcs() formatting and double-blank line
in patch 5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:33 +00:00
Russell King (Oracle)
d9cbacf057 net: dsa: qca8k: mark as non-legacy
The qca8k driver does not make use of the speed, duplex, pause or
advertisement in its phylink_mac_config() implementation, so it can be
marked as a non-legacy driver.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:33 +00:00
Russell King (Oracle)
7544b3ff74 net: dsa: qca8k: move pcs configuration
Move the PCS configuration to qca8k_pcs_config().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:33 +00:00
Russell King (Oracle)
9612a8f915 net: dsa: qca8k: convert to use phylink_pcs
Convert the qca8k driver to use the phylink_pcs support to talk to the
SGMII PCS.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:33 +00:00
Russell King (Oracle)
10728cd796 net: dsa: qca8k: move qca8k_phylink_mac_link_state()
Move qca8k_phylink_mac_link_state() to separate the code movement from
code changes.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:32 +00:00
Russell King (Oracle)
3ce855f040 net: dsa: qca8k: move qca8k_setup()
Move qca8k_setup() to be later in the file to avoid needing prototypes
for called functions.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:32 +00:00
Russell King (Oracle)
bde018222c net: dsa: add support for phylink mac_select_pcs()
Add DSA support for the phylink mac_select_pcs() method so DSA drivers
can return provide phylink with the appropriate PCS for the PHY
interface mode.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:28:32 +00:00
Tom Rix
8aba73ef44 net: ethernet: xilinx: cleanup comments
Remove the second 'the'.
Replacements:
endiannes to endianness
areconnected to are connected
Mamagement to Management
undoccumented to undocumented
Xilink to Xilinx
strucutre to structure

Change kernel-doc comment style to c style for
/* Management ...

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:11:10 +00:00
Gal Pressman
8467fadc11 net: gro: Fix a 'directive in macro's argument list' sparse warning
Following the cited commit, sparse started complaining about:
../include/net/gro.h:58:1: warning: directive in macro's argument list
../include/net/gro.h:59:1: warning: directive in macro's argument list

Fix that by moving the defines out of the struct_group() macro.

Fixes: de5a1f3ce4 ("net: gro: minor optimization for dev_gro_receive()")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:00:25 +00:00
Xu Wang
129c77b569 s390/qeth: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so
there is no need to flush it explicitly.

Remove the redundant 'flush_workqueue()' calls.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220216075155.940-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:52:25 -08:00
Vladimir Oltean
d2b1d186ce net: dsa: delete unused exported symbols for ethtool PHY stats
Introduced in commit cf96357303 ("net: dsa: Allow providing PHY
statistics from CPU port"), it appears these were never used.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220216193726.2926320-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:07:09 -08:00
Eric Dumazet
f20cfd662a net: add sanity check in proto_register()
prot->memory_allocated should only be set if prot->sysctl_mem
is also set.

This is a followup of commit 2520611151 ("crypto: af_alg - get
rid of alg_memory_allocated").

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220216171801.3604366-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:06:06 -08:00
Christophe JAILLET
60f8ad2392 net: ll_temac: Use GFP_KERNEL instead of GFP_ATOMIC when possible
XTE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and the default value for
'rx_bd_num' is RX_BD_NUM_DEFAULT	(i.e. 1024)

So this loop allocates more than 9 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/694abd65418b2b3974106a82d758e3474c65ae8f.1645042560.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:04:06 -08:00
Christophe JAILLET
6b48bece87 net: nixge: Use GFP_KERNEL instead of GFP_ATOMIC when possible
NIXGE_MAX_JUMBO_FRAME_SIZE is over 9000 bytes and RX_BD_NUM 128.

So this loop allocates more than 1 Mo of memory.

Previous memory allocations in this function already use GFP_KERNEL, so
use __netdev_alloc_skb_ip_align() and an explicit GFP_KERNEL instead of a
implicit GFP_ATOMIC.

This gives more opportunities of successful allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/28d2c8e05951ad02a57eb48333672947c8bb4f81.1645043881.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:03:39 -08:00
Jakub Kicinski
3ad8ba6a3e Merge branch 'mptcp-selftest-fine-tuning-and-cleanup'
Mat Martineau says:

====================
mptcp: Selftest fine-tuning and cleanup

Patch 1 adjusts the mptcp selftest timeout to account for slow machines
running debug builds.

Patch 2 simplifies one test function.

Patches 3-6 do some cleanup, like deleting unused variables and avoiding
extra work when only printing usage information.

Patch 7 improves the checksum tests by utilizing existing checksum MIBs.
====================

Link: https://lore.kernel.org/r/20220218030311.367536-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 20:00:02 -08:00
Geliang Tang
24720d7452 selftests: mptcp: add csum mib check for mptcp_connect
This patch added the data checksum error mib counters check for the
script mptcp_connect.sh when the data checksum is enabled.

In do_transfer(), got the mib counters twice, before and after running
the mptcp_connect commands. The latter minus the former is the actual
number of the data checksum mib counter.

The output looks like this:

ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP   (duration    86ms) [ OK ]
ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP   (duration    66ms) [ FAIL ]
server got 1 data checksum error[s]

Fixes: 94d66ba1d8 ("selftests: mptcp: enable checksum in mptcp_connect.sh")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/255
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:59 -08:00
Matthieu Baerts
87154755d9 selftests: mptcp: join: check for tools only if needed
To allow showing the 'help' menu even if these tools are not available.

While at it, also avoid launching the command then checking $?. Instead,
the check is directly done in the 'if'.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:59 -08:00
Matthieu Baerts
93827ad58f selftests: mptcp: join: create tmp files only if needed
These tmp files will only be created when a test will be launched.

This avoid 'dd' output when '-h' is used for example.

While at it, also avoid creating netns that will be removed when
starting the first test.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:59 -08:00
Matthieu Baerts
0a40e273be selftests: mptcp: join: remove unused vars
Shellcheck found that these variables were set but never used.

Note that rndh is no longer prefixed with '0-' but it doesn't change
anything.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:59 -08:00
Matthieu Baerts
22514d5296 selftests: mptcp: join: exit after usage()
With an error if it is an unknown option.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:59 -08:00
Geliang Tang
bccefb7624 selftests: mptcp: simplify pm_nl_change_endpoint
This patch simplified pm_nl_change_endpoint(), using id-based address
lookups only. And dropped the fragile way of parsing 'addr' and 'id'
from the output of pm_nl_show_endpoints().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:58 -08:00
Matthieu Baerts
d17b968b98 selftests: mptcp: increase timeout to 20 minutes
With the increase number of tests, one CI instance, using a debug kernel
config and not recent hardware, takes around 10 minutes to execute the
slowest MPTCP test: mptcp_join.sh.

Even if most CIs don't take that long to execute these tests --
typically max 10 minutes to run all selftests -- it will help some of
them if the timeout is increased.

The timeout could be disabled but it is always good to have an extra
safeguard, just in case.

Please note that on slow public CIs with kernel debug settings, it has
been observed it can easily take up to 45 minutes to execute all tests
in this very slow environment with other jobs running in parallel.
The slowest test, mptcp_join.sh takes ~30 minutes in this case.

In such environments, the selftests timeout set in the 'settings' file
is disabled because this environment is known as being exceptionnally
slow. It has been decided not to take such exceptional environments into
account and set the timeout to 20min.

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 19:59:58 -08:00
Jakub Kicinski
a3fc4b1d09 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2022-02-17

We've added 29 non-merge commits during the last 8 day(s) which contain
a total of 34 files changed, 1502 insertions(+), 524 deletions(-).

The main changes are:

1) Add BTFGen support to bpftool which allows to use CO-RE in kernels without
   BTF info, from Mauricio Vásquez, Rafael David Tinoco, Lorenzo Fontana and
   Leonardo Di Donato. (Details: https://lpc.events/event/11/contributions/948/)

2) Prepare light skeleton to be used in both kernel module and user space
   and convert bpf_preload.ko to use light skeleton, from Alexei Starovoitov.

3) Rework bpftool's versioning scheme and align with libbpf's version number;
   also add linked libbpf version info to "bpftool version", from Quentin Monnet.

4) Add minimal C++ specific additions to bpftool's skeleton codegen to
   facilitate use of C skeletons in C++ applications, from Andrii Nakryiko.

5) Add BPF verifier sanity check whether relative offset on kfunc calls overflows
   desc->imm and reject the BPF program if the case, from Hou Tao.

6) Fix libbpf to use a dynamically allocated buffer for netlink messages to
   avoid receiving truncated messages on some archs, from Toke Høiland-Jørgensen.

7) Various follow-up fixes to the JIT bpf_prog_pack allocator, from Song Liu.

8) Various BPF selftest and vmtest.sh fixes, from Yucong Sun.

9) Fix bpftool pretty print handling on dumping map keys/values when no BTF
   is available, from Jiri Olsa and Yinjun Zhang.

10) Extend XDP frags selftest to check for invalid length, from Lorenzo Bianconi.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (29 commits)
  bpf: bpf_prog_pack: Set proper size before freeing ro_header
  selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails
  selftests/bpf: Fix vmtest.sh to launch smp vm.
  libbpf: Fix memleak in libbpf_netlink_recv()
  bpftool: Fix C++ additions to skeleton
  bpftool: Fix pretty print dump for maps without BTF loaded
  selftests/bpf: Test "bpftool gen min_core_btf"
  bpftool: Gen min_core_btf explanation and examples
  bpftool: Implement btfgen_get_btf()
  bpftool: Implement "gen min_core_btf" logic
  bpftool: Add gen min_core_btf command
  libbpf: Expose bpf_core_{add,free}_cands() to bpftool
  libbpf: Split bpf_core_apply_relo()
  bpf: Reject kfunc calls that overflow insn->imm
  selftests/bpf: Add Skeleton templated wrapper as an example
  bpftool: Add C++-specific open/load/etc skeleton wrappers
  selftests/bpf: Fix GCC11 compiler warnings in -O2 mode
  bpftool: Fix the error when lookup in no-btf maps
  libbpf: Use dynamically allocated buffer when receiving netlink messages
  libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0
  ...
====================

Link: https://lore.kernel.org/r/20220217232027.29831-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 17:23:52 -08:00
Song Liu
d24d2a2b0a bpf: bpf_prog_pack: Set proper size before freeing ro_header
bpf_prog_pack_free() uses header->size to decide whether the header
should be freed with module_memfree() or the bpf_prog_pack logic.
However, in kvmalloc() failure path of bpf_jit_binary_pack_alloc(),
header->size is not set yet. As a result, bpf_prog_pack_free() may treat
a slice of a pack as a standalone kvmalloc'd header and call
module_memfree() on the whole pack. This in turn causes use-after-free by
other users of the pack.

Fix this by setting ro_header->size before freeing ro_header.

Fixes: 33c9805860 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
Reported-by: syzbot+ecb1e7e51c52f68f7481@syzkaller.appspotmail.com
Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220217183001.1876034-1-song@kernel.org
2022-02-17 13:15:36 -08:00
David S. Miller
2aed49da6c Merge branch 'prestera-route-offloading'
Yevhen Orlov says:

====================
net: marvell: prestera: add basic routes offloading

Add support for blackhole and local routes for Marvell Prestera driver.
Subscribe on fib notifications and handle add/del.

Add features:
 - Support route adding.
   e.g.: "ip route add blackhole 7.7.1.1/24"
   e.g.: "ip route add local 9.9.9.9 dev sw1p30"
 - Support "rt_trap", "rt_offload", "rt_offload_failed" flags
 - Handle case, when route in "local" table overlaps route in "main" table
   example:
        ip ro add blackhole 7.7.7.7
        ip ro add local 7.7.7.7 dev sw1p30
        # blackhole route will be deoffloaded. rt_offload flag disappeared

Limitations:
 - Only "blackhole" and "local" routes supported. "nexthop" routes is TRAP
   for now and will be implemented soon.
 - Only "local" and "main" tables supported
====================

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17 20:45:31 +00:00
Yevhen Orlov
4394fbcb78 net: marvell: prestera: handle fib notifications
For now we support only TRAP or DROP, so we can offload only "local" or
"blackhole" routes.
Nexthop routes is TRAP for now. Will be implemented soon.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17 20:45:01 +00:00
Yevhen Orlov
16de3db120 net: marvell: prestera: add hardware router objects accounting for lpm
Add new router_hw object "fib_node". For now it support only DROP and
TRAP mode.

Co-developed-by: Taras Chornyi <tchornyi@marvell.com>
Signed-off-by: Taras Chornyi <tchornyi@marvell.com>
Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17 20:45:01 +00:00