Rather than using the platform device pointer field in the IPA
pointer, pass a platform device pointer to ipa_smp2p_init(). Use
that pointer throughout that function.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than using the platform device pointer field in the IPA
pointer, pass a platform device pointer to ipa_smp2p_irq_init().
Use that pointer throughout that function (without assuming it's
the same as the IPA platform device pointer).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than using the platform device pointer field in the IPA
pointer, pass a platform device pointer to ipa_mem_init(). Use
that pointer throughout that function.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than using the platform device pointer field in the IPA
pointer, pass a platform device pointer to ipa_reg_init(). Use
that pointer throughout that function.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new function ipa_interrupt_init() that is called at probe
time to allocate and initialize the IPA interrupt data structure.
Create ipa_interrupt_exit() as its inverse.
This follows the normal IPA driver pattern of *_init() functions
doing things that can be done before access to hardware is required.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the return type of ipa_interrupt_config() to be an error
code rather than an IPA interrupt structure pointer, and assign the
the pointer within that function.
Change ipa_interrupt_deconfig() to take the IPA pointer as argument
and have it invalidate the ipa->interrupt pointer.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthieu Baerts says:
====================
mptcp: add TCP_NOTSENT_LOWAT sockopt support
Patch 3 does the magic of adding TCP_NOTSENT_LOWAT support, all the
other ones are minor cleanup seen along when working on the new
feature.
Note that this feature relies on the existing accounting for snd_nxt.
Such accounting is not 110% accurate as it tracks the most recent
sequence number queued to any subflow, and not the actual sequence
number sent on the wire. Paolo experimented a lot, trying to implement
the latter, and in the end it proved to be both "too complex" and "not
necessary".
The complexity raises from the need for additional lock and a lot of
refactoring to introduce such protections without adding significant
overhead. Additionally, snd_nxt is currently used and exposed with the
current semantic by the internal packet scheduling. Introducing a
different tracking will still require us to keep the old one.
More interestingly, a more accurate tracking could be not strictly
necessary: as the MPTCP socket enqueues data to the subflows only up
to the available send window, any enqueue data is sent on the wire
instantly, without any blocking operation short or a drop in the tx
path at the nft or TC layer.
====================
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Most TCP-level socket options get an integer from user space, and
set the corresponding field under the msk-level socket lock.
Reduce the code duplication moving such operations in the common code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for such socket option storing the user-space provided
value in a new msk field, and using such data to implement the
_mptcp_stream_memory_free() helper, similar to the TCP one.
To avoid adding more indirect calls in the fast path, open-code
a variant of sk_stream_memory_free() in mptcp_sendmsg() and add
direct calls to the mptcp stream memory free helper where possible.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mptcp_get_int_option() helper is needless open-coded in a
couple of places, replace the duplicate code with the helper
call.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 5cf92bbadc ("mptcp: re-enable sndbuf autotune"), the
MPTCP_NOSPACE bit is redundant: it is always set and cleared together with
SOCK_NOSPACE.
Let's drop the first and always relay on the latter, dropping a bunch
of useless code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not set rtnl_link_stats64 fields to zero, since they are zeroed
before ops->ndo_get_stats64 is called in core dev_get_stats() function.
Also, simplify the data collection by removing the temporary variable.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the nlmon driver and leverage the network
core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last patch which added support to extend the firmware shared
data to add channel data information has introduced a bug due to
the reserved space not adjusted accordingly.
This patch fixes the issue and also adds BUILD_BUG to avoid this
regression error.
Fixes: 997814491c ("Octeontx2-af: Fetch MAC channel info from firmware")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_device poll_dev in struct igc_q_vector was added
in one of the initial commits, but never used.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ziwei Xiao says:
====================
gve: Add header split support
Currently, the ethtool's ringparam has added a new field
tcp-data-split for enabling and disabling header split. These three
patches will utilize that ethtool flag to support header split in GVE
driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To record the stats of header split packets, three stats are added in
the driver's ethtool stats.
- rx_hsplit_pkt is the split packets count with header split
- rx_hsplit_bytes is the received header bytes count with header split
- rx_hsplit_unsplit_pkt is the unsplit packet count due to header buffer
overflow or zero header length when header split is enabled
Currently, it's entering the stats_update critical section more than
once per packet. We have plans to avoid that in the future change to let
all the stats_update happen in one place at the end of
`gve_rx_poll_dqo`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add header buffers and ethtool support to enable header split via the
tcp-data-split flag in ethtool's ringparam config. A coherent dma memory
is allocated for the header buffers. There is one header buffer per ring
entry by calculating the offset to the header-buffers starting address.
The header buffer is always copied directly into the skb and payload is
always added as frags. When there is a header buffer overflow or the
header length is 0, the driver places the whole unsplit packet in frags.
When toggling header split, the driver will call gve_adjust_config to
set its queues appropriately. If header split is enabled by the user and
the max packet buffer size is no less than 4KB, driver will set the
packet buffer size as 4KB to support TCP_ZEROCOPY_RECEIVE. Otherwise the
driver will use the default 2KB as the packet buffer size.
`ethtool -G <dev> tcp-data-split on/off` is the command to toggle header
split.
`ethtool -g <dev>` will show the status of header split with the field
of `tcp-data-split`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To enable header split via ethtool, we first need to query the device to
get the max rx buffer size and header buffer size. Add a device option
to get these values and store them in the driver. If the header buffer
size received from the device is non-zero, it means header split is
supported in the device.
Currently the max rx buffer size will only be used when header split is
enabled which will set the data_buffer_size_dqo to be the max rx buffer
size. Also change the data_buffer_size_dqo from int to u16 since we are
modifying it and making it to be consistent with max_rx_buffer_size.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the ip6_tunnel driver and leverage the network
core allocation instead.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson says:
====================
ionic: code cleanup and performance tuning
Brett has been performance testing and code tweaking and has
come up with several improvements for our fast path operations.
In a simple single thread / single queue iperf case on a 1500 MTU
connection we see an improvement from 74.2 to 86.7 Gbits/sec.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The MODULE_AUTHOR macro is supposed to be a person
not a company.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up complaints from an xmastree.py scan.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the kernel's CQE dim table to align better with the
driver's use of completion queues, and use the tx moderation
when using Tx interrupts.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An earlier change moved the hwstamp queue check into a helper
function with an unlikely(). However, it makes more sense for
the caller to decide if it's likely() or unlikely(), so make
the change to support that.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To help make sure we're only accessing things we really need
to access we can cut down on the q->lif->netdev references by
using q->dev which is already in cache.
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using q->lif->netdev, just pass the netdev when it's
locally defined.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is a lot of transmit traffic the driver can get into a
situation that the device is starved due to the doorbell never
being rung. This can happen if xmit_more is set constantly
and __netdev_tx_sent_queue() keeps returning false. Fix this
by checking if the queue needs to be stopped right before
calling __netdev_tx_sent_queue(). Use MAX_SKB_FRAGS + 1 as the
stop condition because that's the maximum number of frags
supported for non-TSO transmit.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently calls netdev_tx_completed_queue() for every
Tx completion. However, this API is only meant to be called once
per NAPI if any Tx work is done. Make the necessary changes to
support calling netdev_tx_completed_queue() only once per NAPI.
Also, use the __netdev_tx_sent_queue() API, which supports the
xmit_more functionality.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of napi_consume_skb so that skb recycling
can happen by way of the napi_skb_cache.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perf was showing some hot spots in ionic_tx_descs_needed()
for TSO traffic. Rework the function to return sooner where
possible.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cut down the number of default Tx and Rx descriptors to save
initial memory requirements.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the driver attempts to wake the Tx queue
for every descriptor processed. However, this is
overkill and can cause thrashing since Tx xmit can be
running concurrently on a different CPU than Tx clean.
Fix this by refactoring Tx cq servicing into its own
function so the Tx wake code can run after processing
all Tx descriptors.
The driver isn't using the expected memory barriers
to make sure the stop/start bits are coherent. Fix
this by making sure to use the correct memory barriers.
Also, the driver is using the wake API during Tx
xmit even though it's already scheduled. Fix this by
using the start API during Tx xmit.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
active-low was lifted to the common schema for leds, but it went
unnoticed that the leds-multicolour binding had "additionalProperties:
false" where the other users had "unevaluatedProperties: false", thereby
disallowing active-low for multicolour leds. Explicitly permit it again.
Fixes: c94d178313 ("dt-bindings: net: phy: Make LED active-low property common")
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3e2f544dd8 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.
Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the bareudp driver and leverage the network
core allocation.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: better use of skb helpers
First patch is a pure cleanup.
Second patch adds a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len(),
this could help to discover old bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
(skb_transport_header(skb) - skb_network_header(skb))
can be replaced by skb_network_header_len(skb)
Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len()
to catch cases were the transport_header was not set.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cleanup patch, making code a bit more concise.
1) Use skb_network_offset(skb) in place of
(skb_network_header(skb) - skb->data)
2) Use -skb_network_offset(skb) in place of
(skb->data - skb_network_header(skb))
3) Use skb_transport_offset(skb) in place of
(skb_transport_header(skb) - skb->data)
4) Use skb_inner_transport_offset(skb) in place of
(skb_inner_transport_header(skb) - skb->data)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZeEKVAAKCRDbK58LschI
g7oYAQD5Jlv4fIVTvxvfZrTTZ2tU+OsPa75mc8SDKwpash3YygEA8kvESy8+t6pg
D6QmSf1DIZdFoSp/bV+pfkNWMeR8gwg=
=mTAj
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-02-29
We've added 119 non-merge commits during the last 32 day(s) which contain
a total of 150 files changed, 3589 insertions(+), 995 deletions(-).
The main changes are:
1) Extend the BPF verifier to enable static subprog calls in spin lock
critical sections, from Kumar Kartikeya Dwivedi.
2) Fix confusing and incorrect inference of PTR_TO_CTX argument type
in BPF global subprogs, from Andrii Nakryiko.
3) Larger batch of riscv BPF JIT improvements and enabling inlining
of the bpf_kptr_xchg() for RV64, from Pu Lehui.
4) Allow skeleton users to change the values of the fields in struct_ops
maps at runtime, from Kui-Feng Lee.
5) Extend the verifier's capabilities of tracking scalars when they
are spilled to stack, especially when the spill or fill is narrowing,
from Maxim Mikityanskiy & Eduard Zingerman.
6) Various BPF selftest improvements to fix errors under gcc BPF backend,
from Jose E. Marchesi.
7) Avoid module loading failure when the module trying to register
a struct_ops has its BTF section stripped, from Geliang Tang.
8) Annotate all kfuncs in .BTF_ids section which eventually allows
for automatic kfunc prototype generation from bpftool, from Daniel Xu.
9) Several updates to the instruction-set.rst IETF standardization
document, from Dave Thaler.
10) Shrink the size of struct bpf_map resp. bpf_array,
from Alexei Starovoitov.
11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer,
from Benjamin Tissoires.
12) Fix bpftool to be more portable to musl libc by using POSIX's
basename(), from Arnaldo Carvalho de Melo.
13) Add libbpf support to gcc in CORE macro definitions,
from Cupertino Miranda.
14) Remove a duplicate type check in perf_event_bpf_event,
from Florian Lehner.
15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them
with notrace correctly, from Yonghong Song.
16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible
array to fix build warnings, from Kees Cook.
17) Fix resolve_btfids cross-compilation to non host-native endianness,
from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
selftests/bpf: Test if shadow types work correctly.
bpftool: Add an example for struct_ops map and shadow type.
bpftool: Generated shadow variables for struct_ops maps.
libbpf: Convert st_ops->data to shadow type.
libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
bpf, arm64: use bpf_prog_pack for memory management
arm64: patching: implement text_poke API
bpf, arm64: support exceptions
arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT
bpf: add is_async_callback_calling_insn() helper
bpf: introduce in_sleepable() helper
bpf: allow more maps in sleepable bpf programs
selftests/bpf: Test case for lacking CFI stub functions.
bpf: Check cfi_stubs before registering a struct_ops type.
bpf: Clarify batch lookup/lookup_and_delete semantics
bpf, docs: specify which BPF_ABS and BPF_IND fields were zero
bpf, docs: Fix typos in instruction-set.rst
selftests/bpf: update tcp_custom_syncookie to use scalar packet offset
bpf: Shrink size of struct bpf_map/bpf_array.
...
====================
Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet says:
====================
inet: no longer use RTNL to protect inet_dump_ifaddr()
This series convert inet so that a dump of addresses (ip -4 addr)
no longer requires RTNL.
====================
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) inet_dump_ifaddr() can can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the following patch, inet_base_seq() will no longer be called
with RTNL held.
Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
and inet_base_seq().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa->ifa_flags can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa->ifa_preferred_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa->ifa_valid_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa->ifa_tstamp can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Do the same for ifa->ifa_cstamp to prepare upcoming changes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Wei says:
====================
netdevsim: link and forward skbs between ports
This patchset adds the ability to link two netdevsim ports together and
forward skbs between them, similar to veth. The goal is to use netdevsim
for testing features e.g. zero copy Rx using io_uring.
This feature was tested locally on QEMU, and a selftest is included.
I ran netdev selftests CI style and all tests but the following passed:
- gro.sh
- l2tp.sh
- ip_local_port_range.sh
gro.sh fails because virtme-ng mounts as read-only and it tries to write
to log.txt. This issue was reported to virtme-ng upstream.
l2tp.sh and ip_local_port_range.sh both fail for me on net-next/main as
well.
---
v13->v14:
- implement ndo_get_iflink()
- fix returning 0 if peer is already linked during linking or not linked
during unlinking
- bump dropped counter if nsim_ipsec_tx() fails and generally reorder
nsim_start_xmit()
- fix overflowing lines and indentations
v12->v13:
- wait for socat listening port to be ready before sending data in
selftest
v11->v12:
- fix leaked netns refs
- fix rtnetlink.sh kci_test_ipsec_offload() selftest
v10->v11:
- add udevadm settle after creating netdevsims in selftest
v9->v10:
- fix not freeing skb when not there is no peer
- prevent possible id clashes in selftest
- cleanup selftest on error paths
v8->v9:
- switch to getting netns using fd rather than id
- prevent linking a netdevsim to itself
- update tests
v7->v8:
- fix not dereferencing RCU ptr using rcu_dereference()
- remove unused variables in selftest
v6->v7:
- change link syntax to netnsid:ifidx
- replace dev_get_by_index() with __dev_get_by_index()
- check for NULL peer when linking
- add a sysfs attribute for unlinking
- only update Tx stats if not dropped
- update selftest
v5->v6:
- reworked to link two netdevsims using sysfs attribute on the bus
device instead of debugfs due to deadlock possibility if a netdevsim
is removed during linking
- removed unnecessary patch maintaining a list of probed nsim_devs
- updated selftest
v4->v5:
- reduce nsim_dev_list_lock critical section
- fixed missing mutex unlock during unwind ladder
- rework nsim_dev_peer_write synchronization to take devlink lock as
well as rtnl_lock
- return err msgs to user during linking if port doesn't exist or
linking to self
- update tx stats outside of RCU lock
v3->v4:
- maintain a mutex protected list of probed nsim_devs instead of using
nsim_bus_dev
- fixed synchronization issues by taking rtnl_lock
- track tx_dropped skbs
v2->v3:
- take lock when traversing nsim_bus_dev_list
- take device ref when getting a nsim_bus_dev
- return 0 if nsim_dev_peer_read cannot find the port
- address code formatting
- do not hard code values in selftests
- add Makefile for selftests
v1->v2:
- renamed debugfs file from "link" to "peer"
- replaced strstep() with sscanf() for consistency
- increased char[] buf sz to 22 for copying id + port from user
- added err msg w/ expected fmt when linking as a hint to user
- prevent linking port to itself
- protect peer ptr using RCU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I cleared IFF_NOARP flag from netdevsim dev->flags in order to support
skb forwarding. This breaks the rtnetlink.sh selftest
kci_test_ipsec_offload() test because ipsec does not connect to peers it
cannot transmit to.
Fix the issue by adding a neigh entry manually. ipsec_offload test now
successfully pass.
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connect two netdevsim ports in different namespaces together, then send
packets between them using socat.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net>
Signed-off-by: David S. Miller <davem@davemloft.net>