Separate the CBDR data memory alloc standalone. It is convenient for
other part loading, for example the ENETC QOS part.
Reported-and-suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei says:
====================
dpaa2-eth: add support for software TSO
This series adds support for driver level TSO in the dpaa2-eth driver.
The first 5 patches lay the ground work for the actual feature:
rearrange some variable declaration, cleaning up the interraction with
the S/G Table buffer cache etc.
The 6th patch adds the actual driver level software TSO support by using
the usual tso_build_hdr()/tso_build_data() APIs and creates the S/G FDs.
With this patch set we can see the following improvement in a TCP flow
running on a single A72@2.2GHz of the LX2160A SoC:
before: 6.38Gbit/s
after: 8.48Gbit/s
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Once we added support in the dpaa2-eth for driver level software TSO we
observed the following situation: if the EQCR CI (consumer index) is
read from the cache-enabled area we sometimes end up with a computed
value of available enqueue entries bigger than the size of the ring.
This eventually will lead to the multiple enqueue of the same FD which
will determine the same FD to end up on the Tx confirmation path and the
same skb being freed twice.
Just read the consumer index from the cache inhibited area so that we
avoid this situation.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for driver level TSO in the enetc driver using
the TSO API.
There is not much to say about this specific implementation. We are
using the usual tso_build_hdr(), tso_build_data() to create each data
segment, we create an array of S/G FDs where the first S/G entry is
referencing the header data and the remaining ones the data portion.
For the S/G Table buffer we use the same cache of buffers used on the
other non-GSO cases - dpaa2_eth_sgt_get() and dpaa2_eth_sgt_recycle().
We cannot keep a DMA coherent buffer for all the TSO headers because the
DPAA2 architecture does not work in a ring based fashion so we just
allocate a buffer each time.
Even with these limitations we get the following improvement in TCP
termination on the LX2160A SoC, on a single A72 core running at 2.2GHz.
before: 6.38Gbit/s
after: 8.48Gbit/s
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now, the __dpaa2_eth_tx function used a single FD on the stack
to construct the structure to be enqueued. Since we are now preparing
the ground work to add support for TSO done in software at the driver
level, the same function needs to work with an array of FDs and enqueue
as many as the build_*_fd functions create.
Make the necessary adjustments in order to do this. These include:
keeping an array of FDs in a percpu structure, cleaning up the necessary
FDs before populating it and then, retrying the enqueue process up till
all the generated FDs were enqueued or until we reach the maximum number
retries.
This patch does not change the fact that only a single FD will result
from a __dpaa2_eth_tx call but rather just creates the necessary changes
for the next patch.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of allocating memory for an S/G table each time a nonlinear skb
is processed, and then freeing it on the Tx confirmation path, use the
S/G table cache in order to reuse the memory.
For this to work we have to change the size of the cached buffers so
that it can hold the maximum number of scatterlist entries.
Other than that, each allocate/free call is replaced by a call to the
dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle functions, introduced in the
previous patch.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dpaa2-eth driver uses in certain circumstances a buffer cache for
the S/G tables needed in case of a S/G FD. At the moment, the
interraction with the cache is open-coded and couldn't be reused easily.
Add two new functions - dpaa2_eth_sgt_get and dpaa2_eth_sgt_recycle -
which help with code reusability.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of allocating memory and then manually aligning it to the
desired value use napi_alloc_frag_align() directly to streamline the
process.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patches we'll be moving things arroung in the mentioned
function and also add some new variable declarations. Before all this,
cleanup the variable declaration order.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam says:
====================
Priority flow control support for RVU netdev
In network congestion, instead of pausing all traffic on link
PFC allows user to selectively pause traffic according to its
class. This series of patches add support of PFC for RVU netdev
drivers.
Patch1 adds support to disable pause frames by default as
with PFC user can enable either PFC or 802.3 pause frames.
Patch2&3 adds resource management support for flow control
and configures necessary registers for PFC.
Patch4 adds dcb ops registration for netdev drivers.
V2 changes:
Fix compilation error by exporting required symbols 'otx2_config_pause_frm'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Data centric bridging designed to eliminate packet loss due to
queue overflow by adding enhancements to ethernet network such as
proprity flow control etc. This patch adds support for management
of Priority flow control(PFC) on Octeontx2 and CN10K interfaces.
To enable PFC for all priorities
dcb pfc set dev eth0 prio-pfc all:on/off
To enable PFC on selected priorites
dcb pfc set dev eth0 prio-pfc 0:on/off 1:on/off ..7:on/off
With the ntuple commands user can map Priority to receive queues.
On queue overflow NIX will assert backpressure such that PFC pause frames
are genarated with mapped priority.
To map priority 7 to Queue 1
ethtool -U eth0 flow-type ether dst xx:xx:xx:xx:xx:xx vlan 0xe00a
m 0x1fff queue 1
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CN10K MAC block (RPM) and Octeontx2 MAC block (CGX) both supports
PFC flow control and 802.3X flow control pause frames.
Each MAC block supports max 4 LMACS and AF driver assigns same
(MAC,LMAC) to PF and its VFs. As PF and its share same (MAC,LMAC)
pair we need resource management to address below scenarios
1. Maintain PFC and 8023X pause frames mutually exclusive.
2. Reject disable flow control request if other PF or Vfs
enabled it.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prirority based flow control (802.1Qbb) mechanism is similar to
ethernet pause frames (802.3x) instead pausing all traffic on a link,
PFC allows user to selectively pause traffic according to its class.
Oceteontx2 MAC block (CGX) and CN10K Mac block (RPM) both supports
PFC. As upper layer mbox handler is same for both the MACs, this
patch configures PFC by calling apporopritate callbacks.
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation is such that 802.3x pause frames are
enabled by default. As CGX and RPM blocks support PFC
(priority flow control) also, instead of driver enabling one
between them enable them upon request from PF or its VFs.
Also add support to disable pause frames in driver unbind.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Kerr says:
====================
MCTP tag control interface
This series implements a small interface for userspace-controlled
message tag allocation for the MCTP protocol. Rather than leaving the
kernel to allocate per-message tag values, userspace can explicitly
allocate (and release) message tags through two new ioctls:
SIOCMCTPALLOCTAG and SIOCMCTPDROPTAG.
In order to do this, we first introduce some minor changes to the tag
handling, including a couple of new tests for the route input paths.
As always, any comments/queries/etc are most welcome.
v2:
- make mctp_lookup_prealloc_tag static
- minor checkpatch formatting fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a couple of new ioctls for mctp sockets:
SIOCMCTPALLOCTAG and SIOCMCTPDROPTAG. These ioctls provide facilities
for explicit allocation / release of tags, overriding the automatic
allocate-on-send/release-on-reply and timeout behaviours. This allows
userspace more control over messages that may not fit a simple
request/response model.
In order to indicate a pre-allocated tag to the sendmsg() syscall, we
introduce a new flag to the struct sockaddr_mctp.smctp_tag value:
MCTP_TAG_PREALLOC.
Additional changes from Jeremy Kerr <jk@codeconstruct.com.au>.
Contains a fix that was:
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we require an exact match on an incoming packet's dest
address, and the key's local_addr field.
In a future change, we may want to set up a key before packets are
routed, meaning we have no local address to match on.
This change allows key lookups to match on local_addr = MCTP_ADDR_ANY.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we have a couple of paths that check that an EID matches, or
the match value is MCTP_ADDR_ANY.
Rather than open coding this, add a little helper.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a few more tests to check the key/tag lookups on route
input. We add a specific entry to the keys lists, route a packet with
specific header values, and check for key match/mismatch.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a definition for the tag-owner flag, which has TO as a standard
abbreviation. We'll want to add a helper for the actual tag value in a
future change.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
-queue
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-02-08
Joe Damato says:
This patch set makes several updates to the i40e driver stats collection
and reporting code to help users of i40e get a better sense of how the
driver is performing and interacting with the rest of the kernel.
These patches include some new stats (like waived and busy) which were
inspired by other drivers that track stats using the same nomenclature.
The new stats and an existing stat, rx_reuse, are now accessible with
ethtool to make harvesting this data more convenient for users.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netvsc_device_remove() calls vunmap() inside which should not be
called in the interrupt context. Current code calls hv_unmap_memory()
in the free_netvsc_device() which is rcu callback and maybe called
in the interrupt context. This will trigger BUG_ON(in_interrupt())
in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_
remove().
Fixes: 846da38de0 ("net: netvsc: Add Isolation VM support for netvsc driver")
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2022-02-07
Corinna Vinschen says:
Fix the kernel warning "Missing unregister, handled but fix driver"
when running, e.g.,
$ ethtool -G eth0 rx 1024
on igc. Remove memset hack from igb and align igb code to igc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Document Gigabit Ethernet IP found on RZ/G2UL SoC. Gigabit Ethernet
Interface is identical to one found on the RZ/G2L SoC. No driver changes
are required as generic compatible string "renesas,rzg2l-gbeth" will be
used as a fallback.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document Gigabit Ethernet IP found on RZ/V2L SoC. Gigabit Ethernet
Interface is identical to one found on the RZ/G2L SoC. No driver changes
are required as generic compatible string "renesas,rzg2l-gbeth" will be
used as a fallback.
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_dump structure has a union of 'long args[6]' and a context
buffer as scratch space.
Convert ctnetlink to use a structure, its easier to read than the
raw 'args' usage which comes with no type checks and no readable names.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a follow up of the previous patch that enables to get
skb->priority. It's now posssible to set it also.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow up to 16-byte comparisons with a new cmp fast version. Use two
64-bit words and calculate the mask representing the bits to be
compared. Make sure the comparison is 64-bit aligned and avoid
out-of-bound memory access on registers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of two exported functions, export a single option structure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For updating eache missed value we can use cmpxchg.
This also avoids need to disable BH.
kernel robot reported build failure on v1 because not all arches support
cmpxchg for u16, so extend this to u32.
This doesn't increase struct size, existing padding is used.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hengqi Chen says:
====================
Add new macro BPF_KPROBE_SYSCALL, which provides easy access to syscall
input arguments. See [0] and [1] for background.
[0]: https://github.com/libbpf/libbpf-bootstrap/issues/57
[1]: https://github.com/libbpf/libbpf/issues/425
v2->v3:
- Use PT_REGS_SYSCALL_REGS
- Move selftest to progs/bpf_syscall_macro.c
v1->v2:
- Use PT_REGS_PARM2_CORE_SYSCALL instead
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Add syscall-specific variant of BPF_KPROBE named BPF_KPROBE_SYSCALL ([0]).
The new macro hides the underlying way of getting syscall input arguments.
With the new macro, the following code:
SEC("kprobe/__x64_sys_close")
int BPF_KPROBE(do_sys_close, struct pt_regs *regs)
{
int fd;
fd = PT_REGS_PARM1_CORE(regs);
/* do something with fd */
}
can be written as:
SEC("kprobe/__x64_sys_close")
int BPF_KPROBE_SYSCALL(do_sys_close, int fd)
{
/* do something with fd */
}
[0] Closes: https://github.com/libbpf/libbpf/issues/425
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220207143134.2977852-2-hengqi.chen@gmail.com
Ilya Leoshkevich says:
====================
libbpf now has macros to access syscall arguments in an
architecture-agnostic manner, but unfortunately they have a number of
issues on non-Intel arches, which this series aims to fix.
v1: https://lore.kernel.org/bpf/20220201234200.1836443-1-iii@linux.ibm.com/
v1 -> v2:
* Put orig_gpr2 in place of args[1] on s390 (Vasily).
* Fix arm64, powerpc and riscv (Heiko).
v2: https://lore.kernel.org/bpf/20220204041955.1958263-1-iii@linux.ibm.com/
v2 -> v3:
* Undo args[1] change (Andrii).
* Rename PT_REGS_SYSCALL to PT_REGS_SYSCALL_REGS (Andrii).
* Split the riscv patch (Andrii).
v3: https://lore.kernel.org/bpf/20220204145018.1983773-1-iii@linux.ibm.com/
v3 -> v4:
* Undo arm64's and s390's user_pt_regs changes.
* Use struct pt_regs when vmlinux.h is available (Andrii).
* Use offsetofend for accessing orig_gpr2 and orig_x0 (Andrii).
* Move libbpf's copy of offsetofend to a new header.
* Fix riscv's __PT_FP_REG.
* Use PT_REGS_SYSCALL_REGS in test_probe_user.c.
* Test bpf_syscall_macro with userspace headers.
* Use Naveen's suggestions and code in patches 5 and 6.
* Add warnings to arm64's and s390's ptrace.h (Andrii).
v4: https://lore.kernel.org/bpf/20220208051635.2160304-1-iii@linux.ibm.com/
v4 -> v5:
* Go back to v3.
* Do not touch arch headers.
* Use CO-RE struct flavors to access orig_x0 and orig_gpr2.
* Fail compilation if non-CO-RE macros are used to access the first
syscall parameter on arm64 and s390.
* Fix accessing frame pointer on riscv.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
On s390, the first syscall argument should be accessed via orig_gpr2
(see arch/s390/include/asm/syscall.h). Currently gpr[2] is used
instead, leading to bpf_syscall_macro test failure.
orig_gpr2 cannot be added to user_pt_regs, since its layout is a part
of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-11-iii@linux.ibm.com
On arm64, the first syscall argument should be accessed via orig_x0
(see arch/arm64/include/asm/syscall.h). Currently regs[0] is used
instead, leading to bpf_syscall_macro test failure.
orig_x0 cannot be added to struct user_pt_regs, since its layout is a
part of the ABI. Therefore provide access to it only through
PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220209021745.2215452-10-iii@linux.ibm.com