If DCBx update occurs while QPs are open, stop sending edpms until all
QPs are closed.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure device according to DCBx results so that EDPMs
made by RoCE would honor flow-control.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the module initialisation there is a possible race
(basically race between uld and lld) where neither the uld
nor lld notifies the uP about where to route the ctrl queue
completions. LLD skips notifying uP as the rdma queues were
not created by then (will leave it to ULD to notify the uP).
As the ULD comes up, it also skips notifying the uP as the
flag FULL_INIT_DONE is not set yet (ULD assumes that the
interface is not up yet).
Consequently, this race between uld and lld leaves uP
unnotified about where to send the ctrl queue completions
to, leading to iwarp RI_RES WR failure.
Here is the race:
CPU 0 CPU1
- allocates nic rx queus
- t4_sge_alloc_ctrl_txq()
(if rdma rsp queues exists,
tell uP to route ctrl queue
compl to rdma rspq)
- acquires the mutex_lock
- allocates rdma response queues
- if FULL_INIT_DONE set,
tell uP to route ctrl queue compl
to rdma rspq
- relinquishes mutex_lock
- acquires the mutex_lock
- enable_rx()
- set FULL_INIT_DONE
- relinquishes mutex_lock
This patch fixes the above issue.
Fixes: e7519f9926f1('cxgb4: avoid enabling napi twice to the same queue')
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
CC: Stable <stable@vger.kernel.org> # 4.9+
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add general use per-vNIC mailbox area and use it for VLAN filtering
support. Initially proto is hardcoded to 802.1q.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid NULL dereference in setup_sge_queues() when the adapter is
in non offload mode.
Fixes: 0fbc81b3ad ('chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's')
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each Octeon output ring can DMA packets to host memory in two modes: info-
pointer mode and buffer-pointer-only mode. In info-pointer mode, Octeon
takes two buffer pointers for each packet and places the length of the
packet along with specified number of bytes from the beginning of the
packet into one buffer and the rest of the packet in a separate buffer. In
buffer-pointer-only mode, Octeon takes single buffer pointer and places the
length of the packet at the beginning of the buffer followed by the packet
data.
This patch switches all Octeon output rings from info-pointer mode to
buffer-pointer-only mode. This results in fewer DMA setups and cache line
snoops.
Signed-off-by: Prasad Kanneganti <pkanneganti@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add implementation to support ethtool -K ethX rx-vlan-filter on/off.
Rename OCTNET_CMD_ENABLE_VLAN_FILTER command to OCTNET_CMD_VLAN_FILTER_CTL
and add OCTNET_CMD_VLAN_FILTER_ENABLE and OCTNET_CMD_VLAN_FILTER_DISABLE
parameters so that it can be used to enable or disable the filter.
Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series provide some updates and cleanups for mlx5 core and netdevice
driver.
From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZQv0+AAoJEEg/ir3gV/o+1iQH/15I5Pr9KoCWSTN9aUglRupU
8HmJhkf7Novaro6WtIybgMGkdoNTrmHgyTEngAkRq5a5Ws/LrC/1wLH+lVMDh+Fx
/2a5cfPsK483gHWBtAbasBD8SHnsyTIeVnEhuDsevHQNkz3HGuKOgx5ZHF1sdkHU
bj/QU06LNPKAlMDI/wKod13MB4+AdTFemaJRCCgXFvu/p/EfVvB+TStdOsrxj1kx
lDIwkCykJSJsg38HoLXt7Z12nWwgHGf2De04RukKeJ6C6KTdKcUu5EYbaL9BSZZT
jiIayYjRgeXzNhY4R5yLPc0FkecNIgC90YJShUN3nR3PWa+ytaHpfJQPOS4/AW8=
=Tjmk
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox mlx5 updates and cleanups 2017-06-16
mlx5-updates-2017-06-16
This series provide some updates and cleanups for mlx5 core and netdevice
driver.
From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When booting into the kdump/kexec kernel, pHyp and vios
are not prepared for the initialization crq request and
a failover transport event is generated. This is not
handled correctly.
At this point in initialization the driver is still in
the 'probing' state and cannot handle a full reset of the
driver as is normally done for a failover transport event.
To correct this we catch driver resets while still in the
'probing' state and return EAGAIN. This results in the
driver tearing down the main crq and calling ibmvnic_init()
again.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch simplifying the smi read and write error paths. It also
align their error paths with the ones of the xsmi functions.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the xmdio xsmi interface support in the mvmdio driver.
This interface is used in Ethernet controllers on Marvell 370, 7k and 8k
(as of now). The xsmi interface supported by this driver complies with
the IEEE 802.3 clause 45. The xSMI interface is used by 10GbE devices.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a check for the read and write smi operations, to ensure the
MII_ADDR_C45 bit isn't set. This will be needed as soon as the xSMI
support is added to the mvmdio driver.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put the two poll intervals (min and max) in the driver's ops
structure. This is needed to add the xmdio support later.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce an ops structure to add an indirection on the is_done
function, as this is needed to add the xMDIO support later.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MDIO layer already provides per-bus locking, so there's no need for
MDIO bus drivers to do their own internal locking. Remove this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch to use the GENMASK helper for masks.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch replacing spaces by tabs for defined values.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic fix reordering headers alphabetically.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to qede to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Mintz Yuval <Yuval.Mintz@cavium.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to nfp to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to ixgbe to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to thunderx to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Sunil Goutham <sgoutham@cavium.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to bnxt to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to mlx5e to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.
Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:
@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)
@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)
@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)
Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.
Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:
@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)
@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)
which actually doesn't cover pskb_put since there are only three
users overall.
A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A common pattern with skb_put() is to just want to memcpy()
some data into the new space, introduce skb_put_data() for
this.
An spatch similar to the one for skb_put_zero() converts many
of the places using it:
@@
identifier p, p2;
expression len, skb, data;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_data(skb, data, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_data(skb, data, len);
)
(
p2 = (t2)p;
-memcpy(p2, data, len);
|
-memcpy(p, data, len);
)
@@
type t, t2;
identifier p, p2;
expression skb, data;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_data(skb, data, sizeof(t));
)
(
p2 = (t2)p;
-memcpy(p2, data, sizeof(*p));
|
-memcpy(p, data, sizeof(*p));
)
@@
expression skb, len, data;
@@
-memcpy(skb_put(skb, len), data, len);
+skb_put_data(skb, data, len);
(again, manually post-processed to retain some comments)
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.
The following spatch found many more and also removes the
now unnecessary casts:
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_zero(skb, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)
@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)
@@
expression skb, len;
@@
-memset(skb_put(skb, len), 0, len);
+skb_put_zero(skb, len);
Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some code re-ordering, functionally equivalent.
- The !tx_info->inl check is evaluated anyway in both flows
(common case/end case). Run it first, this might finish
the flows earlier.
- dma_unmap calls are identical in both flows, get it out
of the if block into the common area.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define LOG_TXBB_SIZE, log of TXBB_SIZE, and use it with a shift
operation instead of a multiplication with TXBB_SIZE.
Operations are equivalent as TXBB_SIZE is a power of two.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the default TX ring size (from 512 to 1024) to match
the RX ring size.
This gives the XDP TX ring a better chance to keep up with the
rate of its RX ring in case of a high load of XDP_TX actions.
Tested:
Ethtool counter rx_xdp_tx_full used to increase, after applying this
patch it stopped.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having their own NAPIs, XDP TX completion queues get
polled within the corresponding RX NAPI.
This prevents any possible race on TX ring prod/cons indices,
between the context that issues the transmits (RX NAPI) and the
context that handles the completions (was previously done in
a separate NAPI).
This also improves performance, as it decreases the number
of NAPIs running on a CPU, saving the overhead of syncing
and switching between the contexts.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.
XDP_TX packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 12.0 Mpps | 13.8 Mpps | 15% |
IPv6 | 12.0 Mpps | 13.8 Mpps | 15% |
-------------------------------------
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several performance improvements in XDP TX datapath,
including:
- Ring a single doorbell for XDP TX ring per NAPI budget,
instead of doing it per a lower threshold (was 8).
This includes removing the flow of immediate doorbell ringing
in case of a full TX ring.
- Compiler branch predictor hints.
- Calculate values in compile time rather than in runtime.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.
XDP_TX packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 10.3 Mpps | 12.0 Mpps | 17% |
IPv6 | 10.3 Mpps | 12.0 Mpps | 17% |
-------------------------------------
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several small code and performance improvements in stack TX datapath,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- Move tx_info non-inline flow handling to a separate function.
- Calculate data_offset in compile time rather than in runtime
(for !lso_header_size branch).
- Avoid trinary-operator ("?") when value can be preset in a matching
branch.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Gain is too small to be measurable, no degradation sensed.
Results are similar for IPv4 and IPv6.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several small performance improvements in TX CQ polling,
including:
- Compiler branch predictor hints.
- Minimize variables scope.
- More proper check of cq type.
- Use boolean instead of int for a binary indication.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet-rate tests for both regular stack and XDP use cases:
No noticeable gain, no degradation.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several small performance improvements in RX datapath,
including:
- Compiler branch predictor hints.
- Replace a multiplication with a shift operation.
- Minimize variables scope.
- Write-prefetch for packet header.
- Avoid trinary-operator ("?") when value can be preset in a matching
branch.
- Save a branch by updating RX ring doorbell within
mlx4_en_refill_rx_buffers(), which now returns void.
Performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON
(enable by ethtool -L <interface> rx 1).
XDP_DROP packet rate:
Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%).
Drop packets in TC:
-------------------------------------
| Before | After | Gain |
IPv4 | 4.14 Mpps | 4.18 Mpps | 1% |
-------------------------------------
XDP_TX packet rate:
-------------------------------------
| Before | After | Gain |
IPv4 | 10.1 Mpps | 10.3 Mpps | 2% |
IPv6 | 10.1 Mpps | 10.3 Mpps | 2% |
-------------------------------------
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove owner argument, as it is obsolete and unused.
This also saves the overhead of calculating its value in data-path.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.
Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).
This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Unlike ethtool stats, get_stats ndo provides information cached by
update stats work that is running in the background without updating
them explicitly.
We cannot update all counters inside the ndo because some
updates require firmware commands that cannot be performed under a
spinlock.
update_stats work does not need to update ALL counters, since only
some of them are needed by ndo_get_stats.
This patch will allow for a minimal run of update_stats using an extra
parameter which will update necessary counters only and cut 13
firmware commands in each iteration of the work.
Work duration previous to this patch: ~4200us.
Work duration after this patch: ~700us (17% of the original time).
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Move "query queue counter out of buffer" helper function out of
qp.c to en_main.c, since mlx5e netdev driver is the only one to use it.
Also allocate the output buffer on the stack instead of the heap, to reduce
number of heap allocs on update_stats work.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Allocating buffers on the heap every 200ms is something we should avoid,
let's use buffers located on the stack instead.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
Rename rx_symbol_errors_phy to rx_pcs_symbol_err_phy, in order to
prevent confusion with rx_symbol_err_phy counter.
rx_pcs_symbol_err_phy counter counts the number of symbol errors that
were detected on the PCS (regardless of traffic) and weren't
corrected by FEC correction algorithm or that FEC algorithm was
not active on this interface.
rx_symbol_err_phy refers to errors on packet level (physical error
during a packet receive).
Fixes: 5db0a4f64c ("net/mlx5e: Expose physical layer statistical...")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: kernel-team@fb.com
For a better code reuse and readability, use the existing
function arfs_get_tt() to map arfs_type into mlx5e_traffic_types,
instead of duplicating the switch-case logic.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
LAG cannot work if virtual functions are present. Therefore, if LAG is
configured, the attempt to create virtual functions will fail. This gives
precedence to LAG over SRIOV which is not the desired behavior as users
might want to use the bonding/teaming driver also want to work with SRIOV.
In that case we don't want to force an order of actions, first create
virtual functions and only than configure a bonding/teaming net device.
To fix, if LAG is configured during a request to create virtual
functions, remove it and continue.
We ignore ENODEV when trying to forbid lag. This makes sense
because "No such device" means that lag is forbidden anyway.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>