Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.
Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.
Fixes: 8b32580df1 ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.
Fixes: e3a2b7ed01 ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.
Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).
This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In alloc_cmd_box(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.
This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.
Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.
While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.
Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.
Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned
With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added switchib driver fails to link if MLXSW_PCI=m:
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'
The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.
Fixes: d1ba526384 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.
Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50
07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98
07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26
07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>>
07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23
07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39
07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46
07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48
07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13
07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97
Average: eth0 142827.50 3179259.70 9206.30 4700578.16
This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.
We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.
lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42
07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02
07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16
07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39
07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21
07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05
07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73
07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07
07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35
07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84
Average: eth0 142835.66 3182165.93 9206.98 4704874.08
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG
Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the self diagnostic tests to support loopback test.
The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.
This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add set/query commands for DCBX_PARAM register
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.
With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.
Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
creation.
b. When reading ETS setting from FW, if the traffic class bandwidth
is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
implementation solves the scenarios when the DCBX is in FW control
and willing bit is on which means the ETS setting is dictated
by remote switch.
Also check ETS capability where needed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.
Note:
priority group in CEE is equivalent to traffic class in ConnectX-4
hardware spec.
bw allocation per priority in CEE is not supported because ConnectX-4
only supports bw allocation per traffic class.
user priority in CEE does not have an equivalent term in ConnectX-4.
Therefore, user priority to priority mapping in CEE is not supported.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure firmware supports qos before exposing the DCB API.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.
Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.
Interrupt moderation is best effort, and we do not really care of
ultra precise counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.
Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure policers and connect them to trap groups.
While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.
Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trap groups can be used to control traps priority, both in terms of
which trap "wins" if a packet matches two traps (priority) and in terms
of packets from which trap group will be scheduled to the cpu first (tc).
They can also be used to set rate limiters (policers) on them (will be
added in the next patches).
Currently, we support two trap groups. In Spectrum we want a better
resolution, so every protocol / flow will have a different trap group,
so we can control its parameters separately. Once the policers will be
implemented, it will also allow us limit the rate of each protocol by
itself.
This patch change the trap group list to include:
* the emad trap group, which is shared for all the devices.
* Switchx2's trap groups, which are a copy of the current trap groups.
* Spectrum's new trap groups, in order to match the above guidelines.
(Switchib is using only the emad trap group, so it require no changes).
This patch also includes new configuration for Spectrum's trap groups,
with primary priority order within them.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a trap for BGP protocol that was previously trapped by the generic trap
for IP2ME. This trap will allow us to have better control (over priority
and rate) of the traffic.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trap groups have many options which we currently set to default values.
In the next patches we will use many of them with non-default values.
Some of these options have no default value, so this patch sets them as
params for the trap group set function. Others almost always use the same
values, so the set function will use this default values. In the rare cases
when they will need to be with other values, these values can be set
directly (using the macros for fields in registers).
Parameters without default value:
TC - the traffic class for packets that hit this trap group.
(old default is the max tc)
priority - if one packet hits multiple trap groups, the group with the
higher priority will "catch" it. (old default is 0)
policer - limit rate policer (old default is disabled)
Default parameters:
swid - switch id, relevant for the emad trap only, ignored on Spectrum.
(new default is 0)
rdq - CPU receive descriptor queue (new default is identical to trap
group id)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the max number of trap groups to resource query.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we set the trap group to pre-determined option, based on whether
it is an rx or event trap.
This commit adds a possibility to chose the trap group, so it can be set
to different values in the following patches.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the event handling in Switchib to be comptible with Spectrum and
Switchx2.
Use the generic listener struct for the events. Init and fini them by loop
(and not by calling each event by its name).
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a macro for creating the generic listener struct for events,
similar to the one for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorganize the traps to use the new generic listener struct and
functions. Use macros to shorten the traps list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the old rx listener struct definitions by the generic ones.
Use the new generic registering / unregistering functions for them.
Add some macros to organize the trap list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum, there is a macro to arrange the traps list.
This macro is useful for everyone who is using rx traps.
Create a similar macro in core.h for creating the generic listener struct
for rx traps.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have 2 types of HW traps to handle, rx traps and events.
The registration workflow for both is very similar. So it only make
sense to create one function to handle both.
This patch creates a struct to hold the data for both cases. It also
creates a registration and an un-registration functions that get this
generic struct as input.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 99724c18fc ("mlxsw: spectrum: Introduce support for
router interfaces") we no longer rely on flooding traffic to the CPU in
order to trap packets intended for the host itself. Therefore, the FDB
MC trap can be removed.
Remove traps for protocols that are not supported yet.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop duplicate header delay.h from mlx5/core/main.c.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first
line of the function so we don't need to check again here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A flow should be offloaded only if the matches are
allowed according to min inline mode.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>