Set up the infrastructure for managing Rx filters. We can't ask the
hardware for what filters it has, so we keep a local list of filters
that we've pushed into the HW.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set up the initial NDO structure and callbacks for netdev
to use, and register the netdev. This will allow us to do
a few basic operations on the device, but no traffic yet.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AdminQ is fine for sending messages and requests to the NIC,
but we also need to have events published from the NIC to the
driver. The NotifyQ handles this for us, using the same interrupt
as AdminQ.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add AdminQ specific message requests and completion handling.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the NIC configuration happens through the AdminQ message
queue. NAPI is used for basic interrupt handling and message
queue management. These routines are set up to be shared among
different types of queues when used in slow-path handling.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ionic interrupt model is based on interrupt control blocks
accessed through the PCI BAR. Doorbell registers are used by
the driver to signal to the NIC that requests are waiting on
the message queues. Interrupts are used by the NIC to signal
to the driver that answers are waiting on the completion queues.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LIF is the Logical Interface, which represents the external
connections. The NIC can multiplex many LIFs to a single port,
but in most setups, LIF0 is the primary control for the port.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port management commands apply to the physical port
associated with the PCI device, which might be shared among
several logical interfaces.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ionic device has a small set of PCI registers, including a
device control and data space, and a large set of message
commands.
Also adds new DEVLINK_INFO_VERSION_GENERIC tags for
ASIC_ID, ASIC_REV, and FW.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a basic driver framework for the Pensando IONIC
network device. There is no functionality right now other than
the ability to load and unload.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with firmware version MC10.18.0, a new counter for in flight
Tx frames is offered. Use it when bringing down the interface to
determine when all pending Tx frames have been processed by hardware
instead of sleeping a fixed amount of time.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent firmware versions expose more DPNI counters.
Export relevant ones via ethtool -S.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we prepare to read more pages from the DPNI stat counters,
reorganize the code a bit to make it easier to extend.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-09-03
This series contains updates to ice driver only.
Anirudh adds the ability for the driver to handle EMP resets correctly
by adding the logic to the existing ice_reset_subtask().
Jeb fixes up the logic to properly free up the resources for a switch
rule whether or not it was successful in the removal.
Brett fixes up the reporting of ITR values to let the user know odd ITR
values are not allowed. Fixes the driver to only disable VLAN pruning
on VLAN deletion when the VLAN being deleted is the last VLAN on the VF
VSI.
Chinh updates the driver to determine the TSA value from the priority
value when in CEE mode.
Bruce aligns the driver with the hardware specification by ensuring that
a PF reset is done as part of the unload logic. Also update the driver
unloading field, based on the latest hardware specification, which
allows us to remove an unnecessary endian conversion. Moves #defines
based on their need in the code.
Jesse adds the current state of auto-negotiation in the link up message.
In addition, adds additional information to inform the user of an issue
with the topology/configuration of the link.
Usha updates the driver to allow the maximum TCs that the firmware
supports, rather than hard coding to a set value.
Dave updates the DCB initialization flow to handle the case of an actual
error during DCB init. Updated the driver to report the current stats,
even when the netdev is down, which aligns with our other drivers.
Mitch fixes the VF reset code flows to ensure that it properly calls
ice_dis_vsi_txq() to notify the firmware that the VF is being reset.
Michal fixes the driver so the DCB is not enabled when the SW LLDP is
activated, which was causing a communication issue with other NICs. The
problem lies in that DCB was being enabled without checking the number
of TCs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).
Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.
The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.
Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.
Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.
Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).
Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s with FW and current driver.
This will improve OVS based solution performance.
Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.
mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.
When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using SW steering instead of FW.
The implementation includes:
Domain - The steering domain is the object that all other object resides
in. It holds the memory allocator, send engine, locks and other shared
data needed by lower objects such as table, matcher, rule, action.
Each domain can contain multiple tables. Domain is equivalent to
namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
currently in mlx5_core fs_core (flow steering core).
Table - Table objects are used for holding multiple matchers, each table
has a level used to prevent processing loops. Packets are being
directed to this table once it is set as the root table, this is done
by fs_core using a FW command. A packet is being processed inside the
table matcher by matcher until a successful hit, otherwise the packet
will perform the default action.
Matcher - Matchers objects are used to specify the fields mask for
matching when processing a packet. A matcher belongs to a table, each
matcher can hold multiple rules, each rule with different matching
values corresponding to the matcher mask. Each matcher has a priority
used for rule processing order inside the table.
Action - Action objects are created to specify different steering actions
such as count, reformat (encapsulate, decapsulate, ...), modify
header, forward to table and many other actions. When creating a rule
a sequence of actions can be provided to be executed on a successful
match.
Rule - Rule objects are used to specify a specific match on packets as
well as the actions that should be executed. A rule belongs to a
matcher.
STE - This layer is used to hold the specific STE format for the device
and to convert the requested rule to STEs. Each rule is constructed of
an STE chain, Multiple rules construct a steering graph. Each node in
the graph is a hash table containing multiple STEs. The index of each
STE in the hash table is being calculated using a CRC32 hash function.
Memory pool - Used for managing and caching device owned memory for rule
insertion. The memory is being allocated using DM (device memory) API.
Communication with device - layer for standard RDMA operation using RC QP
to configure the device steering.
Command utility - This module holds all of the FW commands that are
required for SW steering to function.
Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.
2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)
3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files
4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation
5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev
eswitch steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1uxPAACgkQSD+KveBX
+j5AkggAymoYqG2G+s8cLa4vQFySaD1Td3VzzWglp7PlpDBE3UcSoMAMg/gIDU1D
8F04PeCsJ6snt1ICk56vPNyAEHWfWeBUd56+QK5lEJBuwozyFvBh6HP81Bnr6T/n
n6uTx45ljAFQPTHJjEOLBPSzEXecLu07+mvpzSoW0F3ehfGbELhL1IkVobr/RELx
z4xZW9uM2vm5ylheWvjf4V1S/SvokgJazW9+4fh//rl8tfXgun5IfPoS0hqKie1/
h5sjcMSYkYR4gLVqrhKmBYHmHVl/h0TYROckW8iC/+XX7ailSo9uPG7lPa6cm+GE
7Bajlbz4oD/K5RWoByo+q+dmyjeVhQ==
=M9bS
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-09-01 (Software steering support)
Abstract:
--------
Mellanox ConnetX devices supports packet matching, packet modification and
redirection. These functionalities are also referred to as flow-steering.
To configure a steering rule, the rule is written to the device owned
memory, this memory is accessed and cached by the device when processing
a packet.
Steering rules are constructed from multiple steering entries (STE).
Rules are configured using the Firmware command interface. The Firmware
processes the given driver command and translates them to STEs, then
writes them to the device memory in the current steering tables.
This process is slow due to the architecture of the command interface and
the processing complexity of each rule.
The highlight of this patchset is to cut the middle man (The firmware) and
do steering rules programming into device directly from the driver, with
no firmware intervention whatsoever.
Motivation:
-----------
Software (driver managed) steering allows for high rule insertion rates
compared to the FW steering described above, this is achieved by using
internal RDMA writes to the device owned memory instead of the slow
command interface to program steering rules.
Software (driver managed) steering, doesn't depend on new FW
for new steering functionality, new implementations can be done in the
driver skipping the FW layer.
Performance:
------------
The insertion rate on a single core using the new approach allows
programming ~300K rules per sec. (Done via direct raw test to the new mlx5
sw steering layer, without any kernel layer involved).
Test: TC L2 rules
33K/s with Software steering (this patchset).
5K/s with FW and current driver.
This will improve OVS based solution performance.
Architecture and implementation details:
----------------------------------------
Software steering will be dynamically selected via devlink device
parameter. Example:
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
and contained in mlx5/core/steering directory and controlled by
MLX5_SW_STEERING kconfig flag.
mlx5 core steering layer (fs_core) already provides a shim layer for
implementing different steering mechanisms, software steering will
leverage that as seen at the end of this series.
When Software Steering for a specific steering domain
(NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
targeting this domain to be created using SW steering instead of FW.
The implementation includes:
Domain - The steering domain is the object that all other object resides
in. It holds the memory allocator, send engine, locks and other shared
data needed by lower objects such as table, matcher, rule, action.
Each domain can contain multiple tables. Domain is equivalent to
namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
currently in mlx5_core fs_core (flow steering core).
Table - Table objects are used for holding multiple matchers, each table
has a level used to prevent processing loops. Packets are being
directed to this table once it is set as the root table, this is done
by fs_core using a FW command. A packet is being processed inside the
table matcher by matcher until a successful hit, otherwise the packet
will perform the default action.
Matcher - Matchers objects are used to specify the fields mask for
matching when processing a packet. A matcher belongs to a table, each
matcher can hold multiple rules, each rule with different matching
values corresponding to the matcher mask. Each matcher has a priority
used for rule processing order inside the table.
Action - Action objects are created to specify different steering actions
such as count, reformat (encapsulate, decapsulate, ...), modify
header, forward to table and many other actions. When creating a rule
a sequence of actions can be provided to be executed on a successful
match.
Rule - Rule objects are used to specify a specific match on packets as
well as the actions that should be executed. A rule belongs to a
matcher.
STE - This layer is used to hold the specific STE format for the device
and to convert the requested rule to STEs. Each rule is constructed of
an STE chain, Multiple rules construct a steering graph. Each node in
the graph is a hash table containing multiple STEs. The index of each
STE in the hash table is being calculated using a CRC32 hash function.
Memory pool - Used for managing and caching device owned memory for rule
insertion. The memory is being allocated using DM (device memory) API.
Communication with device - layer for standard RDMA operation using RC QP
to configure the device steering.
Command utility - This module holds all of the FW commands that are
required for SW steering to function.
Patch planning and files:
-------------------------
1) First patch, adds the support to Add flow steering actions to fs_cmd
shim layer.
2) Next 12 patch will add a file per each Software steering
functionality/module as described above. (See patches with title: DR, *)
3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
build with the new files
4) Next two patches will add the support for software steering in mlx5
steering shim layer
net/mlx5: Add API to set the namespace steering mode
net/mlx5: Add direct rule fs_cmd implementation
5) Last two patches will add the new devlink parameter to select mlx5
steering mode, will be valid only for switchdev mode for now.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev
eswitch steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the VF adds a VLAN, VLAN pruning will be enabled for that VSI.
Also, when a VLAN gets deleted it will disable VLAN pruning even if other
VLAN(s) exists for the VF. Fix this by only disabling VLAN pruning on the
VF VSI when removing the last VF (i.e. vf->num_vlan == 0).
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove code that enables DCB in initialization when SW LLDP is
activated. DCB flag is set or reset before in ice_init_pf_dcb
based on number of TCs. So there is not need to overwrite it.
Setting DCB without checking number of TCs can cause communication
problems with other cards. Host card sends packet with VLAN priority
tag, but client card doesn't strip this tag and ping doesn't work.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is currently a check in get_ndo_stats that
returns before updating stats if the VSI is down
or there are no Tx or Rx queues. This causes the
netdev to report zero stats with the netdev is down.
Remove the check so that the behavior of reporting
stats is the same as it was in IXGBE.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The call to ice_dis_vsi_txq() acts as the notification to the firmware
that the VF is being reset. Because of this, we need to make this call
every time we reset, regardless of whatever else we do to stop the Tx
queues.
Without this change, VF resets would fail to complete on interfaces that
were up and running.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the init path for DCB, the call to ice_init_dcb()
can return a non-zero value for either an actual
error, or due to the FW lldp engine being stopped.
We are currently treating all non-zero values only as
an indication that the FW LLDP engine is stopped.
Check for an actual error in the DCB init flow.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch limits the max TCs set by the driver to the value provided by
the firmware as per the capabilities of the device. Otherwise, hard coding
to 8 TC max would fail the device configurations with more than 4 ports.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conventionally, if the #defines/other are not needed by other header
files being included, #includes are done first followed by #defines
and other stuff. Move the #defines before the #includes to follow this
convention.
Suggested by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver needs to inform the user if there is an issue
with the topology / configuration of the link.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Print the state of auto-negotiation when printing the Link
up message. Adds new text to the "NIC Link is up" line like
Autoneg: <True | False>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
According to recent specification versions, the field in the Queue Shutdown
AdminQ command consisting of the "driver unloading" indication is not a 4
byte field (it is byte.bit 16.0). Change it to a byte and remove the
unnecessary endian conversion.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
According to the specification, a PF Reset must be done as part of the
driver unload flow.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In CEE mode, the TSA information can be derived from the reported
priority value.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently if the user sets an odd value for [tx|rx]-usecs we align the
value because the hardware only understands ITR values in multiples of
2. This seems misleading because we are essentially telling the user
that the ITR value is odd, when in fact we have changed it internally.
Fix this by reporting that setting odd ITR values is not allowed.
Also, while making changes to ice_set_rc_coalesce() I noticed a bit of
code/error duplication. Make the necessary changes to remove the
duplication.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We don't free s_rule if ice_aq_sw_rules() returns a non-zero status. If
it returned a zero status, s_rule would be freed right after, so this
implies it should be freed within the scope of the function regardless.
Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice_reset_subtask needs to handle EMP resets as well, as EMP resets
can be triggered by the firmware. This patch adds the logic to do
this.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add new parameter (flow_steering_mode) to control the flow steering
mode of the driver.
Two modes are supported:
1. DMFS - Device managed flow steering
2. SMFS - Software/Driver managed flow steering.
In the DMFS mode, the HW steering entities are created through the
FW. In the SMFS mode this entities are created though the driver
directly.
The driver will use the devlink steering mode only if the steering
domain supports it, for now SMFS will manages only the switchdev eswitch
steering domain.
User command examples:
- Set SMFS flow steering mode::
$ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
- Read device flow steering mode::
$ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
pci/0000:06:00.0:
name flow_steering_mode type driver-specific
values:
cmode runtime value smfs
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In case that flow steering mode of the driver is SMFS (Software Managed
Flow Steering), then use the DR (SW steering) API to create the steering
objects.
In addition, add a call to the set peer namespace when switchdev gets
devcom pair event. It is required to support VF LAG in SMFS.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add API to set the flow steering root namesapce mode.
Setting new mode should be called before any steering operation
is executed on the namespace.
This API is going to be used by steering users such switchdev.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support to create flow steering objects
via direct rule API (SW steering).
New layer is added - fs_dr, this layer translates the command that
fs_core sends to the FW into direct rule API. In case that direct
rule is not supported in some feature then -EOPNOTSUPP is
returned.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add new mlx5 Kconfig flag to allow selecting software steering
support and compile all the steering files only if the flag is
selected.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose APIs for direct rule managing to increase insertion rate by
bypassing the firmware.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
SW steering is capable of doing many steering functionalities
but there are still some functionalities which are not exposed
to upper layers and therefore performed by the FW.
This is the support for recalculating checksum using a hairpin QP.
The recalculation is required after a modify TTL action which skips
the needed CS calculation in HW.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rules are the actual objects that tie matchers, header values and
actions. Each rule belongs to a matcher, which can hold multiple rules
sharing the same mask. Each rule is a specific set of values and
actions.
When a packet reaches a matcher it is being matched against the
matcher`s rules. In case of a match over a rule its actions will be
executed. Each rule object contains a set of STEs, where each STE is a
definition of match values and actions defined by the rule.
This file handles the rule operations and processing.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On rule creation a set of actions can be provided, the actions describe
what to do with the packet in case of a match. It is possible to provide
a set of actions which will be done by order.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Matcher defines which packets fields are matched when a packet arrives.
Matcher is a part of a table and can contain one or more rules. Where
rule defines specific values of the matcher's mask definition.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tables are objects which are used for storing matchers, each table
belongs to a domain and defined by the domain type. When a packet
reaches the table it is being processed by each of its matchers until a
successful match. Tables can hold multiple matchers ordered by matcher
priority. Each table has a level.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Domain is the frame for all of the dr (direct rule) objects.
There are different domain types which also affect the object under that
domain. Each domain can hold multiple tables which can hold multiple
matchers and so on, this means that all of the dr (direct rule) objects
exist under a specific domain. The domain object also holds the
resources needed for other objects such as memory management and
communication with the device.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Steering Entry (STE) object is the basic building block of the steering
map. There are several types of STEs. Each rule can be constructed of
multiple STEs. Each STE dictates which fields of the packet's header are
being matched as well as the information about the next step in map (hit
and miss pointers). The hardware gets a packet and tries to match it
against the STEs, going to either the hit pointer or the miss pointer.
This file handles the STE operations.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Inserting or deleting a rule is done by RDMA read/write operation to SW
ICM device memory. This file provides the support for executing these
operations. It includes allocating the needed resources and providing an
API for writing steering entries to the memory.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
ICM device memory is used for writing steering rules (STEs) to the NIC.
An ICM memory pool allocator was implemented to manage the required
memory. The pool consists of buckets, a bucket per chunk size.
Once a bucket is empty we will cut a row of memory from the latest
allocated MR, if the MR size is not sufficient we will allocate a new MR.
HW design requires that chunks memory address should be aligned to the
chunk size, this is the reason for managing the MR with row size that
insures memory alignment.
Current design is greedy in memory but provides quick allocation times
in steady state.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add direct rule command utilities which consists of all the FW
commands that are executed to provide the SW steering functionality.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the internal header file that contains various types
definition that will be used in coming patches as well as
the internal functions decelerations.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Every mvpp2 unit can use up to 8 buffers mapped by the BM (the HW buffer
manager). The HW will place the frames in the buffer pool depending on the
frame size: short (< 128 bytes), long (< 1664) or jumbo (up to 9856).
As any unit can have up to 4 ports, the driver allocates only 2 pools,
one for small and one long frames, and share them between ports.
When the first port MTU is set higher than 1664 bytes, a third pool is
allocated for jumbo frames.
This shared allocation makes impossible to use percpu allocators,
and creates contention between HW queues.
If possible, i.e. if the number of possible CPU are less than 8 and jumbo
frames are not used, switch to a new scheme: allocate 8 per-cpu pools for
short and long frames and bind every pool to an RXQ.
When the first port MTU is set higher than 1664 bytes, the allocation
scheme is reverted to the old behaviour (3 shared pools), and when all
ports MTU are lowered, the per-cpu buffers are allocated again.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor mvpp2_bm_pool_create(), mvpp2_bm_pool_destroy() and
mvpp2_bm_pools_init() so that they accept a struct device instead
of a struct platform_device, as they just need platform_device->dev.
Removing such dependency makes the BM code more reusable in context
where we don't have a pointer to the platform_device.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function sun8i_dwmac_set_syscon(), local variable "val" could
be uninitialized if function regmap_field_read() returns -EINVAL.
However, it will be used directly in the if statement, which
is potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take only FIB events that are happening in init_net into account. No other
namespaces are supported.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
cvlan_tag enabled in match criteria and disabled in
match value means both S & C tags don't exist (untagged of both).
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the check if RoCE steering is initialized to the
disable RoCE function, it will ensure that we disable
RoCE only if we succeeded in enabling it before.
Fixes: 80f09dfc23 ("net/mlx5: Eswitch, enable RoCE loopback traffic")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.
This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.
In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function 'hclge_restore_vlan_table':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:8016:18: warning:
variable 'qos' set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 70a214903d ("net: hns3: reduce the parameters of some functions")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointer reg_info is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv"),
a call to 'get_zeroed_page()' has been turned into a call to
'dma_alloc_coherent()'. Only the remove function has been updated to turn
the corresponding 'free_page()' into 'dma_free_attrs()'.
The error hndling path of the probe function has not been updated.
Fix it now.
Rename the corresponding label to something more in line.
Fixes: 99cd149efe ("sgiseeq: replace use of dma_cache_wback_inv")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8125 uses a different register for VLAN offloading config,
therefore don't set bit RxVlan.
Fixes: f1bce4ad2f ("r8169: add support for RTL8125")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call to 'pci_free_irq_vectors()' are missing both in the error handling
path of the probe function, and in the remove function.
Add them.
Fixes: 19971f5ea0 ("enetc: add PTP clock driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change enables the use of SW timestamping on the Raspberry Pi 4.
bcmgenet's transmit function bcmgenet_xmit() implements software
timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE capability was
missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced. By using
ethtool_ops bcmgenet_ethtool_ops() as get_ts_info(), the
SOF_TIMESTAMPING_TX_SOFTWARE capability is announced.
Similar to commit a8f5cb9e79 ("smsc95xx: use ethtool_op_get_ts_info()")
Signed-off-by: Ryan M. Collins <rmc032@bucknell.edu>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On embedded environments with hard memory limits it is a normal although
rare case when skb can't be allocated on rx part under high traffic.
In such OOM cases napi_complete_done() was not called.
So the napi object became in an invalid state like it is "scheduled".
Kernel do not re-schedules the poll of that napi object.
Consequently, kernel can not remove that object the system hangs on
`ifconfig down` waiting for a poll.
We are fixing this by gracefully closing napi poll routine with correct
invocation of napi_complete_done.
This was reproduced with artificially failing the allocation of skb to
simulate an "out of memory" error case and check that traffic does
not get stuck.
Fixes: 970a2e9864 ("net: ethernet: aquantia: Vector operations")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declaring threaded irq handler should also indicate the irq is
oneshot. It is oneshot indeed, because HW implements irq automasking
on trigger.
Not declaring this causes some kernel configurations to fail
on interface up, because request_threaded_irq returned an err code.
The issue was originally hidden on normal x86_64 configuration with
latest kernel, because depending on interrupt controller, irq driver
added ONESHOT flag on its own.
Issue was observed on older kernels (4.14) where no such logic exists.
Fixes: 4c83f170b3 ("net: aquantia: link status irq handling")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reported-by: Michael Symolkin <Michael.Symolkin@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of device reconfiguration the driver may reset the device invisible
for other modules, vlan module in particular. So vlans will not be
removed&created and vlan filters will not be configured in the device.
The patch reapplies the vlan filters at device start.
Fixes: 7975d2aff5 ("net: aquantia: add support of rx-vlan-filter offload")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a limit condition of vlans on the interface before setting vlan
promiscuous mode
Fixes: 48dd73d08d ("net: aquantia: fix vlans not working over bridged network")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to absence of checking against the rx flow rule when vlan 0 is being
removed, the other rule could be removed instead of the rule with vlan 0
Fixes: 7975d2aff5 ("net: aquantia: add support of rx-vlan-filter offload")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds driver support for configuring grc dump config flags, and
dumping the grc data via ethtool get/set-dump interfaces.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds driver support for configuring the grc dump config flags.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add driver support for dumping the config id attributes via ethtool dump
interfaces.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds driver support for reading the config id attributes from NVM
flash partition.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.
2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new function bnxt_get_registered_vfs() to handle the work
of getting the number of registered VFs under #ifdef CONFIG_BNXT_SRIOV.
The main code will call this function and will always work correctly
whether CONFIG_BNXT_SRIOV is set or not.
Fixes: 230d1f0de7 ("bnxt_en: Handle firmware reset.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Relax the requirements to the XSK frame size to allow it to be smaller
than a page and even not a power of two. The current implementation can
work in this mode, both with Striding RQ and without it.
The code that checks `mtu + headroom <= XSK frame size` is modified
accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
Functions that worked with pages only now work with XSK frames, even if
their size is different from PAGE_SIZE.
With XSK queues, regardless of the frame size, Striding RQ uses the
stride size of PAGE_SIZE, and UMR MTTs are posted using starting
addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
packet data will overlap with other frames. UMR MTT size is made equal
to the stride size of the RQ, because UMEM frames may come in random
order, and we need to handle them one by one. PAGE_SIZE is just a power
of two that is bigger than any allowed XSK frame size, and also it
doesn't require making additional changes to the code.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Each get_next and lookup call requires a round trip to the device.
However, the device is capable of giving us a few entries back,
instead of just one.
In this patch we ask for a small yet reasonable number of entries
(4) on every get_next call, and on subsequent get_next/lookup calls
check this little cache for a hit. The cache is only kept for 250us,
and is invalidated on every operation which may modify the map
(e.g. delete or update call). Note that operations may be performed
simultaneously, so we have to keep track of operations in flight.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If control channel MTU is too low to support map operations a warning
will be printed. This is not enough, we want to make sure probe fails
in such scenario, as this would clearly be a faulty configuration.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
New local variable "struct flow_block_offload *f" was added to
mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
is not set. Move the variable definition into eswitch-specific code block
from the beginning of mlx5e_setup_tc() function.
Fixes: c9f14470d0 ("net: sched: add API for registering unlocked offload block callbacks")
Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 190f73ab4c ("net: stmmac: setup higher frequency clk support for EHL & TGL")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The devicetree binding lists the phy phy as optional. As such, the
driver should not bail out if it can't find a regulator. Instead it
should just skip the remaining regulator related code and continue
on normally.
Skip the remainder of phy_power_on() if a regulator supply isn't
available. This also gets rid of the bogus return code.
Fixes: 2e12f53663 ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In xgbe_mod_init(), we should do cleanup if some error occurs
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: efbaa82833 ("amd-xgbe: Add support to handle device renaming")
Fixes: 47f164deab ("amd-xgbe: Add PCI device support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Health show command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw_fatal
pci/0000:af:00.0:
name fw_fatal
state healthy error 1 recover 1 grace_period 0 auto_recover true
Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.
We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call will handle fatal firmware errors by forcing a reset on the
firmware. The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages. From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.
The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller. Fix up some clean up functions to continue
and not abort when the firmware message function returns error.
This is preparation work to fully handle firmware error recovery
under fatal conditions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devlink health reporter for the firmware reset event. Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the bnxt_fw_reset() main function to handle firmware reset. This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected. bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.
The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This event from firmware signals a coordinated reset initiated by the
firmware. It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.
We store the parameters from this event. Subsequent patches will
add logic to handle reset itself using devlink reporters.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create new FW devlink_health_reporter, to know the current health
status of FW.
Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw
pci/0000:af:00.0:
name fw
state healthy error 0 recover 0
FW status: Healthy; Reset count: 1
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new flag will be set in subsequent patches when firmware is
going through reset. If bnxt_close() is called while the new flag
is set, the FW reset sequence will have to be aborted because the
NIC is prematurely closed before FW reset has completed. We also
reject SRIOV configurations while FW reset is in progress.
v2: No longer drop rtnl_lock() in close and wait for FW reset to complete.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle the async event from the firmware that enables firmware health
monitoring. Store initial health metrics.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-map the GRC registers for periodic firmware health monitoring.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset. Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary. If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VFs need to be reconfigured dynamically after firmwware reset, the
configuration sequence on the PF needs to be changed to register the VF
buffers first. Otherwise, some VF firmware commands may not succeed as
there may not be PF buffers ready for the re-directed firmware commands.
This sequencing did not matter much before when we only supported
the normal bring-up of VFs.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the hardware/firmware configuration portion in
bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov(). This
new function can be called after a firmware reset to reconfigure the
VFs previously enabled.
v2: straight refactor of the code. Reordering done in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for the new firmware reset feature, some of the logic
in bnxt_init_one() and related functions will be called again after
firmware has reset. Reset some of the flags and capabilities so that
everything that can change can be re-initialized. Refactor some
functions to probe firmware versions and capabilities. Check some
buffers before allocating as they may have been allocated previously.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the silent parameter is set, suppress all messages when there is
no response from firmware. When polling for firmware to come out of
reset, no response may be normal and we want to suppress the error
messages. Also, don't poll for the firmware DMA response if Bus Master
is disabled. This is in preparation for error recovery when firmware
may be in error or reset state or Bus Master is disabled.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 4 functions handling message forwarding for SR-IOV. They
check for non-zero firmware response code and then return -1. There
is no need to do this anymore. The main messaging function will
now return standard error code. Since we don't need to examine the
response, we can use the hwrm_send_message() variant which will
take the mutex automatically.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main firmware messaging function returns the firmware defined error
code and many callers have to convert to standard error code for proper
propagation to userspace. Convert bnxt_hwrm_do_send_msg() to return
standard error code so we can do away with all the special error code
handling by the many callers.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the non-standard -1 code with -EBUSY when there is no firmware
response after waiting for the maximum timeout.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same message is printed 3 times in the code, so use a common function
to do that.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_stop_queue()/netif_wake_qeue() aren't needed for changing
multicast filters.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
emcr in private struct wasn't always protected by spinlock.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The half/full duplex settings for inter packet gap counters/timer were
reversed.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
replace open coded checksum folding by csum_fold.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the homegrown DMA memory allocation, which only works on
SGI-IP27 machines, with the generic dma allocations.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common code for rx buffer setup into ioc3_alloc_skb and deal
with allocation failures. Also clean up allocation size calculation.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do tx ring cleaning and freeing of rx buffers, when chip is shutdown and
allocate buffers before bringing chip up.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
ioc3_init did everything from reset to init rings to starting the chip.
This change move out chip start into a new function as preparation
for easier handling of receive buffer allocation failures.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After allocation of descriptor memory is now done once in probe
handling of tx ring is completely done by ioc3_clean_tx_ring. So
we remove the remaining tx ring actions out of ioc3_alloc_rings
and ioc3_free_rings and rename it to ioc3_[alloc|free]_rx_bufs
to better describe what they are doing.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move clearing of the descriptor valid bit into ioc3_alloc_rings. This
makes ioc3_clean_rx_ring obsolete.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory for descriptor rings are allocated/freed, when interface is
brought up/down. Since the size of the rings is not changeable by
hardware, we now allocate rings now during probe and free it, when
device is removed.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Descriptor ring sizes of the IOC3 are more or less fixed size. To
make clearer where there is a relation to ring sizes use defines.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before massaging the driver further fix oddities found by checkpatch like
- wrong indention
- comment formatting
- use of printk instead or netdev_xxx/pr_xxx
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Break up the big ioc3 register struct into functional pieces to
make use in sub-function drivers more straightforward. And while
doing that get rid of all volatile access by using readX/writeX.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct spider_net_card {
...
struct spider_net_descr darray[0];
};
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.
So, replace the following form:
sizeof(struct spider_net_card) + (tx_descriptors + rx_descriptors) * sizeof(struct spider_net_descr)
with:
struct_size(card, darray, tx_descriptors + rx_descriptors)
Notice that, in this case, variable alloc_size is not necessary, hence it
is removed.
Building: allmodconfig powerpc.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds EEE support for RTL8125 based on the vendor driver.
Supported is EEE for 100Mbps and 1Gbps. Realtek recommended to not yet
enable EEE for 2.5Gbps due to potential compatibility issues. Also
ethtool doesn't support yet controlling EEE for 2.5Gbps and 5Gbps.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds PHY initialization magic copied from the r8125 vendor
driver. In addition it supports loading the firmware for chip version
RTL_GIGA_MAC_VER_61.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for 2.5Gbps chip RTL8125, it's partially based on the
r8125 vendor driver. Tested with a Delock 89531 PCIe card against a
Netgear GS110MX Multi-Gig switch. Firmware isn't strictly needed,
but on some systems there may be compatibility issues w/o firmware.
Firmware has been submitted to linux-firmware.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On RTL8125 this bit is always cleared after send. Therefore check for
tx_skb->skb being set what is functionally equivalent.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8125 uses a different register number for IntrMask.
To net have side effects by reading a random register let's
use a register that is the same on all supported chip families.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8125 doesn't support the same coalescing registers, therefore move
this initialization to the 8168/6169-specific init.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RTL8125 we will have to read the MAC address also from another
register range, therefore create a small helper.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend helper rtl_is_8168evl_up to properly work once we add
mac version numbers >51 for RTL8125.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTL8125 uses a 32 bit interrupt mask even though only bits in the
lower 16 bits are used. Change interrupt mask size to u32 to be
prepared and reintroduce helper rtl_get_events.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Misc updates for mlx5e net device driver
1) Maxim and Tariq add the support for LAG TX port affinity distribution
When VF LAG is enabled, VFs netdevs will round-robin the TX affinity
of their tx queues among the different LAG ports.
2) Aya adds the support for ip-in-ip RSS.
3) Marina adds the support for ip-in-ip TX TSO and checksum offloads.
4) Moshe adds a device internal drop counter to mlx5 ethtool stats.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1mzKEACgkQSD+KveBX
+j7n9QgAhabOmJtGTT9HP2u3ilbWW6oi2aHr244IDvmvJvuwNIcIll/HaNuj4no9
XSr5aW0zjVENJ73r5V7slIcyCyjB4AoeEEt2QTBB/UINTkx1Yd56AWd7qgMC1LD0
A+ZpwEqd6ArRnt8elZJ/w5JlyrjUCMVSqSU8HcuOT1pRnpF5628HmM9w5f33R7iJ
KJaiNpbjb3zFDbQsRdItPAy4JtxLnhvz660Ti+fXff24DDpap8VSiaj7QsH0DamG
DTrR0AIu7XQZzwyVthzBXMc/Pe/ord6nBoRzGzQGTaK07OwAP7N8Mc1+dk//FEbe
xJh71SdoAoJQbNoDTUSJeYZw4mfxuA==
=Ggn4
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-08-22
Misc updates for mlx5e net device driver
1) Maxim and Tariq add the support for LAG TX port affinity distribution
When VF LAG is enabled, VFs netdevs will round-robin the TX affinity
of their tx queues among the different LAG ports.
2) Aya adds the support for ip-in-ip RSS.
3) Marina adds the support for ip-in-ip TX TSO and checksum offloads.
4) Moshe adds a device internal drop counter to mlx5 ethtool stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current loopback mode is to add 0x1F to the SMAC address
as the DMAC address and enable the promiscuous mode.
However, if the VF address is the same as the DMAC address,
the loopback test fails.
Loopback can be enabled in three places: SSU, MAC, and serdes.
By default, SSU loopback is enabled, so if the SMAC and the DMAC
are the same, the packets are looped back in the SSU. If SSU loopback
is disabled, packets can reach MAC even if SMAC is the same as DMAC.
Therefore, this patch disables the SSU loopback before the loopback
test. In this way, the SMAC and DMAC can be the same, and the
promiscuous mode does not need to be enabled. And this is not
valid in version 0x20.
This patch also uses a macro to replace 0x1F.
Fixes: c39c4d98dc ("net: hns3: Add mac loopback selftest support in hns3 driver")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the reset interrupt will be cleared firstly, so when
reset fails, if interrupt status register has reset interrupt,
it means there is a new coming reset.
Fixes: 72e2fb0799 ("net: hns3: clear reset interrupt status in hclge_irq_handle()")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the loopback test supports only mac selftest and serdes
selftest. This patch adds phy selftest.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When hardware or IMP get specified error it may need the client
to take some special operations.
This patch implements the hns3 client's process_hw_errorx.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimizes the waiting time for TQP reset.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes some incorrect type in assignment reported by sparse.
Those sparse warning as below:
- warning : restricted __le16 degrades to integer
- warning : cast from restricted __le32
- warning : expected restricted __le32
- warning : cast from restricted __be32
- warning : cast from restricted __be16
- warning : cast to restricted __le16
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hclge_dcb.c, these pair of codes:
hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT);
and
hclge_notify_client(hdev, HNAE3_INIT_CLIENT);
hclge_notify_client(hdev, HNAE3_UP_CLIENT);
are called many times, so make them into a function.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To better identify abnormal conditions, this patch modifies or
adds some logs to show driver status more accurately.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch simplifies parameters of some functions by deleting
unused parameter.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces kstrtouint()'s patameter base with 0 in the
hclge_dbg_dump_tm_mac(), which makes it more flexible. Also
uses a macro to replace string "dump tm map", since it has been
used multiple times.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses macro to replace some magic number.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For making the code more readable, this patch uses a array to
keep the information about the dumping register, and then uses
it to parse the parameter cmd_buf which passing into
hclge_dbg_dump_reg_cmd().
Also replaces parameter "base" of kstrtouint with 0 in the
hclge_dbg_dump_reg_common(), which makes it more flexible.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with firmware version MC10.18.0, we have support for
L2 flow control. Asymmetrical configuration (Rx or Tx only) is
supported, but not pause frame autonegotioation.
Pause frame configuration is done via ethtool. By default, we start
with flow control enabled on both Rx and Tx. Changes are propagated
to hardware through firmware commands, using two flags (PAUSE,
ASYM_PAUSE) to specify Rx and Tx pause configuration, as follows:
PAUSE | ASYM_PAUSE | Rx pause | Tx pause
----------------------------------------
0 | 0 | disabled | disabled
0 | 1 | disabled | enabled
1 | 0 | enabled | enabled
1 | 1 | enabled | disabled
The hardware can automatically send pause frames when the number
of buffers in the pool goes below a predefined threshold. Due to
this, flow control is incompatible with Rx frame queue taildrop
(both mechanisms target the case when processing of ingress
frames can't keep up with the Rx rate; for large frames, the number
of buffers in the pool may never get low enough to trigger pause
frames as long as taildrop is enabled). So we set pause frame
generation and Rx FQ taildrop as mutually exclusive.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever a link state change occurs, we get notified and save
the new link settings in the device's private data. In ethtool
get_link_ksettings, use the stored state instead of interrogating
the firmware each time.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only support fixed-link for now, so there is no point in
offering users the option to change link settings via ethtool.
Functionally there is no change, since firmware prevents us from
changing link parameters anyway.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Spectrum-1, timestamped PTP packets and the corresponding timestamps need to
be kept in caches until both are available, at which point they are matched up
and packets forwarded as appropriate. However, not all packets will ever see
their timestamp, and not all timestamps will ever see their packet. It is
necessary to dispose of such abandoned entries, so a garbage collector was
introduced in commit 5d23e41597 ("mlxsw: spectrum: PTP: Garbage-collect
unmatched entries").
If these GC events happen often, it is a sign of a problem. However because this
whole mechanism is taking place behind the scenes, there is no direct way to
determine whether garbage collection took place.
Therefore to fix this, on Spectrum-1 only, expose four artificial ethtool
counters for the GC events: GCd timestamps and packets, in TX and RX directions.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new version supports extended error reporting from firmware via a
new TLV in the EMAD packet. Similar to netlink extended ack.
It also fixes an issue in the PCI code that can result in false AER
errors under high Tx rate.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After 50G-1-lane and 100G-2-lanes link modes were introduced, the driver
is facing situations in which the hardware auto negotiates not only on
speed and type, but also on number of lanes.
Prevent auto negotiation on number of lanes by allowing only port speeds
that can be supported on a given port according to its width.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 275e928f19 ("mlxsw: spectrum: Prevent force of 56G") prevented
the driver from setting a speed of 56G when auto-negotiation is off.
This is the only speed supported by mlxsw that cannot be set when
auto-negotiation is off, which makes it difficult to write generic
tests.
Further, the speed is not supported by newer ASICs such as Spectrum-2
and to the best of our knowledge it is not used by current users.
Therefore, remove 56G support from mlxsw.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar workaround for the suspend/resume problem is needed for yet
another ASUS machines, P6X models. Like the previous fix, the BIOS
doesn't provide the standard DMI_SYS_* entry, so again DMI_BOARD_*
entries are used instead.
Reported-and-tested-by: SteveM <swm@swm1.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent code changes to NFP allowed the offload of neighbour entries to FW
when the next hop device was an internal port. This allows for offload of
tunnel encap when the end-point IP address is applied to such a port.
Unfortunately, the neighbour event handler still rejects events that are
not associated with a repr dev and so the firmware neighbour table may get
out of sync for internal ports.
Fix this by allowing internal port neighbour events to be correctly
processed.
Fixes: 45756dfeda ("nfp: flower: allow tunnels to output to internal port")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Internal port TC offload is implemented through user-space applications
(such as OvS) by adding filters at egress via TC clsact qdiscs. Indirect
block offload support in the NFP driver accepts both ingress qdisc binds
and egress binds if the device is an internal port. However, clsact sends
bind notification for both ingress and egress block binds which can lead
to the driver registering multiple callbacks and receiving multiple
notifications of new filters.
Fix this by rejecting ingress block bind callbacks when the port is
internal and only adding filter callbacks for egress binds.
Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-26
This series contains updates to ice driver only.
Usha fixes the statistics reported on 4 port NICs which were reporting
the incorrect statistics due to using the incorrect port identifier.
Victor fixes an issue when trying to traverse to the first node of a
requested layer by adding a sibling head pointer for each layer per
traffic class.
Anirudh cleans up the locking and logic for enabling and disabling
VSI's to make it more consistent. Updates the driver to do dynamic
allocation of queue management bitmaps and arrays, rather than
statically allocating them which consumes more memory than required.
Refactor the logic in ice_ena_msix_range() for clarity and add
additional checks for when requested resources exceed what is available.
Jesse updates the debugging print statements to make it more useful when
dealing with link and PHY related issues.
Krzysztof adds a local variable to the VSI rebuild path to improve
readability.
Akeem limits the reporting of MDD events from VFs so that the kernel
log is not clogged up with MDD events which are duplicate or potentially
false positives. Fixed a reset issue that would result in the system
getting into a state that could only be resolved by a reboot by
testing if the VF is in a disabled state during a reset.
Michal adds a check to avoid trying to access memory that has not be
allocated by checking the number of queue pairs.
Jake fixes a static analysis warning due to a cast of a u8 to unsigned
long, so just update ice_is_tc_ena() to take a unsigned long so that a
cast is not necessary.
Colin Ian King fixes a potential infinite loop where a u8 is being
compared to an int.
Maciej refactors the queue handling functions that work on queue arrays
so that the logic can be done for a single queue.
Paul adds support for VFs to enable and disable single queues.
Henry fixed the order of operations in ice_remove() which was trying to
use adminq operations that were already disabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the ibmvnic driver will not schedule device resets
if the device is being removed, but does not check the device
state before the reset is actually processed. This leads to a race
where a reset is scheduled with a valid device state but is
processed after the driver has been removed, resulting in an oops.
Fix this by checking the device state before processing a queued
reset event.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the compatibility string for SiFive FU540-C000 as per the new
string updated in the binding doc.
Reference:
https://lore.kernel.org/netdev/CAJ2_jOFEVZQat0Yprg4hem4jRrqkB72FKSeQj4p8P5KA-+rgww@mail.gmail.com/
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Tested-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use generic function for checking tunnel stateless offload capability
instead of separate macros.
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for inner header RSS on IP-in-IP and IPv6 tunneled packets.
Add rules to the steering table regarding outer IP header, with
IPv4/6->IP-in-IP. Tunneled packets with protocol numbers: 0x4 (IP-in-IP)
and 0x29 (IPv6) are RSS-ed on the inner IP header.
Separate FW dependencies between flow table inner IP capabilities and
GRE offload support. Allowing this feature even if GRE offload is not
supported. Tested with multi stream TCP traffic tunneled with IPnIP.
Verified that:
Without this patch, only a single RX ring was processing the traffic.
With this patch, multiple RX rings were processing the traffic.
Verified with and without GRE offload support.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move function which indicates whether tunnel inner flow table is
supported from en.h to en_fs.c. It fits better right after tunnel
protocol rules definitions.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Added the following packets drop counter:
Device out of buffer - counts packets which were dropped due to full
device internal receive queue.
This counter will be shown on ethtool as a new counter called
dev_out_of_buffer.
The counter is read from FW by command QUERY_VNIC_ENV.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When the VF LAG is in use, round-robin the TX affinity of channels among
the different ports, if supported by the firmware. Create a set of TISes
per port, while doing round-robin of the channels over the different
sets. Let all SQs of a channel share the same set of TISes.
If lag_tx_port_affinity HCA cap bit is supported, num_lag_ports > 1 and
we aren't the LACP owner (PF in the regular use), assign the affinities,
otherwise use tx_affinity == 0 in TIS context to let the FW assign the
affinities itself. The TISes of the LACP owner are mapped only to the
native physical port.
For VFs, the starting port for round-robin is determined by its vhca_id,
because a VF may have only one channel if attached to a single-core VM.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For better modularity and code sharing.
Function internal change to be introduced in the next patches.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix a typo in 'mlx5e_refomrat_wol_mode_mlx5_to_linux' and
'mlx5e_refomrat_wol_mode_linux_to_mlx5' function names:
"refomrat" -> "reformat".
Fixes: 928cfe8745 ("net/mlx5e: Wake On LAN support")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.
4) ODP capabilities for DC transport
Misc updates:
5) Saeed, two compiler warnings cleanups
6) Add XRQ legacy commands opcodes
7) Use refcount_t for refcount
8) fix a -Wstringop-truncation warning
Michael Guralnik says:
====================
The series adds support for on-demand paging for DC transport.
As DC is a mlx-only transport, the capabilities are exposed to the user
using DEVX objects and later on through mlx5dv_query_device.
====================
Based on the mlx5-next branch from
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
dependencies
* branch 'mlx5-odp-dc':
IB/mlx5: Add page fault handler for DC initiator WQE
IB/mlx5: Remove check of FW capabilities in ODP page fault handling
net/mlx5: Set ODP capabilities for DC transport to max
Move ASPM definitions and function prototypes from include/linux/pci-aspm.h
to include/linux/pci.h so users only need to include <linux/pci.h>:
PCIE_LINK_STATE_L0S
PCIE_LINK_STATE_L1
PCIE_LINK_STATE_CLKPM
pci_disable_link_state()
pci_disable_link_state_locked()
pcie_no_aspm()
No functional changes intended.
Link: https://lore.kernel.org/r/20190827095620.11213-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
In mlx5_core initialization, query max ODP capabilities for DC transport
from FW and set as current capabilities.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk,
ptp clock and ptp_max_adj to 200MHz.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added EHL RGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added TGL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added EHL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning: variable status set but not used [-Wunused-but-set-variable]
Fixes: 296c912075 ("net: ethernet: mediatek: Add MT7628/88 SoC support")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Re-add SGMII support but now with PHYLINK API support
So the SGMII changes are more clear
* Move SGMII block setup from mtk_gmac_sgmii_path_setup() to
mtk_mac_config()
* Merge mtk_setup_hw_path() into mtk_mac_config()
* Remove mediatek,physpeed property, fixed-link supports now any speed so
speed = <2500>; is now valid with PHYLINK
* Demagic SGMII register values
* Use phylink state to setup fixed-link mode
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This convert the basics to PHYLINK API.
SGMII support is not in this patch.
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In file included from ./arch/powerpc/include/asm/paca.h:15,
from ./arch/powerpc/include/asm/current.h:13,
from ./include/linux/thread_info.h:21,
from ./include/asm-generic/preempt.h:5,
from ./arch/powerpc/include/generated/asm/preempt.h:1,
from ./include/linux/preempt.h:78,
from ./include/linux/spinlock.h:51,
from ./include/linux/wait.h:9,
from ./include/linux/completion.h:12,
from ./include/linux/mlx5/driver.h:37,
from
drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h:6,
from
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:33:
In function 'strncpy',
inlined from 'mlx5_fw_tracer_save_trace' at
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:549:2,
inlined from 'mlx5_tracer_print_trace' at
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:574:2:
./include/linux/string.h:305:9: warning: '__builtin_strncpy' output may
be truncated copying 256 bytes from a string of length 511
[-Wstringop-truncation]
return __builtin_strncpy(p, q, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix it by using the new strscpy_pad() since the commit 458a3bf82d
("lib/string: Add strscpy_pad() function") which will always
NUL-terminate the string, and avoid possibly leak data through the ring
buffer where non-admin account might enable these events through perf.
Fixes: fd1483fe1f ("net/mlx5: Add support for FW reporter dump")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The order of operations was incorrect in ice_remove(). The code would
try to use adminq operations after the adminq was disabled. This caused
all adminq calls to fail and possibly timeout waiting.
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current implementation of ice_ena_msix_range is difficult to read
and has subtle issues. This patch reworks the said function for
clarity and correctness.
More specifically,
1. Add more checks to bail out of 'needed' is greater than 'v_left'.
2. Simplify fallback logic
3. Do not set pf->num_avail_sw_msix in ice_ena_msix_range as it
gets overwritten by ice_init_interrupt_scheme.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a critical reset issue that resulting to the server
reboot when an Admin changes VF configuration on the host, for example
changing VF to Trusted/non_Trusted mode, the PF driver send reset
notification to AVF driver while also continue with reset flow. However,
AVF driver schedule another reset due to notification, which causes two
concurrent reset going on, and trigger lock up in the FW, with AQ call to
delete VSI.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The total number of queues available on the device is divided between
multiple physical functions (PF) in the firmware and provided to the
driver when it gets function capabilities from the firmware. Thus
each PF knows how many Tx/Rx queues it has. These queues are then
doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.)
To track usage of these queues at the PF level, the driver uses two
bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi
instances) the driver uses two arrays txq_map and rxq_map, to track
ownership of VSIs' queues in avail_txqs and avail_rxqs respectively.
The aforementioned bitmaps and arrays should be allocated dynamically,
because the number of queues supported by a PF is only available once
function capabilities have been queried. The current static allocation
consumes way more memory than required.
This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs
and instead uses bitmap_zalloc to allocate the bitmaps during init.
Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays.
As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed.
Also as txq_map and rxq_map are now allocated and freed, some code
reordering was required in ice_vsi_rebuild for correct functioning.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The VF driver can call VIRTCHNL_OP_[ENABLE|DISABLE]_QUEUES separately
for each queue. Add support for virtchnl_queue_select.[tx|rx]_queues
bitmap which is used to indicate which queues to enable and disable.
Add tracing of VF Tx/Rx per queue enable state to avoid enabling enabled
queues and disabling disabled queues. Add total queues enabled count and
clear ICE_VF_STATE_QS_ENA when count is zero.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Peng Huang <peng.huang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Refactor the queue handling functions that are going through queue
arrays in a way that the logic done for a single queue is pulled out and
it will be called for each ring when traversing ring array. This implies
that when disabling Tx rings we won't fill up q_ids, q_teids and
q_handles arrays. Drop also 'offset' parameter; the value from vsi's
txq_map is stored in ring->reg_idx and that drops the need for mentioned
parameter. Introduce the ice_vsi_cfg_txq, ice_vsi_stop_tx_ring and
ice_vsi_ctrl_rx_ring that are the functions with pulled out logic.
There's several Tx queue meta data (q_id, q_handle, q_teid and other)
that need to be set up during Tx queue disablement, so let's as well add
a helper structure that wraps it up and a function that will be filling
it up.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The loop counter of a for-loop is a u8 however this is being compared
to an int upper bound and this can lead to an infinite loop if the
upper bound is greater than 255 since the loop counter will wrap back
to zero. Fix this potential issue by making the loop counter an int.
Addresses-Coverity: ("Infinite loop")
Fixes: c7aeb4d1b9 ("ice: Disable VFs until reset is completed")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ice_is_tc_ena is used to check whether a given traffic class is
enabled. Because there are only 8 traffic classes, the function took
a u8 bitmap. This causes problems because it is cast to an unsigned
long causing a static analysis warning regarding Out-of-bounds read.
Fix this by simply updating ice_is_tc_ena to take an unsigned long.
Passing a u8 to this function should implicitly convert the value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Check num_queue_pairs to avoid access to unallocated field of
vsi->tx_rings/vsi->rx_rings. Without this validation we can set
vsi->alloc_txq/vsi->alloc_rxq to value smaller than ICE_MAX_BASE_QS_PER_VF
and send this command with num_queue_pairs greater than
vsi->alloc_txq/vsi->alloc_rxq. This lead to access to unallocated memory.
In VF vsi alloc_txq and alloc_rxq should be the same. Get minimum
because looks more readable.
Also add validation for ring_len param. It should be greater than 32 and
be multiple of 32. Incorrect value leads to hang traffic on PF.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In case of MDD events on VF, don't clog kernel log with unlimited VF MDD
events message "VF 0 has had 1018 MDD events since last boot" - limit
events log message to 30, based on the observation in some experimentation
with sending malicious packet once, and number of events reported before
device stopped observing MDD events.
Also removed defunct macro "ICE_DFLT_NUM_MDD_EVENTS_ALLOWED" for tracking
number of MDD events allowed before disabling the interface...
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a VSI is accessed inside the ice_for_each_vsi macro in the rebuild
path (ice_vsi_rebuild_all() and ice_vsi_replay_all()), it is referred to
as pf->vsi[i]. Introduce local variables to improve readability.
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add some verbose debugging for dyndbg to help us when
we are having issues with link and/or PHY.
While there, shorten some strings used by locals that
were causing long line wrapping.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1. ndo_open and ndo_stop are implemented by ice_open and ice_stop
respectively. When enabling/disabling VSIs, just call
ice_open/ice_stop instead of ndo_open/ndo_stop.
2. Rework logic around rtnl_lock/rtnl_unlock
3. In ice_ena_vsi, remove an unnecessary stack variable and return
0 instead of err when __ICE_NEEDS_RESTART is not set.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There was a bug in the previous code which never traverses all the
children to get the first node of the requested layer. Add a sibling
head pointer to point the first node of each layer per TC. This helps
traverse easier and quicker and also removes the recursion.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the issue where port and PFC statistics counters are
incrementing at the wrong port with 4x25G cards.
Read the GLPRT port registers using lport parameter instead of pf_id to
update the statistics otherwise the pf_ids are flipped for ports 2 and 3
when read from the HW register PF_FUNC_RID and this is expected as per
hardware specification.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add MODULE_FIRMWARE entries for AMDA0058 boards.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the call to dma_sync_single_for_cpu after calling napi_alloc_skb.
This avoids calling dma_sync_single_for_cpu w/o handing control back
to device if the memory allocation should fail.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.
Register mlx5 tc block offload callbacks as "unlocked".
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
nop
If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
immed[gprB_5, 0]
However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.
Fixes: 0b4de1ff19 ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Bring in the lastest mlx5-next branch as the RDMA RX RoCE Steering
Support patch series requires it (first two patches are in mlx5-next,
final patch in RDMA tree).
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/cirrus/cs89x0.c: In function 'cs89x0_platform_probe':
drivers/net/ethernet/cirrus/cs89x0.c:1847:20: warning:
variable 'lp' set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 6751edeb87 ("cirrus: cs89x0: Use managed interfaces")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit ee641b0cdb.
Actually it is not clear whether this register read is not
needed for it's HW side effects or not.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning: variable status set but not used [-Wunused-but-set-variable]
Fixes: 296c912075 ("net: ethernet: mediatek: Add MT7628/88 SoC support")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-23
This series contains updates to ice driver only.
Dave adds logic for the necessary bits to be set in the VSI context for
the PF_VSI and the TX_descriptors for control packets egressing the
PF_VSI. Updated the logic to detect both DCBx and LLDP states in the
firmware engine to account for situations where DCBx is enabled and LLDP
is disabled. Fixed the driver to treat the DCBx state of "NOT_STARTED"
as a valid state and should not assume "is_fw_lldp" true automatically.
Since "enable-fw-lldp" flag was confusing and cumbersome, change the
flag to "fw-lldp-agent" with a value of on or off to help clarify
whether the LLDP agent is running or not.
Brett fixes an issue where synchronize_irq() was being called from the
host of VF's, which should not be done.
Michal fixed an issue when rebuilding the DCBx configuration while in
IEEE mode versus CEE mode, so add a check before copying the
configuration value to ensure we are only in CEE mode.
Jake fixes the PF to reject any VF request to setup head writeback since
the support has been deprecated.
Mitch adds an additional check to ensure the VF is active before sending
out an error message that a message was unable to be sent to a
particular VF.
Chinh updates the driver to use "topology" mode when checking the PHY
for status, since this mode provides us the current module type that is
available. Fixes the driver from clearing the auto_fec_enable bit which
was blocking a user from forcing non-spec compliant FEC configurations.
Amruth does a refactor on the code to first check, then assign in the
virtual channel space.
Bruce updates the driver to actually update the stats when a user runs
the ethtool command 'ethtool -S <iface>' instead of providing a snapshot
of the stats that maybe from a second ago.
Akeem fixes up the adding/removing of VSI MAC filters for VFs, so that
VFs cannot add/remove a filter from another VSI. We now track the
number of filters added right from when the VF resources get allocated
and won't get into MAC filter mis-match issue in the switch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commit added logic to determine the appropriate statistics block
size to allocate and the size is stored in bp->hw_ring_stats_size. But
if the firmware spec is older than 1.6.0, it is 0 and not initialized.
This causes the allocation to fail with size 0 and bnxt_open() to
abort. Fix it by always initializing bp->hw_ring_stats_size to the
legacy default size value.
Fixes: 4e74850663 ("bnxt_en: Allocate the larger per-ring statistics block for 57500 chips.")
Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1e/VgACgkQSD+KveBX
+j6IgAgAt5c/f8Q06yrCBdUw1plX0sOXY+0TCT3iRCVQxkGsr4KSn8SmFK1Jn3PC
CU1SF0dpmKHWMxjOBI+vONSErFayNEGhHfXIaiug5b/Bcs6VEi4hSzjyj6DDJSbn
8sn/1enVR9S9ZlMprKexUu3YBb/sWXKAx9bQdZu82yfi8o/PKCr7kA3BimmFVJmH
nXoGA++c6WJUOp/4vh8FqxI8zWjvPNt8cf4qEu/gjN2y/LtAMBx9BLyY1Zd3yqWV
Pa0y741NA4LFtzWSVfJtLFS8rdVyvigbbDYrDO+yydQBO5VD/Qumu+ENnPOqWhe+
CfMP9KAP/aBGXTBS8Atj9gQVoSXYeQ==
=GQ0j
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-08-22
This series introduces some fixes to mlx5 driver.
1) Form Moshe, two fixes for firmware health reporter
2) From Eran, two ktls fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h:542:30:
warning: meta_data_key_info defined but not used [-Wunused-const-variable=]
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h:553:30:
warning: tuple_key_info defined but not used [-Wunused-const-variable=]
The two variable is only used in hclge_main.c,
so just move the definition over there.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit f072218cca.
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.
Fixes: f072218cca ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return code value could be non deterministic in case of wrong size read.
With this patch, if such error occurs, set rc to be -EIO.
In addition, mlx5_hv_config_common() supports reading of
HV_CONFIG_BLOCK_SIZE_MAX bytes only, fix to early return error with
bad input.
Fixes: 913d14e866 ("net/mlx5: Add wrappers for HyperV PCIe operations")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a VSI is not using a unicast filter or did not configure that
particular unicast filter, driver should not allow it to be removed
by the rogue VSI.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
VSI, especially VF could request to add or remove filter for another VSI,
driver should really guide such request and disallow it.
However, instead of returning error for such malicious request, driver
can simply return success.
In addition, we are not tracking number of MAC filters configured per
VF correctly - and this leads to issue updating VF MAC filters whenever
they were removed and re-configured via bringing VF interface down and
up. Also, since VF could send request to update multiple MAC filters at
once, driver should program those filters individually in the switch, in
order to determine which action resulted to error, and communicate
accordingly to the VF.
So, with this changes, we now track number of filters added right from
when VF resources allocation is done, and could properly add filters for
both trusted and non_trusted VFs, without MAC filters mis-match issue in
the switch...
Also refactor code, so that driver can use new function to add or remove
MAC filters.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Users expect ethtool statistics to be updated on-demand when invoking
'ethtool -S <iface>' instead of providing a snapshot of statistics taken
once a second (the frequency of the watchdog task where stats are currently
updated). Update stats every time 'ethtool -S <iface>' is run.
Also, fix an indentation style issue and an unnecessary local variable
initialization in ice_get_ethtool_stats() discovered while investigating
the subject issue.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Move the assignment to local variables after validation.
Remove unnecessary checks in ice_vc_process_vf_msg() as the respective
functions are now performing the checks.
Signed-off-by: "Amruth G.P" <amruth.gouda.parameshwarappa@intel.com>
Signed-off-by: Nitesh B Venkatesh <nitesh.b.venkatesh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver should never clear the auto_fec_enable bit.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When checking the PHY for status, by specification, the driver
should be using "topology" mode when querying the module type.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some circumstances, VF devices can be deactivated while a message is
in-flight. In that case, a series of scary error message will be
printed in the log. Since these are actually harmless, check for this
case and suppress them. No harm, no foul.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current flag name of "enable-fw-lldp" is a bit cumbersome.
Change priv-flag name to "fw-lldp-agent" with a value of on or
off. This is more straight-forward in meaning.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The virtchnl interface provides a mechanism for a VF driver to request
head writeback support. This feature is deprecated as of AVF 1.0, but
older versions of a VF driver may still attempt to request the mode.
Since the ice hardware does not support head writeback, we should not
accept Tx queue configuration which attempts to enable it.
Currently, the driver simply assumes that the headwb_enabled bit will
never be set.
If a VF driver does request head writeback, the configuration will
return successfully, even though head writeback is not enabled. This
leaves the VF driver in a non functional state since it is assuming to
be operating in head writeback mode.
Fix the PF driver to reject any attempt to setup headwb_enabled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In rebuild DCB desired_dcbx_cfg was copy to local_dcbx_cfg, but
if DCBX mode is IEEE desired_dcbx_cfg is not initialized by DCBX
config from FW. Change logic to copy config value only if mode is
set to CEE.
If driver copy desired_dcbx_cfg to local_dcbx_cfg in IEEE mode there
is problem with globr. System is frozen after two or more globr.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a port is not cabled, but DCBx is enabled in the
firmware, the status of DCBx will be NOT_STARTED. This
is a valid state for FW enabled and should not be
treated as a is_fw_lldp true automatically.
Add the code to treat NOT_STARTED as another valid state.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently we will call synchronize_irq() from the host for VF's. This is
not correct, so don't allow it.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, only the DCBx status is taken into account to
determine if FW LLDP is possible. But there are NVM version
coming out with DCBx enabled, and FW LLDP disabled. This
is causing errors where the driver sees that DCBx is not
disabled, and then tries to register for LLDP MIB change
events, and fails.
Change the logic to detect both DCBx and LLDP states in the
FW engine.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For control packets (i.e. LLDP packets) to be able to egress
from the main VSI, a bit has to be set in the TX_descriptor.
This should only be done for the main VSI and only if the
FW LLDP agent is disabled. A bit to allow this also has to
be set in the VSI context.
Add the logic to add the necessary bits in the VSI context
for the PF_VSI and the TX_descriptors for control packets
egressing the PF_VSI.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE 802.3ae clause 45 defines a modified MDIO protocol that uses a two
staged access model in order to increase the address space.
This patch adds support for C45 MDIO read and write accesses, which are
used whenever the MII_ADDR_C45 flag in the regnum argument is set.
In case it is not set, C22 accesses are used as before.
Signed-off-by: Marco Hartmann <marco.hartmann@nxp.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If qed_mcp_send_drv_version() fails, no cleanup is executed, leading to
memory leaks. To fix this issue, introduce the label 'err4' to perform the
cleanup work before returning the error.
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trap action should be copying the frame to CPU and
dropping it for forwarding, but current setting was just
copying frame to CPU.
Fixes: b596229448 ("net: mscc: ocelot: Add support for tcam")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump WQE shall not include Ethernet segment. Define mlx5e_dump_wqe to be
used for "Dump WQEs" instead of sharing it with the general mlx5e_tx_wqe
layout.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For TLS WQEs, metadata info did not include num_bytes. Due to this issue,
tx_tls_dump_bytes counter did not increment.
Modify tx_fill_wi() to fill num bytes. When it is called for non-traffic
WQE, zero is expected.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When fw fatal error occurs, poll health() first detects and reports on a
fw error. Afterwards, it detects and reports on the fw fatal error
itself.
That can cause a long delay in fw fatal error handling which waits in a
queue for the fw error handling to be finished. The fw error handle will
try asking for fw core dump command while fw in fatal state may not
respond and driver will wait for command timeout.
Changing the flow to detect and handle first fw fatal errors and only if
no fatal error detected look for a fw error to handle.
Fixes: d1bf0e2cc4 ("net/mlx5: Report devlink health on FW issues")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Crdump repeats itself every chunk of 256bytes.
That is due to bug of missing progressing offset while copying the data
from buffer to devlink_fmsg.
Fixes: 9b1f298236 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fixed a bug where driver was breaking out of the loop and
reporting an error without retrying first.
Signed-off-by: Marcin Formela <marcin.formela@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a function to read NVM module data and uses it to
read current LLDP agent configuration from NVM API version 1.8.
Signed-off-by: Sylwia Wnuczko <sylwia.wnuczko@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Driver waits after issuing a reset. When a reset takes too long a driver
gives up. Implemented by invoking PF reset in a loop. After defined
number of unsuccessful PF reset trials it returns error.
Without this patch PF reset fails when NIC is in recovery mode.
So make i40e_set_mac_type() public. i40e driver requires i40e_set_mac_type()
to be public. It is required for recovery mode handling. Without this patch
recovery mode could not be detected in i40e_probe().
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch removes function i40e_update_dcb_config(). Instead of
i40e_update_dcb_config() we use i40e_init_dcb(), which implements the
correct NVM read.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add update to the VSI pointer passed to the i40e_set_vf_mac function.
If VF is in reset state the driver waits in i40e_set_vf_mac function
for the reset to be complete, yet after reset the vsi pointer
that was passed into this function is no longer valid.
The patch updates local VSI pointer directly from pf->vsi array,
by using the id stored in VF pointer (lan_vsi_idx).
Without this commit the driver might occasionally invoke general
protection fault in kernel and disable the OS entirely.
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The stats structure for the VEB switch statistics is reset periodically,
but the tc_stats are not reset at the same time.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Upcoming FW increment API version to 1.9 due to Extend PHY access AQ
command support. SW is ready for that support as well.
Signed-off-by: Piotr Azarewicz <piotr.azarewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Function check_recovery_mode had wrong if statement.
Now we check proper FWS1B register values, which are responsible for
the recovery mode. Recovery mode has 4 values for x710 and 2 for x722.
That's why we need 6 different flags which are defined in the code.
Now in the if statement, we recognize type of mac address
and register value.
Without those changes driver could show wrong state.
Signed-off-by: Adrian Podlawski <adrian.podlawski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds "drop mode" parameter to set mac config AQ command.
This bit controls the behavior when a no-drop packet is blocking a TC
queue.
0 – The PF driver is notified.
1 – The blocking packet is dropped and then the PF driver is notified.
Signed-off-by: Sylwia Wnuczko <sylwia.wnuczko@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes following error reported by cppcheck:
(error) Shifting signed 32-bit value by 31 bits is undefined behaviour
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When i40e_configure_tx_ring(vsi->tx_rings[i]) returns an error, we should
exit from i40e_vsi_configure_tx and return the error, instead of continuing
to check whether xdp is enable, and configure the xdp transmit ring.
Signed-off-by: huhai <huhai@kylinos.cn>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Similar to the ixgbe issue fixed in:
655c914145 ("ixgbe: Check DDM existence in transceiver before access)
i40e has the same issue when reading eeprom from SFP's module that comply
with SFF-8472 but not implement the Digital Diagnostic Monitoring (DDM)
interface described in it. The existence of such area is specified by bit
6 of byte 92, set to 1 if implemented.
Without this patch, due to not checking this bit i40e fails to read SFP
module's eeprom with the follow message:
ethtool -m enP51p1s0f0
Cannot get Module EEPROM data: Input/output error
Because it fails to read the additional 256 bytes in which it was assumed
to exist the DDM data.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The functions i40e_aq_get_phy_abilities_resp() and i40e_set_fc() both
have giant structure on the stack, which makes each one use stack frames
larger than 500 bytes.
As clang decides one function into the other, we get a warning for
exceeding the frame size limit on 32-bit architectures:
drivers/net/ethernet/intel/i40e/i40e_common.c:1654:23: error: stack frame size of 1116 bytes in function 'i40e_set_fc' [-Werror,-Wframe-larger-than=]
When building with gcc, the inlining does not happen, but i40e_set_fc()
calls i40e_aq_get_phy_abilities_resp() anyway, so they add up on the
kernel stack just as much.
The parts that actually use large stacks don't overlap, so make sure
each one is a separate function, and mark them as noinline_for_stack to
prevent the compilers from combining them again.
Fixes: 0a862b43ac ("i40e/i40evf: Add module_types and update_link_info")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
HV VHCA stats agent is responsible on running a preiodic rx/tx
packets/bytes stats update. Currently the supported format is version
MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
transfer from the VF to the PF.
The reporter fetch the statistics data from all opened channels, fill it
in a buffer and send it to mlx5_hv_vhca_write_agent.
As the stats layer should include some metadata per block (sequence and
offset), the HV VHCA layer shall modify the buffer before actually send it
over block 1.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Control agent is responsible over of the control block (ID 0). It should
update the PF via this block about every capability change. In addition,
upon block 0 invalidate, it should activate all other supported agents
with data requests from the PF.
Upon agent create/destroy, the invalidate callback of the control agent
is being called in order to update the PF driver about this change.
The control agent is an integral part of HV VHCA and will be created
and destroy as part of the HV VHCA init/cleanup flow.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HV VHCA is a layer which provides PF to VF communication channel based on
HyperV PCI config channel. It implements Mellanox's Inter VHCA control
communication protocol. The protocol contains control block in order to
pass messages between the PF and VF drivers, and data blocks in order to
pass actual data.
The infrastructure is agent based. Each agent will be responsible of
contiguous buffer blocks in the VHCA config space. This infrastructure will
bind agents to their blocks, and those agents can only access read/write
the buffer blocks assigned to them. Each agent will provide three
callbacks (control, invalidate, cleanup). Control will be invoked when
block-0 is invalidated with a command that concerns this agent. Invalidate
callback will be invoked if one of the blocks assigned to this agent was
invalidated. Cleanup will be invoked before the agent is being freed in
order to clean all of its open resources or deferred works.
Block-0 serves as the control block. All execution commands from the PF
will be written by the PF over this block. VF will ack on those by
writing on block-0 as well. Its format is described by struct
mlx5_hv_vhca_control_block layout.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add wrapper functions for HyperV PCIe read / write /
block_invalidate_register operations. This will be used as an
infrastructure in the downstream patch for software communication.
This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a copy and paste error so we have "rx" where "tx" was intended
in the priv->tx[] array.
Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To remove dependency on rtnl lock and prevent neigh update code from
accessing uninitialized flows when executing concurrently with tc, extend
mlx5e_tc_flow with 'init_done' completion. Modify helper
mlx5e_take_all_encap_flows() to wait for flow completion after obtaining
reference to it. Modify mlx5e_tc_encap_flows_del() and
mlx5e_tc_encap_flows_add() to skip flows that don't have OFFLOADED flag
set, which can happen if concurrent flow initialization failed.
This commit finishes neigh update refactoring for concurrent execution
started in previous change in this series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to remove dependency on rtnl lock and allow neigh update workqueue
task to execute concurrently with tc, refactor mlx5e_rep_neigh_update() for
concurrent execution:
- Lock encap table when accessing encap entry to prevent concurrent
changes. To do this properly, the initial encap state check is moved from
mlx5e_rep_neigh_update() into mlx5e_rep_update_flows() to be performed
under encap_tbl_lock protection.
- Wait for encap to be fully initialized before accessing it by means of
'res_ready' completion.
- Add mlx5e_take_all_encap_flows() helper which is used to construct a
temporary list of flows and efi indexes that is used to access current
encap data in flow which can be attached to multiple encaps
simultaneously. Release the flows from temporary list after
encap_tbl_lock critical section. This is necessary because
mlx5e_flow_put() can't be called while holding encap_tbl_lock.
- Modify mlx5e_tc_encap_flows_add() and mlx5e_tc_encap_flows_del() to work
with user-provided list of flows built by mlx5e_take_all_encap_flows(),
instead of traversing encap flow list directly.
This is first step in complex neigh update refactoring, which is finished
by following commit in this series.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to remove dependency on rtnl lock and allow neigh used value
update workqueue task to execute concurrently with tc, refactor
mlx5e_tc_update_neigh_used_value() for concurrent execution:
- Lock encap table when accessing encap entry to prevent concurrent
changes.
- Save offloaded encap flows to temporary list and release them after encap
entry is updated. Add mlx5e_put_encap_flow_list() helper which is
intended to be shared with neigh update code in following patch in this
series. This is necessary because mlx5e_flow_put() can't be called while
holding encap_tbl_lock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on encap
list to their rcu counterparts and extending encap structure with rcu_head
to free the encap instances after rcu grace period. Use rcu read lock when
traversing encap list. Implement helper mlx5e_get_next_valid_encap()
function that is used by mlx5e_tc_update_neigh_used_value() to safely
iterate over valid entries of nhe->encap_list.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, always take neigh update encap lock when
modifying neigh update hash table and list. Originally, this lock was only
used to synchronize with netevent handler function, which is called from bh
context and cannot use rtnl lock for synchronization. Take lock in encap
entry attach function to prevent concurrent modifications of neigh update
hash table and list.
Taking the encap lock when creating new nhe introduces a problem that we
need to allocate new entry with sleeping GFP_KERNEL flag while holding a
spinlock. However, since previous patch in this series has already
converted lookup in netevent handler function to user rcu read lock instead
of encap lock, we can safely convert the lock type to mutex.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock and to allow unlocked iteration over list
of neigh hash entries, extend nhe with rcu. Change operations on neigh list
to their rcu counterparts and free neigh hash entry with rcu timeout.
Introduce mlx5e_get_next_nhe() helper that is used to iterate over rcu
neigh list with reference to nhe taken.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Neigh entry has reference counter, however it is only used when scheduling
neigh update event. In all other cases reference to neigh entry is not
taken while working with it. Neigh code relies on synchronization provided
by rtnl lock and uses encap list size as implicit reference counter.
To remove dependency on rtnl lock, always take reference to neigh entry
while using it. Remove neigh entry from hash table and delete it only when
reference counter reaches zero. This can result spurious neigh update
events, when there is an event on entry that has zero encaps attached.
However, such events are rare and properly handled by neigh update handler.
Extend encap entry with reference to neigh hash entry in order to be able
to directly release it when encap is detached, instead of lookup nhe by key
through hash table. Extend nhe with reference to device priv structure to
guarantee correctness when nhe is used with stack devices, bond setup, in
which case it is non-trivial to determine correct device when releasing the
nhe.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As a preparation for following refactoring that removes rtnl lock
dependency from neigh hash entry handlers, extract code that enqueues neigh
update work into standalone function. This commit doesn't change
functionality.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In certain cases when the probe function fails the error path calls
cpsw_remove_dt() before calling platform_set_drvdata(). This is an
issue as cpsw_remove_dt() uses platform_get_drvdata() to retrieve the
cpsw_common data and leds to a NULL pointer exception. This patches
fixes it by calling platform_set_drvdata() earlier in the probe.
Fixes: 83a8471ba2 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
Reported-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register supported packet traps (layer 2 drops only, currently) and
associated trap group with devlink during driver initialization.
The amount of traffic generated by these packet drop traps is capped at
10Kpps to ensure the CPU is not overwhelmed by incoming packets.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discard trap groups are defined in a different enum so that they could
all share the same policer ID: MLXSW_REG_HTGT_TRAP_GROUP_MAX + 1.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap IDs used to report layer 2 drops.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches will add discard traps support in mlxsw. The driver
cannot configure such traps with a normal trap action, but needs to use
exception trap action, which also increments an error counter.
On the other hand, when these traps are initialized or set to drop
action, they should use the default drop action set by the firmware.
This guarantees that when the feature is disabled we get the exact same
behavior as before the feature was introduced.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now the action of a trap was never changed during its lifetime.
This is going to change by subsequent patches that will allow devlink to
control the action of certain traps.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use different namespaces for bypass and switchdev loopback because they
have different priorities and default table miss action requirement:
1. bypass: with multiple priorities support, and
MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action;
2. switchdev loopback: with single priority support, and
MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss
action.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Currently all the namespaces under the same steering domain share the same
default table miss action, however in some situations (e.g., RDMA RX)
different actions are required. This patch adds a per-namespace default
table miss action instead of using the miss action of the steering domain.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that the single target build descends into sub-directories in the
same way as the normal build, these dummy Makefiles are not needed
any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This patchset introduces changes in mlx5 devlink health reporters.
The highlight of these changes is adding a new reporter: RX reporter
mlx5 RX reporter: reports and recovers from timeouts and RX completion
error.
1) Perform TX reporter cleanup. In order to maintain the
code flow as similar as possible between RX and TX reporters, start the
set with cleanup.
2) Prepare for code sharing, generalize and move shared
functionality.
3) Refactor and extend TX reporter diagnostics information
to align the TX reporter diagnostics output with the RX reporter's
diagnostics output.
4) Add helper functions Patch 11: Add RX reporter, initially
supports only the diagnostics call back.
5) Change ICOSQ (Internal Operations Send Queue) open/close flow to
avoid race between interface down and completion error recovery.
6) Introduce recovery flows for RX ring population timeout on ICOSQ,
and for completion errors on ICOSQ and on RQ (Regular receive queues).
7) Include RX reporters in mlx5 documentation.
8) Last two patches of this series, are trivial fixes for previously
submitted patches on this release cycle.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1cUzMACgkQSD+KveBX
+j7BCQgAvD7ETAE8Ma4e1SPQ0og5JXyzGd45/xM6k4cixcrYle0+tmT0M4zfY9jx
AwdNiOsHfR/zX9B4h8ZQnpcmIReiUV/N/EbhmvXoOm+iLDdSWcHSnN+KoZpCHzZW
tiqp6BpWIEHSUN6hq0zXbt8N9ZO9nh7y524jYCFVsmlanjwixPv7lFb8bmQ5wVJp
Xwe8tS6ObQ5zF9nKsS0WvDg4pJ+MAdETjtdNND0H+D2TelqXtSCGsqtIFJQwzuU/
v95UEDDILilJA0nAIXN+Uxz/SGZFDq4gDUQAbRgF7Cd6PW52ha9SYJtUBXOTKNLT
eDICgUBB2pBBNJZcpthJGfaIElMBLw==
=0cYb
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-08-15
This patchset introduces changes in mlx5 devlink health reporters.
The highlight of these changes is adding a new reporter: RX reporter
mlx5 RX reporter: reports and recovers from timeouts and RX completion
error.
1) Perform TX reporter cleanup. In order to maintain the
code flow as similar as possible between RX and TX reporters, start the
set with cleanup.
2) Prepare for code sharing, generalize and move shared
functionality.
3) Refactor and extend TX reporter diagnostics information
to align the TX reporter diagnostics output with the RX reporter's
diagnostics output.
4) Add helper functions Patch 11: Add RX reporter, initially
supports only the diagnostics call back.
5) Change ICOSQ (Internal Operations Send Queue) open/close flow to
avoid race between interface down and completion error recovery.
6) Introduce recovery flows for RX ring population timeout on ICOSQ,
and for completion errors on ICOSQ and on RQ (Regular receive queues).
7) Include RX reporters in mlx5 documentation.
8) Last two patches of this series, are trivial fixes for previously
submitted patches on this release cycle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When we fail to add/delete MAC filters in the VF, the print doesn't
distinguish between the two. Fix that by printing whether or not we
failed to add/delete the MAC filter respectively.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
These queue variables are being assigned values that are type u16.
Change the local variables to match these types. Since these
represent queue counts, they should never be negative.
Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In order to use some of the VF resources definition in the SR-IOV specific
virtchnl header file, this patch moves applicable code to
ice_virtchnl_pf.h file accordingly... and they should have been defined in
the destination file originally.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently we use the ICE_MBXQ_LEN for both the Mailbox send and receive
queues that are used to communicate with VFs. This is fine for the send
queue because the PF driver will lock the queue for every single send,
but for the Mailbox receive queue every VF is posting to its Mailbox
send queue and the hardware is then handing the message to the PF on its
Mailbox receive queue. This becomes a problem with many VFs because it
seems to overburden the Mailbox receive queue on the PF. Fix this by
increasing the Mailbox receive queue for the PF to 512 entries.
The number 512 was determined based on the number of VFs supported by
the device. We can have a total of 256 VFs so in the worst case this
allows the VFs to put 2 messages in the PFs Mailbox receive queue at the
same time.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently there are a couple places where the VF is waiting too long when
checking the status of registers. This is causing the AVF driver to
spin for longer than necessary in the __IAVF_STARTUP state. Sometimes
it causes the AVF to go into the __IAVF_COMM_FAILED, which may retrigger
the __IAVF_STARTUP state. Try to reduce the chance of this happening by
removing unnecessary wait times in VF bringup/resets.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Register access for GLINT_DYN_CTL and GLINT_VECT2FUNC should be within
the PF space and not the absolute device space.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
During rebuild ice_ena_vsi() is called to recover the VSI state.
This function assumes the PF VSI is always to be enabled, however,
it's possible that during reset/rebuild the interface can be
brought down. If this occurs, we can attempt to bring up the PF
VSI on a downed interface which can lead to various crashes. If
the interface is not running, do not bring up the associated VSI.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some circumstances, the hardware will hand us a receive descriptor
which has no data attached, but is otherwise valid. The receive code was
improperly ignoring these descriptors, which result in an infinite loop.
To fix this, change the receive code to process all descriptors,
regardless of the size of the associated data. Add checks to the
memory-handling functions to allow for zero size.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the set local MIB AQ call failures in the DCB rebuild path
by setting the defaults for the ETS recommended DCB configuration. Also,
willing bits for the DCB configuration needs to be set correctly. Resets
works fine in IEEE mode as the ETS recommended DCB configuration is
populated but not in CEE mode.
Without this patch, PFR causes the kernel hang in CEE mode.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently when busy polling is enabled we aren't setting/enabling
WB_ON_ITR in the driver. This doesn't break the driver, but it does
cause issues. If we don't enable WB_ON_ITR mode we will still get
write-backs from hardware during polling when a cache line has been
filled, but if a cache line is not filled we will not get the
write-back because WB_ON_ITR is not set. Fix this by enabling
WB_ON_ITR in the driver when interrupts are disabled.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When ETHTOOL_GLINKSETTINGS is defined get pause param pause->autoneg
reports SW configured setting, however when not defined get pause param
pause->autoneg reports the link status. Set pause param needs to compare
pause->autoneg with the same source as get pause param to block the user
from changing autoneg with the set pause param option, or the user
may be incorrectly blocked from changing Rx|Tx pause settings.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix documentation of mlx5_eq_enable/disable to cleanup compiler warnings.
drivers/net/ethernet/mellanox/mlx5/core//eq.c:334:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_enable'
drivers/net/ethernet/mellanox/mlx5/core//eq.c:355:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_disable'
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Previously, mlx5_cleanup_fc_stats() would cleanup the flow counter
pool beofre releasing all the counters to it, which would result in
flow counter bulks not getting freed. Resolve this by changing the
order in which elements of fc_stats are cleaned up, so that the flow
counter pool is cleaned up after all the counters are released.
Also move cleanup actions for freeing the bulk query memory and
destroying the idr to the end of mlx5_cleanup_fc_stats().
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from error on completion on RQ by
setting the queue back to ready state. Handle only errors with a
syndrome indicating the RQ might enter error state and could be
recovered.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Just to be aligned with the MPWQE handlers, handle RX WQE with error
for legacy RQs in the top RX handlers, just before calling skb_from_cqe().
CQE error handling will now be called at the same stage regardless of
the RQ type or netdev mode NIC, Representor, IPoIB, etc ..
This will be useful for down stream patch to improve error CQE
handling.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from rx timeout. On driver open we
post NOP work request on the rx channels to trigger napi in order to
fillup the rx rings. In case napi wasn't scheduled due to a lost
interrupt, perform EQ recovery.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from error on completion on ICOSQ.
Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to
ready state (firmware) and reset the ICOSQ and the RQ (software
resources). Finally, activate the ICOSQ and the RQ.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
open and activate where open handles creation and activate enables the
queue. Do a symmetric thing in close flow: split into close and
deactivate.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce helper functions for create and destroy reporters and update
channels. In the following patch, rx reporter is added and it will use
these helpers too.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The following patches in the set enhance the diagnostics info of tx
reporter. Therefore, it is better to pass a pointer to the SQ for
further data extraction.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Prepare for code sharing with rx reporter, which is added in the
following patches in the set. Introduce a generic error_ctx for
agnostic recovery despatch.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following
patches in the set rx reporter is added, the new naming convention is
more uniformed.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename reporter.h -> health.h so patches in the set can use it for
health related functionality.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch restructures how VFs are configured, and resources allocated.
Instead of freeing resources that were never allocated, and resetting
empty VFs that have never been created - the new flow will just allocate
resources for number of requested VFs based on the availability.
During VFs initialization process, global interrupt is disabled, and
rearmed after getting MSIX vectors for VFs. This allows immediate mailbox
communications, instead of delaying it till later and VFs.
PF communications resulted to using polling instead of actual interrupt.
The issue manifested when creating higher number of VFs (128 VFs) per PF.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently we divide budget by the number of Rx queues per Rx ring
container in ice_napi_poll even if there is only 1. This is an
unnecessary divide for the normal case of 1 Rx ring per Rx ring
container. Fix this by using an unlikely() call in the case where we
actually need to divide.
Also, we will always set budget_per_ring even if there are no Rx rings
in the Rx ring container so we don't need to initialize it to 0.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently in ice_get_tx_pending we try to read a Tx ring's tail. This is
then compared with the software based head (next_to_clean) to determine
if we have pending work. This will never work because reading of the Tx
ring's tail is no longer supported. Fix this by using the software based
tail (next_to_use) to determine if there is pending work.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When processing FLOW_BLOCK_BIND command on indirect block, check that flow
block cb is not busy.
Fixes: 0d4fd02e71 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the call to phy_attached_info() to the hns driver
to identify which exact PHY drivers is in use.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a Tx timestamp is requested, a pointer to the skb is stored in the
ravb_tstamp_skb struct. This was done without an skb_get. There exists
the possibility that the skb could be freed by ravb_tx_free (when
ravb_tx_free is called from ravb_start_xmit) before the timestamp was
processed, leading to a use-after-free bug.
Use skb_get when filling a ravb_tstamp_skb struct, and add appropriate
frees/consumes when a ravb_tstamp_skb struct is freed.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Tho Vu <tho.vu.wh@rvc.renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the MediaTek MT7628/88 SoCs to the common
MediaTek ethernet driver. Some minor changes are needed for this and
a bigger change, as the MT7628 does not support QDMA (only PDMA).
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the NEXT_RX_DESP_IDX macro to NEXT_DESP_IDX, so that it better
can be used for TX ops as well. This will be used in the upcoming
MT7628/88 support (same functionality for RX and TX in this macro).
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all QDMA registers are named "MTK_QDMA_foo" in this driver
with one exception: MTK_QMTK_INT_STATUS. This patch renames
MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUS so that all macros follow
this rule.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adaptive coalescing is managed per adapter not per event queue so it
does not needed to store 'enable' flag for each event queue.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver name gets exposed in sysfs under /sys/bus/pci/drivers
so it should look like other devices. Change it to be common
format (instead of "Cavium PTP").
This is a trivial fix that was observed by accident because
Debian kernels were building this driver into kernel (bug).
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need to wait until a completion is received to unmap
TX descriptor buffers that have been passed to the hypervisor.
Instead unmap it when the hypervisor call has completed. This patch
avoids the possibility that a buffer will not be unmapped because
a TX completion is lost or mishandled.
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Devesh K. Singh <devesh_singh@in.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW expects the driver to provide unique flow reference handles
for Tx or Rx flows. When a Tx flow and an Rx flow end up sharing
a reference handle, flow offload does not seem to work.
This could happen in the case of 2 flows having their L2 fields
wildcarded but in different direction.
Fix to incorporate the flow direction as part of the L2 key
v2: Move the dir field to the end of the bnxt_tc_l2_key struct to
fix the warning reported by kbuild test robot <lkp@intel.com>.
There is existing code that initializes the structure using
nested initializer and will warn with the new u8 field added to
the beginning. The structure also packs nicer when this new u8 is
added to the end of the structure [MChan].
Fixes: abd43a1352 ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Direction of the flow is determined using src_fid. For an RX flow,
src_fid is PF's fid and for TX flow, src_fid is VF's fid. Direction
of the flow must be specified, when getting statistics for that flow.
Currently, for DECAP flow, direction is determined incorrectly, i.e.,
direction is initialized as TX for DECAP flow, instead of RX. Because
of which, stats are not reported for this DECAP flow, though it is
offloaded and there is traffic for that flow, resulting in flow age out.
This patch fixes the problem by determining the DECAP flow's direction
using correct fid. Set the flow direction in all cases for consistency
even if 64-bit flow handle is not used.
Fixes: abd43a1352 ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For newly added NVM parameters, older firmware may not have the support.
Suppress the error message to avoid the unncessary error message which is
triggered when devlink calls the driver during initialization.
Fixes: 782a624d00 ("bnxt_en: Add bnxt_en initial params table and register it.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If FW returns FRAG_ERR in response error code, driver is resending the
command only when HWRM command returns success. Fix the code to resend
NVM_INSTALL_UPDATE command with DEFRAG install flags, if FW returns
FRAG_ERR in its response error code.
Fixes: cb4d1d6261 ("bnxt_en: Retry failed NVM_INSTALL_UPDATE with defragmentation flag enabled.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both RX buffers and RX aggregation buffers have to be
replenished at the end of NAPI, post the RX aggregation buffers first
before RX buffers. Otherwise, we may run into a situation where
there are only RX buffers without RX aggregation buffers for a split
second. This will cause the hardware to abort the RX packet and
report buffer errors, which will cause unnecessary cleanup by the
driver.
Ringing the Aggregation ring doorbell first before the RX ring doorbell
will prevent some of these buffer errors. Use the same sequence during
ring initialization as well.
Fixes: 697197e5a1 ("bnxt_en: Re-structure doorbells.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During device shutdown, the VNIC clearing sequence needs to be modified
to free the VNIC first before freeing the RSS contexts. The current
code is doing the reverse and we can get mis-directed RX completions
to CP ring ID 0 when the RSS contexts are freed and zeroed. The clearing
of RSS contexts is not required with the new sequence.
Refactor the VNIC clearing logic into a new function bnxt_clear_vnic()
and do the chip specific VNIC clearing sequence.
Fixes: 7b3af4f75b ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the call to phy_attached_info() to the hns3 driver
to identify which exact PHY drivers and models is in use.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC TNL interrupt is used to collect statistic info about
link status changing suddenly when netdev is running.
But when stopping netdev, the enabled MAC TNL interrupt is
unnecessary, and may add some noises to the statistic info.
So this patch disables it before stopping MAC.
Fixes: a63457878b ("net: hns3: Add handling of MAC tunnel interruption")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes print level of RAS error log from warning to error.
Because RAS error and its recovery process could cause application
failure. Also uses %u instead of %d when the parameter is unsigned.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pointer type parameter should be declare as const for preventing
from its pointed value being unexpected modified.
The uninitialized variable can not be return directly. The default
return value is 0 if no abnormal result.
This patch fixes the preceding two errors, deletes redundant
declaration of a function and align one parameter.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some temporary variables do not need to be initialized that
they will be set before used, so this patch deletes the
initialization value of these temporary variables.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huzhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To explain some code, this patch adds some comments, and modifies or
merges some comments to make them more neat.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 04f05230c5 ("bnx2x: Remove configured vlans as
part of unload sequence."), introduced a regression in driver
that as a part of VF's reload flow, VLANs created on the VF
doesn't get re-configured in hardware as vlan metadata/info
was not getting cleared for the VFs which causes vlan PING to stop.
This patch clears the vlan metadata/info so that VLANs gets
re-configured back in the hardware in VF's reload flow and
PING/traffic continues for VLANs created over the VFs.
Fixes: 04f05230c5 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds support for the new need_wakeup feature of AF_XDP. The
applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag.
When this feature is enabled, some behavior changes:
RX side: If the Fill Ring is empty, instead of busy-polling, set the
flag to tell the application to kick the driver when it refills the Fill
Ring.
TX side: If there are pending completions or packets queued for
transmission, set the flag to tell the application that it can skip the
sendto() syscall and save time.
The performance testing was performed on a machine with the following
configuration:
- 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz
- Mellanox ConnectX-5 Ex with 100 Gbit/s link
The results with retpoline disabled:
| without need_wakeup | with need_wakeup |
|----------------------|----------------------|
| one core | two cores | one core | two cores |
-------|----------|-----------|----------|-----------|
txonly | 20.1 | 33.5 | 29.0 | 34.2 |
rxdrop | 0.065 | 14.1 | 12.0 | 14.1 |
l2fwd | 0.032 | 7.3 | 6.6 | 7.2 |
"One core" means the application and NAPI run on the same core. "Two
cores" means they are pinned to different cores.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Two XSK tasks are performed during NAPI polling, that are not bound to
hardware interrupts: TXing packets and polling for frames in the Fill
Ring. They are special in a way that the hardware doesn't know about
these tasks, so it doesn't trigger interrupts if there is still some
work to be done, it's our driver's responsibility to ensure NAPI will be
rescheduled if needed.
Create a new function to handle these tasks and move the corresponding
code from mlx5e_napi_poll to the new function to improve modularity and
prepare for the changes in the following patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support for the need_wakeup feature of AF_XDP. If the
application has told the kernel that it might sleep using the new bind
flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
no more buffers on the NIC Rx ring and yield to the application. For
Tx, it will set the flag if it has no outstanding Tx completion
interrupts and return to the application.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support for the need_wakeup feature of AF_XDP. If the
application has told the kernel that it might sleep using the new bind
flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
no more buffers on the NIC Rx ring and yield to the application. For
Tx, it will set the flag if it has no outstanding Tx completion
interrupts and return to the application.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>