Commit Graph

352 Commits

Author SHA1 Message Date
Maor Gottlieb
45bf454ae8 net/mlx5e: Enabling aRFS mechanism
Accelerated RFS requires that ntuple filtering is enabled via
ethtool and driver supports ndo_rx_flow_steer.
When the ntuple filtering is enabled, we modify the l3_l4 ttc
rules to point on the aRFS flow tables and when the filtering
is disabled, we modify the l3_l4 ttc rules to point on the RSS
TIRs.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:12 -04:00
Maor Gottlieb
18c908e477 net/mlx5e: Add accelerated RFS support
Implement ndo_rx_flow_steer ndo.
A new flow steering rule will be composed from the
skb 4-tuple and added to the hardware aRFS flow table.

Each rule is stored in an internal hash table, if such
skb 4-tuple rule already exists we update the corresponding
hardware steering rule with the new destination.

For garbage collection rps_may_expire_flow will be
invoked for a limited amount of old rules upon any
ndo_rx_flow_steer invocation.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:11 -04:00
Maor Gottlieb
1cabe6b096 net/mlx5e: Create aRFS flow tables
Create the following four flow tables for aRFS usage:
1. IPv4 TCP - filtering 4-tuple of IPv4 TCP packets.
2. IPv6 TCP - filtering 4-tuple of IPv6 TCP packets.
3. IPv4 UDP - filtering 4-tuple of IPv4 UDP packets.
4. IPv6 UDP - filtering 4-tuple of IPv6 UDP packets.

Each flow table has two flow groups: one for the 4-tuple
filtering (full match)  and the other contains * rule for miss rule.

Full match rule means a hit for aRFS and packet will be forwarded
to the dedicated RQ/Core, miss rule packets will be forwarded to
default RSS hashing.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:11 -04:00
Maor Gottlieb
5a7b27eb9c net/mlx5: Initializing CPU reverse mapping
Allocating CPU rmap and add entry for each IRQ.
CPU rmap is used in aRFS to get the RX queue number
of the RX completion interrupts.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:11 -04:00
Maor Gottlieb
33cfaaa8f3 net/mlx5e: Split the main flow steering table
Currently, the main flow table is used for two purposes:
One is to do mac filtering and the other is to classify
the packet l3-l4 header in order to steer the packet to
the right RSS TIR.

This design is very complex, for each configured mac address we
have to add eleven rules (rule for each traffic type), the same if the
device is put to promiscuous/allmulti mode.
This scheme isn't scalable for future features like aRFS.

In order to simplify it, the main flow table is split to two flow
tables:
1. l2 table - filter the packet dmac address, if there is a match
we forward to the ttc flow table.

2. TTC (Traffic Type Classifier) table - classify the traffic
type of the packet and steer the packet to the right TIR.

In this new design, when new mac address is added, the driver adds
only one flow rule instead of eleven.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:11 -04:00
Maor Gottlieb
acff797cd1 net/mlx5e: Refactor mlx5e flow steering structs
Slightly refactor and re-order the flow steering structs,
tables and data-bases for better self-containment and
flexibility to add more future steering phases
(tables/rules/data bases) e.g: aRFS.

Changes:
1. Move the vlan DB and address DB into their table structs.
2. Rename steering table structs to unique format: mlx5e_*_table,
e.g: mlx5e_vlan_table.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:10 -04:00
Maor Gottlieb
13de6c106c net/mlx5: Support different attributes for priorities in namespace
Currently, namespace could be initialized only
with priorities with the same attributes.
Add support to initialize namespace with priorities
with different attributes(e.g. different number of levels).

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:10 -04:00
Maor Gottlieb
d63cd28608 net/mlx5: Add user chosen levels when allocating flow tables
Currently, consumers of the flow steering infrastructure can't
choose their own flow table levels and are limited to one
flow table per level. This just waste levels.
Instead, we introduce here the possibility to use multiple
flow tables in a level. The user is free to connect these
flow tables, while following the rule (FTEs in FT of level x
could only point to FTs of level y where y > x).

In addition this patch switch the order of the create/destroy
flow tables of the NIC(vlan and main).

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:09 -04:00
Maor Gottlieb
a257b94a18 net/mlx5: Set number of allowed levels in priority
Refactors the flow steering namespace creation,
by changing the name num_fts to num_levels.
When new flow table is created, the driver assign new level
to this flow table therefore the meaning is equivalent.
Since downstream patches will introduce the ability to create more
than one flow table per level, the name num_fts is no
longer accurate.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:09 -04:00
Maor Gottlieb
d745098ced net/mlx5: Introduce modify flow rule destination
This API is used for modifying the flow rule destination.
This is needed for modifying the pointed flow table by the
traffic type classifier rules to point on the aRFS tables.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:08 -04:00
Tariq Toukan
1da366964e net/mlx5e: Direct TIR per RQ
Introduce new TIRs for direct access per RQ.
Now we have 2 available kinds of TIRs:
	- indirect TIR per traffic type, each points to one RQT (RSS RQT)
          same as before.
	- New direct TIR per RQ, each points to RQT with a size of one
          that forwards packets to that RQ only.

Driver will open max channels (num cores) direct TIRs by default,
they will be filled with the actual RQs once channels are allocated.

Needed for downstream aRFS and ethtool direct steering functionalities.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:08 -04:00
Matthew Finlay
01a14098d3 net/mlx5e: Call vxlan_get_rx_port() with rtnl lock
Hold the rtnl lock when calling vxlan_get_rx_port().

Fixes: b7aade1548 ("vxlan: break dependency with netdev drivers")
Signed-off-by: Matthew Finlay <matt@mellanox.com>
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29 16:29:08 -04:00
Arnd Bergmann
6b87663fbe net/mlx5e: avoid stack overflow in mlx5e_open_channels
struct mlx5e_channel_param is a large structure that is allocated
on the stack of mlx5e_open_channels, and with a recent change
it has grown beyond the warning size for the maximum stack
that a single function should use:

mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels':
mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

The function is already using dynamic allocation and is not in
a fast path, so the easiest workaround is to use another kzalloc
for allocating the channel parameters.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: d3c9bc2743 ("net/mlx5e: Added ICO SQs")
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:46:59 -04:00
David S. Miller
c0cc53162a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes in the conflicts.

In the macsec case, the change of the default ID macro
name overlapped with the 64-bit netlink attribute alignment
fixes in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27 15:43:10 -04:00
Saeed Mahameed
1b223dd391 net/mlx5e: Fix checksum handling for non-stripped vlan packets
Now as rx-vlan offload can be disabled, packets can be received
with vlan tag not stripped, which means is_first_ethertype_ip will
return false, for that we need to check if the hardware reported
csum OK so we will report CHECKSUM_UNNECESSARY for those packets.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:03 -04:00
Gal Pressman
363501145e net/mlx5e: Add ethtool support for rxvlan-offload (vlan stripping)
Use ethtool -K <interface> rxvlan <on/off> to enable/disable
C-TAG vlan stripping by hardware.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
bb64143eee net/mlx5e: Add ethtool support for dump module EEPROM
Add query MCIA, PMLP registers infrastructure and commands.
Add ethtool support for get_module_info() and get_module_eeprom()
callbacks.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
da54d24ec3 net/mlx5e: Add ethtool support for interface identify (LED blinking)
Add the needed hardware command and mlx5_ifc structs for managing LED
control.
Add set_phys_id ethtool callback to support ethtool -p flag.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Eran Ben Elisha
94cb1ebbaf net/mlx5e: Add support for RXALL netdev feature
Introduce new access register named Ports Check Mask Register (PCMR) to
control all HW checks on port. With this register, the driver can
enable/disable Hardware FCS validation.

When RXALL is enabled/disabled using ndo_set_features, enable/disable
fcs check at HW.
User can change HW configuration using rx-all flag at ethtool.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:02 -04:00
Gal Pressman
0e405443e8 net/mlx5e: Improve set features ndo resiliency
In current mlx5e ndo_set_features implementation, setting some features
can success while others can fail. Today, we return one error code which
doesn't reflect the current features status of the netdev at the end of
the ndo callback.

Set netdev->features with features which were successfully set in order
to keep the current status in case of failure. For this purpose, define
new Macro to set/unset specific feature in netdev->features.

This patch introduces a mechanism that uses feature handlers for each
feature.
Set features will call a generic handler, which will then call a specific
handler in his turn and update netdev->features according to it's return
value. Each specific handler is responsible to perform driver specific
actions, and updating params if needed.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
121fcdc84d net/mlx5e: Add link down events counter
Expose link_down_events counter through ethtool -S.
This counter is read from PPort statistics, then proccessed and stored as
a special handling software counter.
This counter is stored along software counters since it is the only PPort
counter that it's size is not 64 bits.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
cf678570d5 net/mlx5e: Add per priority group to PPort counters
Expose counters providing information for each priority level (PCP) through
ethtool -S option and DCBNL.
This includes rx/tx bytes, frames, and pause counters.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
8075cb7238 net/mlx5e: Rename VPort counters
VPort and software counters names are confusing and may be unclear, all
VPort counters now have a prefix of rx/tx_vport_*.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
9218b44dcc net/mlx5e: Statistics handling refactoring
Redesign ethtool statistics handling and reporting in the driver:
1. Move counters to a separate file (en_stats.h).
2. Remove unnecessary dependencies between stats and strings.
3. Use counter descriptors which hold a name and offset for each counter,
   and will be used to decide which counters will be exposed.

For example when adding a new software counter to ethtool, instead of:
1. Add to stats struct.
2. Add to strings struct in the same order.
3. Change macro defining number of software counters.
The only thing needed is to link the new counter to a counter descriptor.

VPort counters are a set of hardware traffic counters created automatically
for each virtual port opened.
PPort counters are a set of counters describing per physical port
performance statistics.
These counters are gathered from hardware register and divided to groups
according to different protocols.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:01 -04:00
Gal Pressman
269e6b3af3 net/mlx5e: Report additional error statistics in get stats ndo
Provide rtnl_link_stats64 with information regarding physical errors to be
seen in ifconfig and ip tool.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:58:00 -04:00
Majd Dibbiny
5fc7197d3a net/mlx5: Add pci shutdown callback
This patch introduces kexec support for mlx5.
When switching kernels, kexec() calls shutdown, which unloads
the driver and cleans its resources.

In addition, remove unregister netdev from shutdown flow. This will
allow a clean shutdown, even if some netdev clients did not release their
reference from this netdev. Releasing The HW resources only is enough as
the kernel is shutting down

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Eli Cohen
78228cbdeb net/mlx5_core: Remove static from local variable
The static is not required and breaks re-entrancy if it will be required.

Fixes: 2530236303 ("net/mlx5_core: Flow steering tree initialization")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Saeed Mahameed
cd255efff9 net/mlx5e: Use vport MTU rather than physical port MTU
Set and report vport MTU rather than physical MTU,
Driver will set both vport and physical port mtu and will
rely on the query of vport mtu.

SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.

Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Saeed Mahameed
d8edd2469a net/mlx5e: Fix minimum MTU
Minimum MTU that can be set in Connectx4 device is 68.

This fixes the case where a user wants to set invalid MTU,
the driver will fail to satisfy this request and the interface
will stay down.

It is better to report an error and continue working with old
mtu.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:39 -04:00
Saeed Mahameed
046339eaab net/mlx5e: Device's mtu field is u16 and not int
For set/query MTU port firmware commands the MTU field
is 16 bits, here I changed all the "int mtu" parameters
of the functions wrapping those firmware commands to be u16.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:38 -04:00
Majd Dibbiny
64dbbdfef2 net/mlx5_core: Add ConnectX-5 to list of supported devices
Add the upcoming ConnectX-5 devices (PF and VF) to the list of
supported devices by the mlx5 driver.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:38 -04:00
Rana Shahout
6e4c218946 net/mlx5e: Fix MLX5E_100BASE_T define
Bit 25 of eth_proto_capability in PTYS register is
1000Base-TT and not 100Base-T.

Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to
support ConnectX-4 Ethernet functionality')
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:38 -04:00
Maor Gottlieb
c3f9bf628b net/mlx5_core: Fix soft lockup in steering error flow
In the error flow of adding flow rule to auto-grouped flow
table, we call to tree_remove_node.

tree_remove_node locks the node's parent, however the node's parent
is already locked by mlx5_add_flow_rule and this causes a deadlock.
After this patch, if we failed to add the flow rule, we unlock the
flow table before calling to tree_remove_node.

fixes: f0d22d1874 ('net/mlx5_core: Introduce flow steering autogrouped
flow table')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reported-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-24 14:51:38 -04:00
Tariq Toukan
5498440756 net/mlx5e: Add ethtool counter for RX buffer allocation failures
Counts the number of RX buffer allocation failures and shows it
in ethtool statistics.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:06 -04:00
Saeed Mahameed
e20a0db304 net/mlx5e: Delay skb->data access
Move mlx5e_handle_csum and eth_type_trans to the end of
mlx5e_build_rx_skb to gain some more time before accessing
skb->data, to reduce cache misses.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:06 -04:00
Tariq Toukan
1bfec31627 net/mlx5e: Remove redundant barrier
The bit-op operation one line before is an explicit barrier
by itself.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
c5adb96f6c net/mlx5e: Use napi_alloc_skb for RX SKB allocations
Instead of netdev_alloc_skb, we use the napi_alloc_skb function
which is designated to allocate skbuff's for RX in a
channel-specific NAPI instance, and implies the IP packet alignment.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
bc77b240b3 net/mlx5e: Add fragmented memory support for RX multi packet WQE
If the allocation of a linear (physically continuous) MPWQE fails,
we allocate a fragmented MPWQE.

This is implemented via device's UMR (User Memory Registration)
which allows to register multiple memory fragments into ConnectX
hardware as a continuous buffer.
UMR registration is an asynchronous operation and is done via
ICO SQs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
d3c9bc2743 net/mlx5e: Added ICO SQs
Added ICO (Internal Control Operations) SQ per channel to be used
for driver internal operations such as memory registration for
fragmented memory and nop requests upon ifconfig up.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
461017cb00 net/mlx5e: Support RX multi-packet WQE (Striding RQ)
Introduce the feature of multi-packet WQE (RX Work Queue Element)
referred to as (MPWQE or Striding RQ), in which WQEs are larger
and serve multiple packets each.

Every WQE consists of many strides of the same size, every received
packet is aligned to a beginning of a stride and is written to
consecutive strides within a WQE.

In the regular approach, each regular WQE is big enough to be capable
of serving one received packet of any size up to MTU or 64K in case of
device LRO is enabled, making it very wasteful when dealing with
small packets or device LRO is enabled.

For its flexibility, MPWQE allows a better memory utilization
(implying improvements in CPU utilization and packet rate) as packets
consume strides according to their size, preserving the rest of
the WQE to be available for other packets.

MPWQE default configuration:
	Num of WQEs	= 16
	Strides Per WQE = 2048
	Stride Size	= 64 byte

The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
16 * 2048 * 64 = 2MB per ring.
However, HW LRO can now be supported at no additional cost in memory
footprint, and hence we turn it on by default and get an even better
performance.

Performance tested on ConnectX4-Lx 50G.
To isolate the feature under test, the numbers below were measured with
HW LRO turned off. We verified that the performance just improves when
LRO is turned back on.

* Netperf single TCP stream:
- BW raised by 10-15% for representative packet sizes:
  default, 64B, 1024B, 1478B, 65536B.

* Netperf multi TCP stream:
- No degradation, line rate reached.

* Pktgen: packet rate raised by 2-10% for traffic of different message
sizes: 64B, 128B, 256B, 1024B, and 1500B.

* Pktgen: packet loss in bursts of small messages (64byte),
single stream:
- | num packets | packets loss before | packets loss after
  |     2K      |       ~ 1K          |       0
  |     8K      |       ~ 6K          |       0
  |     16K     |       ~13K          |       0
  |     32K     |       ~28K          |       0
  |     64K     |       ~57K          |     ~24K

As expected as the driver can receive as many small packets (<=64B) as
the number of total strides in the ring (default = 2048 * 16) vs. 1024
(default ring size regardless of packets size) before this feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:05 -04:00
Tariq Toukan
2f48af128d net/mlx5e: Use function pointers for RX data path handling
In preparation for Striding RQ feature, which will need its own
RX handlers.
This patch does not change any functionality.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Tariq Toukan
d8c9660dac net/mlx5e: Use only close NUMA node for default RSS
Distribute default RSS table uniformly over the rings of the
close NUMA node, instead of all available channels.
This way we enforce the preference of close rings over far ones.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Rana Shahout
593cf33829 net/mlx5e: Allocate set of queue counters per netdev
Connect all netdev RQs to this set of queue counters.
Also, add an "rx_out_of_buffer" counter to ethtool,
which indicates RX packet drops due to lack of receive
buffers.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Tariq Toukan
237cd21809 net/mlx5: Introduce device queue counters
A queue counter can collect several statistics for one or more
hardware queues (QPs, RQs, etc ..) that the counter is attached to.

For Ethernet it will provide an "out of buffer" counter which
collects the number of all packets that are dropped due to lack
of software buffers.

Here we add device commands to alloc/query/dealloc queue counters.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:09:04 -04:00
Linus Torvalds
b8ba452683 Round two of 4.6 merge window patches
- A few minor core fixups needed for the next patch series
 - The IB SRIOV series.  This has bounced around for several versions.
   Of note is the fact that the first patch in this series effects
   the net core.  It was directed to netdev and DaveM for each iteration
   of the series (three versions total).  Dave did not object, but did
   not respond either.  I've taken this as permission to move forward
   with the series.
 - The new Intel X722 iWARP driver
 - A huge set of updates to the Intel hfi1 driver.  Of particular interest
   here is that we have left the driver in staging since it still has an
   API that people object to.  Intel is working on a fix, but getting
   these patches in now helps keep me sane as the upstream and Intel's
   trees were over 300 patches apart.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK
 H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ
 Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r
 WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w
 iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR
 c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2
 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv
 cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx
 VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7
 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C
 arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl
 2IRxQ+ltNEscb2cwi5wE
 =t2Ko
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "Round two of 4.6 merge window patches.

  This is a monster pull request.  I held off on the hfi1 driver updates
  (the hfi1 driver is intimately tied to the qib driver and the new
  rdmavt software library that was created to help both of them) in my
  first pull request.  The hfi1/qib/rdmavt update is probably 90% of
  this pull request.  The hfi1 driver is being left in staging so that
  it can be fixed up in regards to the API that Al and yourself didn't
  like.  Intel has agreed to do the work, but in the meantime, this
  clears out 300+ patches in the backlog queue and brings my tree and
  their tree closer to sync.

  This also includes about 10 patches to the core and a few to mlx5 to
  create an infrastructure for configuring SRIOV ports on IB devices.
  That series includes one patch to the net core that we sent to netdev@
  and Dave Miller with each of the three revisions to the series.  We
  didn't get any response to the patch, so we took that as implicit
  approval.

  Finally, this series includes Intel's new iWARP driver for their x722
  cards.  It's not nearly the beast as the hfi1 driver.  It also has a
  linux-next merge issue, but that has been resolved and it now passes
  just fine.

  Summary:

   - A few minor core fixups needed for the next patch series

   - The IB SRIOV series.  This has bounced around for several versions.
     Of note is the fact that the first patch in this series effects the
     net core.  It was directed to netdev and DaveM for each iteration
     of the series (three versions total).  Dave did not object, but did
     not respond either.  I've taken this as permission to move forward
     with the series.

   - The new Intel X722 iWARP driver

   - A huge set of updates to the Intel hfi1 driver.  Of particular
     interest here is that we have left the driver in staging since it
     still has an API that people object to.  Intel is working on a fix,
     but getting these patches in now helps keep me sane as the upstream
     and Intel's trees were over 300 patches apart"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits)
  IB/ipoib: Allow mcast packets from other VFs
  IB/mlx5: Implement callbacks for manipulating VFs
  net/mlx5_core: Implement modify HCA vport command
  net/mlx5_core: Add VF param when querying vport counter
  IB/ipoib: Add ndo operations for configuring VFs
  IB/core: Add interfaces to control VF attributes
  IB/core: Support accessing SA in virtualized environment
  IB/core: Add subnet prefix to port info
  IB/mlx5: Fix decision on using MAD_IFC
  net/core: Add support for configuring VF GUIDs
  IB/{core, ulp} Support above 32 possible device capability flags
  IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
  net/mlx5_core: Introduce offload arithmetic hardware capabilities
  net/mlx5_core: Refactor device capability function
  net/mlx5_core: Fix caching ATOMIC endian mode capability
  ib_srpt: fix a WARN_ON() message
  i40iw: Replace the obsolete crypto hash interface with shash
  IB/hfi1: Add SDMA cache eviction algorithm
  IB/hfi1: Switch to using the pin query function
  IB/hfi1: Specify mm when releasing pages
  ...
2016-03-22 15:48:44 -07:00
Eli Cohen
1f324bff9b net/mlx5_core: Implement modify HCA vport command
Implement the modify HCA vport commands used to modify the parameters of
virtual HCA's ports.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
2a4826fe74 net/mlx5_core: Add VF param when querying vport counter
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to
query counters of virtual functions. Also update current users of the
API.

PFs may call mlx5_core_query_vport_counter with other_vport set to
indicate that they are querying a virtual function. The virtual
function to be queried is given by the vf parameter. Virtual function
numbering is zero based so the first VF is 0 and so on. When a PF
queries its own function, the other_vport parameter is cleared.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Sagi Grimberg
3f0393a575 net/mlx5_core: Introduce offload arithmetic hardware capabilities
Define the necessary hardware structures for the offload
arithmetic capabilities and read/cache them on driver load.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:35 -04:00
Leon Romanovsky
b06e7de8a9 net/mlx5_core: Refactor device capability function
Device capability function was called similar in all places.
It was called twice for every queried parameter, while the
difference between calls was in HCA capability mode only.

The change proposed unify these calls into one function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:29:07 -04:00
Leon Romanovsky
91d9ed8443 net/mlx5_core: Fix caching ATOMIC endian mode capability
Add caching of maximum device capability of ATOMIC endian mode.

Fixes: f91e6d8941 ('net/mlx5_core: Add setting ATOMIC endian mode')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:29:07 -04:00