Commit Graph

2290 Commits

Author SHA1 Message Date
David S. Miller
3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
Jiri Pirko
9bcdef3288 spectrum: acl_tcam: Fix catchall prio value
This fixes an issue reported by smatch:
mlxsw_sp_acl_tcam_chunk_create() warn: impossible condition '(priority == (-1)) => (0-u32max == u64max)'

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 22a677661f ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 14:15:21 -05:00
David S. Miller
501ec18757 mlx5-updates-2017-01-31
This series includes some updates to mlx5 core and ethernet driver.
 
 We got one patch from Or to fix some static checker warnings.
 
 2nd patche from Dan came to add the support for 128B cache line
 in the HCA, which will configures the hardware to use 128B alignment only
 on systems with 128B cache lines, otherwise it will be kept as the current
 default of 64B.
 
 From me three patches to support no inline copy on TX on ConnectX-5 and
 later HCAs.  Starting with two small infrastructure changes and
 refactoring patches followed by two patches to add the actual support for
 both xmit ndo and XDP xmit routines.
 Last patch is a simple fix to return a mistakenly removed pointer from the
 SQ structure, which was remove in previous submission of mlx5 4K UAR.
 
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYmQKaAAoJEEg/ir3gV/o+RBMH/RGHNw3yPB2MyWo28V3eabw+
 xl/SymiNOUgmq03ULYoc6xJpi9RCya7m/Kyce1M/M1gSz6LXubG2IDw9QsKV8lnc
 +5rwHCKjop6MdR3khsgqvWqGiKfQN0+QON5MjlPZB3/4u8qFcjauhfXpiX9naMO5
 aB/Sm9zRPwRnsEhy2AwPyZqOxe5boZzHqmZxpthIgPMtqbpBYNkTkooljsj/KqXf
 AO3y/mdGykELPF3lIHTE4X9zixx5s6MrlAYX2uGUrAojs2WVIBsq3iXI/J8X9zs/
 lg7to15WoMttR66vRZ120U6tx17OMmoxuAp+bmgZumabi/wDAZGSy5ELbH28WlY=
 =F+t/
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-01-31

This series includes some updates to mlx5 core and ethernet driver.

We got one patch from Or to fix some static checker warnings.

2nd patche from Dan came to add the support for 128B cache line
in the HCA, which will configures the hardware to use 128B alignment only
on systems with 128B cache lines, otherwise it will be kept as the current
default of 64B.

From me three patches to support no inline copy on TX on ConnectX-5 and
later HCAs.  Starting with two small infrastructure changes and
refactoring patches followed by two patches to add the actual support for
both xmit ndo and XDP xmit routines.
Last patch is a simple fix to return a mistakenly removed pointer from the
SQ structure, which was remove in previous submission of mlx5 4K UAR.

Saeed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 13:44:08 -05:00
Benjamin Poirier
bd4ce941c8 mlx4: Invoke softirqs after napi_reschedule
mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame and the following message may be logged:
NOHZ: local_softirq_pending 08

The problem is the same as what was described in commit ec13ee8014
("virtio_net: invoke softirqs after __napi_schedule") and this patch
applies the same fix to mlx4.

Fixes: 07841f9d94 ("net/mlx4_en: Schedule napi when RX buffers allocation fails")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 12:50:43 -05:00
Arnd Bergmann
8d1fb01df8 mlxsw: add psample dependency for spectrum
When PSAMPLE is a loadable module, spectrum must not be built-in:

drivers/net/built-in.o: In function `mlxsw_sp_rx_listener_sample_func':
spectrum.c:(.text+0xe357e): undefined reference to `psample_sample_packet'

This adds a Kconfig dependency to enforce usable configurations.

Fixes: 98d0f7b9ac ("mlxsw: spectrum: Add packet sample offloading support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 11:44:12 -05:00
Arnd Bergmann
321fa4ffd9 net/mlx5e: fix another maybe-uninitialized false-positive
In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

We can apply the same workaround here as well.

Fixes: ce99f6b97f ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:35:12 -05:00
Dan Carpenter
73cfb2a2e4 net/mlx4_en: fix a condition
There is a "||" vs "|" typo here so we test 0x1 instead of 0x6.

Fixes: 1f8176f735 ("net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 12:01:06 -05:00
Ido Schimmel
fd76d9105b mlxsw: spectrum_router: Fix typo in comment
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
01b1aa359d mlxsw: spectrum_router: Don't read 'nud_state' without lock
We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.

However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
8a0b727526 mlxsw: spectrum_router: Remove redundant check
We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.

Remove the redundant check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:57 -05:00
Ido Schimmel
5c8802f14a mlxsw: spectrum_router: Simplify neighbour reflection
Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.

However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.

We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
de04b6a358 mlxsw: spectrum_router: Remove unused variable
Since commit 33b1341cd1 ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
e60234ddb5 mlxsw: spectrum_router: Use ordered workqueue for neigh updates
We currently associate each neighbour entry with a work item, so it's
not possible to have multiple events queued for the same neighbour
entry. However, this is about to be changed so that the neighbour entry
is only resolved when the work item is scheduled.

The above can result in a mismatch between the kernel's and the device's
neighbour table, unless the associated work items are processed in the
order in which they were submitted.

Do that by migrating the NEIGH_UPDATE work items to be processed in the
ordered workqueue which was recently introduced in mlxsw in commit
a3832b3189 ("mlxsw: core: Create an ordered workqueue for FIB
offload").

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:56 -05:00
Ido Schimmel
a0e4761d9b mlxsw: core: Queue work immediately instead of delaying it
We always use zero delay before queueing a work on the ordered workqueue
('mlxsw_owq'), so use work_struct directly instead of delayable work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:25:55 -05:00
Saeed Mahameed
8ca967ab67 net/mlx5e: Bring back bfreg uar map dedicated pointer
4K Uar series modified the mlx5e driver to use the new bfreg API,
and mistakenly removed the sq->uar_map iomem data path dedicated
pointer, which was meant to be read from xmit path for cache locality
utilization.

Fix that by returning that pointer to the SQ struct.

Fixes: 7309cb4ad71e ("IB/mlx5: Support 4k UAR for libmlx5")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-02-06 18:20:18 +02:00
Saeed Mahameed
b70149dd7d net/mlx5e: XDP Tx, no inline copy on ConnectX-5
ConnectX-5 and later HW generations will report min inline mode ==
MLX5_INLINE_MODE_NONE, which means driver is not required to copy packet
headers to inline fields of TX WQE.

Avoid copy to inline segment in XDP TX routine when HW inline mode doesn't
require it.

This will improve CPU utilization and boost XDP TX performance.

Tested with xdp2 single flow:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
HCA: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

Before: 7.4Mpps
After:  7.8Mpps
Improvement: 5%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-02-06 18:20:18 +02:00
Saeed Mahameed
a6f402e499 net/mlx5e: Tx, no inline copy on ConnectX-5
ConnectX-5 and later HW generations will report min inline mode ==
MLX5_INLINE_MODE_NONE, which means driver is not required to copy packet
headers to inline fields of TX WQE.

When inline is not required, vlan insertion will be handled in the
TX descriptor rather than copy to inline.

For LSO case driver is still required to copy headers, for the HW to
duplicate on wire.

This will improve CPU utilization and boost TX performance.

Tested with pktgen burst single flow:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
HCA: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

Before: 15.1Mpps
After:  17.2Mpps
Improvement: 14%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-02-06 18:20:17 +02:00
Saeed Mahameed
2b31f7ae5f net/mlx5: TX WQE update
Add new TX WQE fields for Connect-X5 vlan insertion support,
type and vlan_tci, when type = MLX5_ETH_WQE_INSERT_VLAN the
HW will insert the vlan and prio fields (vlan_tci) to the packet.

Those bits and the inline header fields are mutually exclusive, and
valid only when:
MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_NOT_REQUIRED
and MLX5_CAP_ETH(mdev, wqe_vlan_insert),
who will be set in ConnectX-5 and later HW generations.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-02-06 18:20:16 +02:00
Daniel Jurgens
f32f5bd2eb net/mlx5: Configure cache line size for start and end padding
There is a hardware feature that will pad the start or end of a DMA to
be cache line aligned to avoid RMWs on the last cache line. The default
cache line size setting for this feature is 64B. This change configures
the hardware to use 128B alignment on systems with 128B cache lines.

In addition we lower bound MPWRQ stride by HCA cacheline in mlx5e,
MPWRQ stride should be at least the HCA cacheline, the current default
is 64B and in case HCA_CAP.cach_line_128byte capability is set, MPWRQ RX
stride will automatically be aligned to 128B.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-02-06 18:17:25 +02:00
Elad Raz
e158e5ef24 mlxsw: reg: Fix HTGT register length
HTGT register length is limited to 32 bytes and not 256 bytes.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:07:21 -05:00
Jiri Pirko
7aa0f5aa90 mlxsw: spectrum: Implement TC flower offload
Extend the existing setup_tc ndo call and allow to offload cls_flower
rules. Only limited set of dissector keys and actions are supported now.
Use previously introduced ACL infrastructure to offload cls_flower rules
to be processed in the HW.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:43 -05:00
Jiri Pirko
22a677661f mlxsw: spectrum: Introduce ACL core with simple TCAM implementation
Add ACL core infrastructure for Spectrum ASIC. This infra provides an
abstraction layer over specific HW implementations. There are two basic
objects used. One is "rule" and the second is "ruleset" which serves as a
container of multiple rules. In general, within one ruleset the rules are
allowed to have multiple priorities and masks. Each ruleset is bound to
either ingress or egress a of port netdevice.

The initial TCAM implementation is very simple and limited. It utilizes
parman lsort manager to take care of TCAM region layout.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:43 -05:00
Jiri Pirko
8708ecf01d mlxsw: resources: Add ACL related resources
Add couple of resource limits related to ACL.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:42 -05:00
Jiri Pirko
b876b9aaad mlxsw: spectrum: Introduce basic set of flexible key blocks
Introduce basic set of Spectrum flexible key blocks. It contains blocks
needed to carry all elements defined so far.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:41 -05:00
Jiri Pirko
4cda7d8d70 mlxsw: core: Introduce flexible actions support
Each entry which is matched during ACL lookup points to an action set.
This action set contains up to three separate actions. If more actions
are needed to be chained, the extended set is created to hold them
in KVD linear area.

This patch implements handling of sets and encoding of actions.
Currectly, only two actions are supported. Drop and forward. Forward
action uses PBS pointer to KVD linear area, so the action code needs to
take care of this as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:41 -05:00
Jiri Pirko
3f1a84e696 mlxsw: core: Introduce flexible keys support
Hardware supports matching on so called "flexible keys". The idea is to
assemble an optimal key to use for matching according to the fields in
packet (elements) requested by user. Certain sets of elements are
combined into pre-defined blocks. There is a picker to find needed blocks.
Keys consist of 1..n blocks.

Alongside with that, an initial portion of elements is introduced in order
to be able to offload basic cls_flower rules.

Picked keys are cached so multiple rules could share them.

There is an encode function provided that takes care of encoding key and
mask values according to given key.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:41 -05:00
Jiri Pirko
e3426e12fe mlxsw: reg: Add Policy-Engine Extended Flexible Action Register
PEFA register is used for accessing an extended flexible action entry
in the central KVD Linear Database.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:40 -05:00
Jiri Pirko
d120649d86 mlxsw: reg: Add Policy-Engine Policy Based Switching Register
The PPBS register retrieves and sets Policy Based Switching Table entries.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:40 -05:00
Jiri Pirko
937b682cc0 mlxsw: reg: Add Policy-Engine Rules Copy Register
The PRCR register is used for accessing rules within a TCAM region.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:40 -05:00
Jiri Pirko
af7170eee6 mlxsw: reg: Add Policy-Engine Port Binding Table
The PPBT is used for configuration of the Port Binding Table.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:39 -05:00
Jiri Pirko
0171cdec03 mlxsw: reg: Add Policy-Engine TCAM Entry Register Version 2
The PTCE-V2 register is used for accessing rules within a TCAM region.
It is a new version of PTCE in order to support wider key, mask and
action within a TCAM region.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:39 -05:00
Jiri Pirko
d9c2661e1c mlxsw: reg: Add Policy-Engine TCAM Allocation Register
The PTAR register is used for allocation of regions in the TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:39 -05:00
Jiri Pirko
10fabef513 mlxsw: reg: Add Policy-Engine ACL Group Table register
The PAGT register is used for configuration of the ACL Group Table.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:38 -05:00
Jiri Pirko
3279da4c88 mlxsw: reg: Add Policy-Engine ACL Register
The PACL register is used for configuration of the ACL.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:38 -05:00
Jiri Pirko
d5e556c6a1 mlxsw: item: Add helpers for getting pointer into payload for char buffer item
Sometimes it is handy to get a pointer to a char buffer item and use it
direcly to write/read data. So add these helpers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:38 -05:00
Jiri Pirko
2946fde9fd mlxsw: item: Add 8bit item helpers
Item heplers for 8bit values are needed, let's add them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:35:37 -05:00
Martin KaFai Lau
770f82253d mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'
After calling mlx4_en_try_alloc_resources (e.g. by changing the
number of rx-queues with ethtool -L), the existing xdp_prog becomes
inactive.

The bug is that the xdp_prog ptr has not been carried over from
the old rx-queues to the new rx-queues

Fixes: 47a38e1550 ("net/mlx4_en: add support for fast rx drop bpf program")
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:27:05 -05:00
Martin KaFai Lau
f32b20e89e mlx4: Fix memory leak after mlx4_en_update_priv()
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
are over-written by src->tx_ring[t] and src->tx_cq[t] without
first calling kfree.

One of the reproducible code paths is by doing 'ethtool -L'.

The fix is to do the kfree in mlx4_en_free_resources().

Here is the kmemleak report:
unreferenced object 0xffff880841211800 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff880841213000 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

(gdb) list *mlx4_en_try_alloc_resources+0x118
0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
2140                    if (!dst->tx_ring_num[t])
2141                            continue;
2142
2143                    dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
2144                                              MAX_TX_RINGS, GFP_KERNEL);
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
(gdb) list *mlx4_en_try_alloc_resources+0x13b
0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
2150                    if (!dst->tx_cq[t]) {
2151                            kfree(dst->tx_ring[t]);
2152                            goto err_free_tx;
2153                    }
2154            }

Fixes: ec25bc04ed ("net/mlx4_en: Add resilience in low memory systems")
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:27:05 -05:00
David S. Miller
e2160156bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All merge conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 16:54:00 -05:00
Jack Morgenstein
d585df1c5c net/mlx4_core: Avoid command timeouts during VF driver device shutdown
Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.

In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.

The result is that the VF driver on the VM will experience a command
timeout during the shutdown process when the Hypervisor does not deliver
a command-completion event to the VM.

To avoid FW command timeouts on the VM when the driver's shutdown method
is invoked, we detect the absence of the VF's comm channel at the very
start of the shutdown process. If the comm-channel has already been
disabled, we cause all FW commands during the device shutdown process to
immediately return success (and thus avoid all command timeouts).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:45:27 -05:00
Shaker Daibes
1f8176f735 net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow
Make sure pptx/pprx mask flag is set using new fields upon set port
request. In addition, move this code into a helper function for better
code readability.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:43 -05:00
Shaker Daibes
bf1f939683 net/mlx4_en: Check the enabling mtu flag in SET_PORT wrapper flow
Make sure MTU mask flag is set using new field upon set port
request. In addition, move this code into a helper function for better
code readability.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:43 -05:00
Shaker Daibes
40fb4fc1e1 net/mlx4_en: Pass user MTU value to Firmware at set port command
When starting the port, driver will inform Firmware about the actual MTU
which does not include implicit headers, such as FCS or VLAN tags.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:43 -05:00
Ariel Levkovich
297e1cf29e net/mlx4_en: Adding support of turning off link autonegotiation via ethtool
This feature will allow the user to disable auto negotiation
on the port for mlx4 devices while setting the speed is limited
to 1GbE speeds.
Other speeds will not be accepted in autoneg off mode.

This functionality is permitted providing that the firmware
is compatible with this feature.
The above is determined by querying a new dedicated capability
bit in the device.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:42 -05:00
Alaa Hleihel
4b5e5b7ece net/mlx4_core: Get num_tc using netdev_get_num_tc
Avoid reading num_tc directly from struct net_device, but use
the helper function netdev_get_num_tc.

Fixes: bc6a4744b8 ("net/mlx4_en: num cores tx rings for every UP")
Fixes: f5b6345ba8 ("net/mlx4_en: User prio mapping gets corrupted when changing number of channels")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:42 -05:00
Matan Barak
ae5a2e29d1 net/mlx4_core: Add resource alloc/dealloc debugging
In order to aid debugging of functions that take a resource but
don't put it, add the last function name that successfully grabbed
this resource.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:42 -05:00
Yishai Hadas
3835336401 net/mlx4_core: Device revision support
The device revision field returned by the NodeInfo MAD is incorrect
on ConnectX3 devices.

This patch is driver side handling to complete a FW fix added at 2.11.1172.
INIT_HCA - bit at offset 0x0C.12 is set to 1 so that FW will report
correct device revision.

Older FW versions won't be affected from turning on that bit,
no capability bit is needed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:42 -05:00
Tariq Toukan
72b8eaab24 net/mlx4: Replace ENOSYS with better fitting error codes
Conform the following warning:
WARNING: ENOSYS means 'invalid syscall nr' and nothing else.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:41 -05:00
Moshe Shemesh
d15118af26 net/mlx5e: Check ets capability before ets query FW command
On dcbnl callback getpgtccfgtx, the driver should check the ets
capability before ets query command is sent to firmware.
It is valid to return from this void function without changing in/out
parameters, as these parameters are initialized to
DCB_ATTR_VALUE_UNDEFINED.

Fixes: 3a6a931dfb ("net/mlx5e: Support DCBX CEE API")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29 23:31:26 +02:00
Gal Pressman
a100ff3eef net/mlx5e: Fix update of hash function/key via ethtool
Modifying TIR hash should change selected fields bitmask in addition to
the function and key.

Formerly, Only on ethool mlx5e_set_rxfh "ethtoo -X" we would not set this
field resulting in zeroing of its value, which means no packet fields are
used for RX RSS hash calculation thus causing all traffic to arrive in
RQ[0].

On driver load out of the box we don't have this issue, since the TIR
hash is fully created from scratch.

Tested:
ethtool -X ethX hkey  <new key>
ethtool -X ethX hfunc <new func>
ethtool -X ethX equal <new indirection table>

All cases are verified with TCP Multi-Stream traffic over IPv4 & IPv6.

Fixes: bdfc028de1 ("net/mlx5e: Fix ethtool RX hash func configuration change")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29 23:31:18 +02:00