Commit Graph

2236 Commits

Author SHA1 Message Date
Yishai Hadas
3835336401 net/mlx4_core: Device revision support
The device revision field returned by the NodeInfo MAD is incorrect
on ConnectX3 devices.

This patch is driver side handling to complete a FW fix added at 2.11.1172.
INIT_HCA - bit at offset 0x0C.12 is set to 1 so that FW will report
correct device revision.

Older FW versions won't be affected from turning on that bit,
no capability bit is needed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:42 -05:00
Tariq Toukan
72b8eaab24 net/mlx4: Replace ENOSYS with better fitting error codes
Conform the following warning:
WARNING: ENOSYS means 'invalid syscall nr' and nothing else.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:26:41 -05:00
David S. Miller
4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Daniel Borkmann
a67edbf4fb bpf: add initial bpf tracepoints
This work adds a number of tracepoints to paths that are either
considered slow-path or exception-like states, where monitoring or
inspecting them would be desirable.

For bpf(2) syscall, tracepoints have been placed for main commands
when they succeed. In XDP case, tracepoint is for exceptions, that
is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED
return code, or when error occurs during XDP_TX action and the packet
could not be forwarded.

Both have been split into separate event headers, and can be further
extended. Worst case, if they unexpectedly should get into our way in
future, they can also removed [1]. Of course, these tracepoints (like
any other) can be analyzed by eBPF itself, etc. Example output:

  # ./perf record -a -e bpf:* sleep 10
  # ./perf script
  sock_example  6197 [005]   283.980322:      bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0
  sock_example  6197 [005]   283.980721:       bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5
  sock_example  6197 [005]   283.988423:   bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
  sock_example  6197 [005]   283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00]
  [...]
  sock_example  6197 [005]   288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00]
       swapper     0 [005]   289.338243:    bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER

  [1] https://lwn.net/Articles/705270/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 13:17:47 -05:00
David S. Miller
716dcaebed mlx5-updates-2017-24-01
The first seven patches from Or Gerlitz in this series further enhances
 the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
 TC tunnel key set (encap) and unset (decap) actions.
 
 Or Gerlitz says:
 ========================
 As part of doing this change, few cleanups are done in the IPv4 code,
 later we move to use the full tunnel key info provided to the driver as
 the key for our internal hashing which is used to identify cases where
 the same tunnel is used for encapsulating multiple flows. As done in the
 IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
 lookups and construction of the IPv6 tunnel headers on the encap path and
 matching on the outer hears in the decap path.
 
 The last patch of the series enlarges the HW FDB size for the switchdev mode,
 so it has now room to contain offloaded flows as many as min(max number
 of HW flow counters supported, max HW table size supported).
 ========================
 
 Next to Or's series you can find several patches handling several topics.
 
 From Mohamad, add support for SRIOV VF min rate guarantee by using the
 TSAR BW share weights mechanism.
 
 From Or, Two patches to enable Eth VFs to query their min-inline value for
 user-space.
 for that we move a mlx5 low level min inline helper function from mlx5
 ethernet driver into the core driver and then use it in mlx5_ib to expose
 the inline mode to rdma applications through libmlx5.
 
 From Kamal Heib, Reduce memory consumption on kdump kernel.
 
 From Shaker Daibes, code reuse in CQE compression control logic
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYh7FNAAoJEEg/ir3gV/o+TjsIAL1e92+5eutBS9ZvhMARi+Tc
 c2V9V8bG8W1RWWTvx1G0aU4nNjWsr5L8Q8gzqpwhrQITBfgpWd+hlnxQCucyhxC3
 AC1qQ+AKREe/C+25D+WJRq34/61ZHEH2rbKZvpZ1O8SuicVPbcvJ9eM+wOEDxwwX
 u5C5kWQ0HRtCcnFiiOYkB+0CQPH7m3+ZzZek+jDowrexHMSE+yl8ZNtaSTX9c9QN
 bE2cPiCVZd7ufKPIwY8LWHBryyl7sh5P+NqzD633OeiqP/pkZsW9A+czyt+d330f
 6XTKOS1PCD+TfHE0sZJT4VMCjICMHrOFbNRZuwcxJQ6NfmwIJZfskX4NLbyGQTI=
 =vF7U
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-24-01

The first seven patches from Or Gerlitz in this series further enhances
the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
TC tunnel key set (encap) and unset (decap) actions.

Or Gerlitz says:
========================
As part of doing this change, few cleanups are done in the IPv4 code,
later we move to use the full tunnel key info provided to the driver as
the key for our internal hashing which is used to identify cases where
the same tunnel is used for encapsulating multiple flows. As done in the
IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
lookups and construction of the IPv6 tunnel headers on the encap path and
matching on the outer hears in the decap path.

The last patch of the series enlarges the HW FDB size for the switchdev mode,
so it has now room to contain offloaded flows as many as min(max number
of HW flow counters supported, max HW table size supported).
========================

Next to Or's series you can find several patches handling several topics.

From Mohamad, add support for SRIOV VF min rate guarantee by using the
TSAR BW share weights mechanism.

From Or, Two patches to enable Eth VFs to query their min-inline value for
user-space.
for that we move a mlx5 low level min inline helper function from mlx5
ethernet driver into the core driver and then use it in mlx5_ib to expose
the inline mode to rdma applications through libmlx5.

From Kamal Heib, Reduce memory consumption on kdump kernel.

From Shaker Daibes, code reuse in CQE compression control logic
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 12:49:58 -05:00
Shaker Daibes
5eb0249b43 net/mlx5e: CQE compression control code reuse
This patch is intended for code reuse of mlx5e_modify_rx_cqe_compression
function.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:08 +02:00
Kamal Heib
b4e029da29 net/mlx5e: Reduce memory consumption on kdump kernel
Reduce memory consumption on kdump kernel by decreasing the number of
channels to 1 and the size of RQs and SQs to the minimal values.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:07 +02:00
Or Gerlitz
8c7245a60e net/mlx5: Push min-inline mode resolution helper into the core
So we can use that from the IB driver too in downstream patches.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:05 +02:00
Mohamad Haj Yahia
c9497c9890 net/mlx5: Add support for setting VF min rate
Add support for SRIOV VF min rate guarantee by using the TSAR BW share
weights mechanism.

The TSAR BW share vport attribute represents the weight of that vport
among the other vports weights which means that the actual vport BW
percentage is the same vport weight percentage among the total vports
weights sum.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:04 +02:00
Or Gerlitz
264d7bf3c1 net/mlx5: E-Switch, Enlarge the FDB size for the switchdev mode
The E-Switch FDB size was hard coded to 8k. Change it to be

  min(max eswitch table size, max flow counters * num flow groups)

where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.

We don't know upfront the division of flows to group. This setup allows
each group to be of size up to the where we want to support (we mandate
pairing of flows with counters for offloading). Thus, we don't expect
multiple occurences for a group which in turn adds steering hops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Tested-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:03 +02:00
Or Gerlitz
ce99f6b97f net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels
Add the missing parts for offloading IPv6 tunnels. This includes
route and neigh lookups and construnction of the IPv6 tunnel headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:02 +02:00
Or Gerlitz
9a941117fb net/mlx5e: Maximize ip tunnel key usage on the TC offloading path
Use more fields out of the tunnel key (e.g the tunnel source IP address)
provided by upper layers for the route lookup done on the encap offload path.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:01 +02:00
Or Gerlitz
76f7444dd5 net/mlx5e: Use the full tunnel key info for encapsulation offload house-keeping
Currently we use subset of the input tunnel key fields (id, ip daddr,
dst port) which are provided by upper layers to indentify flows that should
go through the same encapsulation and maintain the HW encapsulation table.

This is redundant and can get us wrong.

Instead, keep a copy of the ip tunnel info provided by the user
through TC and have the tunnel key part as the key to our internal hash.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:00 +02:00
Or Gerlitz
75c33da827 net/mlx5e: TC ipv4 tunnel encap offload cosmetic changes
Move around some settings of variables as pre-step to make things
more robust and clear for the ipv6 case in down-stream patch.
This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:13:59 +02:00
Or Gerlitz
19f4440141 net/mlx5e: Add TC offloads matching on IPv6 encapsulation headers
Enhance the parsing of offloaded TC rules to set HW matching on outer
IPv6 encapsulation headers. This effectively adds support for TC tunnel
key release action (decapsulation) of SRIOV offloads over IPv6 tunnels.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:13:58 +02:00
Or Gerlitz
073ff3c8e6 net/mlx5: Use exact encap header size for the FW input buffer
The current code is allocating the max encap size supported by
the firmware and not the size requested by the caller, fix that.

Also, spare a warning when the size of the encapsulation headers
is bigger from what is supported by the firmware.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:13:57 +02:00
Yotam Gigi
98d0f7b9ac mlxsw: spectrum: Add packet sample offloading support
Using the MPSC register, add the functions that configure port-based
packet sampling in hardware and the necessary datatypes in the
mlxsw_sp_port struct. In addition, add the necessary trap for sampled
packets and integrate with matchall offloading to allow offloading of the
sample tc action.

The current offload support is for the tc command:

tc filter add dev <DEV> parent ffff: \
	  matchall skip_sw \
	  action sample rate <RATE> group <GROUP> [trunc <SIZE>]

Where only ingress qdiscs are supported, and only a combination of
matchall classifier and sample action will lead to activating hardware
packet sampling.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 13:44:28 -05:00
Yotam Gigi
0677d6828b mlxsw: reg: add the Monitoring Packet Sampling Configuration Register
The MPSC register allows to configure ingress packet sampling on specific
port of the mlxsw device. The sampled packets are then trapped via
PKT_SAMPLE trap.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 13:44:28 -05:00
Ido Schimmel
a59b7e0246 mlxsw: spectrum_router: Correctly reallocate adjacency entries
mlxsw_sp_nexthop_group_mac_update() is called in one of two cases:

1) When the MAC of a nexthop needs to be updated
2) When the size of a nexthop group has changed

In the second case the adjacency entries for the nexthop group need to
be reallocated from the adjacency table. In this case we must write to
the entries the MAC addresses of all the nexthops that should be
offloaded and not only those whose MAC changed. Otherwise, these entries
would be filled with garbage data, resulting in packet loss.

Fixes: a7ff87acd9 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 13:42:45 -05:00
Geliang Tang
3704eb6f6f net/mlx4: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-22 16:46:13 -05:00
David S. Miller
7d982567f4 mlx5 and mlx5e updates 2017-01-19
This series includes some updates for mlx5 core and mlx5e netdevice driver.
 
 From Leon, a small fix that remove an unnecessary print.
 
 From Eli Cohen, a fix to the FW version printout in case of internal error.
 
 From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per
 second) mlx5 infrastructure support and the 2nd adds the necessary bits
 for mlx5e ptp logic and structures.
 
 From Mohamad, add support for s-tagged packet receive when in promiscuous
 mode.
 
 Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM
 (Ports capabilities mask register) registers infrastructure, those
 registers are needed in order to query the different statistics registers
 support in FW, in order for the driver to enable/disable query and
 reporting them back to user.  On top of this infrastructure we've exposed
 new set of statistics groups:
    - MPCNT: Physical layer statistical counters (For symbol errors)
    - PPCNT: PCIe performance counters
 
 In addition to the statistics capabilities series we've moved the mlx5 HCA
 capabilities fields to a dedicated struct under the driver private data.
 
 At the end a small patch to update & query statistics in the most desired
 order.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYgT2XAAoJEEg/ir3gV/o++x8H/2adUTYToQVH9T9KdeHYcpYj
 LtN36/8WFL5MliMpK0DcmuKe9k45ukN5bJGUDhwndJBsHJledBoFw3C6k4vZl0Qw
 NiP4t165xmwYQrqI75KVeeGqNWl6LanozZzJVsOM48mSjOXClPnz5BFR4UgL5gTh
 q60VmqpSeBjT0EQfT18s1DZCdUY6UUK1XgmgNnFsHUhO/iWVPlNEwItblC2N/YWA
 p7lGUAJmAQvDN2sejzz0ElcCieY8yA+cZHgalW0KZK961RCwIl1GECw5xEcLLGxN
 O88jaDvTuJpZtl0IOsBi9dwZHx64dx1a+wkFGv+GA6eTgiQ5kPgb2Jdhy26jf9g=
 =Hy/5
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 and mlx5e updates 2017-01-19

This series includes some updates for mlx5 core and mlx5e netdevice driver.

From Leon, a small fix that remove an unnecessary print.

From Eli Cohen, a fix to the FW version printout in case of internal error.

From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per
second) mlx5 infrastructure support and the 2nd adds the necessary bits
for mlx5e ptp logic and structures.

From Mohamad, add support for s-tagged packet receive when in promiscuous
mode.

Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM
(Ports capabilities mask register) registers infrastructure, those
registers are needed in order to query the different statistics registers
support in FW, in order for the driver to enable/disable query and
reporting them back to user.  On top of this infrastructure we've exposed
new set of statistics groups:
   - MPCNT: Physical layer statistical counters (For symbol errors)
   - PPCNT: PCIe performance counters

In addition to the statistics capabilities series we've moved the mlx5 HCA
capabilities fields to a dedicated struct under the driver private data.

At the end a small patch to update & query statistics in the most desired
order.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 14:22:27 -05:00
Eric Dumazet
e048fc50d7 net/mlx5e: Do not recycle pages from emergency reserve
A driver using dev_alloc_page() must not reuse a page allocated from
emergency memory reserve.

Otherwise all packets using this page will be immediately dropped,
unless for very specific sockets having SOCK_MEMALLOC bit set.

This issue might be hard to debug, because only a fraction of received
packets would be dropped.

Fixes: 4415a0319f ("net/mlx5e: Implement RX mapped page cache for page recycle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:41:46 -05:00
Eric Dumazet
dceeab0e52 mlx4: support __GFP_MEMALLOC for rx
Commit 04aeb56a17 ("net/mlx4_en: allocate non 0-order pages for RX
ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.

As using memory reserves is a must in some situations (swap over NFS or
iSCSI), this patch adds this flag.

Note that this driver does not reuse pages (yet) so we do not have to
add anything else.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19 23:35:12 -05:00
Saeed Mahameed
3dd69e3dd2 net/mlx5e: Reorder update stats
Reorder update stats flow to update most important counters last,
to get more accurate results.

New update order:
	- PCIe counters
	- Port counters
	- Vport counters
	- Queue counters
	- Software counters

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2017-01-19 23:20:04 +02:00
Gal Pressman
701052c578 net/mlx5: Move cached hca caps to designated caps struct
The caps structure consists of hca caps and port/management caps,
all under one roof.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:20:03 +02:00
Gal Pressman
0f7f348192 net/mlx5e: Expose PCIe statistics to ethtool
This patch exposes PCIe performance counters, queried with
ethtool -S <devname>.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:20:02 +02:00
Gal Pressman
5db0a4f64c net/mlx5e: Expose physical layer statistical counters to ethtool
Use ethtool -S to query physical layer statistical counters including:
- rx_symbol_errors_phy: Number of symbol errors that were not corrected
  by FEC correction algorithm or that FEC was not active on this interface.

- rx_corrected_bits_phy: Number of corrected bits according to active
  FEC (RS/FC).

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:20:01 +02:00
Gal Pressman
71862561f3 net/mlx5: Query and cache PCAM, MCAM registers on initialization
On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:59 +02:00
Gal Pressman
c835ad6468 net/mlx5: Implement PCAM, MCAM access register commands
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:58 +02:00
Mohamad Haj Yahia
8a271746a2 net/mlx5e: Receive s-tagged packets in promiscuous mode
Today when the driver enter to promiscuous mode or vlan
filter is disabled, we add flow rule to receive any c-taggd
packets, therefore s-tagged packets are dropped.
In order to receive s-tagged packets as well we need to add
flow rule to receive any s-tagged packet.

Fixes: 7cb21b794b ('net/mlx5e: Rename en_flow_table.c to en_fs.c')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:56 +02:00
Mohamad Haj Yahia
105433659d net/mlx5: Add support to s-tag in mlx5 firmware interface
Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry
match param.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-01-19 23:19:55 +02:00
Eugenia Emantayev
ee7f12205a net/mlx5e: Implement 1PPS support
This patch enables the 1PPS IN and 1PPS OUT support according
to the advertised HCA capability. Single pin may be configured
to one of the above mutual exclusive functions via standard
Linux tools and APIs. For example, testptp open source application.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:54 +02:00
Eugenia Emantayev
f9a1ef720e net/mlx5: Add MTPPS and MTPPSE registers infrastructure
Implement query and set functionality for MTPPS and MTPPSE registers.
MTPPS (Management Pulse Per Second) provides the device PPS capabilities,
configures the PPS in and out modules and holds the PPS in time stamp.
Query MTPPS is supported only when HCA_CAP.pps is set and modify is supported
when HCA_CAP.pps_modify is set.

MTPPSE (Management Pulse Per Second Event) configures the different event
generation modes for PPS. Supported when HCA_CAP.pps is set.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:53 +02:00
Eli Cohen
712bfef609 net/mlx5: Fix version printout in case of health issue
Firmware representation of the firmware version on the health buffer has
changed for newer device. The representation in the initialization
segment does not and will not change. In addition, we print the health
buffer firmware version as a raw hex number.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:52 +02:00
Leon Romanovsky
f82eed4523 net/mlx5: Remove information print after attempt to load mlx5_ib module
Infiniband part of mlx5 driver can be compiled as a module
or as a part of bzImage (compiled in). In the second case,
the call to request_module will return an error -ENOENT.
It will cause to a misleading print "failed request module
on mlx5_ib".

This patch removes this print, In order to comply with mlx4.

Fixes: f66f049fb7 ("net/mlx5_core: Request the mlx5 IB module on driver load")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19 23:19:51 +02:00
Arnd Bergmann
ad05df399f net/mlx5e: Remove unused variable
A cleanup removed the only user of this variable

mlx5/core/en_ethtool.c: In function 'mlx5e_set_channels':
mlx5/core/en_ethtool.c:546:6: error: unused variable 'ncv' [-Werror=unused-variable]

Let's remove the declaration as well.

Fixes: 639e9e9416 ("net/mlx5e: Remove unnecessary checks when setting num channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19 11:14:29 -05:00
Eran Ben Elisha
639e9e9416 net/mlx5e: Remove unnecessary checks when setting num channels
Boundaries checks for the number of RX and TX should be checked by the
caller and not in the driver.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 14:58:24 -05:00
Eran Ben Elisha
e91ef71dfe net/mlx4_en: Remove unnecessary checks when setting num channels
Boundaries checks for the number of RX, TX, other and combined channels
should be checked by the caller and not in the driver.

In addition, remove wrong memset on get channels as it overrides the cmd
field in the requester struct.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 14:58:23 -05:00
Martin KaFai Lau
d8bec2b29a net/mlx5e: Support bpf_xdp_adjust_head()
This patch adds bpf_xdp_adjust_head() support to mlx5e.

1. rx_headroom is added to struct mlx5e_rq.  It uses
   an existing 4 byte hole in the struct.
2. The adjusted data length is checked against
   MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu).
3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h.
   MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason
   but it is not a must.

v2:
- Keep the xdp specific logic in mlx5e_xdp_handle()
- Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame()

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 11:31:13 -05:00
David S. Miller
580bdf5650 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-17 15:19:37 -05:00
Jack Morgenstein
9577b174cd net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's
message log when (correct, normally operating) apps use SRQ LIMIT events
as a trigger to post WQEs to SRQs.

Add more information to the existing debug printout for SRQ_LIMIT, and
output the warning messages only for the SRQ CATAS ERROR event.

Fixes: acba2420f9 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs")
Fixes: e0debf9cb5 ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 15:08:29 -05:00
Jack Morgenstein
7c3945bc20 net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
Save the qp context flags byte containing the flag disabling vlan stripping
in the RESET to INIT qp transition, rather than in the INIT to RTR
transition. Per the firmware spec, the flags in this byte are active
in the RESET to INIT transition.

As a result of saving the flags in the incorrect qp transition, when
switching dynamically from VGT to VST and back to VGT, the vlan
remained stripped (as is required for VST) and did not return to
not-stripped (as is required for VGT).

Fixes: f0f829bf42 ("net/mlx4_core: Add immediate activate for VGT->VST->VGT")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 15:08:28 -05:00
Jack Morgenstein
291c566a28 net/mlx4_core: Fix racy CQ (Completion Queue) free
In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.

Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().

Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
   we no longer take this spinlock in the interrupt context.
   The spinlock here, therefore, simply protects against different
   threads simultaneously invoking mlx4_cq_free() for different cq's.

2. In function mlx4_cq_free(), we move the radix tree delete to before
   the synchronize_irq() calls. This guarantees that we will not
   access this cq during any subsequent interrupts, and therefore can
   safely free the CQ after the synchronize_irq calls. The rcu_read_lock
   in the interrupt handlers only needs to protect against corrupting the
   radix tree; the interrupt handlers may access the cq outside the
   rcu_read_lock due to the synchronize_irq calls which protect against
   premature freeing of the cq.

3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.

4. We leave the cq reference count mechanism in place, because it is
   still needed for the cq completion tasklet mechanism.

Fixes: 6d90aa5cf1 ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 15:08:28 -05:00
Arnd Bergmann
abeffce90c net/mlx5e: Fix a -Wmaybe-uninitialized warning
As found by Olof's build bot, we gain a harmless warning about a
potential uninitialized variable reference in mlx5:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here

This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair
that gcc is unfortunately unable to completely figure out.

The problem being gcc cannot tell that if(IS_ERR()) in
mlx5e_route_lookup_ipv4() is equivalent to checking if(err) later,
so it assumes that 'out_dev' is used after the 'return PTR_ERR(rt)'.

The PTR_ERR_OR_ZERO() case by comparison is fairly easy to detect
by gcc, so it can't get that wrong, so it no longer warns.

Hadar Hen Zion already attempted to fix the warning earlier by adding fake
initializations, but that ended up not fully addressing all warnings, so
I'm reverting it now that it is no longer needed.

Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.10-rc3-98-gcff3b2c/
Fixes: a42485eb0e ("net/mlx5e: TC ipv4 tunnel encap offload error flow fixes")
Fixes: a757d108dc ("net/mlx5e: Fix kbuild warnings for uninitialized parameters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 14:48:19 -05:00
Eric Dumazet
8cf699ec84 mlx4: do not call napi_schedule() without care
Disable BH around the call to napi_schedule() to avoid following warning

[   52.095499] NOHZ: local_softirq_pending 08
[   52.421291] NOHZ: local_softirq_pending 08
[   52.608313] NOHZ: local_softirq_pending 08

Fixes: 8d59de8f7b ("net/mlx4_en: Process all completions in RX rings after port goes up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16 11:44:10 -05:00
Elad Raz
28e46a0f2e mlxsw: pci: Fix EQE structure definition
The event_data starts from address 0x00-0x0C and not from 0x08-0x014. This
leads to duplication with other fields in the Event Queue Element such as
sub-type, cqn and owner.

Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:25:55 -05:00
Arkadi Sharshevsky
400fc0106d mlxsw: switchx2: Fix memory leak at skb reallocation
During transmission the skb is checked for headroom in order to
add vendor specific header. In case the skb needs to be re-allocated,
skb_realloc_headroom() is called to make a private copy of the original,
but doesn't release it. Current code assumes that the original skb is
released during reallocation and only releases it at the error path
which causes a memory leak.

Fix this by adding the original skb release to the main path.

Fixes: d003462a50 ("mlxsw: Simplify mlxsw_sx_port_xmit function")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:25:55 -05:00
Arkadi Sharshevsky
36bf38d158 mlxsw: spectrum: Fix memory leak at skb reallocation
During transmission the skb is checked for headroom in order to
add vendor specific header. In case the skb needs to be re-allocated,
skb_realloc_headroom() is called to make a private copy of the original,
but doesn't release it. Current code assumes that the original skb is
released during reallocation and only releases it at the error path
which causes a memory leak.

Fix this by adding the original skb release to the main path.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:25:55 -05:00
David S. Miller
02ac5d1487 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 14:43:39 -05:00
Daniel Jurgens
5e44fca504 net/mlx5: Only cancel recovery work when cleaning up device
Do not attempt to drain the health workqueue when unloading the device in
the recovery flow, this can cause a deadlock when the recovery work
tries to cancel itself with sync.

Because the work is no longer unconditionally canceled when unloading, it
must be explicitly canceled in the AER flow.

fixes: 689a248df8 ("net/mlx5: Cancel recovery work in remove flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00