Commit Graph

2103 Commits

Author SHA1 Message Date
Or Gerlitz
776b12b674 net/mlx5: Put elements related to offloaded TC rule in one struct
Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
e33dfe316c net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan
The HW can be programmed to push vlan, pop vlan or both.

A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
bac9b6aa1d net/mlx5: E-Switch, Set vport representor fields explicitly on registration
The structure we use for the eswitch vport representor (mlx5_eswitch_rep)
has some fields which are set from upper layers in the driver when they
register the rep. Use explicit setting on registration time for them and
avoid global memcpy. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
9deb2241f1 net/mlx5: E-Switch, Set the vport when registering the uplink rep
Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
David S. Miller
d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Saeed Mahameed
35b510e257 net/mlx5e: XDP TX xmit more
Previously we rang XDP SQ doorbell on every forwarded XDP packet.

Here we introduce a xmit more like mechanism that will queue up more
than one packet into SQ (up to RX napi budget) w/o notifying the hardware.

Once RX napi budget is consumed and we exit napi RX loop, we will
flush (doorbell) all XDP looped packets in case there are such.

XDP forward packet rate:

Comparing XDP with and w/o xmit more (bulk transmit):

RX Cores    XDP TX       XDP TX (xmit more)
---------------------------------------------------
1           6.5Mpps      12.4Mpps
2          13.2Mpps      24.2Mpps
4          25.2Mpps      36.3Mpps*
8          36.3Mpps*     36.3Mpps*

*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:41 -04:00
Saeed Mahameed
b5503b994e net/mlx5e: XDP TX forwarding support
Adding support for XDP_TX forwarding from xdp program.
Using XDP, now user can loop packets out of the same port.

We create a dedicated TX SQ for each channel that will serve
XDP programs that return XDP_TX action to loop packets back to
the wire directly from the channel RQ RX path.

For that RX pages will now need to be mapped bi-directionally,
and on XDP_TX action we will sync the page back to device then
queue it into SQ for transmission.  The XDP xmit frame function will
report back to the RX path if the page was consumed (transmitted), if so,
RX path will forget about that page as if it were released to the stack.
Later on, on XDP TX completion, the page will be released back to the
page cache.

For simplicity this patch will hit a doorbell on every XDP TX packet.

Next patch will introduce a xmit more like mechanism that will
queue up more than one packet into SQ w/o notifying the hardware,
once RX napi loop is done we will hit doorbell once for all XDP TX
packets form the previous loop.  This should drastically improve
XDP TX performance.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:41 -04:00
Saeed Mahameed
f10b7cc770 net/mlx5e: Have a clear separation between different SQ types
Make a clear separate between Regular SQ (TXQ) and ICO SQ creation,
destruction and union their mutual information structures.

Don't allocate redundant TXQ skb/wqe_info/dma_fifo arrays for ICO SQ.
And have a different SQ edge for ICO SQ than TXQ SQ, to be more
accurate.

In preparation for XDP TX support.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:41 -04:00
Rana Shahout
86994156c7 net/mlx5e: XDP fast RX drop bpf programs support
Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.

When XDP is on we make sure to change channels RQs type to
MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
ensure "page per packet".

On XDP set, we fail if HW LRO is set and request from user to turn it
off.  Since on ConnectX4-LX HW LRO is always on by default, this will be
annoying, but we prefer not to enforce LRO off from XDP set function.

Full channels reset (close/open) is required only when setting XDP
on/off.

When XDP set is called just to exchange programs, we will update
each RQ xdp program on the fly and for synchronization with current
data path RX activity of that RQ, we temporally disable that RQ and
ensure RX path is not running, quickly update and re-enable that RQ,
for that we do:
	- rq.state = disabled
	- napi_synnchronize
	- xchg(rq->xdp_prg)
	- rq.state = enabled
	- napi_schedule // Just in case we've missed an IRQ

Packet rate performance testing was done with pktgen 64B packets and on
TX side and, TC drop action on RX side compared to XDP fast drop.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Comparison is done between:
	1. Baseline, Before this patch with TC drop action
	2. This patch with TC drop action
	3. This patch with XDP RX fast drop

RX Cores  Baseline(TC drop)    TC drop    XDP fast Drop
--------------------------------------------------------------
1            5.3Mpps           5.3Mpps     16.5Mpps
2           10.2Mpps          10.2Mpps     31.3Mpps
4           20.5Mpps          19.9Mpps     36.3Mpps*

*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:41 -04:00
Saeed Mahameed
2fc4bfb725 net/mlx5e: Dynamic RQ type infrastructure
Add two helper functions to allow dynamic changes of RQ type.

mlx5e_set_rq_priv_params and mlx5e_set_rq_type_params will be
used on netdev creation to determine the default RQ type.

This will be needed later for downstream patches of XDP support.
When enabling XDP we will dynamically move from striding RQ to
linked list RQ type.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:40 -04:00
Saeed Mahameed
e4b8550807 net/mlx5e: Slightly reduce hardware LRO size
Before this patch LRO size was 64K, now with build_skb requires
extra room, headroom + sizeof(skb_shared_info) added to the data
buffer will make  wqe size or page_frag_size slightly larger than
64K which will demand order 5 page instead of order 4 in 4K page systems.

We take those extra bytes from hardware LRO data size in order to not
increase the required page order for when hardware LRO is enabled.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:40 -04:00
Saeed Mahameed
21c59685dd net/mlx5e: Union RQ RX info per RQ type
We have two types of RX RQs, and they use two separate sets of
info arrays and structures in RX data path function.  Today those
structures are mutually exclusive per RQ type, hence one kind is
allocated on RQ creation according to the RQ type.

For better cache locality and to minimalize the
sizeof(struct mlx5e_rq), in this patch we define them as a union.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:40 -04:00
Saeed Mahameed
1bfecfca56 net/mlx5e: Build RX SKB on demand
For non-striding RQ configuration before this patch we had a ring
with pre-allocated SKBs and mapped the SKB->data buffers for
device.

For robustness and better RX data buffers management, we allocate a
page per packet and build_skb around it.

This patch (which is a prerequisite for XDP) will actually reduce
performance for normal stack usage, because we are now hitting a bottleneck
in the page allocator. We use the page-cache to restore or even improve
performance in comparison to the old RX scheme.

Packet rate performance testing was done with pktgen 64B packets on xmit
side and TC ingress dropping action on RX side.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Comparison is done between:
 1.Baseline, before 'net/mlx5e: Build RX SKB on demand'
 2.Build SKB with RX page cache (This patch)

RX Cores  Baseline    Build SKB+page-cache    Improvement
-----------------------------------------------------------
1          4.16Mpps       5.33Mpps                28%
2          7.16Mpps      10.24Mpps                43%
4         13.61Mpps      20.51Mpps                51%
8         25.32Mpps      32.00Mpps                26%

All respective cores were 100% utilized.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:51:40 -04:00
Nicolas Pitre
efee95f42b ptp_clock: future-proofing drivers against PTP subsystem becoming optional
Drivers must be ready to accept NULL from ptp_clock_register() if the
PTP clock subsystem is configured out.

This patch documents that and ensures that all drivers cope well
with a NULL return.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 02:18:33 -04:00
Kamal Heib
fba1296624 net/mlx4_core: Fix to clean devlink resources
This patch cleans devlink resources by calling devlink_port_unregister()
to avoid the following issues:

- Kernel panic when triggering reset flow.
- Memory leak due to unfreed resources in mlx4_init_port_info().

Fixes: 09d4d087cd ("mlx4: Implement devlink interface")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-22 01:41:27 -04:00
Jack Morgenstein
a7e1f04905 net/mlx4_core: Fix deadlock when switching between polling and event fw commands
When switching from polling-based fw commands to event-based fw
commands, there is a race condition which could cause a fw command
in another task to hang: that task will keep waiting for the polling
sempahore, but may never be able to acquire it. This is due to
mlx4_cmd_use_events, which "down"s the sempahore back to 0.

During driver initialization, this is not a problem, since no other
tasks which invoke FW commands are active.

However, there is a problem if the driver switches to polling mode
and then back to event mode during normal operation.

The "test_interrupts" feature does exactly that.
Running "ethtool -t <eth device> offline" causes the PF driver to
temporarily switch to polling mode, and then back to event mode.
(Note that for VF drivers, such switching is not performed).

Fix this by adding a read-write semaphore for protection when
switching between modes.

Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 21:52:43 -04:00
Leon Romanovsky
30353bfc43 net/mlx4_core: Use RCU to perform radix tree lookup for SRQ
Radix tree lookup can be performed without locking.

Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 21:52:43 -04:00
Kamal Heib
57c970c2e8 net/mlx4_en: Fix wrong indentation
Use tabs instead of spaces before if statement, no functional change.

Fixes: e7c1c2c462 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 21:52:43 -04:00
Tariq Toukan
de3d6fa81e net/mlx4_en: Add branch prediction hints in RX data-path
Add likely/unlikely hints to improve branch predictions
in the RX data-path.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 21:52:43 -04:00
Nogah Frankel
8f8a62d462 mlxsw: spectrum: Implement max rif resource
Replace max rif const with using the result from resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:59 -04:00
Nogah Frankel
274df7fb77 mlxsw: pci: Add max router interface resource
Add the max number of rif (router interfaces) to resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:59 -04:00
Nogah Frankel
e44d49cbbc mlxsw: pci: Add some miscellaneous resources
Add max system ports, max regions and max vlan groups to resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:59 -04:00
Nogah Frankel
9497c042bf mlxsw: spectrum: Implement max virtual routers resource
Replace max virtual routers const with the result from
the resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:59 -04:00
Nogah Frankel
b8a09f0a09 mlxsw: pci: Add max virtual routers resource
Add the max number of virtual routers to resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:59 -04:00
Nogah Frankel
403547d38d mlxsw: profile: Add KVD resources to profile config
Use resources from resource query to determine values for
the profile configuration.
Add KVD determined section sizes to the resources struct.
Change the profile struct and value to match this changes.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:58 -04:00
Nogah Frankel
2acd10c51b mlxsw: pci: Add KVD size relate resources
Add KVD size, and minimum sizes for the single and double
sections resources to resources query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:58 -04:00
Nogah Frankel
ce0bd2b0c5 mlxsw: spectrum: lag resources- use resources data instead of consts
Use max lag and max ports in lag resources as the result of resource query
instead of using const to save them.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:58 -04:00
Nogah Frankel
9f7f797c1d mlxsw: pci: Add lag related resources to resources query
Add max lag and max ports in lag resources to resources query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 01:00:58 -04:00
Or Gerlitz
4bdcc6ca21 mlxsw: spectrum: Make offloads stats functions static
The offloads stats functions are local to this file, make them static.

Fixes: fc1bbb0f18 ('mlxsw: spectrum: Implement offload stats ndo [..]')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-21 00:45:59 -04:00
Jesper Dangaard Brouer
5737f6c926 mlx4: add missed recycle opportunity for XDP_TX on TX failure
Correct drop handling for XDP_TX on TX failure, were recently added in
commit 95357907ae ("mlx4: fix XDP_TX is acting like XDP_PASS on TX
ring full").

The change missed an opportunity for recycling the RX page, instead of
going through the page allocator, like the regular XDP_DROP action does.
This patch cease the opportunity, by going through the XDP_DROP case.

Fixes: 95357907ae ("mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 04:58:17 -04:00
Ido Schimmel
1a9234e66e mlxsw: spectrum: Fix sparse warnings
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:251:28: warning: symbol
'mlxsw_sp_span_entry_find' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:265:28: warning: symbol
'mlxsw_sp_span_entry_get' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56: warning: mixing
different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56:     int enum
mlxsw_sp_span_type  versus
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56:     int enum
mlxsw_reg_mpar_i_e
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32:     int
enum mlxsw_reg_sbxx_dir  versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32:     int
enum devlink_sb_pool_type
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39:     int
enum mlxsw_reg_sbpr_mode  versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39:     int
enum devlink_sb_threshold_type
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54:     int
enum mlxsw_sp_l3proto  versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54:     int
enum mlxsw_reg_ralxx_protocol
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1749:6: warning:
symbol 'mlxsw_sp_fib_entry_put' was not declared. Should it be static?

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 04:32:50 -04:00
Elad Raz
18c2d2c113 mlxsw: Change the RX LAG hash function from XOR to CRC
Change the RX hash function from XOR to CRC in order to have better
distribution of the traffic.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20 04:32:50 -04:00
Or Gerlitz
6c419ba8e2 net/mlx5: E-Switch, Handle mode change failures
E-switch mode changes involve creating HW tables, potentially allocating
netdevices, etc, and things can fail. Add an attempt to rollback to the
existing mode when changing to the new mode fails. Only if rollback fails,
getting proper SRIOV functionality requires module unload or sriov
disablement/enablement.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 22:10:16 -04:00
Or Gerlitz
4eea37d7b9 net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 22:10:15 -04:00
Roi Dayan
babd6134a5 net/mlx5: Fix flow counter bulk command out mailbox allocation
The FW command output length should be only the length of struct
mlx5_cmd_fc_bulk out field. Failing to do so will cause the memcpy
call which is invoked later in the driver to write over wrong memory
address and corrupt kernel memory which results in random crashes.

This bug was found using the kernel address sanitizer (kasan).

Fixes: a351a1b03b ('net/mlx5: Introduce bulk reading of flow counters')
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 22:10:15 -04:00
Baoyou Xie
766a0e978f net/mlx5: clean function declarations in eswitch.c up
We get 2 warnings when building kernel with W=1:
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:463:5: warning: no previous prototype for 'esw_offloads_init' [-Wmissing-prototypes]
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:521:6: warning: no previous prototype for 'esw_offloads_cleanup' [-Wmissing-prototypes]

In fact, both functions are declared in
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c,but should be
declared in a header file, thus can be recognized in other file.

So this patch moves the declarations into
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 21:52:10 -04:00
Jesper Dangaard Brouer
95357907ae mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full
The XDP_TX action can fail transmitting the frame in case the TX ring
is full or port is down.  In case of TX failure it should drop the
frame, and not as now call 'break' which is the same as XDP_PASS.

Fixes: 9ecc2d8617 ("net/mlx4_en: add xdp forwarding and data write support")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 01:32:19 -04:00
Nogah Frankel
fc1bbb0f18 mlxsw: spectrum: Implement offload stats ndo and expose HW stats by default
Change the default statistics ndo to return HW statistics
(like the one returned by ethtool_ops).
The HW stats are collected to a cache by delayed work every 1 sec.
Implement the offload stat ndo.
Add a function to get SW statistics, to be called from this function.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18 22:33:42 -04:00
Tariq Toukan
4415a0319f net/mlx5e: Implement RX mapped page cache for page recycle
Instead of reallocating and mapping pages for RX data-path,
recycle already used pages in a per ring cache.

Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - order0 no cache
* 4,786,899 - order0 with cache
1% gain

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - order0 no cache
* 4,127,852 - order0 with cache
3.7% gain

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - order0 no cache
* 3,931,708 - order0 with cache
5.4% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 09:51:40 -04:00
Tariq Toukan
a5a0c59016 net/mlx5e: Introduce API for RX mapped pages
Manage the allocation and deallocation of mapped RX pages only
through dedicated API functions.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 09:51:40 -04:00
Tariq Toukan
7e42667170 net/mlx5e: Single flow order-0 pages for Striding RQ
To improve the memory consumption scheme, we omit the flow that
demands and splits high-order pages in Striding RQ, and stay
with a single Striding RQ flow that uses order-0 pages.

Moving to fragmented memory allows the use of larger MPWQEs,
which reduces the number of UMR posts and filler CQEs.

Moving to a single flow allows several optimizations that improve
performance, especially in production servers where we would
anyway fallback to order-0 allocations:
- inline functions that were called via function pointers.
- improve the UMR post process.

This patch alone is expected to give a slight performance reduction.
However, the new memory scheme gives the possibility to use a page-cache
of a fair size, that doesn't inflate the memory footprint, which will
dramatically fix the reduction and even give a performance gain.

Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - this patch
no reduction

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - this patch
3.5% reduction

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - this patch
4% reduction

Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Fixes: bc77b240b3 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-17 09:51:40 -04:00
Sebastian Ott
2a292822f0 net/mlx4_en: fix off by one in error handling
If an error occurs in mlx4_init_eq_table the index used in the
err_out_unmap label is one too big which results in a panic in
mlx4_free_eq. This patch fixes the index in the error path.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-16 04:16:22 -04:00
Ido Schimmel
b9d66a36aa mlxsw: spectrum: Add support for new ethtool API
Remove the deprecated {get,set}_settings callbacks and instead add
{get,set}_link_ksettings along with support for newly available speeds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13 12:16:34 -04:00
Ido Schimmel
91bdc7a43a mlxsw: spectrum: Indicate support of multiple port types
The device can support multiple port types, so don't return on first
match.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13 12:16:33 -04:00
Ido Schimmel
0213424ada mlxsw: spectrum: Report port type according to operational speed
In case port isn't operational we shouldn't report the port type, but
instead return PORT_OTHER. This is consistent with most other drivers
that return PORT_OTHER when media type can't be determined.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13 12:16:33 -04:00
Ido Schimmel
4149b97f72 mlxsw: spectrum: Report link partner's advertised speeds
If autonegotiation was performed successfully, then we should report the
link partner's advertised speeds instead of the operational speed of the
port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13 12:16:33 -04:00
Ido Schimmel
0c83f88c02 mlxsw: spectrum: Correctly report autonegotiation
Up until now the device always reported autonegotiation to be off
although it was on by default.

Allow the user to disable / enable autonegotiation and report its status
correctly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-13 12:16:33 -04:00
David S. Miller
b20b378d49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mediatek/mtk_eth_soc.c
	drivers/net/ethernet/qlogic/qed/qed_dcbx.c
	drivers/net/phy/Kconfig

All conflicts were cases of overlapping commits.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12 15:52:44 -07:00
Moshe Shemesh
7a61fc86af net/mlx4_en: Fix panic on xmit while port is down
When port is down, tx drop counter update is not needed.
Updating the counter in this case can cause a kernel
panic as when the port is down, ring can be NULL.

Fixes: 63a664b7e9 ("net/mlx4_en: fix tx_dropped bug")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11 19:40:26 -07:00
Tariq Toukan
564ed9b187 net/mlx4_en: Fixes for DCBX
This patch adds a capability check before enabling DCBX.
In addition, it re-organizes the relevant data structures,
and fixes a typo in a define.

Fixes: af7d518526 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11 19:40:26 -07:00
Kamal Heib
c677071741 net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
mlx4_en_dcbnl_set_state() returns u8, the return value from
mlx4_en_setup_tc() could be negative in case of failure, so fix that.

Fixes: af7d518526 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11 19:40:25 -07:00
Kamal Heib
74a9e90544 net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
mlx4_en_dcbnl_set_all() returns u8, so return value can't be negative in
case of failure.

Fixes: af7d518526 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11 19:40:25 -07:00
Mohamad Haj Yahia
f1ee87fe55 net/mlx5: Organize device list API in one place
Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose
an organized modular API to manage and manipulate mlx5 devices list.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:50 -07:00
Mohamad Haj Yahia
9df30601c8 net/mlx5e: Restore vlan filter after seamless reset
When detaching the mlx5e interface clear all the vlans rules from the
vlan flow table.
When attaching it back restore all the active vlans rules to the HW.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:50 -07:00
Mohamad Haj Yahia
26e59d8077 net/mlx5e: Implement mlx5e interface attach/detach callbacks
Needed to support seamless and lightweight PCI/Internal error recovery.
Implement the attach/detach interface callbacks.
In attach callback we only allocate HW resources.
In detach callback we only deallocate HW resources.
All SW/kernel objects initialzing/destroying is kept in add/remove
callbacks.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:50 -07:00
Mohamad Haj Yahia
1ab2068a4c net/mlx5: Implement vports admin state backup/restore
Save the user configuration in the vport sturct.
Restore saved old configuration upon vport enable.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:50 -07:00
Mohamad Haj Yahia
c2d6e31a00 net/mlx5: Align sriov/eswitch modules with the new load/unload flow.
Init/cleanup sriov/eswitch in the core software context init/cleanup
flows.
Attach/detach sriov/eswitch in the core load/unload flows.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:50 -07:00
Mohamad Haj Yahia
62a9b90ad8 net/mlx5: Implement eswitch attach/detach flows
Needed for lightweight and modular internal/pci error handling.
Implement eswitch attach function which allocates/starts hw related
resources.
Implement eswitch detach function which releases/stops hw related
resources.
Init/cleanup function only handle eswitch software context allocation
and destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
acab721b5d net/mlx5: Implement SRIOV attach/detach flows
Needed for lightweight and modular internal/pci error handling.
Implement sriov attach function which enables pre-saved number of vfs on
the device side.
Implement sriov detach function which disable the current vfs on the
device side.
Init/cleanup function only handles sriov software context allocation and
destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
59211bd3b6 net/mlx5: Split the load/unload flow into hardware and software flows
Gather all software context creating/destroying in one function and call
it once in the first load and in the last unload.
load/unload functions will now receive indication if we need to
create/destroy the software contexts.
In internal/pci error do the unload/load flows without releasing the
software objects.
In this way we perserve the sw core state and it help us restoring old
driver state after PCI error/shutdown.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
737a234bb6 net/mlx5: Introduce attach/detach to interface API
Add attach/detach callbacks to interface API.
This is crucial for implementing seamless reset flow which releases the
hardware and it's resources upon detach while keeping software
structures and state (e.g netdev) then reset and reallocate the hardware
needed resources upon attach.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
6b6adee3da net/mlx5: SRIOV core code refactoring
Simplify the code and makes it look modular and symmetric.
Split sriov enable/disable to two levels: device level and pci level.
When user enable/disable sriov (via sriov_configure driver callback) we
will enable/disable both device and pci sriov.
When driver load/unload we will enable/disable (on demand) only device
sriov while keeping the PCI sriov enabled for next driver load.
On internal/pci error, VFs will be kept enabled on PCI and the reset
is done only in device level.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:49 -07:00
Mohamad Haj Yahia
d62292e850 net/mlx5: Skip waiting for vf pages in internal error
In case of device in internal error state there is no need to wait for
vf pages since they will be reclaimed manually later in the unload flow.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10 21:21:48 -07:00
Ido Schimmel
3247ff2b31 mlxsw: spectrum: Set port type before setting its address
During port init, we currently set the port's type to Ethernet after
setting its MAC address. However, the hardware documentation states this
should be the other way around.

Align the driver with the hardware documentation and set the port's MAC
address after setting its type.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 16:56:53 -07:00
Jiri Pirko
40d2590455 mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
When neigh_init fails, we have to do proper cleanup including
router_fini call.

Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09 16:56:53 -07:00
Gal Pressman
cd17d230dd net/mlx5e: Fix parsing of vlan packets when updating lro header
Currently vlan tagged packets were not parsed correctly
and assumed to be regular IPv4/IPv6 packets.
We should check for 802.1Q/802.1ad tags and update the lro header
accordingly.
This fixes the use case where LRO is on and rxvlan is off
(vlan stripping is off).

Fixes: e586b3b0ba ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Gal Pressman
4e39883d9c net/mlx5e: Fix global PFC counters replication
Currently when reading global PFC statistics we left the counter
iterator out of the equation and we ended up reading the same counter
over and over again.

Instead of reading the counter at index 0 on every iteration we now read
the counter at index (i).

Fixes: e989d5a532 ('net/mlx5e: Expose flow control counters to ethtool')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Gal Pressman
7abc211077 net/mlx5e: Prevent casting overflow
On 64 bits architectures unsigned long is longer than u32,
casting to unsigned long will result in overflow.
We need to first allocate an unsigned long variable, then assign the
wanted value.

Fixes: 665bc53969 ('net/mlx5e: Use new ethtool get/set link ksettings API')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Tariq Toukan
0dbf657c39 net/mlx5e: Fix xmit_more counter race issue
Update the xmit_more counter before notifying the HW,
to prevent a possible use-after-free of the skb.

Fixes: c8cf78fe10 ("net/mlx5e: Add ethtool counter for TX xmit_more")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Brenden Blanco
326fe02d1e net/mlx4_en: protect ring->xdp_prog with rcu_read_lock
Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
freed despite the use of call_rcu inside bpf_prog_put. The situation is
possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
callback for destroying the bpf prog can run even during the bh handling
in the mlx4 rx path.

Several options were considered before this patch was settled on:

Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
of the rings are updated with the new program.
This approach has the disadvantage that as the number of rings
increases, the speed of update will slow down significantly due to
napi_synchronize's msleep(1).

Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
The action of the bpf_prog_put_bh would be to then call bpf_prog_put
later. Those drivers that consume a bpf prog in a bh context (like mlx4)
would then use the bpf_prog_put_bh instead when the ring is up. This has
the problem of complexity, in maintaining proper refcnts and rcu lists,
and would likely be harder to review. In addition, this approach to
freeing must be exclusive with other frees of the bpf prog, for instance
a _bh prog must not be referenced from a prog array that is consumed by
a non-_bh prog.

The placement of rcu_read_lock in this patch is functionally the same as
putting an rcu_read_lock in napi_poll. Actually doing so could be a
potentially controversial change, but would bring the implementation in
line with sk_busy_loop (though of course the nature of those two paths
is substantially different), and would also avoid future copy/paste
problems with future supporters of XDP. Still, this patch does not take
that opinionated option.

Testing was done with kernels in either PREEMPT_RCU=y or
CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting
any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did
not show up in the perf report whatsoever, and with PREEMPT_RCU=y the
overhead of rcu_read_lock (according to perf) was the same before/after.
In the rx path, rcu_read_lock is eventually called for every packet
from netif_receive_skb_internal, so the napi poll call's rcu_read_lock
is easily amortized.

v2:
Remove extra rcu_read_lock in mlx4_en_process_rx_cq body
Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or
rcu_dereference[_protected] as appropriate.
Add explicit mutex lock around rcu_assign instead of xchg loop.

Fixes: d576acf0a2 ("net/mlx4_en: add page recycle to prepare rx ring for tx support")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06 13:39:33 -07:00
Ido Schimmel
aad8b6bae7 mlxsw: spectrum: Use existing flood setup when adding VLANs
When a VLAN is added on a bridge port we should use the existing unicast
flood configuration of the port instead of assuming it's enabled.

Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Ido Schimmel
f1de7a28d5 mlxsw: spectrum: Don't take multiple references on a FID
In commit 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the
VLAN-aware bridge") I added a per-FID struct, which member ports can
take a reference on upon VLAN membership configuration.

However, sometimes only the VLAN flags (e.g. egress untagged) are
toggled without changing the VLAN membership. In these cases we
shouldn't take another reference on the FID.

Fixes: 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko
e732263849 mlxsw: spectrum_router: Fix netevent notifier registration
Currently the notifier is registered for every asic instance, however the
same block. Fix this by moving the registration to module init.

Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko
de7d62952b mlxsw: spectrum: Fix error path in mlxsw_sp_module_init
Add forgotten notifier unregister.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko
7146da3181 mlxsw: spectrum_router: Fix fib entry update path
Originally, I expected that there would be needed to call update
operation in case RALUE record action is changed. However, that is not
needed since write operation takes care of that nicely. Remove prepared
construct and always call the write operation.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:56 -07:00
Jiri Pirko
5b004412e2 mlxsw: spectrum_router: Fix failure caused by double fib removal from HW
In mlxsw we squash tables 254 and 255 together into HW. Kernel adds/dels
/32 ip to/from both 254 and 255. On del path, that causes the same
prefix being removed twice. Fix this by introducing reference counting
for private mlxsw fib entries. That required a bit of code reshuffle.
Also put dev into fib entry key so the same prefix could be represented
once per every router interface.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 09:44:55 -07:00
David S. Miller
6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Maor Gottlieb
e5835f2833 net/mlx5: Increase number of ethtool steering priorities
Ethtool has 11 flow tables, each flow table has its own priority.
Increase the number of priorities to be aligned with the number of flow
tables.

Fixes: 1174fce8d1 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Eran Ben Elisha
1722b9694e net/mlx5: Add error prints when validate ETS failed
Upon set ETS failure due to user invalid input, add error prints to
specify the exact error to the user.

Fixes: cdcf11212b ('net/mlx5e: Validate BW weight values of ETS')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:16 -04:00
Kamal Heib
bf50082c15 net/mlx5e: Fix memory leak if refreshing TIRs fails
Free 'in' command object also when mlx5_core_modify_tir fails.

Fixes: 724b2aa151 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Tariq Toukan
c8cf78fe10 net/mlx5e: Add ethtool counter for TX xmit_more
Add a counter in ethtool for the number of times that
TX xmit_more was used.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Eran Ben Elisha
cc8e9ebf95 net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
The driver RQ has two possible configurations: striding RQ and
non-striding RQ.  Until this patch, the driver always reported the
number of hardware WQEs (ring descriptors). For non striding RQ
configuration, this was OK since we have one WQE per pending packet
For striding RQ, multiple packets can fit into one WQE. For better
user experience we normalize the rx_pending parameter (size of wqe/mtu)
as the average ring size in case of striding RQ.

Fixes: 461017cb00 ('net/mlx5e: Support RX multi-packet WQE ...')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed
6e8dd6d6f4 net/mlx5e: Don't wait for SQ completions on close
Instead of asking the firmware to flush the SQ (Send Queue) via
asynchronous completions when moved to error, we handle SQ flush
manually (mlx5e_free_tx_descs) same as we did when SQ flush got
timed out or on tx_timeout.

This will reduce SQs flush time and speedup interface down procedure.

Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
critical code locality.

Fixes: 29429f3300 ('net/mlx5e: Timeout if SQ doesn't flush during close')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed
8484f9ed13 net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
ICO (Internal control operations) SQ (Send Queue) is closed/disabled
after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
might post a fragmented MPWQE (Multi Packet Work Queue Element) into
that RQ.

As on regular RQ post, check if we are allowed to post to that
RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
if needed.

Fixes: bc77b240b3 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed
f2fde18c52 net/mlx5e: Don't wait for RQ completions on close
This will significantly reduce receive queue flush time on interface
down.

Instead of asking the firmware to flush the RQ (Receive Queue) via
asynchronous completions when moved to error, we handle RQ flush
manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
out.

This will reduce RQs flush time and speedup interface down procedure
(ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.

Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
free form non critical data path code for better code locality.

Fixes: 6cd392a082 ('net/mlx5e: Handle RQ flush in error cases')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Saeed Mahameed
fe4c988bdd net/mlx5e: Limit UMR length to the device's limitation
ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
is limited to U16_MAX, before this patch we ignored that limitation and
requested the maximum possible UMR translation length that the netdev
might need (MAX channels * MAX pages per channel).
In case of a system with #cores > 32 and when linear WQE allocation fails,
falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
stuck.

Here we limit UMR length to min(U16_MAX, max required pages) (while
considering the required alignments) on driver load, by default U16_MAX is
sufficient since the default RX rings value guarantees that we are in
range, dynamically (on set_ringparam/set_channels) we will check if the
new required UMR length (num mtts) is still in range, if not, fail the
request.

Fixes: bc77b240b3 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:24:15 -04:00
Ido Schimmel
1c6c6d221e mlxsw: spectrum: Mirror certain packets to CPU
Instead of trapping certain packets to the CPU and then relying on it to
flood them we can instead make the device mirror them.

The following packet types are mirrored:

* DHCP: Broadcast packets that should be flooded by the device, but also
trapped in case CPU is running the DHCP server.

* IGMP query: Multicast packets that need to be forwarded to other
bridge ports, but also trapped so that receiving netdev will be marked
as a router port by the bridge driver.

* ARP request: Broadcast packets that should be forwarded to other
bridge ports, but also trapped in case requested IP is of the local
machine.

* ARP response: Unicast packets that should be forwarded by the bridge
but also trapped in case response is directed at us.

Set the trap action of such packets to mirror and mark them using
'offload_fwd_mark' to prevent the bridge driver from forwarding them
itself.

Note that OSPF packets are also marked despite their action being trap.
The reason for this is that the device traps such packets in the
pipeline after they were already flooded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:37 -07:00
Ido Schimmel
63a811417d mlxsw: spectrum: Allow different traps to have different actions
Up until now we only trapped packets to CPU, but we are going to allow
some packets to be mirrored (trap & forward) to CPU.

Extend the Rx listener with 'action' member.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Ido Schimmel
93393b339d mlxsw: spectrum: Simplify traps definition
Instead of copying & pasting the same struct initialization for every
Rx listener, just use a macro.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Doug Ledford
0c41284c83 Mellanox ConnectX-4/Connect-IB shared code (SW part)
* net/mlx5: Add sniffer namespaces
 * net/mlx5: Introduce sniffer steering hardware capabilities
 * net/mlx5: Configure IB devices according to LAG state
 * net/mlx5: Vport LAG creation support
 * net/mlx5: Add LAG flow steering namespace
 * net/mlx5: LAG demux flow table support
 * net/mlx5: LAG and SRIOV cannot be used together
 * net/mlx5e: Avoid port remapping of mlx5e netdev TISes
 * net/mlx5: Get RoCE netdev
 * net/mlx5: Implement RoCE LAG feature
 * net/mlx5: Add HW interfaces used by LAG
 * net/mlx5: Separate query_port_proto_oper for IB and EN
 * net/mlx5: Expose mlx5e_link_mode
 * net/mlx5: Update struct mlx5_ifc_xrqc_bits
 * net/mlx5: Modify RQ bitmask from mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu20hAAoJEORje4g2cliniO8P/0nMxLemOxY63u7P6DqT+UZQ
 +LN62W+/iLicNayKkt8mtcjnDm768YcF3ADvx73vRvKEeUyyEqT5ChMA59eicf70
 rrumfNXB/kfBOaPh5rFWf4Tn8WWpKW+0559drm80NslFZF9jjF9pwv5QGg7xISb7
 fYLcDQWn+5fYDuZzYsSu8zZKUEyGN0AugdjfxT5OHfh4rw+6oqGDb2fhH6LdkD8q
 j3Qx1cPmdQQnjJ5veXJFJT5qHFDqJlNmy85s4l99ItdWD/bcU29ue3Q3vNf7+lHp
 XoJB4ZRWG7sf98yXYXnOUt3iGUMdSJzpLfZqh/Nx9U1LZpdJ8lmBf7pRuR1hpPIN
 yDitcz+CMcFVr2WxvwWaUPhRE7SJsZxxr6tQISgRicYcFVyy9e7mLjABMtkh9vEn
 CXXqiDGUb/27HqTi9ha5qRiLoeT8yFpOCkINL4omV2FJKoUEbC+Jbq5P0mjnPpS1
 ZdzTOzWCtkDQGtLbi+nCIF5SVTv7CCDU+6VpGZPmk6M4/ednwajhxGPsbw6bRpna
 ck5SglGO8dFAaUv1UVRq04PIt7Lj2FRakP7sHWx3tc9XEP8syLX0OEiVB+ZN3yRn
 y2TlpsREk7AqDdRulwM4qfuNd4AxaDklXyS3C79RiJtenYO4GUGrJ6J6ryesLg8u
 tGKVV3fXEr2Hve6cTkpu
 =+m21
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into mlx5-shared

Mellanox ConnectX-4/Connect-IB shared code (SW part)

* net/mlx5: Add sniffer namespaces
* net/mlx5: Introduce sniffer steering hardware capabilities
* net/mlx5: Configure IB devices according to LAG state
* net/mlx5: Vport LAG creation support
* net/mlx5: Add LAG flow steering namespace
* net/mlx5: LAG demux flow table support
* net/mlx5: LAG and SRIOV cannot be used together
* net/mlx5e: Avoid port remapping of mlx5e netdev TISes
* net/mlx5: Get RoCE netdev
* net/mlx5: Implement RoCE LAG feature
* net/mlx5: Add HW interfaces used by LAG
* net/mlx5: Separate query_port_proto_oper for IB and EN
* net/mlx5: Expose mlx5e_link_mode
* net/mlx5: Update struct mlx5_ifc_xrqc_bits
* net/mlx5: Modify RQ bitmask from mlx5 ifc
2016-08-25 10:01:23 -04:00
Ido Schimmel
0f7a4d8a9d mlxsw: spectrum: Don't set learning when creating vPorts
Before commit 99724c18fc ("mlxsw: spectrum: Introduce support for
router interfaces") we used to assign vFIDs to the created vPorts. Since
these vPorts were used for slow path traffic we had to disable learning
for them, as it doesn't make sense to have it enabled.

This is no longer the case and now vPorts are either used for router
interfaces (for which learning is disabled by the firmware) or bridge
ports (for which learning is explicitly enabled by the driver).

Therefore, we can remove the learning configuration upon vPort creation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:13 -07:00
Ido Schimmel
81f77bc006 mlxsw: spectrum: Remove unnecessary check in FDB processing
We now offload the learning configuration to the device and don't rely
on the driver to decide whether to learn the FDB record, so remove the
check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel
89b548f0cf mlxsw: spectrum: Offload learning to the switch ASIC
Up until now we simply stored the learning configuration of a bridge
port in the driver and decided whether to learn a new FDB record based
on this value.

However, this is sub-optimal in cases where learning is disabled on the
bridge port, as the device repeatedly generates learning notifications
for the same record.

Instead, offload the learning configuration to the device, thereby
preventing it from generating notifications when learning is disabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel
584d73df06 mlxsw: spectrum: Configure learning for VLAN-aware bridge port
We are going to prevent the device from generating learning
notifications for a port that was configured with learning disabled.

Since learning configuration is done per {Port, VID} we need to apply
the port's learning configuration for any VID that is added to the
bridge port's VLAN filter list.

When a VID is added to the VLAN filter list of a VLAN-aware bridge port,
configure the {Port, VID} learning status according to the port's
configuration. When the VID is removed, disable learning for the {Port,
VID}.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel
640be7b717 mlxsw: spectrum: Don't abort on first error when removing VLANs
When removing VLANs from the VLAN-aware bridge we shouldn't abort on the
first error, as we'll otherwise have resources that will never be freed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel
f7a8f6cec3 mlxsw: spectrum: Make VLAN deletion function symmetric
Commit 05978481e7 ("mlxsw: spectrum: Create PVID vPort before
registering netdevice") removed __mlxsw_sp_port_vlans_del() from the
init sequence of the driver, which forced it to be non-symmetric with
regards to __mlxsw_sp_port_vlans_add().

Make both functions symmetric as the constraint no longer exists.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:12 -07:00
Ido Schimmel
1803e0fb7e mlxsw: spectrum: Limit number of FDB records per learning session
Up until now a learning session ended whenever the number of queried
records was zero. This turned out to be problematic in situations where
a large number of MACs (48K) had to be processed by the switch driver,
as RTNL mutex is held during the learning session.

Instead, limit the number of FDB records that can be processed in a
session to 64. This means that every time the device is queried for
learning notifications (currently, every 100ms), up to 64 records will
be processed by the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:41:11 -07:00
Yotam Gigi
51af96b534 mlxsw: router: Enable neighbors to be created on stacked devices
Make the function mlxsw_router_neigh_construct search the rif according
to the neighbour dev other than the dev that was passed to the ndo, thus
allowing creating neigbhours upon stacked devices.

Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:39:04 -07:00
Ido Schimmel
f888f58795 mlxsw: spectrum: Add missing flood to router port
In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
then we should flood the following packet types to the router:

* Broadcast: If DIP is the broadcast address of the interface, then we
need to be able to get it to CPU by trapping it following route lookup.

* Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
use this range and are trapped in the router block.

Fixes: 99f44bb352 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:39:03 -07:00
David S. Miller
fff84d2a39 Mellanox ConnectX-4/Connect-IB shared code (SW part)
* net/mlx5: Add sniffer namespaces
 * net/mlx5: Introduce sniffer steering hardware capabilities
 * net/mlx5: Configure IB devices according to LAG state
 * net/mlx5: Vport LAG creation support
 * net/mlx5: Add LAG flow steering namespace
 * net/mlx5: LAG demux flow table support
 * net/mlx5: LAG and SRIOV cannot be used together
 * net/mlx5e: Avoid port remapping of mlx5e netdev TISes
 * net/mlx5: Get RoCE netdev
 * net/mlx5: Implement RoCE LAG feature
 * net/mlx5: Add HW interfaces used by LAG
 * net/mlx5: Separate query_port_proto_oper for IB and EN
 * net/mlx5: Expose mlx5e_link_mode
 * net/mlx5: Update struct mlx5_ifc_xrqc_bits
 * net/mlx5: Modify RQ bitmask from mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu20hAAoJEORje4g2cliniO8P/0nMxLemOxY63u7P6DqT+UZQ
 +LN62W+/iLicNayKkt8mtcjnDm768YcF3ADvx73vRvKEeUyyEqT5ChMA59eicf70
 rrumfNXB/kfBOaPh5rFWf4Tn8WWpKW+0559drm80NslFZF9jjF9pwv5QGg7xISb7
 fYLcDQWn+5fYDuZzYsSu8zZKUEyGN0AugdjfxT5OHfh4rw+6oqGDb2fhH6LdkD8q
 j3Qx1cPmdQQnjJ5veXJFJT5qHFDqJlNmy85s4l99ItdWD/bcU29ue3Q3vNf7+lHp
 XoJB4ZRWG7sf98yXYXnOUt3iGUMdSJzpLfZqh/Nx9U1LZpdJ8lmBf7pRuR1hpPIN
 yDitcz+CMcFVr2WxvwWaUPhRE7SJsZxxr6tQISgRicYcFVyy9e7mLjABMtkh9vEn
 CXXqiDGUb/27HqTi9ha5qRiLoeT8yFpOCkINL4omV2FJKoUEbC+Jbq5P0mjnPpS1
 ZdzTOzWCtkDQGtLbi+nCIF5SVTv7CCDU+6VpGZPmk6M4/ednwajhxGPsbw6bRpna
 ck5SglGO8dFAaUv1UVRq04PIt7Lj2FRakP7sHWx3tc9XEP8syLX0OEiVB+ZN3yRn
 y2TlpsREk7AqDdRulwM4qfuNd4AxaDklXyS3C79RiJtenYO4GUGrJ6J6ryesLg8u
 tGKVV3fXEr2Hve6cTkpu
 =+m21
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-24

This series contains some low level and API updates for mlx5 core
driver interface and mlx5_ifc.h, plus mlx5 LAG core driver support,
to be shared as base code for net-next and rdma mlx5 4.9 submissions.

From Alex and Artemy, Update mlx5_ifc for modify RQ and XRC bits.

From Noa, Expose mlx5 link modes so they can be used in RDMA tree for rdma tools.

From Aviv, LAG support needed for RDMA.
    - Add needed hardware structures, layouts and interface
    - mlx5 core driver LAG implementation
    - Introduce mlx5 core driver LAG API for mlx5_ib

From Maor, add two low level patches for mlx5 hardware sniffer QP
infrastructure bits and capabilities, plus added the namespace for sniffer
steering tables.  Needed for RDMA subtree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 09:35:35 -07:00
David S. Miller
01555e6449 Mellanox ConnectX-4/Connect-IB shared code (HW part)
* net/mlx5: Introduce alloc_encap and dealloc_encap commands
 * net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
 * net/mlx5: Enable setting minimum inline header mode for VFs
 * net/mlx5: Improve driver log messages
 * net/mlx5: Unify and improve command interface
 * {net,IB}/mlx5: Modify QP commands via mlx5 ifc
 * {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
 * {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
 * {net,IB}/mlx5: CQ commands via mlx5 ifc
 * net/mlx5: EQ commands via mlx5 ifc
 * net/mlx5: Pages management commands via mlx5 ifc
 * net/mlx5: MCG commands via mlx5 ifc
 * net/mlx5: PD and UAR commands via mlx5 ifc
 * net/mlx5: Access register and MAD IFC commands via mlx5 ifc
 * net/mlx5: Init/Teardown hca commands via mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu2zqAAoJEORje4g2clin0dQP/3SF9+4lxVaWRDnhutwIdaxd
 GDWDCEcp1x8oC1ylKEfQW57tTG8mk6pEFD5xEZSAJMGjGm5zR8QnaIS9eiPTdDkf
 QIReMP9XJUUVDqXZ8F207PwVbgB4IkHB2VPyl2Sar1HULe6Mn3nAS40A1QfYpVzs
 cYC3SFOPuLsTDZkIVQrZzKvX4WVHjcyj0tAkXkutWQ+K8cPXmpx49+ngrzVm6xnw
 j6THx3kOAEwozW5NxMC7V6DOD7KfLWzPi96BLZ2h4eQynpgJnSLOCar3zyBPH5g3
 KAk99tVjD1kp+HreSNzCd+oP8Zqrw+RBt3WlrGX2GvQ0V7XIJrpkRbLDgWhbBjej
 O1ln/xr5pqLSKgxz41LsFlrLWbOgG7r4N212iMNv3rArb9e11tqZCAbR0OzX5vZ6
 fl2W7moYRB2273Y+MnB/e1e8xf7PEIppWnyvyPrzCz1lSdzw1BzLqz5tWz2nc1dB
 yQWosVTf4xTa3OQHhUqw6CbhpRpywQZx1ZhmAzZ7+hQ90Z4hwPWWXIx7MNa4g2sJ
 toUamuonbnib3wBLQzzW2ktTbdJUx8OTF5aiVNC06QG8KAvXeUAP2Ho95Am3JpLJ
 XZ14ZP0NxOFaGgOSDRxEVKuhnUnXuIG57NSgQpMD5rjSieMl+msasrydP8X4+qny
 HlwA4nwt2bHf9k7Cg1iM
 =QIAc
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-20

This series contains several low level and API updates for mlx5 core
commands interface and mlx5_ifc.h to be shared as base code for net-next and
rdma mlx5 4.9 submissions.

From Saeed, ten patches that refactors old layouts of firmware commands which
were manually generated before we introduced the mlx5_ifc, now all of the firmware
commands inbox/outbox layouts moved to use mlx5_ifc and we remove the old
manually generated structures.  Plus to those ten patches, we add two patches
that unifies mlx5 commands execution interface and improve the driver log messages
in that area.

From Hadar and Ilya, added the needed hardware bits and infrastructure for
minimum inline headers setting and encap/decap commands and capabilities,
needed for E-Switch offloads.

This series applies on top latest net-next and rdma/master, and smoothly merges with
the latest "Mellanox 100G mlx5 fixes 2016-08-16" series already applied into net branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 11:08:23 -07:00
Doug Ledford
124c13439b Mellanox ConnectX-4/Connect-IB shared code (HW part)
* net/mlx5: Introduce alloc_encap and dealloc_encap commands
 * net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
 * net/mlx5: Enable setting minimum inline header mode for VFs
 * net/mlx5: Improve driver log messages
 * net/mlx5: Unify and improve command interface
 * {net,IB}/mlx5: Modify QP commands via mlx5 ifc
 * {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
 * {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
 * {net,IB}/mlx5: CQ commands via mlx5 ifc
 * net/mlx5: EQ commands via mlx5 ifc
 * net/mlx5: Pages management commands via mlx5 ifc
 * net/mlx5: MCG commands via mlx5 ifc
 * net/mlx5: PD and UAR commands via mlx5 ifc
 * net/mlx5: Access register and MAD IFC commands via mlx5 ifc
 * net/mlx5: Init/Teardown hca commands via mlx5 ifc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXu2zqAAoJEORje4g2clin0dQP/3SF9+4lxVaWRDnhutwIdaxd
 GDWDCEcp1x8oC1ylKEfQW57tTG8mk6pEFD5xEZSAJMGjGm5zR8QnaIS9eiPTdDkf
 QIReMP9XJUUVDqXZ8F207PwVbgB4IkHB2VPyl2Sar1HULe6Mn3nAS40A1QfYpVzs
 cYC3SFOPuLsTDZkIVQrZzKvX4WVHjcyj0tAkXkutWQ+K8cPXmpx49+ngrzVm6xnw
 j6THx3kOAEwozW5NxMC7V6DOD7KfLWzPi96BLZ2h4eQynpgJnSLOCar3zyBPH5g3
 KAk99tVjD1kp+HreSNzCd+oP8Zqrw+RBt3WlrGX2GvQ0V7XIJrpkRbLDgWhbBjej
 O1ln/xr5pqLSKgxz41LsFlrLWbOgG7r4N212iMNv3rArb9e11tqZCAbR0OzX5vZ6
 fl2W7moYRB2273Y+MnB/e1e8xf7PEIppWnyvyPrzCz1lSdzw1BzLqz5tWz2nc1dB
 yQWosVTf4xTa3OQHhUqw6CbhpRpywQZx1ZhmAzZ7+hQ90Z4hwPWWXIx7MNa4g2sJ
 toUamuonbnib3wBLQzzW2ktTbdJUx8OTF5aiVNC06QG8KAvXeUAP2Ho95Am3JpLJ
 XZ14ZP0NxOFaGgOSDRxEVKuhnUnXuIG57NSgQpMD5rjSieMl+msasrydP8X4+qny
 HlwA4nwt2bHf9k7Cg1iM
 =QIAc
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into mlx5-shared

Mellanox ConnectX-4/Connect-IB shared code (HW part)

* net/mlx5: Introduce alloc_encap and dealloc_encap commands
* net/mlx5: Update mlx5_ifc.h for vxlan encap/decap
* net/mlx5: Enable setting minimum inline header mode for VFs
* net/mlx5: Improve driver log messages
* net/mlx5: Unify and improve command interface
* {net,IB}/mlx5: Modify QP commands via mlx5 ifc
* {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
* {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
* {net,IB}/mlx5: CQ commands via mlx5 ifc
* net/mlx5: EQ commands via mlx5 ifc
* net/mlx5: Pages management commands via mlx5 ifc
* net/mlx5: MCG commands via mlx5 ifc
* net/mlx5: PD and UAR commands via mlx5 ifc
* net/mlx5: Access register and MAD IFC commands via mlx5 ifc
* net/mlx5: Init/Teardown hca commands via mlx5 ifc
2016-08-23 11:52:02 -04:00
Markus Elfring
6f0b826da4 mlx5/core: Use memdup_user() rather than duplicating its implementation
* Reuse existing functionality from memdup_user() instead of keeping
  duplicate source code.

  This issue was detected by using the Coccinelle software.

* Return directly if this copy operation failed.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-22 17:05:38 -07:00
Jiri Pirko
8912862f06 mlxsw: spectrum_buffers: Fix pool value handling in mlxsw_sp_sb_tc_pool_bind_set
Pool index has to be converted by get_pool helper to work correctly for
egress pool. In mlxsw the egress pool index starts from 0.

Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 18:01:56 -07:00
Or Gerlitz
f96750f8d6 net/mlx5: E-Switch, Avoid ACLs in the offloads mode
When we are in the switchdev/offloads mode, HW matching is done as
dictated by the offloaded rules and hence we don't need to enable
the ACLs mechanism used by the legacy mode.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Or Gerlitz
1a8ee6f25b net/mlx5: E-Switch, Set the send-to-vport rules in the correct table
While adding actual offloading support to the new switchdev mode, we didn't
change the setup of the send-to-vport rules to put them in the slow path
table, fix that.

Fixes: 1033665e63 ('net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Or Gerlitz
ef78618b9d net/mlx5: E-Switch, Return the correct devlink e-switch mode
Since mlx5 has also the NONE e-switch mode, we must translate from mlx5
mode to devlink mode on the devlink eswitch mode get call, do that.

While here, remove the mlx5_ prefix from the static function helpers
that deal with the mode to comply with the rest of the code.

Fixes: c930a3ad74 ('net/mlx5e: Add devlink based SRIOV mode change')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Hadar Hen Zion
dbe413e3bb net/mlx5e: Retrieve the switchdev id from the firmware only once
Avoid firmware command execution each time the switchdev HW ID attr get
call is made. We do that by reading the ID (PF NIC MAC) only once at
load time and store it on the representor structure.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Hadar Hen Zion
1dbd0d373a net/mlx5e: Use correct flow dissector key on flower offloading
The wrong key is used when extracting the address type field set by
the flower offload code. We have to use the control key and not the
basic key, fix that.

Fixes: e3a2b7ed01 ('net/mlx5e: Support offload cls_flower with drop action')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:56 -07:00
Amir Vadai
6c3b4f9086 net/mlx5: Update last-use statistics for flow rules
Set lastuse statistic, when number of packets is changed compared to
last query. This was wrongly dropped when bulk counter reading was added.

Fixes: a351a1b03b ('net/mlx5: Introduce bulk reading of flow counters')
Signed-off-by: Amir Vadai <amirva@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Paul Blakey
2c0f8ce1b5 net/mlx5: Added missing check of msg length in verifying its signature
Set and verify signature calculates the signature for each of the
mailbox nodes, even for those that are unused (from cache). Added
a missing length check to set and verify only those which are used.

While here, also moved the setting of msg's nodes token to where we
already go over them. This saves a pass because checksum is disabled,
and the only useful thing remaining that set signature does is setting
the token.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB
adapters')
Signed-off-by: Paul Blakey <paulb@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Mohamad Haj Yahia
1061c90f52 net/mlx5: Fix pci error recovery flow
When PCI error is detected we should save the state of the pci prior to
disabling it.

Also when receiving pci slot reset call we need to verify that the
device is responsive.

Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core
driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Tariq Toukan
506753b0b4 net/mlx5e: Optimization for MTU change
Avoid unnecessary interface down/up operations upon an MTU change
when it does not affect the rings configuration.

Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Saeed Mahameed
13f9bba7cd net/mlx5e: Set port MTU on netdev creation rather on open
Port mtu shouldn't be written to hardware on every single interface
open.
Here we set it only when needed, on change_mtu and netdevice creation.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-19 16:09:55 -07:00
Maor Gottlieb
87d22483ce net/mlx5: Add sniffer namespaces
Add sniffer TX and RX namespaces to receive ingoing and outgoing
traffic.

Each outgoing/incoming packet is duplicated and steered to the sniffer
TX/RX namespace in addition to the regular flow.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:59 +03:00
Aviv Heller
917b41aab7 net/mlx5: Configure IB devices according to LAG state
When mlx5_ib is loaded, we would like each card's IB devices
to be added according to its LAG state (one IB device, instead of
two, is to be added if LAG is active).

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:58 +03:00
Aviv Heller
3bc34f3bcb net/mlx5: Vport LAG creation support
Add interfaces for issuing CREATE_VPORT_LAG and
DESTROY_VPORT_LAG commands.

Used for receiving PF1's eth traffic on PF0's
root ft.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller
3e75d4ebaa net/mlx5: Add LAG flow steering namespace
This namespace is used for LAG demux flowtable.

The idea is to position the LAG demux ft between
bypass and kernel flowtables, allowing raw-eth
traffic from both ports to be received by the PF0
IB device.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:57 +03:00
Aviv Heller
aaff1bea16 net/mlx5: LAG demux flow table support
Add interfaces to allow the creation and destruction of a
LAG demux flow table.

It is a special flow table used during LAG for redirecting
non user-mode packets from PF0 to PF1 root ft, if a packet was
received on phys port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller
edb31b1686 net/mlx5: LAG and SRIOV cannot be used together
Until support will be added for RoCE LAG SRIOV.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:56 +03:00
Aviv Heller
db60b80273 net/mlx5e: Avoid port remapping of mlx5e netdev TISes
TISes belonging to the mlx5e NIC should not be
subject to port remap.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:55 +03:00
Aviv Heller
6a32047a44 net/mlx5: Get RoCE netdev
Used by IB driver for determining the IB bond
device's netdev, when LAG is active.

Returns PF0's netdev if mode is not active-backup,
or the PF netdev of the active slave when mode is
active-backup.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller
7907f23adc net/mlx5: Implement RoCE LAG feature
Available on dual port cards only, this feature keeps
track, using netdev LAG events, of the bonding
and link status of each port's PF netdev.

When both of the card's PF netdevs are enslaved to the
same bond/team master, and only them, LAG state
is active.

During LAG, only one IB device is present for both ports.

In addition to the above, this commit includes FW commands
used for managing the LAG, new facilities for adding and removing
a single device by interface, and port remap functionality according to
bond events.

Please note that this feature is currently used only for mimicking
Ethernet bonding for RoCE - netdevs functionality is not altered,
and their bonding continues to be managed solely by bond/team driver.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:54 +03:00
Aviv Heller
84df61ebc6 net/mlx5: Add HW interfaces used by LAG
Exposed LAG commands enum and layouts:
- CREATE_LAG
  HW enters LAG mode:
  RoCE traffic from port two is received on PF0 core dev.
  Allows to set tx_affinity (tx port) for QPs and TISes.
  Allows to port remap QPs and TISes, overriding their
  tx_affinity behavior.

- MODIFY_LAG
  Remap QPs and TISes to another port.

- QUERY_LAG
  Query whether LAG mode is active.

- DESTROY_LAG
  HW exits LAG mode, returning to non-LAG behavior.

- CREATE_VPORT_LAG
  Merge Ethernet flow steering, such that traffic received on port
  two jumps to PF0 root flow table.

  Available only in LAG mode.

- DESTROY_VPORT_LAG
  Ethernet flow steering returns to non-LAG behavior.

Caps added:
- lag_master
  Driver is in charge of managing the LAG.
  This is currently the only option.

- num_lag_ports
  LAG is supported only if this field's value is 2.

Other fields:
- QP/TIS tx port affinity
  During LAG, this field controls on which port a QP or TIS resides.

- TIS strict tx affinity
  When this field is set, the TIS will not be subject to port remap by
  CREATE_LAG/MODIFY_LAG.

- LAG demux flow table
  Flow table used for redirecting non user-space traffic back to
  PF1 root flow table, if the packet was received on port two.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:53 +03:00
Noa Osherovich
d5beb7f2af net/mlx5: Separate query_port_proto_oper for IB and EN
Replaced mlx5_query_port_proto_oper with separate functions per link
type. The functions should take different arguments so no point in
trying to unite them.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Noa Osherovich
8cca30a7f9 net/mlx5: Expose mlx5e_link_mode
The mlx5e_link_mode enumeration will also be used in mlx5_ib for RoCE.
This patch moves the enumeration to the mlx5 driver port header file.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:52 +03:00
Alex Vesker
83b502a12e net/mlx5: Modify RQ bitmask from mlx5 ifc
Use mlx5 ifc MODIFY_BITMASK_VSD in mlx5e_modify_rq_vsd and expose counter
set capability bit in hca caps structure.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18 18:49:08 +03:00
Linus Torvalds
184ca82348 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
    Fietkau.

 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.

 3) Fix some tg3 ethtool logic bugs, and one that would cause no
    interrupts to be generated when rx-coalescing is set to 0.  From
    Satish Baddipadige and Siva Reddy Kallam.

 4) QLCNIC mailbox corruption and napi budget handling fix from Manish
    Chopra.

 5) Fix fib_trie logic when walking the trie during /proc/net/route
    output than can access a stale node pointer.  From David Forster.

 6) Several sctp_diag fixes from Phil Sutter.

 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.

 8) Checksum fixup fixes in bpf from Daniel Borkmann.

 9) Memork leaks in nfnetlink, from Liping Zhang.

10) Use after free in rxrpc, from David Howells.

11) Use after free in new skb_array code of macvtap driver, from Jason
    Wang.

12) Calipso resource leak, from Colin Ian King.

13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.

14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.

15) Fix lockdep splats in macsec, from Sabrina Dubroca.

16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
    handling.

17) Various tc-action bug fixes, from CONG Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net_sched: allow flushing tc police actions
  net_sched: unify the init logic for act_police
  net_sched: convert tcf_exts from list to pointer array
  net_sched: move tc offload macros to pkt_cls.h
  net_sched: fix a typo in tc_for_each_action()
  net_sched: remove an unnecessary list_del()
  net_sched: remove the leftover cleanup_a()
  mlxsw: spectrum: Allow packets to be trapped from any PG
  mlxsw: spectrum: Unmap 802.1Q FID before destroying it
  mlxsw: spectrum: Add missing rollbacks in error path
  mlxsw: reg: Fix missing op field fill-up
  mlxsw: spectrum: Trap loop-backed packets
  mlxsw: spectrum: Add missing packet traps
  mlxsw: spectrum: Mark port as active before registering it
  mlxsw: spectrum: Create PVID vPort before registering netdevice
  mlxsw: spectrum: Remove redundant errors from the code
  mlxsw: spectrum: Don't return upon error in removal path
  i40e: check for and deal with non-contiguous TCs
  ixgbe: Re-enable ability to toggle VLAN filtering
  ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
  ...
2016-08-17 17:26:58 -07:00
WANG Cong
22dc13c837 net_sched: convert tcf_exts from list to pointer array
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af2 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:27:51 -04:00
Ido Schimmel
9ffcc3725f mlxsw: spectrum: Allow packets to be trapped from any PG
When packets enter the device they are classified to a priority group
(PG) buffer based on their PCP value. After their egress port and
traffic class are determined they are moved to the switch's shared
buffer and await transmission, if:

(Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres &&
 Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres)
||
(Ingress{Port}.Usage < Min || Ingress{Port,PG} < Min ||
 Egress{Port}.Usage < Min || Egress{Port,TC}.Usage < Min)

Packets scheduled to transmission through CPU port (trapped to CPU) use
traffic class 7, which has a zero maximum and minimum quotas. However,
when such packets arrive from PG 0 they are admitted to the shared
buffer as PG 0 has a non-zero minimum quota.

Allow all packets to be trapped to the CPU - regardless of the PG they
were classified to - by assigning a 10KB minimum quota for CPU port and
TC7.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:28 -04:00
Ido Schimmel
8168287b5d mlxsw: spectrum: Unmap 802.1Q FID before destroying it
Before destroying the 802.1Q FID we should first remove the VID-to-FID
mapping. This makes mlxsw_sp_fid_destroy() symmetric with regards to
mlxsw_sp_fid_create().

Fixes: 14d39461b3 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
0583272d91 mlxsw: spectrum: Add missing rollbacks in error path
While going over the code I noticed we are missing two rollbacks in the
port's creation error path. Add them and adjust the place of one of them
in the port's removal sequence so that both are symmetric.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Jiri Pirko
0e7df1a290 mlxsw: reg: Fix missing op field fill-up
Ralue pack function needs to set op, otherwise it is 0 for add always.

Fixes: d5a1c749d2 ("mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
a94a614fa2 mlxsw: spectrum: Trap loop-backed packets
One of the conditions to generate an ICMP Redirect Message is that "the
packet is being forwarded out the same physical interface that it was
received from" (RFC 1812).

Therefore, we need to be able to trap such packets and let the kernel
decide what to do with them.

For each RIF, enable the loop-back filter, which will raise the LBERROR
trap whenever the ingress RIF equals the egress RIF.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Reported-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Elad Raz
c20b80187a mlxsw: spectrum: Add missing packet traps
Add the following traps:

1) MTU Error: Trap packets whose size is bigger than the egress RIF's
MTU. If DF bit isn't set, traffic will continue to be routed in slow
path.

2) TTL Error: Trap packets whose TTL expired. This allows traceroute to
work properly.

3) OSPF packets.

Fixes: 7b27ce7bb9 ("mlxsw: spectrum: Add traps needed for router implementation")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
2f25844c23 mlxsw: spectrum: Mark port as active before registering it
Commit bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of
init sequence") moved ports initialization to the end of the init
sequence, which means ports are the first to be removed during fini.

Since the FDB delayed work is still active when ports are removed it's
possible for it to process FDB notifications of inactive ports,
resulting in a warning message.

Fix that by marking ports as inactive only after unregistering them. The
NETDEV_UNREGISTER event will invoke bridge's driver port removal
sequence that will cause the FDB (and FDB notifications) to be flushed.

Fixes: bbf2a4757b ("mlxsw: spectrum: Initialize ports at the end of init sequence")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
05978481e7 mlxsw: spectrum: Create PVID vPort before registering netdevice
After registering a netdevice it's possible for user space applications
to configure an IP address on it. From the driver's perspective, this
means a router interface (RIF) should be created for the PVID vPort.

Therefore, we must create the PVID vPort before registering the
netdevice.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
fa66d7e3fe mlxsw: spectrum: Remove redundant errors from the code
Currently, when device configuration fails we emit errors to the kernel
log despite the fact we already get these from the EMAD transaction
layer, so remove them.

In addition to being unnecessary, removing these error messages will
allow us to reuse mlxsw_sp_port_add_vid() to create the PVID vPort
before registering the netdevice.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ido Schimmel
7a35583ec5 mlxsw: spectrum: Don't return upon error in removal path
When removing a VLAN filter from the device we shouldn't return upon the
first error we encounter, as otherwise we'll have resources that will
never be freed nor used.

Instead, we should keep trying to free as much resources as possible in
a best effort mode.

Remove the error message as well, since we already get these from the
EMAD transaction code.

Fixes: 99724c18fc ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:18:27 -04:00
Ilya Lesokhin
575ddf5888 net/mlx5: Introduce alloc_encap and dealloc_encap commands
Implement low-level commands to support vxlan encapsulation.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:46:01 +03:00
Hadar Hen Zion
9def7121be net/mlx5: Enable setting minimum inline header mode for VFs
Implement the low-level part of the PF side in setting minimum
inline header mode for VFs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:59 +03:00
Saeed Mahameed
2974ab6e8b net/mlx5: Improve driver log messages
Remove duplicate pci dev name printing in mlx5_core_err.
Use mlx5_core_{warn,info,err} where possible to have the pci info in the
driver log messages.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:59 +03:00
Saeed Mahameed
c4f287c4a6 net/mlx5: Unify and improve command interface
Now as all commands use mlx5 ifc interface, instead of doing two calls
for executing a command we embed command status checking into
mlx5_cmd_exec to simplify the interface.

Also we do here some cleanup for redundant software structures
(inbox/outbox) and functions and improved command failure output.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:58 +03:00
Saeed Mahameed
1a412fb1ca {net,IB}/mlx5: Modify QP commands via mlx5 ifc
Prior to this patch we assumed that modify QP commands have the
same layout.

In ConnectX-4 for each QP transition there is a specific command
and their layout can vary.

e.g: 2err/2rst commands don't have QP context in their layout and before
this patch we posted the QP context in those commands.

Fortunately the FW only checks the suffix of the commands and executes
them, while ignoring all invalid data sent after the valid command
layout.

This patch removes mlx5_modify_qp_mbox_in and changes
mlx5_core_qp_modify to receive the required transition and QP context
with opt_param_mask if needed.  This way the caller is not required to
provide the command inbox layout and it will be generated automatically.

mlx5_core_qp_modify will generate the command inbox/outbox layouts
according to the requested transition and will fill the requested
parameters.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:58 +03:00
Saeed Mahameed
09a7d9eca1 {net,IB}/mlx5: QP/XRCD commands via mlx5 ifc
Remove old representation of manually created QP/XRCD commands layout
amd use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-17 17:45:57 +03:00
Vincent
eb8fc32354 mlxsw: spectrum_router: Fix use after free
In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-14 21:32:05 -07:00
Saeed Mahameed
ec22eb5310 {net,IB}/mlx5: MKey/PSV commands via mlx5 ifc
Remove old representation of manually created MKey/PSV commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:18 +03:00
Saeed Mahameed
2782778663 {net,IB}/mlx5: CQ commands via mlx5 ifc
Remove old representation of manually created CQ commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:15 +03:00
Saeed Mahameed
73b626c182 net/mlx5: EQ commands via mlx5 ifc
Remove old representation of manually created EQ commands layout,
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:12 +03:00
Saeed Mahameed
a533ed5e17 net/mlx5: Pages management commands via mlx5 ifc
Remove old representation of manually created Pages management
commands layout, and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:08 +03:00
Saeed Mahameed
20bb566bda net/mlx5: MCG commands via mlx5 ifc
Remove old representation of manually created MCG commands layout
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:04 +03:00
Saeed Mahameed
732ef5ad8f net/mlx5: PD and UAR commands via mlx5 ifc
Remove old representation of manually created PD/UAR commands layouts
and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:39:00 +03:00
Saeed Mahameed
20ed51c643 net/mlx5: Access register and MAD IFC commands via mlx5 ifc
Remove old representation of manually created ACCESS_REG/MAD_IFC
commands layout and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:38:57 +03:00
Saeed Mahameed
04ed5ad5db net/mlx5: Init/Teardown hca commands via mlx5 ifc
Remove old representation of manually created Init/Teardown hca
commands layout and use mlx5_ifc canonical structures and defines.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-14 14:38:38 +03:00
Ido Schimmel
4de34eb574 mlxsw: spectrum: Add missing DCB rollback in error path
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.

Fixes: f00817df2b ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Ido Schimmel
07d50cae06 mlxsw: spectrum: Do not override PAUSE settings
The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).

Fix this by packing the PFCC register with the current PAUSE settings.

Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Ido Schimmel
b489a2000f mlxsw: spectrum: Do not assume PAUSE frames are disabled
When ieee_setpfc() gets called, PAUSE frames are not necessarily
disabled on the port.

Check if PAUSE frames are disabled or enabled and configure the port's
headroom buffer accordingly.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08 12:57:27 -07:00
Linus Torvalds
0cda611386 Round one of 4.8 code
- Updates/fixes for iw_cxgb4 driver
 - Updates/fixes for mlx5 driver
 - Add flow steering and RSS API
 - Add hardware stats to mlx4 and mlx5 drivers
 - Add firmware version API for RDMA driver use
 - Add the rxe driver (this is a software RoCE driver that makes any
   Ethernet device a RoCE device)
 - Fixes for i40iw driver
 - Support for send only multicast joins in the cma layer
 - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq
 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+
 GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu
 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5
 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq
 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ
 NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8
 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw
 xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0
 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8
 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq
 kwy96TvYtT43SkyNmmBf
 =oZOO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull base rdma updates from Doug Ledford:
 "Round one of 4.8 code: while this is mostly normal, there is a new
  driver in here (the driver was hosted outside the kernel for several
  years and is actually a fairly mature and well coded driver).  It
  amounts to 13,000 of the 16,000 lines of added code in here.

  Summary:

   - Updates/fixes for iw_cxgb4 driver
   - Updates/fixes for mlx5 driver
   - Add flow steering and RSS API
   - Add hardware stats to mlx4 and mlx5 drivers
   - Add firmware version API for RDMA driver use
   - Add the rxe driver (this is a software RoCE driver that makes any
     Ethernet device a RoCE device)
   - Fixes for i40iw driver
   - Support for send only multicast joins in the cma layer
   - Other minor fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits)
  Soft RoCE driver
  IB/core: Support for CMA multicast join flags
  IB/sa: Add cached attribute containing SM information to SA port
  IB/uverbs: Fix race between uverbs_close and remove_one
  IB/mthca: Clean up error unwind flow in mthca_reset()
  IB/mthca: NULL arg to pci_dev_put is OK
  IB/hfi1: NULL arg to sc_return_credits is OK
  IB/mlx4: Add diagnostic hardware counters
  net/mlx4: Query performance and diagnostics counters
  net/mlx4: Add diagnostic counters capability bit
  Use smaller 512 byte messages for portmapper messages
  IB/ipoib: Report SG feature regardless of HW UD CSUM capability
  IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct
  IB/hfi1: Disable by default
  IB/rdmavt: Disable by default
  IB/mlx5: Fix port counter ID association to QP offset
  IB/mlx5: Fix iteration overrun in GSI qps
  i40iw: Add NULL check for puda buffer
  i40iw: Change dup_ack_thresh to u8
  i40iw: Remove unnecessary check for moving CQ head
  ...
2016-08-04 20:10:31 -04:00
Doug Ledford
7f1d25b47d Merge branches 'misc' and 'rxe' into k.o/for-4.8-1 2016-08-04 11:13:47 -04:00
Mark Bloch
bfaf31687c net/mlx4: Query performance and diagnostics counters
Add a function to query diagnostics counters from the firmware.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:34 -04:00
Mark Bloch
c7c122ed67 net/mlx4: Add diagnostic counters capability bit
Add a bit that indicates if the firmware supports per port
diagnostic counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03 21:03:33 -04:00
Linus Torvalds
731c7d3a20 Merge tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux
Merge drm updates from Dave Airlie:
 "This is the main drm pull request for 4.8.

  I'm down with a cold at the moment so hopefully this isn't in too bad
  a state, I finished pulling stuff last week mostly (nouveau fixes just
  went in today), so only this message should be influenced by illness.
  Apologies to anyone who's major feature I missed :-)

  Core:
        Lockless GEM BO freeing
        Non-blocking atomic work
        Documentation changes (rst/sphinx)
        Prep for new fencing changes
        Simple display helpers
        Master/auth changes
        Register/unregister rework
        Loads of trivial patches/fixes.

  New stuff:
        ARM Mali display driver (not the 3D chip)
        sii902x RGB->HDMI bridge

  Panel:
        Support for new panels
        Improved backlight support

  Bridge:
        Convert ADV7511 to bridge driver
        ADV7533 support
        TC358767 (DSI/DPI to eDP) encoder chip support

  i915:
        BXT support enabled by default
        GVT-g infrastructure
        GuC command submission and fixes
        BXT workarounds
        SKL/BKL workarounds
        Demidlayering device registration
        Thundering herd fixes
        Missing pci ids
        Atomic updates

  amdgpu/radeon:
        ATPX improvements for better dGPU power control on PX systems
        New power features for CZ/BR/ST
        Pipelined BO moves and evictions in TTM
        GPU scheduler improvements
        GPU reset improvements
        Overclocking on dGPUs with amdgpu
        Polaris powermanagement enabled

  nouveau:
        GK20A/GM20B volt and clock improvements.
        Initial support for GP100/GP104 GPUs, GP104 will not yet support
        acceleration due to NVIDIA having not released firmware for them as of yet.

  exynos:
        Exynos5433 SoC with IOMMU support.

  vc4:
        Shader validation for branching

  imx-drm:
        Atomic mode setting conversion
        Reworked DMFC FIFO allocation
        External bridge support

  analogix-dp:
        RK3399 eDP support
        Lots of fixes.

  rockchip:
        Lots of small fixes.

  msm:
        DT bindings cleanups
        Shrinker and madvise support
        ASoC HDMI codec support

  tegra:
        Host1x driver cleanups
        SOR reworking for DP support
        Runtime PM support

  omapdrm:
        PLL enhancements
        Header refactoring
        Gamma table support

  arcgpu:
        Simulator support

  virtio-gpu:
        Atomic modesetting fixes.

  rcar-du:
        Misc fixes.

  mediatek:
        MT8173 HDMI support

  sti:
        ASOC HDMI codec support
        Minor fixes

  fsl-dcu:
        Suspend/resume support
        Bridge support

  amdkfd:
        Minor fixes.

  etnaviv:
        Enable GPU clock gating

  hisilicon:
        Vblank and other fixes"

* tag 'drm-for-v4.8' of git://people.freedesktop.org/~airlied/linux: (1575 commits)
  drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
  drm/nouveau/acpi: fix lockup with PCIe runtime PM
  drm/nouveau/acpi: check for function 0x1B before using it
  drm/nouveau/acpi: return supported DSM functions
  drm/nouveau/acpi: ensure matching ACPI handle and supported functions
  drm/nouveau/fbcon: fix font width not divisible by 8
  drm/amd/powerplay: remove enable_clock_power_gatings_tasks from initialize and resume events
  drm/amd/powerplay: move clockgating to after ungating power in pp for uvd/vce
  drm/amdgpu: add query device id and revision id into system info entry at CGS
  drm/amdgpu: add new definition in bif header
  drm/amd/powerplay: rename smum header guards
  drm/amdgpu: enable UVD context buffer for older HW
  drm/amdgpu: fix default UVD context size
  drm/amdgpu: fix incorrect type of info_id
  drm/amdgpu: make amdgpu_cgs_call_acpi_method as static
  drm/amdgpu: comment out unused defaults_staturn_pro static const structure to fix the build
  drm/amdgpu: enable UVD VM only on polaris
  drm/amdgpu: increase timeout of IB test
  drm/amdgpu: add destroy session when generate VCE destroy msg.
  drm/amd: fix deadlock of job_list_lock V2
  ...
2016-08-01 21:44:08 -04:00
Bhaktipriya Shridhar
0a91605cda net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
The workqueue health->wq was used as per device private health thread.
This was done to perform delayed work.

The workqueue has a single workitem(&health->work) and
hence doesn't require ordering. It is involved in handling the health of
the device and is not being used on a memory reclaim path.
Hence, the singlethreaded workqueue has been replaced with the use of
system_wq.

Work item has been flushed in mlx5_health_cleanup() to ensure that
there are no pending tasks while disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-26 15:18:56 -07:00
Dave Airlie
5e580523d9 Linux 4.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXlRXSAAoJEHm+PkMAQRiGG/gH/0Z8O4zWOsrwO+X1mRToRDBH
 joFOjAmCVe83T1VpF5LYNB+9+owL/dEDt6+ZIswnhH7AfQPjs4RqwS4PcuMbCDVO
 +mDm0PmfcKaYcQZrB2Z2OwIzRNnfCTVcsDPhIHwuIHk0m4z/xuGZonD8KoAj0+tO
 3yJF6sbE1KubDVjOb+lmZZSP3cXA0pDXrNhkYhE4Tsr8fiihGjeXSNJ8t2zPLjxo
 W3MPqo0rzDvQsOwoF4TWHHagVaFSJlhLBBgqu33fI7uO3jtfQD2G8wG68JCND1j3
 qbMoBfTLFV/yQmSIJUt0Wv1axaCcwnjpweEB35A/GEeZ0mNB1rDdoBeI1eKEQkc=
 =DGFC
 -----END PGP SIGNATURE-----

Backmerge tag 'v4.7' into drm-next

Linux 4.7

As requested by Daniel Vetter as the conflicts were getting messy.
2016-07-26 17:26:29 +10:00
Alex Vesker
9b022a6e0f net/mlx4_core: Check device state before unregistering it
Verify that the device state is registered before un-registering it.
This check is required to prevent an OOPS on flows that do
re-registration of the device and its previous state was
unregistered.

Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 18:00:25 -07:00
Ido Schimmel
86cb13e4ec mlxsw: spectrum: Fix compilation error when CLS_ACT isn't set
When CONFIG_NET_CLS_ACT isn't set 'struct tcf_exts' has no member named
'actions' and we therefore must not access it. Otherwise compilation
fails.

Fix this by introducing a new macro similar to tc_no_actions(), which
always returns 'false' if CONFIG_NET_CLS_ACT isn't set.

Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:57:33 -07:00
Hadar Hen Zion
cff92d7c7e net/mlx5e: Query minimum required header copy during xmit
Add support for query the minimum inline mode from the Firmware.
It is required for correct TX steering according to L3/L4 packet
headers.

Each send queue (SQ) has inline mode that defines the minimal required
headers that needs to be copied into the SQ WQE.
The driver asks the Firmware for the wqe_inline_mode device capability
value.  In case the device capability defined as "vport context" the
driver must check the reported min inline mode from the vport context
before creating its SQs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:53:40 -07:00
Hadar Hen Zion
ae76715d15 net/mlx5e: Check the minimum inline header mode before xmit
Each send queue (SQ) has inline mode that defines the minimal required
inline headers in the SQ WQE.
Before sending each packet check that the minimum required headers
on the WQE are copied.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 17:53:40 -07:00
Yotam Gigi
763b4b70af mlxsw: spectrum: Add support in matchall mirror TC offloading
This patch offloads port mirroring directives to hw using the matchall TC
with action mirror. It includes both the implementation of the
ndo_setup_tc function for the spectrum driver and the spectrum hardware
offload configuration code.

The hardware offload code is basically two new functions which are capable
of adding and removing a new mirror ports pair. It is done using the MPAT,
MPAR and SBIB registers:
 - A new Switch-Port Analyzer (SPAN) entry is added using MPAT to the 'to'
   port.
 - The 'to' port is bound to the SPAN entry using MPAR register.
 - In case of egress SPAN, the 'to' port gets a new internal shared
   buffer using SBIB register.

In addition, a new database was added to the mlxsw_sp struct to store all
the SPAN entries and their bound ports list. The number of supported SPAN
entries is determined by resource query.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:12:00 -07:00
Yotam Gigi
230190548b mlxsw: reg: Add the Monitoring Port Analyzer register
The MPAR register is used to bind ports to a SPAN entry (which was
created using MPAT register) and thus mirror their traffic (ingress /
egress) to a different port.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Yotam Gigi
43a4685620 mlxsw: reg: Add Monitoring Port Analyzer Table register
The MPAT register is used to query and configure the Switch Port Analyzer
(SPAN) table. This register is used to configure a port as a mirror output
port, while after that a mirrored input port can be bound using MPAR
register.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Yotam Gigi
51ae8cc662 mlxsw: reg: Add Shared Buffer Internal Buffer register
The SBIB register configures per port buffer for internal use. This
register is used to configure an egress mirror buffer on the egress port
which does the mirroring.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Nogah Frankel
ded821c8d3 mlxsw: pci: Add max span resources to resources query
Add max span resources to resources query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
Nogah Frankel
57d316ba20 mlxsw: pci: Add resources query implementation.
Add resources query implementation. If exists, query the HW for its
builtin resources instead of having them as consts in the code.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:58 -07:00
David S. Miller
de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Brenden Blanco
cb7386d37e net/mlx4_en: use READ_ONCE when freeing xdp_prog
For consistency, and in order to hint at the synchronous nature of the
xdp_prog field, use READ_ONCE in the destroy path of the ring. All
occurrences should now use either READ_ONCE or xchg.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 22:07:23 -07:00
Saeed Mahameed
882b0f2fba net/mlx5e: Fix del vxlan port command buffer memset
memset the command buffers rather than the pointers to them.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 15:29:50 -07:00
Ido Schimmel
df4750e84e mlxsw: spectrum: Expose per-tc counters via ethtool
Expose the transmit queue length of each traffic class and the amount of
unicast packets discarded due to insufficient room in the shared buffer.

The first counter allows us to debug user priority to traffic class
mapping, whereas the drop counter is useful when determining shared buffer
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:53:56 -07:00
Ido Schimmel
7ed674bc3c mlxsw: spectrum: Expose per-priority counters via ethtool
Expose per-priority bytes / packets / PFC packets counters via ethtool.

These counters are very useful when debugging QoS functionality and
provide a better insight into the device's forwarding plane.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:53:56 -07:00
Wei Yongjun
44fafdaa75 net/mlx5: Use PTR_ERR_OR_ZERO() to simplify the code
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR.

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 14:46:00 -07:00
Brenden Blanco
9ecc2d8617 net/mlx4_en: add xdp forwarding and data write support
A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco
224e92e02a net/mlx4_en: break out tx_desc write into separate function
In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:33 -07:00
Brenden Blanco
d576acf0a2 net/mlx4_en: add page recycle to prepare rx ring for tx support
The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper      [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper      [kernel.vmlinux]       [k] intel_idle
  8.79%  swapper      [mlx4_en]              [k] mlx4_en_process_rx_cq
  7.54%  swapper      [kernel.vmlinux]       [k] poll_idle
  6.36%  swapper      [mlx4_core]            [k] mlx4_eq_int
  4.21%  swapper      [kernel.vmlinux]       [k] tasklet_action
  4.03%  swapper      [kernel.vmlinux]       [k] cpuidle_enter_state
  3.43%  swapper      [mlx4_en]              [k] mlx4_en_prepare_rx_desc
  2.18%  swapper      [kernel.vmlinux]       [k] native_irq_return_iret
  1.37%  swapper      [kernel.vmlinux]       [k] menu_select
  1.09%  swapper      [kernel.vmlinux]       [k] bpf_map_lookup_elem

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco
47a38e1550 net/mlx4_en: add support for fast rx drop bpf program
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Eugenia Emantayev
ec25bc04ed net/mlx4_en: Add resilience in low memory systems
This patch fixes the lost of Ethernet port on low memory system,
when driver frees its resources and fails to allocate new resources.
Issue could happen while changing number of channels, rings size or
changing the timestamp configuration.
This fix is necessary because of removing vmap use in the code.
When vmap was in use driver could allocate non-contiguous memory
and make it contiguous with vmap. Now it could fail to allocate
a large chunk of contiguous memory and lose the port.
Current code tries to allocate new resources and then upon success
frees the old resources.

Fixes: 73898db043 ('net/mlx4: Avoid wrong virtual mappings')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:44:11 -07:00
Eugenia Emantayev
30f56e3ced net/mlx4_en: Move filters cleanup to a proper location
Filters cleanup should be done once before destroying net device,
since filters list is contained in the private data.

Fixes: 1eb8c695bd ('net/mlx4_en: Add accelerated RFS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 16:44:11 -07:00
Ido Schimmel
11719a58bd mlxsw: spectrum: Prevent invalid ingress buffer mapping
Packets entering the switch are mapped to a Switch Priority (SP)
according to their PCP value (untagged frames are mapped to SP 0).

The packets are classified to a priority group (PG) buffer in the port's
headroom according to their SP.

The switch maintains another mapping (SP to IEEE priority), which is
used to generate PFC frames for lossless PGs. This mapping is
initialized to IEEE = SP % 8.

Therefore, when mapping SP 'x' to PG 'y' we create a situation in which
an IEEE priority is mapped to two different PGs:

IEEE 'x' ---> SP 'x' ---> PG 'y'
IEEE 'x' ---> SP 'x + 8' ---> PG '0' (default)

Which is invalid, as a flow can use only one PG buffer.

Fix this by mapping both SP 'x' and 'x + 8' to the same PG buffer.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel
28f5275e4a mlxsw: spectrum: Prevent overwrite of DCB capability fields
The number of supported traffic classes that can have ETS and PFC
simultaneously enabled is not subject to user configuration, so make
sure we always initialize them to the correct values following a set
operation.

Fixes: 8e8dfe9fdf ("mlxsw: spectrum: Add IEEE 802.1Qaz ETS support")
Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel
7347180dca mlxsw: spectrum: Don't emit errors when PFC is disabled
We can't have PAUSE frames and PFC both enabled on the same port, but
the fact that ieee_setpfc() was called doesn't necessarily mean PFC is
enabled.

Only emit errors when PAUSE frames and PFC are enabled simultaneously.

Fixes: d81a6bdb87 ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:51 -07:00
Ido Schimmel
c3f1576810 mlxsw: spectrum: Indicate support for autonegotiation
The device supports link autonegotiation, so let the user know about it
by indicating support via ethtool ops.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:50 -07:00
Ido Schimmel
6277d46b10 mlxsw: spectrum: Force link training according to admin state
When setting a new speed we need to disable and enable the port for the
changes to take effect. We currently only do that if the operational
state of the port is up. However, setting a new speed following link
training failure will require us to explicitly set the port down and then
up.

Instead, disable and enable the port based on its administrative state.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:49:50 -07:00
Christophe Jaillet
a1e3e7372c mlxsw: spectrum_router: Return -ENOENT in case of error
'vr' should be a valid pointer here, so returning 'PTR_ERR(vr)' is wrong.
Return an explicit error code (-ENOENT) instead.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 22:14:54 -07:00
Or Gerlitz
d957b4e383 net/mlx5e: Add TC offload support for the VF representors netdevice
The VF representors support only TC filter/action offloads
(not mqprio) and this is enabled for them by default.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz
adb4c123f8 net/mlx5e: Add TC HW support for FDB (SRIOV e-switch) offloads
Enhance the TC offload code such that when the eswitch exists and it's
mode being SRIOV offloads, we do TC actions parsing and setup targeted
for eswitch. Next, we add the offloaded flow to the HW e-switch (fdb).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz
03a9d11e6e net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads
Add the setup code that parses the TC actions needed to support offloading drop
and mirred/redirect for SRIOV e-switch. We can redirect between two devices if
they belong to the same HW switch, compare the switchdev HW ID attribute to
enforce that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz
5c40348c69 net/mlx5e: Adjustments in the TC offload code towards reuse for SRIOV
Towards reusing the TC offloads code for an SRIOV use-case, change some of the
helper functions to have _nic in their names so it's clear what's NIC unique
and what's general. Also group together the NIC related helpers so we can easily
branch per the use-case in downstream patch.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:29 -07:00
Or Gerlitz
3d80d1a2f5 net/mlx5: E-Switch, Add API to configure rules for the offloaded mode
This allows for upper levels in the driver, e.g the TC offload code to add
e-switch offloaded steering rules. The caller provides the rule spec for
matching, action, source and destination vports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Or Gerlitz
1033665e63 net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode
In the offloads mode, some slow path rules are added by the driver (e.g
send-to-vport), while offloaded rules are to be added from upper layers.

The slow path rules have lower priority and we don't want matching on
offloaded rules to suffer from extra steering hops related to the slow
path rules.

We use two priorities, one for offloaded rules (fast path), and one for
the control rules (slow path). To allow for that, we enable two priorities
for the FDB namespace in the FS core code.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Or Gerlitz
5513028787 net/mlx5e: Offload TC flow counters only when supported
Currenly, the code that programs the flow actions into the firmware
doesn't check if was actually asked to offload the statistics, fix that.

Fixes: aad7e08d39 ('net/mlx5e: Hardware offloaded flower filter statistics support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Amir Vadai
a351a1b03b net/mlx5: Introduce bulk reading of flow counters
This commit utilize the ability of ConnectX-4 to bulk read flow counters.
Few bulk counter queries could be done instead of issuing thousands of
firmware commands per second to get statistics of all flows set to HW,
such as those programmed when we offload tc filters.

Counters are stored sorted by hardware id, and queried in blocks (id +
number of counters).

Due to hardware requirement, start of block and number of counters in a
block must be four aligned.

Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Amir Vadai
29cc667907 net/mlx5: Store counters in rbtree instead of list
In order to use bulk counters, we need to have counters sorted by id.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-14 13:34:28 -07:00
Mohamad Haj Yahia
c3b7c5c950 net/mlx5e: start/stop all tx queues upon open/close netdev
Start all tx queues (including inactive ones) when opening the netdev.
Stop all tx queues (including inactive ones) when closing the netdev.

This is a workaround for the tx timeout watchdog false alarm issue in
which the netdev watchdog is polling all the tx queues which may include
inactive queues and thus once lowering the real tx queues number
(ethtool -L) it will generate tx timeout watchdog false alarms.

Fixes: 3947ca1859 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:38:16 -07:00
Daniel Jurgens
2c1ccc9937 net/mlx5e: Fix TX Timeout to detect queues stuck on BQL
Change netif_tx_queue_stopped to netif_xmit_stopped.  This will show
when queues are stopped due to byte queue limits.

Fixes: 3947ca1859 ('net/mlx5e: Implement ndo_tx_timeout callback')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:38:16 -07:00
Jiri Pirko
b38a75d2d3 mlxsw: core: Trace EMAD messages
Trace EMAD messages going down to HW and up from HW. Devlink needs to be
registered before EMAD init so the trace function can be called
with valid devlink handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v1->v2:
- Use trace_devlink_hwmsg directly
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-12 14:20:18 -07:00
David S. Miller
30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
Or Gerlitz
eae033c1b8 net/mlx5: Avoid setting unused var when modifying vport node GUID
GCC complains on unused-but-set-variable, clean this up.

Fixes: 23898c763f ('net/mlx5: E-Switch, Modify node guid on vf set MAC')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 11:52:42 -07:00
Yotam Gigi
0b2361d9d9 mlxsw: Add the unresolved next-hops probes
Now, the driver sends arp probes for all unresolved neighbours that are
currently a nexthop for some route on the system. The job is set
periodically every 5 seconds.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:31 -07:00
Yotam Gigi
b2157149b0 mlxsw: spectrum_router: Add the nexthop neigh activity update
For nexthop neighbours we need to make kernel to think there is a traffic
flowing to them preventing it from going to stale state. Otherwise
kernel would stale it and eventually the neigh would be removed from HW
and nexthop as well. That would reduce ECMP group in HW.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko
a7ff87acd9 mlxsw: spectrum_router: Implement next-hop routing
Implement next-hop routing offload including ECMP. To make it possible,
introduce next-hop group entity. This entity keeps track of resolved
neighbours and updates HW adjacency table accordingly. Note that HW
next-hops are stored in this adjacency table, in form of MAC.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko
a59f0b312a mlxsw: reg: Add Router Algorithmic LPM ECMP Update Register
The RALEU register is used to mass update remote action adjacency index
and ecmp size.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Yotam Gigi
089f981683 mlxsw: reg: Add Router Adjacency Table register
The RATR register is used to configure the Router Adjacency (next-hop)
Table.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko
b090ef0686 mlxsw: Introduce simplistic KVD linear area manager
This is a very simple manager for KVD linear area. Currently, the
allocator will either allocate a single entry from pre-defined sub-area,
or in case more than one entry is needed, it will allocate 32-entry chunk
in other pre-defined sub-area.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko
c602242761 mlxsw: spectrum: Define sizes of KVD areas
Override the defaults and define the area sizes ourselves.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:30 -07:00
Jiri Pirko
489107bda1 mlxsw: Add KVD sizes configuration into profile
Up until now we only used hash-based tables in the device, but we are
going to use the linear table for remote routes adjacency lists.

Add the configuration fields that control the size of the linear table.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi
a6bf9e933d mlxsw: spectrum_router: Offload neighbours based on NUD state change
Listen to any NEIGH_UPDATE events sent and program the device
accordingly. If NUD state is VALID and neighbour isn't yet offloaded,
then program it into the device's table. Otherwise, just edit its
parameters.

If NUD state machine transitioned neighbour out of VALID state and it's
present in the device's table, then remove it.

Note that the device is programmed in delayed work, as the netevent
notification chain is atomic and prevents us from going to sleep.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi
c723c735fa mlxsw: spectrum_router: Periodically update the kernel's neigh table
As previously explained, the driver should periodically poll the device
for neighbours activity according to the configured DELAY_PROBE_TIME.
This will prevent active neighbours from staying in STALE state for long
periods of time.

During init configure the polling interval according to the
DELAY_PROBE_TIME used in the default table. In addition, register a
netevent notification block, so that the interval is updated whenever
DELAY_PROBE_TIME changes.

Using the computed interval schedule a delayed work, which will update
the kernel via neigh_event_send() on any active neighbour since the last
delayed work.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi
7cf2c205d7 mlxsw: reg: Add Router Algorithmic LPM Unicast Host Table Dump register
The RAUHTD register allows dumping entries from the Router Unicast Host
Table.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:29 -07:00
Yotam Gigi
4457b3df3f mlxsw: reg: Add Router Algorithmic LPM Unicast Host Table register
The RAUHT register is used to configure and query the Unicast Host Table
in devices that implement the Algorithmic LPM. In other words, it is
used to configure neighbour entries in the device.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:28 -07:00
Jiri Pirko
6cf3c971dc mlxsw: spectrum_router: Add private neigh table
We need to hold some private data for every neigh entry. It would be
possible to do it using neigh_priv_len/ndo_neigh_construct/
ndo_neigh_destroy however only for the port device itself. That would not
work for stacked devices like bridge/team/bond. So introduce a private
neigh table. Hook onto ndos neigh_construct/destroy and add/remove
table entry according to that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 09:06:28 -07:00
Gal Pressman
e989d5a532 net/mlx5e: Expose flow control counters to ethtool
Just like per prio counters, the global flow counters are queried from
per priority counters register.
Global flow control counters are stored in priority 0 PFC counters.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:03 -07:00
Gal Pressman
fe6b9bd9eb net/mlx5e: Expose RDMA VPort counters to ethtool
Add the needed descriptors to expose RoCE RDMA counters.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:03 -07:00
Maor Gottlieb
f913a72aa0 net/mlx5e: Add support to get ethtool flow rules
Enhance the existing get_rxnfc callback:
1. Get flow rule of specific ID.
2. Get all flow rules.
3. Get number of rules.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:03 -07:00
Maor Gottlieb
1174fce8d1 net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering
Add support to add flow steering rules with ethtool
of L3/L4 flow types (ip4/tcp4/udp4).
Those rules will be in higher priority than l2 flow rules, in order
to prefer more specific rules.

Mask is not supported for l3/l4 flow types.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:02 -07:00
Maor Gottlieb
6dc6071cfc net/mlx5e: Add ethtool flow steering support
Implement etrhtool set_rxnfc callback to support ethtool flow spec
direct steering. This patch adds only the support of ether flow type
spec. L3/L4 flow specs support will be added in downstream patches.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:02 -07:00
Maor Gottlieb
0da2d66666 net/mlx5: Properly remove all steering objects
Instead of explicitly cleaning up the well known parts of the steering
tree, we use the generic tree structure to traverse for cleanup.
No functional changes.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:02 -07:00
Maor Gottlieb
fba53f7b57 net/mlx5: Introduce mlx5_flow_steering structure
Instead of having all steering private name spaces and
steering module fields flat in mlx5_core_priv, we wrap
them in mlx5_flow_steering for better modularity and
API exposure.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:02 -07:00
Maor Gottlieb
c5bb17302e net/mlx5: Refactor mlx5_add_flow_rule
Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-05 00:06:02 -07:00
Ido Schimmel
99f44bb352 mlxsw: spectrum: Enable L3 interfaces on top of bridge devices
As with the previously introduced L3 interfaces, listen to 'inetaddr'
notifications sent for bridges devices configured on top of the port
netdevs and create / destroy router interfaces (RIFs) accordingly.
This also includes VLAN devices configured on top of the VLAN-aware
bridge.

The RIFs will be destroyed either when the last IP address is removed or
when the underlying FID is is destroyed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:16 -07:00
Ido Schimmel
701b186ebf mlxsw: spectrum: Configure FIDs based on bridge events
Before introducing support for L3 interfaces on top of the VLAN-aware
bridge we need to add some missing infrastructure.

Such an interface can either be the bridge device itself or a VLAN
device on top of it. In the first case the router interface (RIF) is
associated with FID 1, which is created whenever the first port netdev
joins the bridge. We currently assume the default PVID is 1 and that
it's already created, as it seems reasonable. This can be extended in
the future.

However, in the second case it's entirely possible we've yet to create a
matching FID. This can happen if the VLAN device was configured before
making any bridge port member in the VLAN.

Prevent such ordering problems by using the VLAN device's CHANGEUPPER
event to configure the FID. Make the VLAN device hold a reference to the
FID and prevent it from being destroyed even if none of the port netdevs
is using it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:16 -07:00
Ido Schimmel
3ba2ebf4a2 mlxsw: spectrum: Unsplit the vFID range
Previous commit deprecated the vFIDs used to get traffic to the CPU
('port_vfids'). Thus, we now use the vFIDs as god intended and the
artificial split is no longer needed.

Rename functions and variables to reflect that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Ido Schimmel
99724c18fc mlxsw: spectrum: Introduce support for router interfaces
Up until now we only supported bridged interfaces. Packets ingressing
through the switch ports were either classified to FIDs (in the case of
the VLAN-aware bridge) or vFIDs (in the case of VLAN-unaware bridges).
The packets were then forwarded according to the FDB. Routing was done
entirely in slowpath, by splitting the vFID range in two and using the
lower 0.5K vFIDs as dummy bridges that simply flooded all incoming
traffic to the CPU.

Instead, allow packets to be routed in the device by creating router
interfaces (RIFs) that will direct them to the router block.
Specifically, the RIFs introduced here are Sub-port RIFs used for VLAN
devices and port netdevs. Packets ingressing from the {Port / LAG ID, VID}
with which the RIF was programmed with will be assigned to a special
kind of FIDs called rFIDs and from there directed to the router.

Create a RIF whenever the first IPv4 address was programmed on a VLAN /
LAG / port netdev. Destroy it upon removal of the last IPv4 address.
Receive these notifications by registering for the 'inetaddr'
notification chain. A non-zero (10) priority is used for the
notification block, so that RIFs will be created before routes are
offloaded via FIB code.

Note that another trigger for RIF destruction are CHANGEUPPER
notifications causing the underlying FID's reference count to go down to
zero. This can happen, for example, when a VLAN netdev with an IP address
is put under bridge. While this configuration doesn't make sense it does
cause the device and the kernel to get out of sync when the netdev is
unbridged. We intend to address this in the future, hopefully in current
cycle.

Finally, Remove the lower 0.5K vFIDs, as they are deprecated by the RIFs,
which will trap packets according to their DIP.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Ido Schimmel
6e095fd4eb mlxsw: spectrum: Edit RIF properties based on netdev events
We are just about to introduce router interfaces (RIFs), but before that
we need to be able update the device with the correct RIF attributes
whenever they change for the netdev the RIF is backing. Two such
attributes are MTU and MAC.

The MAC is used both to set the source MAC of packets egressing from the
RIF and also to program an FDB rule that will direct packets to the
router block.

Use the existing netdevice notification block and respond to CHANGEADDR
and CHANGEMTU accordingly. Store both attributes in the RIF struct
in case we need to revert to old attributes following a failed update.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Jiri Pirko
7ce856aaaf mlxsw: spectrum: Add couple of lower device helper functions
Add functions that iterate over lower devices and find port device.
As a dependency add netdev_for_each_all_lower_dev and
netdev_for_each_all_lower_dev_rcu macro with
netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.

Also, add functions to return mlxsw struct according to lower device
found and mlxsw_port struct with a reference to lower device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Jiri Pirko
61c503f976 mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops
Implement ipv4 FIB entries addition and removal. Initially, we support
local and broadcast routes using "ip2me" trap action.
Also, unicast routes without nexthop are supported using "local" action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Jiri Pirko
d5a1c749d2 mlxsw: reg: Add Router Algorithmic LPM Unicast Entry Register definition
Serves for adding, updating and removing fib entries.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:15 -07:00
Jiri Pirko
6b75c4807d mlxsw: spectrum_router: Add virtual router management
Virtual router is a construct used inside HW. In this implementation
we map kernel tables to virtual routers one to one. Introduce management
logic to create virtual routers when needed and destroy in case they are
no longer in use. According to that, call into LPM tree management.
Each virtual router is always bound to one LPM tree.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Jiri Pirko
53342023ee mlxsw: spectrum_router: Implement LPM trees management
Introduce basic LPM tree management allowing to share the trees in
between tables if the used prefixes in the tables are the same.
Build the tree structure according to the used prefixes. Although it is
not optimal for many use cases, this initial implementation does only
simple linear left-tree. More advanced structures will be introduced
later on, possibly including mechanisms to change trees on the fly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Jiri Pirko
20ae4053e9 mlxsw: reg: Add Router Algorithmic LPM Tree Binding Register definition
This register is used to bind virtual router and protocol to an
allocated LPM tree.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Jiri Pirko
a9823359c6 mlxsw: reg: Add Router Algorithmic LPM Structure Tree Register definition
Serves to build LPM tree structure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Jiri Pirko
6f9fc3cee4 mlxsw: reg: Add Router Algorithmic LPM Tree Allocation Register definition
Register serves for allocation and deallocation of LPM search tree.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Jiri Pirko
5e9c16cc83 mlxsw: spectrum_router: Implement private fib
Shadow FIB is needed in order to hold additional information for FIB
entries and keep track of used prefixes. That is needed for the LPM tree
construction to be introduced later on in this set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 18:25:14 -07:00
Christophe Jaillet
5d4de16c6d net/mlx4: Fix some indent inconsistancy
Silent a few smatch warnings about indentation.
This include the removal of a 'return' statement in 'resource_tracker.c'.
This 'return' will still be performed when breaking out of the
corresponding 'switch' block.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 15:22:33 -07:00
Jiri Pirko
7b27ce7bb9 mlxsw: spectrum: Add traps needed for router implementation
ip2me:
To instruct HW to send trapped ip2me traffic to kernel, we have to add
this trap. Selection ip2me traffic is introduced later on in this set.

ARPs:
We are going to stop flooding to CPU port when netdev isn't bridged and
only get packets destined to the netdev's IP address and certain control
packets.

Add traps for ARP request (broadcast) and response (unicast) in order to
get these to the CPU and resolve neighbours.

host miss:
If a packet is routed through a directly connected route and its
destination IP is not in the device's neighbour table, then we need to
trap it to CPU. This will cause the host to resolve the MAC of the
neighbour, which will be eventually programmed to the device's table.

router ingress:
In order to trap packets in router part.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:18 -04:00
Ido Schimmel
10f00aa1c4 mlxsw: spectrum: Use action 'discard' when removing traps
When removing packet traps we should use action 'discard' instead of
'forward', as some trap IDs we'll add cannot be configured with the
later. However, result is the same, as packets are not trapped to the
CPU.

In the future we will be able to reverse the operation properly by
detaching the trap group from the CPU.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:18 -04:00
Ido Schimmel
3dc266896d mlxsw: reg: Add Router Interface Table Register
Add the Router Interface Table Register (RITR), which allows us to
create and configure router interfaces (RIFs).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:18 -04:00
Ido Schimmel
d82d8c060f mlxsw: reg: Add FDB action to forward to router
Incoming packets are directed to the router when they match an FDB
entry with action forward to IP router.

Add this action, which was mistakenly named "TRAP".

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:17 -04:00
Ido Schimmel
fa3054f5a8 mlxsw: spectrum: Add router interface struct
When enabling the router in the device we will represent L3 netdevs
using router interfaces (RIFs). These will be specified whenever
programming routes or neighbours on the netdev.

Introduce the basic RIF infrastructure which allows one to lookup a RIF
by its netdev. Later patches in the series will extend this, but the
basic routines are needed now in order to direct traffic to CPU.

Pointers to the RIF structs are stored in an array indexed by the RIF's
number. This will allow us to efficiently update the kernel's neighbour
table when regularly dumping the device's table.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:17 -04:00
Ido Schimmel
464dce1884 mlxsw: spectrum_router: Add basic ipv4 router initialization
Create a skeleton router file and do basic HW initialization of router.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:17 -04:00
Ido Schimmel
bbf2a4757b mlxsw: spectrum: Initialize ports at the end of init sequence
During ports initialization a net device is registered for each
available port, which implies the port is usable. However, a port is
only usable after the different parts of the device (e.g. flooding,
buffers) are initialized. This is especially important now, when we must
initialize the router before the ports, as otherwise the device can't be
initialized.

Solve that by initializing the switch ports at the end of init sequence.

Also, remove an unnecessary warning about port up/down events, which
would otherwise be invoked whenever removing the driver, as ports are
removed before unregistering the listener for these events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:17 -04:00
Ido Schimmel
69c407aaf9 mlxsw: reg: Add Router General Configuration Register
Add the Router General Configuration Register (RGCR), which allows us to
enable the router in the device and configure its various parameters.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 15:21:17 -04:00