Commit Graph

799803 Commits

Author SHA1 Message Date
Heiner Kallweit
e782410ed2 r8169: improve spurious interrupt detection
Improve detection of spurious interrupts by checking against the
interrupt mask as currently set in the chip.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 11:22:48 -08:00
Yangtao Li
b09026c691 cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()
We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define
such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the
DEFINE_SHOW_ATTRIBUTE macro to simplify some code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 11:21:22 -08:00
Yangtao Li
70f98d7c7d ipconfig: convert to DEFINE_SHOW_ATTRIBUTE
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 11:21:22 -08:00
David S. Miller
a6b981079c Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'
Salil Mehta says:

====================
net: hns3: Add more commands to Debugfs in HNS3 driver

This patch-set adds few more debugfs commands to HNS3 Ethernet
Driver. Support has been added to query info related to below
items:
1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd")
2. Manager table("echo dump mng tbl > cmd")
3. Dfx status register("echo dump reg ssu [prt id] > cmd")
4. Dcb status register("echo dump reg dcb [port id] > cmd")
5. Queue map ("echo queue map [queue no] > cmd")
6. Tm map ("echo tm map [queue no] > cmd")

NOTE: Above commands are *read-only* and are only intended to
query the information from the SoC(and dump inside the kernel,
for now) and in no way tries to perform write operations for
the purpose of configuration etc.

Change Log:
V1-->V2:
1. Addressed the GCC-8.2 compiler issue reported by David S. Miller.
   Link: https://lkml.org/lkml/2018/12/14/1298
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
82e00b86a5 net: hns3: Add "tm map" status information query function
This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump tm map 100 > cmd
queue_id | qset_id | pri_id | tc_id
0100     | 0065    | 08     | 00
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
0c29d1912b net: hns3: Add "queue map" information query function
This patch prints queue map information.

debugfs command:
echo dump queue map > cmd

Sample Command:
root@(none)# echo queue map > cmd
 local queue id | global queue id | vector id
          0              32             769
          1              33             770
          2              34             771
          3              35             772
          4              36             773
          5              37             774
          6              38             775
          7              39             776
          8              40             777
          9              41             778
         10              42             779
         11              43             780
         12              44             781
         13              45             782
         14              46             783
         15              47             784
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
c0ebebb9cc net: hns3: Add "dcb register" status information query function
This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump reg dcb > cmd
 roce_qset_mask: 0x0
 nic_qs_mask: 0x0
 qs_shaping_pass: 0x0
 qs_bp_sts: 0x0
 pri_mask: 0x0
 pri_cshaping_pass: 0x0
 pri_pshaping_pass: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
27cf979a15 net: hns3: Add "status register" information query function
This patch prints status register information by module.

debugfs command:
echo dump reg [mode name] > cmd

Sample Command:
root@(none)# echo dump reg bios common > cmd
 BP_CPU_STATE: 0x0
 DFX_MSIX_INFO_NIC_0: 0xc000
 DFX_MSIX_INFO_NIC_1: 0xf
 DFX_MSIX_INFO_NIC_2: 0x2
 DFX_MSIX_INFO_NIC_3: 0x2
 DFX_MSIX_INFO_ROC_0: 0xc000
 DFX_MSIX_INFO_ROC_1: 0x0
 DFX_MSIX_INFO_ROC_2: 0x0
 DFX_MSIX_INFO_ROC_3: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
7737f1fbb5 net: hns3: Add "manager table" information query function
This patch prints manager table information.

debugfs command:
echo dump mng tbl > cmd

Sample Command:
root@(none)# echo dump mng tbl > cmd
 entry|mac_addr         |mask|ether|mask|vlan|mask|i_map|i_dir|e_type
 00   |01:00:5e:00:00:01|0   |00000|0   |0000|0   |00   |00   |0
 01   |c2:f1:c5:82:68:17|0   |00000|0   |0000|0   |00   |00   |0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
122bedc56a net: hns3: Add "bd info" query function
This patch prints Sending and receiving
package descriptor information.

debugfs command:
echo dump bd info 1 > cmd

Sample Command:
root@(none)# echo bd info 1 > cmd
hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0
hns3 0000:7d:00.0: (TX) addr: 0x0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)send_size: 0
hns3 0000:7d:00.0: (TX)vlan_tso: 0
hns3 0000:7d:00.0: (TX)l2_len: 0
hns3 0000:7d:00.0: (TX)l3_len: 0
hns3 0000:7d:00.0: (TX)l4_len: 0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)tv: 0
hns3 0000:7d:00.0: (TX)vlan_msec: 0
hns3 0000:7d:00.0: (TX)ol2_len: 0
hns3 0000:7d:00.0: (TX)ol3_len: 0
hns3 0000:7d:00.0: (TX)ol4_len: 0
hns3 0000:7d:00.0: (TX)paylen: 0
hns3 0000:7d:00.0: (TX)vld_ra_ri: 0
hns3 0000:7d:00.0: (TX)mss: 0
hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120
hns3 0000:7d:00.0: (RX)addr: 0xffee7000
hns3 0000:7d:00.0: (RX)pkt_len: 0
hns3 0000:7d:00.0: (RX)size: 0
hns3 0000:7d:00.0: (RX)rss_hash: 0
hns3 0000:7d:00.0: (RX)fd_id: 0
hns3 0000:7d:00.0: (RX)vlan_tag: 0
hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0
hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0
hns3 0000:7d:00.0: (RX)bd_base_info: 0

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:17 -08:00
David S. Miller
b9948e1113 Merge branch 'net-prefer-listeners-bound-to-an-address'
Peter Oskolkov says:

====================
net: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patchset eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In a future patchset I plan to explore whether it is possible
to remove port-only hashtables completely: additional refactoring
will be required, as some non-lookup code uses the hashtables.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:21 -08:00
Peter Oskolkov
6254e5c6a8 selftests: net: test that listening sockets match on address properly
This patch adds a selftest that verifies that a socket listening
on a specific address is chosen in preference over sockets
that listen on any address. The test covers UDP/UDP6/TCP/TCP6.

It is based on, and similar to, reuseport_dualstack.c selftest.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
Peter Oskolkov
0ee58dad5b net: tcp6: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
Peter Oskolkov
d9fbc7f643 net: tcp: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
Peter Oskolkov
23b0269e58 net: udp6: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
Peter Oskolkov
4cdeeee925 net: udp: prefer listeners bound to an address
A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:55:20 -08:00
yupeng
8e2ea53a83 add snmp counters document
Add explainations for some general IP counters, SACK and DSACK related
counters

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:50:14 -08:00
David S. Miller
384aee46ca Merge branch 'neighbor-More-gc_list-changes'
David Ahern says:

====================
neighbor: More gc_list changes

More gc_list changes and cleanups.

The first 2 patches are bug fixes from the first gc_list change.
Specifically, fix the locking order to be consistent - table lock
followed by neighbor lock, and then entries in the FAILED state
should always be candidates for forced_gc without waiting for any
time span (return to the eviction logic prior to the separate gc_list).

Patch 3 removes 2 now unnecessary arguments to neigh_del.

Patch 4 moves a helper from a header file to core code in preparation
for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list.
These entries are already exempt from forced_gc; patch 5 removes them
from consideration and makes them on par with PERMANENT entries given
that they are also managed by userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern
e997f8a20a neighbor: Remove externally learned entries from gc_list
Externally learned entries are similar to PERMANENT entries in the
sense they are managed by userspace and can not be garbage collected.
As such remove them from the gc_list, remove the flags check from
neigh_forced_gc and skip threshold checks in neigh_alloc. As with
PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED
entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern
526f1b587c neighbor: Move neigh_update_ext_learned to core file
neigh_update_ext_learned has one caller in neighbour.c so does not need
to be defined in the header. Move it and in the process remove the
intialization of ndm_flags and just set it based on the flags check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern
7e6f182bec neighbor: Remove state and flags arguments to neigh_del
neigh_del now only has 1 caller, and the state and flags arguments
are both 0. Remove them and simplify neigh_del.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern
758a7f0b32 neighbor: Fix state check in neigh_forced_gc
PERMANENT entries are not on the gc_list so the state check is now
redundant. Also, the move to not purge entries until after 5 seconds
should not apply to FAILED entries; those can be removed immediately
to make way for newer ones. This restores the previous logic prior to
the gc_list.

Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
David Ahern
9c29a2f55e neighbor: Fix locking order for gc_list changes
Lock checker noted an inverted lock order between neigh_change_state
(neighbor lock then table lock) and neigh_periodic_work (table lock and
then neighbor lock) resulting in:

[  121.057652] ======================================================
[  121.058740] WARNING: possible circular locking dependency detected
[  121.059861] 4.20.0-rc6+ #43 Not tainted
[  121.060546] ------------------------------------------------------
[  121.061630] kworker/0:2/65 is trying to acquire lock:
[  121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324
[  121.063894]
[  121.063894] but task is already holding lock:
[  121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324
[  121.066274]
[  121.066274] which lock already depends on the new lock.
[  121.066274]
[  121.067693]
[  121.067693] the existing dependency chain (in reverse order) is:
...

Fix by renaming neigh_change_state to neigh_update_gc_list, changing
it to only manage whether an entry should be on the gc_list and taking
locks in the same order as neigh_periodic_work. Invoke at the end of
neigh_update only if diff between old or new states has the PERMANENT
flag set.

Fixes: 8cc196d6ef ("neighbor: gc_list changes should be protected by table lock")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:44:47 -08:00
Cong Wang
aeb3fecde8 net_sched: fold tcf_block_cb_call() into tc_setup_cb_call()
After commit 69bd48404f ("net/sched: Remove egdev mechanism"),
tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
and remove its unused parameter 'exts'.

Fixes: 69bd48404f ("net/sched: Remove egdev mechanism")
Cc: Oz Shlomo <ozsh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 15:32:19 -08:00
Wen Yang
390de19404 net/ibmvnic: Remove tests of member address
The driver was checking for non-NULL address.
- adapter->napi[i]

This is pointless as these will be always non-NULL, since the
'dapter->napi' is allocated in init_napi().
It is safe to get rid of useless checks for addresses to fix the
coccinelle warning:
>>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address
Since such statements always return true, they are redundant.

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thomas Falcon <tlfalcon@linux.ibm.com>
CC: John Allen <jallen@linux.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linuxppc-dev@lists.ozlabs.org
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:36:38 -08:00
Prashant Bhole
6342ca6447 tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled
tun_xdp_one() runs with local bh disabled. So there is no need to
disable preemption by calling get_cpu_ptr while updating stats. This
patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
micro-optimization. Also removes related put_cpu_ptr call.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:36:26 -08:00
Aviv Heller
9582466640 net/mlx5: Handle LAG FW commands failure gracefully
When create_lag or destroy_lag FW commands fail, display an appropriate
error message, and try to recover, if possible.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:56 -08:00
Aviv Heller
7c34ec19e1 net/mlx5: Make RoCE and SR-IOV LAG modes explicit
With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:55 -08:00
Aviv Heller
292612d68c net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()
The new name better represents the function's aim, and sets a precedent
for a '__' prefix for internal, non-locked versions of LAG APIs.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
8aaca1976e net/mlx5: Allow co-enablement of uplink LAG and SRIOV
Enable setting uplink LAG if sriov is enabled on both ports in switchdev
mode.

Once the sriov mode is changed from switchdev for any of the ports, the
LAG instance is disabled.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
eff849b2c6 net/mlx5: Allow/disallow LAG according to pre-req only
Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
3b5ff59fd8 net/mlx5: Adjustments for the activate LAG logic to run under sriov
When HW lag is set/unset, roce must not be enabled on the port, as such
we wrap such changes with roce enable/disable either directly or through
re-creation of IB device.

Currently, lag and sriov are mutually exclusive, so by definition this
code doesn't run under sriov.

Towards changing this exclusion, we need to make sure that roce will not
be enabled on the eswitch manager port under sriov since this is
requirement of the switchdev mode.

We are going strict here and avoiding this all together under sriov.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Aviv Heller
1418ddd96a net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG
Under uplink LAG, packets that match a flow might arrive on both uplink
ports and transmitted through both as part of supporting aggregation and
high-availability.

When the netdevs representing the uplinks are set into LAG (bonding,
teaming), duplicate the TC flow offloading into each of the per-uplink
e-switches.

Duplication is not required if the source is the bond device, since in
this case it is assumed that the bond and the uplink netdevs share the
same TC block, and thus duplication will occur naturally by the stack.

Note that under encapsulation scheme, both flows will use the same
neighbour and hence both will contribute to the last-used feedback
towards the stack.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou
7ba58ba7ba net/mlx5e: Offload TC e-switch rules with egress LAG device
When parsing TC FDB actions, if the egress device is a bond/team
net-device which enslaved the uplink representor of the e-switch,
use the uplink representor as the destination in the HW rule.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou
491c37e49b net/mlx5e: In case of LAG, one switch parent id is used for all representors
When the uplink representors are put into lag, set all the
representors (VFs and uplinks) of the same NIC to return the same
switchdev id.

Currently, the route lookup code on the encapsulation offload path
assumes that same switchdev id for the source and dest devices means
that the dest is also mlx5 HW netdev. This doesn't hold anymore when we
align the switchdev Id of the uplinks to be same, which in turn causes
the bond/team to return that id to the caller. As such, enhance the
relevant check to take into account the uplink lag case.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Shahar Klein
f9392795e2 net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules
Assign a counter dev attribute according to device capability and use
it for management of counters related to offloaded eswitch TC flows.

With upcoming support for uplink LAG, we have two HW rules per one
logical SW (TC) rule. Although the HW supports attaching one counter
to multiple rules, we are allocating counter per HW rule.

We need this separation for two reasons:

1. "flow eswitch" counter affinity HW require the counter to be
allocated on the device where the eswitch rule is set.

2. for some use-cases (multi-path routing) each HW flow relates to
different neighbour, hence our neigh update logic must have a per-rule
HW accountant in order to provide the proper feedback to the kernel.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan
04de7dda73 net/mlx5e: Infrastructure for duplicated offloading of TC flows
Under uplink LAG or multipath schemes, traffic that matches one flow
might arrive on both uplink ports and transmitted through both
as part of supporting aggregation and high-availability.

To cope with the fact that the SW model might use logical SW port
(e.g uplink team or bond) but we have two HW ports with e-switch on
each, there are cases where in order to offload a SW TC rule we
need to duplicate it to two HW flows.

Since each HW rule has its own counter we also aggregate the counter
of both rules when a flow stats query is executed from user-space.

Introduce the changes for the different elements (add/delete/stats),
currently nothing is duplicated.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan
ac004b8321 net/mlx5e: E-Switch, Add peer miss rules
In the sriov offloads mode, packets that are not matched by any
other rule are sent towards the e-switch vport manager for further
processing.

Under upcoming patches (e.g for uplink LAG), packets sent from VF
vports belonging to esw0 (e-switch related to PF0) might end up in
esw1 (e-switch related to PF1) due to muxing logic applied by the
FW.

In such a case we still want the missed packet to be sent to the
"base" esw manager vport in order to present the control plane a
consistent view of the source (VF reresentor) port.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Aviv Heller
fadd59fc50 net/mlx5: Introduce inter-device communication mechanism
This introduces devcom, a generic mechanism for performing operations
on both physical functions of the same Connect-X card.

The first user of this API is merged eswitch, which will be introduced
in subsequent patches.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Arnd Bergmann
c2c79a32fb hamradio, ppp: change semaphore to completion
ppp and hamradio have copies of the same code that uses a semaphore
in place of a completion for historic reasons. Make it use the
proper interface instead in all copies.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:27:10 -08:00
Arnd Bergmann
2aa55dccf8 hns3: prevent building without CONFIG_INET
We now get a link failure when CONFIG_INET is disabled, since
tcp_gro_complete is unavailable:

drivers/net/ethernet/hisilicon/hns3/hns3_enet.o: In function `hns3_set_gro_param':
hns3_enet.c:(.text+0x230c): undefined reference to `tcp_gro_complete'

Add an explicit CONFIG_INET dependency here to avoid the broken
configuration.

Fixes: a6d53b97a2 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:17:50 -08:00
Saeed Mahameed
64e4cf0dab Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 11:15:25 -08:00
Shahar Klein
4c283e6155 net/mlx5: Fold the modify lag code into function
Handle the code of modifying the lag affinity within a separate function.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan
3cfe432e1b net/mlx5: Add lag affinity info to log
Report the initial LAG port affinity upon LAG creation.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan
8252cf728c net/mlx5: Split the activate lag function into two routines
Split the activate lag function in order to be symmetric with
the deactivate lag function.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Shahar Klein
8bb957d255 net/mlx5: E-Switch, Introduce flow counter affinity
This dictates the device affinity for eswitch flow counters, set by the FW
according to the HW device capabilities.

Under "source eswitch" affinity, the counter should be allocated on the
device related to the source vport in the match. This covers both non
merged e-switch mode as well as old FW that does not advertise this cap.

Under "flow eswitch" affinity, the counter should be allocated on the
device where the eswitch rule is set.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Mark Bloch
06cc74af05 IB/mlx5: Unify e-switch representors load approach between uplink and VFs
When in switchdev mode and the add function is called by the core
level driver, make sure we only register the callbacks, but don't
create the mlx5 IB device or initialize anything. With this change
all the IB devices in switchdev mode are created only once the load
callback is invoked by the e-switch core sub-module. This follows
the design paradigm under which the all the Eth representors must
be loaded before any of IB reprs is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Saeed Mahameed
4c8b85187c net/mlx5: Use lowercase 'X' for hex values
Apparently gcc is cool with upper case '0X' but it is not commonly used.
Replace '0X' with lowercase '0x' in mlx5_ifc.h file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
David S. Miller
522185d5cb Merge branch 'Introduce-NETDEV_PRE_CHANGEADDR'
Petr Machata says:

====================
Introduce NETDEV_PRE_CHANGEADDR

Spectrum devices have a limitation that all router interfaces need to
have the same address prefix. In Spectrum-1, the requirement is for the
initial 38 bits of all RIFs to be the same, in Spectrum-2 the limit is
36 bits. Currently violations of this requirement are not diagnosed. At
the same time, if the condition is not upheld, the mismatched MAC
address ends up overwriting the common prefix, and all RIF MAC addresses
silently change to the new prefix.

It is therefore desirable to be able at least to diagnose the issue, and
better to reject attempts to change MAC addresses in ways that is
incompatible with the device.

Currently MAC address changes are notified through emission of
NETDEV_CHANGEADDR, which is done after the change. Extending this
message to allow vetoing is certainly possible, but several other
notification types have instead adopted a simple two-stage approach:
first a "pre" notification is sent to make sure all interested parties
are OK with the change that's about to be done. Then the change is done,
and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore this patchset introduces it for NETDEV_CHANGEADDR as well.

One prominent path to emitting NETDEV_CHANGEADDR is through
dev_set_mac_address(). Therefore in patch #1, give this function an
extack argument, so that a textual reason for rejection (or a warning)
can be communicated back to the user.

In patch #2, add the new notification type. In patch #3, have dev.c emit
the notification for instances of dev_addr change, or addition of an
address to dev_addrs list.

In patches #4 and #5, extend the bridge driver to handle and emit the
new notifier.

In patch #6, change IPVLAN to emit the new notifier.

Likewise for bonding driver in patches #7 and #8. Note that the team
driver doesn't need this treatment, as it goes through
dev_set_mac_address().

In patches #9, #10 and #11 adapt mlxsw to veto MAC addresses on router
interfaces, if they violate the requirement that all RIF MAC addresses
have the same prefix.

Finally in patches #12 and #13, add a test for vetoing of a direct
change of a port device MAC, and indirect change of a bridge MAC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata
9651ee10ce selftests: mlxsw: Test FID RIF MAC vetoing
When a FID RIF is created for a bridge with IP address, its MAC address
must obey the same requirements as other RIFs. Test that attempts to
change the address incompatibly by attaching a device are vetoed with
extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00