Commit Graph

738271 Commits

Author SHA1 Message Date
Karsten Graul
be6d467b99 net/smc: remove unused fields from smc structures
The daddr field holds the destination IPv4 address. The field was set but
never used and can be removed. The addr field was a left-over from an
earlier version of non-blocking connects and can be removed.
The result of the call to kernel_getpeername is not used, the call can be
removed. Non-blocking connects are working, so remove restriction comment.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Karsten Graul
696cd30169 net/smc: move netinfo function to file smc_clc.c
The function smc_netinfo_by_tcpsk() belongs to CLC handling.
Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
Stefan Raspl
0f6271264a net/smc: cleanup smc_llc.h and smc_clc.h headers
Remove structures used internal only from headers.
And remove an extra function parameter.

Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:21:31 -05:00
David S. Miller
3c5aa0bc9c Merge branch 'ipv4-ipv6-mcast-align'
Yuval Mintz says:

====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6

Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.

This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.

The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.

The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.

This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.

Changes from previous versions
------------------------------
v2:
  - #6 Corrected reporting logic when hitting an unresolved cache
  - #7 Addressed kernel doc style [Thanks Nikolay]

RFC -> v1:
  - Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
  - Addressed a couple of kbuild test robot issues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:24 -05:00
Yuval Mintz
7b0db85737 ipmr, ip6mr: Unite dumproute flows
The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.

Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
889cd83cbe ip6mr: Remove MFC_NOTIFY and refactor flags
MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit
06bd6c0370 ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
3feda6b46f ipmr, ip6mr: Unite vif seq functions
Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
c8d6196803 ipmr, ip6mr: Unite mfc seq logic
With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
845c9a7ae7 ipmr, ip6mr: Unite logic for searching in MFC cache
ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.

In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
494fff5637 ipmr, ip6mr: Make mfc_cache a common structure
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic  - mr_mfc
and convert both ipmr and ip6mr into using it.

For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
0bbbf0e7d0 ipmr, ip6mr: Unite creation of new mr_table
Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
b70432f731 mroute*: Make mr_table a common struct
Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
87c418bf13 ip6mr: Align hash implementation to ipmr
Since commit 8fb472c09b ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.

Align ip6mr to the current ipmr implementation.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
8571ab479a ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Yuval Mintz
6853f21f76 ipmr,ipmr6: Define a uniform vif_device
The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.

Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
David S. Miller
a25724b05a Merge branch 'fib_rules-support-sport-dport-and-proto-match'
Roopa Prabhu says:

====================
fib_rules: support sport, dport and proto match

This series extends fib rule match support to include sport, dport
and ip proto match (to complete the 5-tuple match support).
Common use-cases of Policy based routing in the data center require
5-tuple match. The last 2 patches in the series add a call to flow dissect
in the fwd path if required by the installed fib rules (controlled by a flag).

v1:
  - Fix errors reported by kbuild and feedback on RFC
  - extend port match uapi to accomodate port ranges

v2:
  - address comments from Nikolay, David Ahern and Paolo (Thanks!)

Pending things I will submit separate patches for:
  - extack for fib rules
  - fib rules test (as requested by david ahern)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:45:05 -05:00
Roopa Prabhu
5e5d6fed37 ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:44 -05:00
Roopa Prabhu
e37b1e978b ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:44 -05:00
Roopa Prabhu
bb0ad1987e ipv6: fib6_rules: support for match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:43 -05:00
Roopa Prabhu
4a2d73a4fb ipv4: fib_rules: support match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:43 -05:00
Roopa Prabhu
bfff486265 net: fib_rules: support for match on ip_proto, sport and dport
uapi for ip_proto, sport and dport range match
in fib rules.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 22:44:43 -05:00
Heiner Kallweit
2927499157 r8169: fix interrupt number after adding support for MSI-X interrupts
In case of MSI-X the interrupt number may differ from pcidev->irq.
Fix this by using pci_irq_vector().

Fixes: 6c6aa15fde ("r8169: improve interrupt handling")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 14:46:44 -05:00
David S. Miller
a6a8f0196d Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-02-28

This series contains updates to fm10k only.

Jake provides all the changes in this series, starting with making the
function header comments consistent and to align with how the kernel
documentation expects it.  Also cleaned up code comment as well as bump
the driver version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:34:20 -05:00
David S. Miller
540749293a Merge branch 'selftests-forwarding-Add-VRF-based-tests'
Ido Schimmel says:

====================
selftests: forwarding: Add VRF-based tests

One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.

Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.

However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:

                             br0
                              +
               vrf-h1         |           vrf-h2
                 +        +---+----+        +
                 |        |        |        |
    192.0.2.1/24 +        +        +        + 192.0.2.2/24
               swp1     swp2     swp3     swp4
                 +        +        +        +
                 |        |        |        |
                 +--------+        +--------+

The VRFs act as lightweight namespaces representing hosts connected to
the switch.

This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:

1. Only the device under test (DUT) is being tested without noise from
other system.

2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.

These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.

v2:
* Order local variables declaration according to function arguments
  order (Petr)

v1:
* Change location to net/forwarding instead of forwarding/
* Add ability to pause on failure
* Add ability to pause on cleanup
* Make configuration file optional
* Make ping/ping6/mz configurable
* Add more tc tests
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:49 -05:00
Jiri Pirko
4908e24b81 selftests: forwarding: Introduce basic shared blocks tests
Test shared block infrastructure. This is a basic test that shares TC
block in between 2 clsact qdiscs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:49 -05:00
Jiri Pirko
b13f245e84 selftests: forwarding: Introduce basic tc chains tests
Tests chains matching and goto chain action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Jiri Pirko
bc13af291e selftests: forwarding: Introduce tc actions tests
Add first part of actions tests. This patch only contains tests of gact
ok/drop/trap and mirred redirect egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Jiri Pirko
07e5c75184 selftests: forwarding: Introduce tc flower matching tests
Add first part of flower tests. This patch only contains dst/src ip/mac
matching.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Jiri Pirko
781fe631fa selftests: forwarding: Allow to get netdev interfaces names from commandline
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Jiri Pirko
4e4272d2a6 selftests: forwarding: Add MAC get helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Jiri Pirko
2f19f2125d selftests: forwarding: Add tc offload check helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:48 -05:00
Ido Schimmel
4fb20ae137 selftests: forwarding: Test IPv6 weighted nexthops
Have one host generate 16K IPv6 echo requests with a random flow label
and check that they are distributed between both multipath links
according to the provided weights.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
3d578d8795 selftests: forwarding: Test IPv4 weighted nexthops
Use different weights for the multipath route configured on the first
router and check that the different flows generated by the first host
are distributed according to the provided weights.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
937eeb3482 selftests: forwarding: Create test topology for multipath routing
Create a topology with two hosts, each directly connected to a different
router. Both routers are connected using two links, enabling multipath
routing.

Test IPv4 and IPv6 ping using default MTU and large MTU.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
7b7bc87555 selftests: forwarding: Add a test for basic IPv4 and IPv6 routing
Configure two hosts which are directly connected to the same router and
test IPv4 and IPv6 ping. Use a large MTU and check that ping is
unaffected.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
236dd50bf6 selftests: forwarding: Add a test for flooded traffic
Add test cases for unknown unicast and unregistered multicast flooding.

For each traffic type, turn off flooding on one bridged port and inject
a packet of the specified type through the second bridged port. Make
sure the packet was not received by checking the ACL counters on the
other end. Later, turn on flooding and make sure the packet was
received.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
d4deb01467 selftests: forwarding: Add a test for FDB learning
Send a packet with a specific destination MAC, make sure it was learned
on the ingress port and then aged-out.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Ido Schimmel
73bae6736b selftests: forwarding: Add initial testing framework
Add initial framework to test packet forwarding functionality. The tests
can run on actual devices using loop-backed cables or using veth pairs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:25:47 -05:00
Paolo Abeni
8230819494 ipvlan: use per device spinlock to protect addrs list updates
This changeset moves ipvlan address under RCU protection, using
a per ipvlan device spinlock to protect list mutation and RCU
read access to protect list traversal.

Also explicitly use RCU read lock to traverse the per port
ipvlans list, so that we can now perform a full address lookup
without asserting the RTNL lock.

Overall this allows the ipvlan driver to check fully for duplicate
addresses - before this commit ipv6 addresses assigned by autoconf
via prefix delegation where accepted without any check - and avoid
the following rntl assertion failure still in the same code path:

 RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124)
 WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan]
 Modules linked in: ipvlan(E) ixgbe
 CPU: 15 PID: 0 Comm: swapper/15 Tainted: G            E    4.16.0-rc2.ipvlan+ #1782
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan]
 RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300
 RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00
 R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001
 FS:  0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  ipvlan_addr6_event+0x6c/0xd0 [ipvlan]
  notifier_call_chain+0x49/0x90
  atomic_notifier_call_chain+0x6a/0x100
  ipv6_add_addr+0x5f9/0x720
  addrconf_prefix_rcv_add_addr+0x244/0x3c0
  addrconf_prefix_rcv+0x2f3/0x790
  ndisc_router_discovery+0x633/0xb70
  ndisc_rcv+0x155/0x180
  icmpv6_rcv+0x4ac/0x5f0
  ip6_input_finish+0x138/0x6a0
  ip6_input+0x41/0x1f0
  ipv6_rcv+0x4db/0x8d0
  __netif_receive_skb_core+0x3d5/0xe40
  netif_receive_skb_internal+0x89/0x370
  napi_gro_receive+0x14f/0x1e0
  ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe]
  ixgbe_poll+0x31a/0x7a0 [ixgbe]
  net_rx_action+0x296/0x4f0
  __do_softirq+0xcf/0x4f5
  irq_exit+0xf5/0x110
  do_IRQ+0x62/0x110
  common_interrupt+0x91/0x91
  </IRQ>

 v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()

Fixes: e9997c2938 ("ipvlan: fix check for IP addresses in control path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:20:13 -05:00
Paolo Abeni
cccc200fca ipvlan: egress mcast packets are not exceptional
Currently, if IPv6 is enabled on top of an ipvlan device in l3
mode, the following warning message:

 Dropped {multi|broad}cast of type= [86dd]

is emitted every time that a RS is generated and dmseg is soon
filled with irrelevant messages. Replace pr_warn with pr_debug,
to preserve debuggability, without scaring the sysadmin.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:18:51 -05:00
David S. Miller
30bbb983d8 Merge branch 'mlxsw-mq-red-offload'
Jiri Pirko says:

====================
mlxsw: Offload multi-queue RED support

Nogah says:

Support a two level hierarchy of offloaded qdiscs in mlxsw, with sch_prio
being the root qdisc and sch_red as the children.

                +----------+
                | sch_prio |
                +----+-----+
                     |
                     |
    +----------------------------------+
    |                |                 |
    |                |                 |
    |                |                 |
+---v---+       +----v---+       +-----v--+
|sch_red|       |sch_red |       |sch_red |
+-------+       +--------+       +--------+

When setting sch_prio as the root qdisc on a physical port, mlxsw will
offload it. When adding it with sch_red as a child qdisc, it will offload
it as well.
Relocating child qdisc or connecting them to more then one child will
result in unoffloading them. Relocating child qdisc more then once is
highly unrecommended and might cause a miss match between the kernel
configuration and the offloaded one. The offloaded configuration will be
aligned with the one shown in the show command.
Changing the priomap parameter of sch_prio might cause a band that its
configuration was changed and it has offloaded sch_red set on it, to lose
some stats data as if sch_red was unoffloaded and offloaded again. However,
it won't affect the data on this band that will have sch_red continuously.

Patch 1 adds support for setting RED as the child of root qdisc.
Patches 2-4 add support for RED bstasts for offloaded child qdiscs.
Patches 5-6 handle backlog related changes for offloaded child qdiscs.
Patches 7-8 update PRIO in mlxsw to be able to have RED as child on its
bands.
Patch 9 adds offload handles for PRIO graft operations. In mlxsw it will
cause the driver to stop offloading the child in question.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:02 -05:00
Nogah Frankel
32dc5efc6c mlxsw: spectrum: qdiscs: prio: Handle graft command
Handle graft command for an offloaded sch_prio.
Grafting a qdisc to any place other than under its original parent is not
supported by mlxsw and will cause the grafted qdisc to stop being
offloaded.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
b9c7a7acc7 net: sch: prio: Add offload ability for grafting a child
Offload sch_prio graft command for capable drivers.
Warn in case of a failure, unless the graft was done as part of a destroy
operation (the new qdisc is a noop) or if all the qdiscs (the parent, the
old child, and the new one) are not offloaded.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
98ceb7b6d6 mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bands
When the number the bands of sch_prio is decreased, child qdiscs on the
deleted bands would get deleted as well.
This change and deletions are being done under sch_tree_lock of the
sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if
it is offloaded. Un-offloading can't be done inside this lock.
Move the offload command to be done before reducing the number of bands,
so unoffloading of the qdiscs that are about to be deleted could be done
outside of the lock.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
23f2b4048c mlxsw: spectrum: Update sch_prio stats to include sch_red related drops
sch_prio as root qdisc should count all the drops its children have. Since
it is possible for it to have sch_red children, it needs to count RED early
drops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
fd5ac14a1a net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscs
Offloaded qdiscs are allowed to expose only parts of their statistics.
It means that if backlog is being exposed and qlen is not, it might trigger
a warning in qdisc_tree_reduce_backlog.
Do not warn in case the qdisc that was removed was an offloaded one.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
cc6e5c13af mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscs
When removing a child qdisc its backlog will be decreased from the parent
backlog. The driver backlog count should do the same.
When the parent changes its configuration, the child might need to clean
its stats. However, the backlog can't be cleaned with the rest of the
stats, because it reflects a momentary value that needs to be synced with
the core, not the history of the qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
04cc0bf5d6 mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomap
Priority counters count packets according to their packet priority.
Collect the stats for sch_red based on these counters, so the qdisc bstats
will be the sum of counters matching the priorities marked in the qdisc
priomap.
Changing the mapping of the priorities to bands while traffic is running
can result in losing the stats of the bands qdiscs from their last dump
call to this change, as if the qdisc was unoffloaded and re-offloaded. It
will not affect the traffic behaviour according to sch_red.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
1631ab2e8d mlxsw: spectrum: qdiscs: Add priority map per qdisc
Add priority map per qdisc, to indicate which priorities are being
directed through this qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
2f88047ec4 mlxsw: spectrum: Add priority counters
Add TX packets and bytes counters per switch priority per port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00