Commit Graph

16221 Commits

Author SHA1 Message Date
Joe Perches
dedecb6d42 i40evf: Move some i40evf_reset_task code to separate function
The i40evf_reset_task function is a couple hundred lines and it has
a separable block that disables VF.  Move that block to a new
i40evf_disable_vf function to shorten i40evf_reset_task a bit.

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 23:08:48 -08:00
Tushar Dave
2f7679ee2e i40e: fix panic on SPARC while changing num of desc
On SPARC, writel() should not be used to write directly to memory
address but only to memory mapped I/O address otherwise it causes
data access exception.

Commit 147e81ec75 ("i40e: Test memory before ethtool alloc
succeeds") introduced a code that uses memory address to fake the HW
tail address and attempt to write to that address using writel()
causes kernel panic on SPARC. The issue is reproduced while changing
number of descriptors using ethtool.

This change resolves the panic by using HW read-only memory mapped
I/O register to fake HW tail address instead memory address.

e.g.
> ethtool -G eth2 tx 2048 rx 2048
i40e 0000:03:00.2 eth2: Changing Tx descriptor count from 512 to 2048.
i40e 0000:03:00.2 eth2: Changing Rx descriptor count from 512 to 2048
sun4v_data_access_exception: ADDR[fff8001f9734a000] CTX[0000]
TYPE[0004], going.
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
ethtool(3273): Dax [#1]
CPU: 9 PID: 3273 Comm: ethtool Tainted: G            E
4.8.0-linux-net_temp+ #7
task: fff8001f96d7a660 task.stack: fff8001f97348000
TSTATE: 0000009911001601 TPC: 00000000103189e4 TNPC: 00000000103189e8 Y:
00000000    Tainted: G            E
TPC: <i40e_alloc_rx_buffers+0x124/0x260 [i40e]>
g0: fff8001f4eb64000 g1: 00000000000007ff g2: fff8001f9734b92c g3:
00203e0000000000
g4: fff8001f96d7a660 g5: fff8001fa6704000 g6: fff8001f97348000 g7:
0000000000000001
o0: 0006000046706928 o1: 00000000db3e2000 o2: fff8001f00000000 o3:
0000000000002000
o4: 0000000000002000 o5: 0000000000000001 sp: fff8001f9734afc1 ret_pc:
0000000010318a64
RPC: <i40e_alloc_rx_buffers+0x1a4/0x260 [i40e]>
l0: fff8001f4e8bffe0 l1: fff8001f4e8cffe0 l2: 00000000000007ff l3:
00000000ff000000
l4: 0000000000ff0000 l5: 000000000000ff00 l6: 0000000000cda6a8 l7:
0000000000e822f0
i0: fff8001f96380000 i1: 0000000000000000 i2: 00203edb00000000 i3:
0006000046706928
i4: 0000000002086320 i5: 0000000000e82370 i6: fff8001f9734b071 i7:
00000000103062d4
I7: <i40e_set_ringparam+0x3b4/0x540 [i40e]>
Call Trace:
 [00000000103062d4] i40e_set_ringparam+0x3b4/0x540 [i40e]
 [000000000094e2f8] dev_ethtool+0x898/0xbe0
 [0000000000965570] dev_ioctl+0x250/0x300
 [0000000000923800] sock_do_ioctl+0x40/0x60
 [000000000092427c] sock_ioctl+0x7c/0x280
 [00000000005ef040] vfs_ioctl+0x20/0x60
 [00000000005ef5d4] do_vfs_ioctl+0x194/0x4c0
 [00000000005ef974] SyS_ioctl+0x74/0xa0
 [0000000000406214] linux_sparc_syscall+0x34/0x44
Disabling lock debugging due to kernel taint
Caller[00000000103062d4]: i40e_set_ringparam+0x3b4/0x540 [i40e]
Caller[000000000094e2f8]: dev_ethtool+0x898/0xbe0
Caller[0000000000965570]: dev_ioctl+0x250/0x300
Caller[0000000000923800]: sock_do_ioctl+0x40/0x60
Caller[000000000092427c]: sock_ioctl+0x7c/0x280
Caller[00000000005ef040]: vfs_ioctl+0x20/0x60
Caller[00000000005ef5d4]: do_vfs_ioctl+0x194/0x4c0
Caller[00000000005ef974]: SyS_ioctl+0x74/0xa0
Caller[0000000000406214]: linux_sparc_syscall+0x34/0x44
Caller[0000000000107154]: 0x107154
Instruction DUMP: e43620c8
 e436204a  c45e2038
<c2a083a0> 82102000
 81cfe008  90086001
 82102000  81cfe008

Kernel panic - not syncing: Fatal exception

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 23:06:40 -08:00
Piotr Raczynski
64f5ead95a i40e: Add protocols over MCTP to i40e_aq_discover_capabilities
Add logical_id to I40E_AQ_CAP_ID_MNG_MODE capability starting from major
version 2.

Change-ID: Idb29214b172ea5c70cbd45a99e6745c0215af7e4
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:59:04 -08:00
Jacob Keller
0b7c8b5d54 i40e: fix trivial typo in naming of i40e_sync_filters_subtask
A comment incorrectly referred to i40e_vsi_sync_filters_subtask which
does not actually exist. Reference the correct function instead.

Change-ID: I6bd805c605741ffb6fe34377259bb0d597edfafd
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:56:29 -08:00
Michal Kosiarz
91dc1e5d3d i40e: Add Clause22 implementation
Some external PHYs require Clause22 method for accessing registers.
This patch also adds some defines to support blink led on devices using
10CBaseT PHY.

Change-ID: I868a4326911900f6c89e7e522fda4968b0825f14
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:49:39 -08:00
Jacob Keller
d182a5ca1f i40e: avoid duplicate private flags definitions
Separate the global private flags and the regular private flags per
interface into two arrays. Future additions of private flags will not
need to be duplicated which may lead to buggy code. Also rename
"i40e_priv_flags_strings_gl" to "i40e_gl_priv_flags_strings" for
clarity, as it reads more naturally.

Change-ID: I68caef3c9954eb7da342d7f9d20f2873186f2758
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:40:58 -08:00
Jacob Keller
6a112785fd i40e: remove second check of VLAN_N_VID in i40e_vlan_rx_add_vid
Replace a check of magic number 4095 with VLAN_N_VID. This
makes it obvious that a later check against VLAN_N_VID is
always true and can be removed.

Change-ID: I28998f127a61a529480ce63d8a07e266f6c63b7b
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:38:47 -08:00
Jacob Keller
7429c0bd01 i40e: remove error_param_int label from i40e_vc_config_promiscuous_mode_msg
This label is unnecessary, as are jumping to a block that checks aq_ret
and then immediately skipping it and returning. So just jump straight to
the error_param and remove this unnecessary label.

Also use goto error_param even in the last check for style consistency.

Change-ID: If487c7d10c4048e37c594e5eca167693aaed45f6
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:30:44 -08:00
Alexander Duyck
06fc016c43 i40evf: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload.  Previously we were performing no checks.  This
should bring us up to parity with the i40e PF driver.

In addition the device only supports GSO as long as the MSS is 64 or
greater.  We were not checking this so an MSS less than that was resulting
in Tx hangs.

Change-ID: If533553ec92fc6ba694eab6ac81fdaf3004f3592
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-02 22:20:48 -08:00
Alexander Duyck
f114dca253 i40e: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload.  Previously we were just checking for the L4 tunnel
header length, however there are other fields we should be verifying as
there are multiple scenarios in which we cannot perform hardware offloads.

In addition the device only supports GSO as long as the MSS is 64 or
greater.  We were not checking this so an MSS less than that was resulting
in Tx hangs.

Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2016-12-02 22:19:03 -08:00
Marcin Wojtas
2636ac3cc2 net: mvneta: Add network support for Armada 3700 SoC
Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:

* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
  configuration for network controller has to be done on two levels:
  global and per-port. The first one is inherited from the
  bootloader. The latter can be opened in a default way, leaving
  arbitration to the bus controller.  Hence filled mbus_dram_target_info
  structure is not needed

* make per-CPU operation optional - Recent patches adding RSS and XPS
  support for Armada 38x/XP enabled per-CPU operation of the controller
  by default. Contrary to older SoC's Armada 3700 SoC's network
  controller is not capable of per-CPU processing due to interrupt lines'
  connectivity.  This patch restores non-per-CPU operation, which is now
  optional and depends on neta_armada3700 flag value in mvneta_port
  structure. In order not to complicate the code, separate interrupt
  subroutine is implemented.

For now, on the Armada 3700, RSS is disabled as the current
implementation depend on the per cpu interrupts.

[gregory.clement@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:01 -05:00
Gregory CLEMENT
f34dacccb4 net: mvneta: Only disable mvneta_bm for 64-bits
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:01 -05:00
Marcin Wojtas
8d5047cf9c net: mvneta: Convert to be 64 bits compatible
Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.

[gregory.clement@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.

Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT
f88bee1c4b net: mvneta: Use cacheable memory to store the rx buffer virtual address
Until now the virtual address of the received buffer were stored in the
cookie field of the rx descriptor. However, this field is 32-bits only
which prevents to use the driver on a 64-bits architecture.

With this patch the virtual address is stored in an array not shared with
the hardware (no more need to use the DMA API). Thanks to this, it is
possible to use cache contrary to the access of the rx descriptor member.

The change is done in the swbm path only because the hwbm uses the cookie
field, this also means that currently the hwbm is not usable in 64-bits.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Jisheng Zhang <jszhang@marvell.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT
e9f6499965 net: mvneta: Do not allocate buffer in rxq init with HWBM
For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime
they are put into descriptors by hardware. There is no need to fill them
at this point.

Suggested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Gregory CLEMENT
ac83b7ddf2 net: mvneta: Optimize rx path for small frame
For small frame reuse the phys_addr variable instead of accessing the
uncacheable value in the rx descriptor.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:52:00 -05:00
Eric Dumazet
7f7bf1606f mlx4: fix use-after-free in mlx4_en_fold_software_stats()
My recent commit to get more precise rx/tx counters in ndo_get_stats64()
can lead to crashes at device dismantle, as Jesper found out.

We must prevent mlx4_en_fold_software_stats() trying to access
tx/rx rings if they are deleted.

Fix this by adding a test against priv->port_up in
mlx4_en_fold_software_stats()

Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
allows us to eventually broadcast the latest/current counters to
rtnetlink monitors.

Fixes: 40931b8511 ("mlx4: give precise rx/tx bytes/packets counters")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:33:32 -05:00
Sunil Goutham
bd3ad7d3a1 net: thunderx: Fix transmit queue timeout issue
Transmit queue timeout issue is seen in two cases
- Due to a race condition btw setting stop_queue at xmit()
  and checking for stopped_queue in NAPI poll routine, at times
  transmission from a SQ comes to a halt. This is fixed
  by using barriers and also added a check for SQ free descriptors,
  incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
- Contrary to an assumption, a HW errata where HW doesn't stop transmission
  even though there are not enough CQEs available for a CQE_TX is
  not fixed in T88 pass 2.x. This results in a Qset error with
  'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
  RXQ's  RED levels for CQ level such that there is always enough
  space left for CQE_TXs.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:32:59 -05:00
Hadar Hen Zion
ebe06875ff net/mlx5e: Support adding ingress tc rule when egress device flag is set
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.

In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.

Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:38 -05:00
Hadar Hen Zion
726293f1f8 net/mlx5e: Save the represntor netdevice as part of the representor
Replace the representor private data to a net_device pointer holding the
representor netdevice, instead of void pointer holding mlx5e_priv.

It will be used by a new eswitch service function, returning the uplink representor
netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Hadar Hen Zion
718f13e72b net/mlx5e: Bring back representor's ndos that were accidentally removed
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.

Fixes: 370bad0f9a ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 13:28:37 -05:00
Yuval Mintz
1d6cff4fca qed: Add iSCSI out of order packet handling.
This patch adds out of order packet handling for hardware offloaded
iSCSI. Out of order packet handling requires driver buffer allocation
and assistance.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:38 -05:00
Yuval Mintz
fc831825f9 qed: Add support for hardware offloaded iSCSI.
This adds the backbone required for the various HW initalizations
which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
4xxxx line of adapters - FW notification, resource initializations, etc.

Signed-off-by: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:44:37 -05:00
Rasmus Villemoes
b14945ac3e net: atarilance: use %8ph for printing hex string
This is already using the %pM printf extension; might as well also use
%ph to make the code smaller.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:03:35 -05:00
Arnd Bergmann
d709b2a186 net/mlx5e: skip loopback selftest with !CONFIG_INET
When CONFIG_INET is disabled, the new selftest results in a link
error:

drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback':
en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check'
en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum'

This hides the specific test in that configuration.

Fixes: 0952da791c ("net/mlx5e: Add support for loopback selftest")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 11:55:57 -05:00
Daniel Borkmann
366cbf2f46 bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller
After 326fe02d1e ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers
need to hold rcu_read_lock() already to make sure BPF program doesn't
get released in the background.

Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading.
Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping
in XDP supported drivers and to keep the typecheck on the context intact.
For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can
just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock()
out of the helper. When the driver gets atomic replace support, this will
move to call-sites eventually.

mlx5 needs actual fixing as it has the same issue as described already in
326fe02d1e ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
that is, we're under RCU bh at this time, BPF programs are released via
call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark
read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue
reset.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 11:06:24 -05:00
Roi Dayan
5067b60207 net/mlx5e: Remove flow encap entry in the correct place
Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.

Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:03 -05:00
Roi Dayan
961e8979ec net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance
Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:02 -05:00
Roi Dayan
86a33ae1ca net/mlx5e: Correct cleanup order when deleting offloaded TC rules
According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.

Fixes: 8b32580df1 ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:02 -05:00
Roi Dayan
53636068d8 net/mlx5e: Remove redundant hashtable lookup in configure flower
We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.

Fixes: e3a2b7ed01 ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:02 -05:00
Tariq Toukan
ec8b9981ad net/mlx5e: Create UMR MKey per RQ
In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.

Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:02 -05:00
Tariq Toukan
3608ae77c0 net/mlx5e: Move function mlx5e_create_umr_mkey
In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:01 -05:00
Tariq Toukan
1c1b522808 net/mlx5e: Implement Fragmented Work Queue (WQ)
Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).

This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:47:01 -05:00
Neill Whillans
3b80456433 net: ethernet: altera_tse: add support for SGMII PCS
Add support for the (optional) SGMII PCS functionality of the Altera
TSE MAC. If the phy-mode is set to 'sgmii' then we attempt to discover
and initialise the PCS so that the MAC can communicate to the PHY.

The PCS IP block provides a scratch register for testing presence of
the PCS, which is mapped into one of the two MDIO spaces present in
the MAC's register space.  Once we have determined that the scratch
register is functioning, we attempt to initialise the PCS to
auto-negotiate an SGMII link with the PHY. There is no need to monitor
or manage the SGMII link beyond this, since the normal PHY MDIO will
then be used to monitor the media layer.

The Altera TSE MAC has only one way in which it can be configured with an
SGMII PCS, and as such, this patch only looks to the phy-mode to select
whether or not to attempt to initialise the PCS registers.  During
initialisation, we report the PCS's equivalent of a PHY ID register.
This can be parameterised during the IP instantiation and is often left
as '0x00000000' which is not an error.

Signed-off-by: Neill Whillans <neill.whillans@codethink.co.uk>
Reviewed-by: Daniel Silverstone <daniel.silverstone@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 10:36:47 -05:00
Edward Cree
99831b1ea5 sfc: remove RESET_TYPE_RX_RECOVERY
It's no longer used now that Falcon is gone.

Also remove a reference in a comment to an ioctl that doesn't exist.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01 15:39:38 -05:00
Edward Cree
d7d6cabaa1 sfc: don't select SFC_FALCON
Easy enough for Falcon users to enable it when making oldconfig.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01 15:39:38 -05:00
Edward Cree
edd96fa0de sfc: fix debug message format string in efx_farch_handle_rx_not_ok
Defalconisation removed one of the string arguments, but missed the
 corresponding %s.

Fixes: 5a6681e22c ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-01 15:39:37 -05:00
Souptick Joarder
fec668d36d ethernet :mellanox :mlx5: Replace pci_pool_alloc by pci_pool_zalloc
In alloc_cmd_box(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:56:37 -05:00
Souptick Joarder
77d1337bf6 ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:56:36 -05:00
Ivan Khoronzhuk
8feb0a1965 net: ethernet: ti: cpsw: split tx budget according between channels
Split device budget between channels according to channel rate.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:37:14 -05:00
Ivan Khoronzhuk
342934a558 net: ethernet: ti: cpsw: optimize end of poll cycle
Check budget fullness only after it's updated and update
channel mask only once to keep budget balance between channels.
It's also needed for farther changes.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:37:14 -05:00
Ivan Khoronzhuk
83fcad0c98 net: ethernet: ti: cpsw: add .ndo to set per-queue rate
This patch allows to rate limit queues tx queues for cpsw interface.
The rate is set in absolute Mb/s units and cannot be more a speed
an interface is connected with.

The rate for a tx queue can be tested with:

ethtool -L eth0 rx 4 tx 4

echo 100 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
echo 200 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
echo 50 > /sys/class/net/eth0/queues/tx-2/tx_maxrate
echo 30 > /sys/class/net/eth0/queues/tx-3/tx_maxrate

tc qdisc add dev eth0 root handle 1: multiq

tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5001 0xffff action skbedit queue_mapping 0

tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5002 0xffff action skbedit queue_mapping 1

tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5003 0xffff action skbedit queue_mapping 2

tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5004 0xffff action skbedit queue_mapping 3

iperf -c 192.168.2.1 -b 110M -p 5001 -f m -t 60
iperf -c 192.168.2.1 -b 215M -p 5002 -f m -t 60
iperf -c 192.168.2.1 -b 55M -p 5003 -f m -t 60
iperf -c 192.168.2.1 -b 32M -p 5004 -f m -t 60

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:37:14 -05:00
Ivan Khoronzhuk
8f32b90981 net: ethernet: ti: davinci_cpdma: add set rate for a channel
The cpdma has 8 rate limited tx channels. This patch adds
ability for cpdma driver to use 8 tx h/w shapers. If at least one
channel is not rate limited then it must have higher number, this
is because the rate limited channels have to have higher priority
then not rate limited channels. The channel priority is set in low-hi
direction already, so that when a new channel is added with ethtool
and it doesn't have rate yet, it cannot affect on rate limited
channels. It can be useful for TSN streams and just in cases when
h/w rate limited channels are needed.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:37:13 -05:00
Ivan Khoronzhuk
0fc6432cc7 net: ethernet: ti: davinci_cpdma: add weight function for channels
The weight of a channel is needed to split descriptors between
channels. The weight can depend on maximum rate of channels, maximum
rate of an interface or other reasons. The channel weight is in
percentage and is independent for rx and tx channels.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:37:13 -05:00
Mintz, Yuval
cb6aeb0792 qede: Add support for XDP_TX
Add support for forwarding via XDP. Once the eBPF is attached,
driver would allocate & configure a designated transmission queue
meant solely for forwarding packets. Said queue would share the
receive-queue's interrupt line, and would have it's own Tx statistics.

Infrastructure changes required for this [spread-out through the code]:
 - Determine the DMA direction of the receive buffers based on the presence
of the eBPF program.
 - Turn the sw Tx ring into a union, as regular/XDP queues have different
needs for releasing resources after completion [regular requires the SKB,
XDP requires the transmitted page].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:05 -05:00
Mintz, Yuval
496e051709 qede: Add basic XDP support
Add support for the ndo_xdp callback. This patch would support XDP_PASS,
XDP_DROP and XDP_ABORTED commands.

This also adds a per Rx queue statistic which counts number of packets
which didn't reach the stack [due to XDP].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:05 -05:00
Mintz, Yuval
9eb22357d5 qede: Better utilize the qede_[rt]x_queue
Improve the cacheline usage of both queues by reordering -
This reduces the cachelines required for egress datapath processing
from 3 to 2 and those required by ingress datapath processing by 2.

It also changes a couple of datapath related functions that currently
require either the fastpath or the qede_dev, changing them to be based
on the tx/rx queue instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:05 -05:00
Mintz, Yuval
8a47253065 qede: Don't check netdevice for rx-hash
Receive-hashing is a fixed feature, so there's no need to check
during the ingress datapath whether it's set or not.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:04 -05:00
Mintz, Yuval
3da7a37ae6 qed*: Handle-based L2-queues.
The driver needs to maintain several FW/HW-indices for each one of
its queues. Currently, that mapping is done by the QED where it uses
an rx/tx array of so-called hw-cids, populating them whenever a new
queue is opened and clearing them upon destruction of said queues.

This maintenance is far from ideal - there's no real reason why
QED needs to maintain such a data-structure. It becomes even worse
when considering the fact that the PF's queues and its child VFs' queues
are all mapped into the same data-structure.
As a by-product, the set of parameters an interface needs to supply for
queue APIs is non-trivial, and some of the variables in the API
structures have different meaning depending on their exact place
in the configuration flow.

This patch re-organizes the way L2 queues are configured and maintained.
In short:
  - Required parameters for queue init are now well-defined.
  - Qed would allocate a queue-cid based on parameters.
    Upon initialization success, it would return a handle to caller.
  - Queue-handle would be maintained by entity requesting queue-init,
    not necessarily qed.
  - All further queue-APIs [update, destroy] would use the opaque
    handle as reference for the queue instead of various indices.

The possible owners of such handles:
  - PF queues [qede] - complete handles based on provided configuration.
  - VF queues [qede] - fw-context-less handles, containing only relative
    information; Only the PF-side would need the absolute indices
    for configuration, so they're omitted here.
  - VF queues [qed, PF-side] - complete handles based on VF initialization.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:04 -05:00
Mintz, Yuval
567b3c127a qede: Revise state locking scheme
As qede utilizes an internal-reload sequence as result of various
configuration changes, the netif state wouldn't always accurately describe
the status of the configuration.
To compensate, we're storing an internal state of the device, which should
only be accessed under the qede_lock.

This patch fixes and improves several state/lock interactions:
  - The internal state should only be checked while locked.
  - While holding lock, it's preferable to check state rather than
    the netdevice's state.
  - The reload sequence is not 'atomic' - unload and subsequent load
    are not in the same critical section.

This also add the 'locked' variant for the reload, which would later be
used by XDP - useful in the case where the correct sequence is 'lock,
check state and re-configure if good', instead of allowing the reload
itself to make the decision regarding the configurability of the device.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 14:32:04 -05:00