Commit Graph

20440 Commits

Author SHA1 Message Date
Amritha Nambiar
5ecae4120a i40e: Refactor VF BW rate limiting
This patch refactors the BW rate limiting for Tx traffic
on the VF to be reused in the next patch for rate limiting Tx
traffic for the VSIs on the PF as well.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13 14:07:32 -07:00
Amritha Nambiar
a9ce82f744 i40e: Enable 'channel' mode in mqprio for TC configs
The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes by setting the mode option to
'channel'.

Example:
  map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5\
  hw 1 mode channel

qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
             queues:(0:1) (2:3) (4:4) (5:5)
             mode:channel
             shaper:dcb

The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.

This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are configured using mqprio scheduler.

The patch also limits setting ethtool Rx flow hash indirection
(ethtool -X eth0 equal N) to max queues configured via mqprio.
The Rx flow hash indirection input through ethtool should be
validated so that it is within in the queue range configured via
tc/mqprio. The bound checking is achieved by reporting the current
rss size to the kernel when queues are configured via mqprio.

Example:
  map 0 0 0 1 0 2 3 0 queues 2@0 4@2 8@6 11@14\
  hw 1 mode channel

Cannot set RX flow hash configuration: Invalid argument

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13 14:06:37 -07:00
Amritha Nambiar
8f88b3034d i40e: Add infrastructure for queue channel support
This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.

Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13 13:38:17 -07:00
Amritha Nambiar
ff42418812 i40e: Add macro for PF reset bit
Introduce a macro for the bit setting the PF reset flag and
update its usages. This makes it easier to use this flag
in functions to be introduced in future without encountering
checkpatch issues related to alignment and line over 80
characters.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13 13:29:48 -07:00
Geert Uytterhoeven
ab104615e0 ravb: Consolidate clock handling
The module clock is used for two purposes:
  - Wake-on-LAN (WoL), which is optional,
  - gPTP Timer Increment (GTI) configuration, which is mandatory.

As the clock is needed for GTI configuration anyway, WoL is always
available.  Hence remove duplication and repeated obtaining of the clock
by making GTI use the stored clock for WoL use.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 23:00:50 -07:00
Rafał Miłecki
12acd13691 net: bgmac: enable master mode for BCM54210E and B50212E PHYs
There are 4 very similar PHYs:
0x600d84a1: BCM54210E (rev B0)
0x600d84a2: BCM54210E (rev B1)
0x600d84a5: B50212E (rev B0)
0x600d84a6: B50212E (rev B1)
that need setting master mode manually. It's because they run in slave
mode by default with Automatic Slave/Master configuration disabled which
can lead to unreliable connection with massive ping loss.

So far it was reported for a board with BCM47189 SoC and B50212E B1 PHY
connected to the bgmac supported ethernet device. Telling PHY driver to
setup PHY properly solves this issue.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 22:59:38 -07:00
Christos Gkekas
47f2546412 vxge: Clean up unused variables in vxge-traffic
Delete unused channel variables in vxge-traffic.

Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 12:24:53 -07:00
Florian Fainelli
723934fb79 net: systemport: Turn on ACB at the SYSTEMPORT level
Now that we have established the queue mapping between the switch port
egress queues and the SYSTEMPORT egress queues, we can turn on Advanced
Congestion Buffering (ACB) at the SYSTEMPORT level. This enables the
Ethernet MAC controller to get out of band flow control information
directly from the switch port and queue that it monitors such that its
internal TDMA can be appropriately backpressured.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 12:10:02 -07:00
Florian Fainelli
d156576362 net: systemport: Establish lower/upper queue mapping
Establish a queue mapping between the DSA slave network device queues
created that correspond to switch port queues, and the transmit queue
that SYSTEMPORT manages.

We need to configure the SYSTEMPORT transmit queue with the switch port number
and switch port queue number in order for the switch and SYSTEMPORT hardware to
utilize the out of band congestion notification. This hardware mechanism works
by looking at the switch port egress queue and determines whether there is
enough buffers for this queue, with that class of service for a successful
transmission and if not, backpressures the SYSTEMPORT queue that is being used.

For this to work, we implement a notifier which looks at the
DSA_PORT_REGISTER event.  When DSA network devices are registered, the
framework calls the DSA notifiers when that happens, extracts the number
of queues for these devices and their associated port number, remembers
that in the driver private structure and linearly maps those queues to
TX rings/queues that we manage.

This scheme works because DSA slave network deviecs always transmit
through SYSTEMPORT so when DSA slave network devices are
destroyed/brought down, the corresponding SYSTEMPORT queues are no
longer used. Also, by design of the DSA framework, the master network
device (SYSTEMPORT) is registered first.

For faster lookups we use an array of up to DSA_MAX_PORTS * number of
queues per port, and then map pointers to bcm_sysport_tx_ring such that
our ndo_select_queue() implementation can just index into that array to
locate the corresponding ring index.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 12:10:02 -07:00
Timur Tabi
3f7832c26c Revert "net: qcom/emac: enforce DMA address restrictions"
This reverts commit df1ec1b9d0.

It turns out that memory allocated via dma_alloc_coherent is always
aligned to the size of the buffer, so there's no way the RRD and RFD
can ever be in separate 32-bit regions.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12 10:50:10 -07:00
Tariq Toukan
f025fd6061 net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion
In XDP_TX, some fields in tx_info and tx_desc are constants across
all entries of the different XDP_TX rings.
Assign values to these fields on ring creation time, rather than in
data-path.

Patchset performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
------------------------------
Before    | After     | Gain |
13.7 Mpps | 14.0 Mpps | %2.2 |
------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan
f6f0aa9741 net/mlx4_en: Obsolete call to generic write_desc in XDP xmit flow
Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit.
Use the relevant parts inline instead.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan
5dad61b838 net/mlx4_en: Replace netdev parameter with priv in XDP xmit function
The struct net_device parameter was passed only to extract
struct mlx4_en_priv out of it.
Here we pass the priv parameter directly.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Jiri Pirko
717503b9cf net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra
The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:15:43 -07:00
Subash Abhinov Kasiviswanathan
60d58f971c net: qualcomm: rmnet: Implement bridge mode
Add support to bridge two devices which can send multiplexing and
aggregation (MAP) data. This is done only when the data itself is
not going to be consumed in the stack but is being passed on to a
different endpoint. This is mainly used for testing.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:30 -07:00
Subash Abhinov Kasiviswanathan
3352e6c457 net: qualcomm: rmnet: Convert the muxed endpoint to hlist
Rather than using a static array, use a hlist to store the muxed
endpoints and use the mux id to query the rmnet_device.
This is useful as usually very few mux ids are used.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:30 -07:00
Subash Abhinov Kasiviswanathan
5451237ff7 net: qualcomm: rmnet: Remove duplicate setting of rmnet_devices
The rmnet_devices information is already stored in muxed_ep, so
storing this in rmnet_devices[] again is redundant.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:30 -07:00
Subash Abhinov Kasiviswanathan
56470c927f net: qualcomm: rmnet: Remove duplicate setting of rmnet private info
The end point is set twice in the local_ep as well as the mux_id and
the real_dev in the rmnet private structure. Remove the local_ep.
While these elements are equivalent, rmnet_endpoint will be
used only as part of the rmnet_port for muxed scenarios in VND mode.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:30 -07:00
Subash Abhinov Kasiviswanathan
9148963201 net: qualcomm: rmnet: Move rmnet_mode to rmnet_port
Mode information on the real device makes it easier to route packets
to rmnet device or bridged device based on the configuration.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:29 -07:00
Subash Abhinov Kasiviswanathan
1281726ec3 net: qualcomm: rmnet: Remove some unused defines
Most of these constants were used in the initial patchset where
custom netlink configuration was used and hence are no longer relevant.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:29 -07:00
Subash Abhinov Kasiviswanathan
d8bbb07adb net: qualcomm: rmnet: Remove existing logic for bridge mode
This will be rewritten in the following patches.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:05:29 -07:00
Timur Tabi
740d6f188f net: qcom/emac: clean up some TX/RX error messages
Some of the error messages that are printed by the interrupt handlers
are poorly written.  For example, many don't include a device prefix,
so there's no indication that they are EMAC errors.

Also use rate limiting for all messages that could be printed from
interrupt context.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 16:01:56 -07:00
Timur Tabi
df1ec1b9d0 net: qcom/emac: enforce DMA address restrictions
The EMAC has a restriction that the upper 32 bits of the base addresses
for the RFD and RRD rings must be the same.  The ensure that restriction,
we allocate twice the space for the RRD and locate it at an appropriate
address.

We also re-arrange the allocations so that invalid addresses are even
less likely.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 16:01:56 -07:00
Timur Tabi
3958ffcd85 net: qcom/emac: remove unused address arrays
The EMAC is capable of multiple TX and RX rings, but the driver only
supports one ring for each.  One function had some left-over unused
code that supports multiple rings, but all it did was make the code
harder to read.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 16:01:56 -07:00
Timur Tabi
d7e6b34756 net: qcom/emac: specify the correct DMA mask
The 64/32-bit DMA mask hackery in the EMAC driver is not actually necessary,
and is technically not accurate.  The EMAC hardware is limted to a 45-bit
DMA address.  Although no EMAC-enabled system can have that much DDR,
an IOMMU could possible provide a larger address.  Rather than play games
with the DMA mappings, the driver should provide a correct value and
trust the DMA/IOMMU layers to do the right thing.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 16:01:56 -07:00
Wei Yongjun
7822b0836d net: hns3: make local functions static
Fixes the following sparse warnings:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:464:5: warning:
 symbol 'hns3_change_all_ring_bd_num' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:477:5: warning:
 symbol 'hns3_set_ringparam' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 15:21:28 -07:00
David S. Miller
f44dea3421 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-10-10

This series contains updates to e1000e and igb.

Benjamin Poirier provides several fixes for e1000e, starting with a
correction to the return status which was always returning success even
if it was not successful.  Fixed code comments to reflect the actual
code behavior.  Fixed the conditional test for the correct return
value.  Fixed a potential race condition reported by Lennart Sorensen,
where the single flag get_link_status is used to signal two different
states.

Sasha fixes a buffer overrun for i219 devices, where the chipset had
reduced the round-trip latency for the LAN controller DMA accesses
which in some high performance cases caused a buffer overrun while
processing the DMA transactions.

Willem de Bruijn changes the default behavior of e1000e to use the
burst mode settings by default unless the user specifies the
receive interrupt delay (RxIntDelay).

Florian Fainelli updates the driver to differentiate between when
e1000e_put_txbuf() is called from normal reclamation or when a
DMA mapping failure to make the driver more "drop monitor friendly".

Christophe JAILLET fixes a potential NULL pointer dereference by
properly returning -ENOMEM on memory allocation failures.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:20:16 -07:00
Inbar Karmy
80a8dc75ee net/mlx4_en: Increase number of default RX rings
Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy
b8d394367a net/mlx4_en: Limit the number of RX rings
Limit the number of RX rings by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy
7e1dc5e926 net/mlx4_en: Limit the number of TX rings
Limit the number of TX rings per UP by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Lipeng
abf11d04fd net: hns3: fix the ring count for ETHTOOL_GRXRINGS
This patch fix the ring count for ETHTOOL_GRXRINGS. Ring count
not TC size should be return for command "ethtool -n ethx".

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:09:14 -07:00
Lipeng
07d2995425 net: hns3: add support for ETHTOOL_GRXFH
This patch add support for ethtool's ETHTOOL_GRXFH in hns3_get_rxnfc().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:09:14 -07:00
Lipeng
f7db940afc net: hns3: add support for set_rxnfc
This patch supports the ethtool's set_rxnfc().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:09:14 -07:00
Lipeng
5668abda09 net: hns3: add support for set_ringparam
This patch supports the ethtool's set_ringparam().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:09:13 -07:00
Lipeng
ee83f77645 net: hns3: fixes the ring index in hns3_fini_ring
This patch fixes the ring index in hns3_fini_ring.

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:09:13 -07:00
Ganesh Goudar
652faa98ec cxgb4: add new T5 pci device id's
Add 0x50aa and 0x50ab T5 device id's.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 12:51:07 -07:00
Ganesh Goudar
96ac18f14a cxgb4: Add support for new flash parts
Add support for new flash parts identification, and
also cleanup the flash Part identifying and decoding
code.

Based on the original work of Casey Leedom <leedom@chelsio.com>

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 12:51:07 -07:00
Christophe JAILLET
18eb86362a igb: check memory allocation failure
Check memory allocation failures and return -ENOMEM in such cases, as
already done for other memory allocations in this function.

This avoids NULL pointers dereference.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com
Acked-by: PJ Waskiewicz <peter.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 09:01:11 -07:00
Florian Fainelli
377b62736c e1000e: Be drop monitor friendly
e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 09:00:56 -07:00
Willem de Bruijn
48072ae1ec e1000e: apply burst mode settings only on default
Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 09:00:48 -07:00
Sasha Neftin
b10effb92e e1000e: fix buffer overrun while the I219 is processing DMA transactions
Intel® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 09:00:38 -07:00
Benjamin Poirier
4aea7a5c5e e1000e: Avoid receiver overrun interrupt bursts
When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.

This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.

Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59bc
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac68 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59bc ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:59:22 -07:00
Benjamin Poirier
19110cfbb3 e1000e: Separate signaling for link check/link up
Lennart reported the following race condition:

\ e1000_watchdog_task
    \ e1000e_has_link
        \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
            /* link is up */
            mac->get_link_status = false;

                            /* interrupt */
                            \ e1000_msix_other
                                hw->mac.get_link_status = true;

        link_active = !hw->mac.get_link_status
        /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:35:01 -07:00
Benjamin Poirier
d3509f8bc7 e1000e: Fix return value test
All the helpers return -E1000_ERR_PHY.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:33:24 -07:00
Benjamin Poirier
65a29da1f5 e1000e: Fix wrong comment related to link detection
Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:27:07 -07:00
Benjamin Poirier
c4c40e51f9 e1000e: Fix error path in link detection
In case of error from e1e_rphy(), the loop will exit early and "success"
will be set to true erroneously.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:17:00 -07:00
David S. Miller
d93fa2ba64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-09 20:11:09 -07:00
David S. Miller
9f7be893ab Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-10-09

This series contains updates to i40e and i40evf only.

Jake fixes missed flag conversion from u64 to u32.  Fixes a deafult ITR
value issue where the driver defaults to an ITR value of half the
expected value (in terms of minimum microseconds between interrupts).  So
fix this by changing the default values to be calculated using the
ITR_REG_TO_USEC() macro which indicates that we are converting from the
register units into microseconds. Updates the drivers to bump the tail in
increments of 8 and double the number of descriptors we will bundle into
one tail bump when receiving.  With the recent kernel support for
enabling XPS and QoS at the same time, we no longer need to worry about
the number of traffic classes when enabling XPS.

Lihong converts the use of hash_for_each() to hash_for_each_safe() to
safely remove a hash entry.  Adds a check for the return value for
find_first_bit() in the case that it returns the size passed to search.

Alan fixes a bug in which filters are erroneously removed if they are
removed and then added again.  So make sure that when adding a filter, if
we find it already existed in our list, make sure it is not marked to be
removed.

Jayaprakash adds the retrying of PHY reads when the I2C is busy for a
maximum period of 500ms.

Rami fixes code comment typo.

Stefano Brivio simplifies the code by removing the use of a local
return code variable and simply return the results of the read function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 18:12:03 -07:00
David S. Miller
0349a86c85 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2017-10-09

This series contains updates to ixgbe only.

Emil fixes an issue where the semaphore bits could be stuck after a reset
or a crash, by adding the clearing of software resource bits in the
software/firmware synchronization register.  Added error checks when we
attempt to identify and initialize the PHY to prevent a crash.  Fixed a
few issues in the logic of ixgbe_clean_test_rings() which was exposed by
a previous commit that was causing a crash in ethtool diagnostics.

Bhumika Goyal fixes a couple of instances which were overlooked when we
made ixgbe_mac_operations constant.

Shannon Nelson fixes an issue to restore normal operations after the
last MACVLAN offload is removed, otherwise we get stuck in a single queue
operations.

The infamous Jesper Dangaard Brouer adds a counter which counts the
number of times the recycle fails and the real page allocator is invoked.

Alex updates the adaptive ITR algorithm to better support the needs of the
network.  This attempt to make it so that our ITR algorithm will try to
prevent either starving a socket buffer for memory in the case of
transmit, or overrunning an receive socket buffer on receive.  We should
function better with new features like XDP which can handle small packets
at high rates without needing to lock us into NAPI polling mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 16:38:52 -07:00
Stefano Brivio
2c4d36b708 i40e: Avoid some useless variables and initializers in NVM functions
Fixes: 09f79fd49d ("i40e: avoid NVM acquire deadlock during NVM update")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09 14:42:17 -07:00