Commit Graph

5006 Commits

Author SHA1 Message Date
Ben Shelton
30d84397af ice: Do not check INTEVENT bit for OICR interrupts
According to the hardware spec, checking the INTEVENT bit isn't a
reliable way to detect if an OICR interrupt has occurred. This is
because this bit can be cleared by the hardware/firmware before the
interrupt service routine has run. So instead, just check for OICR
events every time.

Fixes: 940b61af02 ("ice: Initialize PF and setup miscellaneous interrupt")
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24 09:03:23 -07:00
Anirudh Venkataramanan
34357a90d5 ice: Fix incorrect comment for action type
Action type 5 defines large action generic values. Fix comment to
reflect that better.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24 08:56:56 -07:00
Anirudh Venkataramanan
d332a38c95 ice: Fix initialization for num_nodes_added
ice_sched_add_nodes_to_layer is used recursively, and so we start
with num_nodes_added being 0. This way, in case of an error or if
num_nodes is NULL, the function just returns 0 to indicate that no
nodes were added.

Fixes: 5513b920a4 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24 08:55:42 -07:00
Vinicius Costa Gomes
2707df9773 igb: Fix the transmission mode of queue 0 for Qav mode
When Qav mode is enabled, queue 0 should be kept on Stream Reservation
mode. From the i210 datasheet, section 8.12.19:

"Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
Qav." ("QueueMode 1b" represents the Stream Reservation mode)

The solution is to give queue 0 the all the credits it might need, so
it has priority over queue 1.

A situation where this can happen is when cbs is "installed" only on
queue 1, leaving queue 0 alone. For example:

$ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
     	   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
     	   hicredit 30 sendslope -980000 idleslope 20000 offload 1

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24 08:53:19 -07:00
Colin Ian King
39035bfdc3 ixgbevf: ensure xdp_ring resources are free'd on error exit
The current error handling for failed resource setup for xdp_ring
data is a break out of the loop and returning 0 indicated everything
was OK, when in fact it is not.  Fix this by exiting via the
error exit label err_setup_tx that will clean up the resources
correctly and return and error status.

Detected by CoverityScan, CID#1466879 ("Logically dead code")

Fixes: 21092e9ce8 ("ixgbevf: Add support for XDP_TX action")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24 08:20:40 -07:00
Jesper Dangaard Brouer
44fa2dbd47 xdp: transition into using xdp_frame for ndo_xdp_xmit
Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
xdp_buff.  This brings xdp_return_frame and ndp_xdp_xmit in sync.

This builds towards changing the API further to become a bulk API,
because xdp_buff is not a queue-able object while xdp_frame is.

V4: Adjust for commit 59655a5b6c ("tuntap: XDP_TX can use native XDP")
V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:30 -04:00
Jesper Dangaard Brouer
039930945a xdp: transition into using xdp_frame for return API
Changing API xdp_return_frame() to take struct xdp_frame as argument,
seems like a natural choice. But there are some subtle performance
details here that needs extra care, which is a deliberate choice.

When de-referencing xdp_frame on a remote CPU during DMA-TX
completion, result in the cache-line is change to "Shared"
state. Later when the page is reused for RX, then this xdp_frame
cache-line is written, which change the state to "Modified".

This situation already happens (naturally) for, virtio_net, tun and
cpumap as the xdp_frame pointer is the queued object.  In tun and
cpumap, the ptr_ring is used for efficiently transferring cache-lines
(with pointers) between CPUs. Thus, the only option is to
de-referencing xdp_frame.

It is only the ixgbe driver that had an optimization, in which it can
avoid doing the de-reference of xdp_frame.  The driver already have
TX-ring queue, which (in case of remote DMA-TX completion) have to be
transferred between CPUs anyhow.  In this data area, we stored a
struct xdp_mem_info and a data pointer, which allowed us to avoid
de-referencing xdp_frame.

To compensate for this, a prefetchw is used for telling the cache
coherency protocol about our access pattern.  My benchmarks show that
this prefetchw is enough to compensate the ixgbe driver.

V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")
V8: Adjust for commit bd658dda42 ("net/mlx5e: Separate dma base address
and offset in dma_sync call")

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer
8d5d885275 xdp: rhashtable with allocator ID to pointer mapping
Use the IDA infrastructure for getting a cyclic increasing ID number,
that is used for keeping track of each registered allocator per
RX-queue xdp_rxq_info.  Instead of using the IDR infrastructure, which
uses a radix tree, use a dynamic rhashtable, for creating ID to
pointer lookup table, because this is faster.

The problem that is being solved here is that, the xdp_rxq_info
pointer (stored in xdp_buff) cannot be used directly, as the
guaranteed lifetime is too short.  The info is needed on a
(potentially) remote CPU during DMA-TX completion time . In an
xdp_frame the xdp_mem_info is stored, when it got converted from an
xdp_buff, which is sufficient for the simple page refcnt based recycle
schemes.

For more advanced allocators there is a need to store a pointer to the
registered allocator.  Thus, there is a need to guard the lifetime or
validity of the allocator pointer, which is done through this
rhashtable ID map to pointer. The removal and validity of of the
allocator and helper struct xdp_mem_allocator is guarded by RCU.  The
allocator will be created by the driver, and registered with
xdp_rxq_info_reg_mem_model().

It is up-to debate who is responsible for freeing the allocator
pointer or invoking the allocator destructor function.  In any case,
this must happen via RCU freeing.

Use the IDA infrastructure for getting a cyclic increasing ID number,
that is used for keeping track of each registered allocator per
RX-queue xdp_rxq_info.

V4: Per req of Jason Wang
- Use xdp_rxq_info_reg_mem_model() in all drivers implementing
  XDP_REDIRECT, even-though it's not strictly necessary when
  allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).

V6: Per req of Alex Duyck
- Introduce rhashtable_lookup() call in later patch

V8: Address sparse should be static warnings (from kbuild test robot)

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer
b411ef1102 i40e: convert to use generic xdp_frame and xdp_return_frame API
Also convert driver i40e, which very recently got XDP_REDIRECT support
in commit d9314c474d ("i40e: add support for XDP_REDIRECT").

V7: This patch got added in V7 of this patchset.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:28 -04:00
Jesper Dangaard Brouer
189ead81a8 ixgbe: use xdp_return_frame API
Extend struct ixgbe_tx_buffer to store the xdp_mem_info.

Notice that this could be optimized further by putting this into
a union in the struct ixgbe_tx_buffer, but this patchset
works towards removing this again.  Thus, this is not done.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:27 -04:00
Linus Torvalds
c18bb396d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) The sockmap code has to free socket memory on close if there is
    corked data, from John Fastabend.

 2) Tunnel names coming from userspace need to be length validated. From
    Eric Dumazet.

 3) arp_filter() has to take VRFs properly into account, from Miguel
    Fadon Perlines.

 4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.

 5) Missing idr_remove() in u32_delete_key(), from Cong Wang.

 6) More syzbot stuff. Several use of uninitialized value fixes all
    over, from Eric Dumazet.

 7) Do not leak kernel memory to userspace in sctp, also from Eric
    Dumazet.

 8) Discard frames from unused ports in DSA, from Andrew Lunn.

 9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
    Falcon.

10) Do not access dp83640 PHY registers prematurely after reset, from
    Esben Haabendal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vhost-net: set packet weight of tx polling to 2 * vq size
  net: thunderx: rework mac addresses list to u64 array
  inetpeer: fix uninit-value in inet_getpeer
  dp83640: Ensure against premature access to PHY registers after reset
  devlink: convert occ_get op to separate registration
  ARM: dts: ls1021a: Specify TBIPA register address
  net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
  ibmvnic: Do not reset CRQ for Mobility driver resets
  ibmvnic: Fix failover case for non-redundant configuration
  ibmvnic: Fix reset scheduler error handling
  ibmvnic: Zero used TX descriptor counter on reset
  ibmvnic: Fix DMA mapping mistakes
  tipc: use the right skb in tipc_sk_fill_sock_diag()
  sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
  net: dsa: Discard frames from unused ports
  sctp: do not leak kernel memory to user space
  soreuseport: initialise timewait reuseport field
  ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
  dccp: initialize ireq->ir_mark
  net: fix uninit-value in __hw_addr_add_ex()
  ...
2018-04-09 17:04:10 -07:00
Linus Torvalds
3c0d551e02 pci-v4.17-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7
 E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf
 +yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4
 /jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA
 XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/
 qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr
 0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD
 gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX
 AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR
 Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME
 f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv
 uLEf0DU7AEmdvzQ=
 =T++R
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - move pci_uevent_ers() out of pci.h (Michael Ellerman)

 - skip ASPM common clock warning if BIOS already configured it (Sinan
   Kaya)

 - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)

 - remove last user of pci_get_bus_and_slot() and the function itself
   (Sinan Kaya)

 - add decoding for 16 GT/s link speed (Jay Fang)

 - add interfaces to get max link speed and width (Tal Gilboa)

 - add pcie_bandwidth_capable() to compute max supported link bandwidth
   (Tal Gilboa)

 - add pcie_bandwidth_available() to compute bandwidth available to
   device (Tal Gilboa)

 - add pcie_print_link_status() to log link speed and whether it's
   limited (Tal Gilboa)

 - use PCI core interfaces to report when device performance may be
   limited by its slot instead of doing it in each driver (Tal Gilboa)

 - fix possible cpqphp NULL pointer dereference (Shawn Lin)

 - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
   hotplug (Mika Westerberg)

 - add support for PCI I/O port space that's neither directly accessible
   via CPU in/out instructions nor directly mapped into CPU physical
   memory space. This is fairly intrusive and includes minor changes to
   interfaces used for I/O space on most platforms (Zhichang Yuan, John
   Garry)

 - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
   John Garry)

 - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)

 - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
   (Shawn Lin)

 - report quirk timings with dev_info (Bjorn Helgaas)

 - report quirks that take longer than 10ms (Bjorn Helgaas)

 - add and use Altera Vendor ID (Johannes Thumshirn)

 - tidy Makefiles and comments (Bjorn Helgaas)

 - don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
   ia64, and mn10300 with x86 (Bjorn Helgaas)

 - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
   Lawler)

 - merge pcieport_if.h into portdrv.h (Bjorn Helgaas)

 - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
   Helgaas)

 - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)

 - remove portdrv link order dependency (Bjorn Helgaas)

 - remove support for unused VC portdrv service (Bjorn Helgaas)

 - simplify portdrv feature permission checking (Bjorn Helgaas)

 - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
   Helgaas)

 - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)

 - use cached AER capability offset (Frederick Lawler)

 - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)

 - rename pcie-dpc.c to dpc.c (Bjorn Helgaas)

 - use generic pci_mmap_resource_range() instead of powerpc and xtensa
   arch-specific versions (David Woodhouse)

 - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)

 - remove System and Video ROM reservations on sparc (Bjorn Helgaas)

 - probe for device reset support during enumeration instead of runtime
   (Bjorn Helgaas)

 - add ACS quirk for Ampere (née APM) root ports (Feng Kan)

 - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
   Vincent-Cross)

 - protect device restore with device lock (Sinan Kaya)

 - handle failure of FLR gracefully (Sinan Kaya)

 - handle CRS (config retry status) after device resets (Sinan Kaya)

 - skip various config reads for SR-IOV VFs as an optimization
   (KarimAllah Ahmed)

 - consolidate VPD code in vpd.c (Bjorn Helgaas)

 - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)

 - add DT support for R-Car r8a7743 (Biju Das)

 - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
   bridge driver that causes a general protection fault (Dexuan Cui)

 - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
   (Dexuan Cui)

 - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
   (Dexuan Cui)

 - make several structures static (Fengguang Wu)

 - increase number of MSI IRQs supported by Synopsys DesignWare bridges
   from 32 to 256 (Gustavo Pimentel)

 - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
   API from DesignWare drivers (Gustavo Pimentel)

 - add Tegra power management support (Manikanta Maddireddy)

 - add Tegra loadable module support (Manikanta Maddireddy)

 - handle 64-bit BARs correctly in endpoint support (Niklas Cassel)

 - support optional regulator for HiSilicon STB (Shawn Guo)

 - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)

 - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)

* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
  MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
  HISI LPC: Add ACPI support
  ACPI / scan: Do not enumerate Indirect IO host children
  ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
  HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
  of: Add missing I/O range exception for indirect-IO devices
  PCI: Apply the new generic I/O management on PCI IO hosts
  PCI: Add fwnode handler as input param of pci_register_io_range()
  PCI: Remove __weak tag from pci_register_io_range()
  MAINTAINERS: Add missing /drivers/pci/cadence directory entry
  fm10k: Report PCIe link properties with pcie_print_link_status()
  net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
  net/mlx5: Report PCIe link properties with pcie_print_link_status()
  net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
  PCI: Add pcie_print_link_status() to log link speed and whether it's limited
  PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
  misc: pci_endpoint_test: Handle 64-bit BARs properly
  PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
  PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
  PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
  ...
2018-04-06 18:31:06 -07:00
Anirudh Venkataramanan
cba5957d7e ice: Bug fixes in ethtool code
1) Return correct size from ice_get_regs_len.
2) Fix incorrect use of ARRAY_SIZE in ice_get_regs.

Fixes: fcea6f3da5 (ice: Add stats and ethtool support)
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-06 07:00:09 -07:00
Wei Yongjun
63bb4e1ebd ice: Fix error return code in ice_init_hw()
Fix to return error code ICE_ERR_NO_MEMORY from the alloc error
handling case instead of 0, as done elsewhere in this function.

Fixes: dc49c77236 ("ice: Get MAC/PHY/link info and scheduler topology")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-06 07:00:09 -07:00
Bjorn Helgaas
170648fda9 fm10k: Report PCIe link properties with pcie_print_link_status()
Previously the driver used pcie_get_minimum_link() to warn when the NIC
is in a slot that can't supply as much bandwidth as the NIC could use.

pcie_get_minimum_link() can be misleading because it finds the slowest link
and the narrowest link (which may be different links) without considering
the total bandwidth of each link.  For a path with a 16 GT/s x1 link and a
2.5 GT/s x16 link, it returns 2.5 GT/s x1, which corresponds to 250 MB/s of
bandwidth, not the true available bandwidth of about 1969 MB/s for a
16 GT/s x1 link.

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.  This finds
the slowest link in the path to the device by computing the total bandwidth
of each link and compares that with the capabilities of the device.

Note that the driver previously used dev_warn() to suggest using a
different slot, but pcie_print_link_status() uses dev_info() because if the
platform has no faster slot available, the user can't do anything about the
warning and may not want to be bothered with it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
2018-04-03 08:58:34 -05:00
David S. Miller
13d5a30a1e Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-03-26

This series contains updates to i40e only.

Jake provides several patches which remove the need for cmpxchg64(),
starting with moving I40E_FLAG_[UDP]_FILTER_SYNC from pf->flags to pf->state
since they are modified during run time possibly when the RTNL lock is not
held so they should be a state bits and not flags.  Moved additional
"flags" which should be state fields, into pf->state.  Ensure we hold
the RTNL lock for the entire sequence of preparing for reset and when
resuming, which will protect the flags related to interrupt scheme under
RTNL lock so that their modification is properly threaded.  Finally,
cleanup the use of cmpxchg64() since it is no longer needed.  Cleaned up
the holes in the feature flags created my moving some flags to the state
field.

Björn Töpel adds XDP_REDIRECT support as well as tweaking the page
counting for XDP_REDIRECT so that it will function properly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 09:56:13 -04:00
David S. Miller
34fd03b9e6 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-03-26

This patch series adds the ice driver, which will support the Intel(R)
E800 Series of network devices.

This is the first phase in the release of this driver where we implement
basic transmit and receive. The idea behind the multi-phase release is to
aid in code review as well as testing. Subsequent phases will implement
advanced features (like SR-IOV, tunnelling, flow director, QoS, etc.) that
build upon the previous phase(s). Each phase will be submitted as a patch
series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 18:20:02 -04:00
Björn Töpel
d9314c474d i40e: add support for XDP_REDIRECT
The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.

XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 14:17:10 -07:00
Björn Töpel
8ce29c679a i40e: tweak page counting for XDP_REDIRECT
This commit tweaks the page counting for XDP_REDIRECT to function
properly. XDP_REDIRECT support will be added in a future commit.

The current page counting scheme assumes that the reference count
cannot decrease until the received frame is sent to the upper layers
of the networking stack. This assumption does not hold for the
XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have
its reference count decreased via the xdp_do_redirect call.

To work around that, we now start off by a large page count and then
don't allow a refcount less than two.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 14:10:04 -07:00
Jacob Keller
8f769dd14a i40e: re-number feature flags to remove gaps
Remove the gaps created by the recent refactor of various feature flags
that have moved to the state field. Use only a u32 now that we have
fewer than 32 flags in the field.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 14:00:12 -07:00
Jacob Keller
886ff146a7 i40e: stop using cmpxchg flow in i40e_set_priv_flags()
Now that the only places which modify flags are either (a) during
initialization prior to creating a netdevice, or (b) while holding the
rtnl lock, we no longer need the cmpxchg64 call in i40e_set_priv_flags.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:56:45 -07:00
Jacob Keller
f0ee70a042 i40e: hold the RTNL lock while changing interrupt schemes
When we suspend and resume, we need to clear and re-enable the interrupt
scheme. This was previously not done while holding the RTNL lock, which
could be problematic, because we are actually destroying and re-creating
queues.

Hold the RTNL lock for the entire sequence of preparing for reset, and
when resuming. This additionally protects the flags related to interrupt
scheme under RTNL lock so that their modification is properly threaded.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:52:53 -07:00
Jacob Keller
5f76a704b8 i40e: move client flags into state bits
The iWarp client flags are all potentially changed when the RTNL lock is
not held, so they should not be part of the pf->flags variable. Instead,
move them into the state field so that we can use atomic bit operations.

This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:50:56 -07:00
Jacob Keller
0605c45ce5 i40e: move I40E_FLAG_TEMP_LINK_POLLING to state field
This flag is modified outside of the RTNL lock and thus should not be
part of the pf->flags variable.

Use a state bit instead, so that we can use atomic bit operations.

This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:48:10 -07:00
Jacob Keller
134201aead i40e: move AUTO_DISABLED flags into the state field
The two Flow Directory auto disable flags are used at run time to mark
when the flow director features needed to be disabled. Thus the flags
could change even when the RTNL lock is not held.

They also have some code constructions which really should be
test_and_set or test_and_clear using atomic bit operations.

Create new state fields to mark this, and stop including them in
pf->flags.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:46:30 -07:00
Jacob Keller
41898c66ef i40e: move I40E_FLAG_UDP_FILTER_SYNC to the state field
This flag is modified during run time, possibly even when the RTNL lock
is not held. Additionally it has a few places which should be using
test_and_set or test_and_clear atomic bit operations.

Create a new state bit __I40E_UDP_SYNC_PENDING and use it instead of the
ole I40E_FLAG_UDP_FILTER_SYNC flag.

This is part of a larger effort to remove the need for using cmpxchg64
in i40e_set_priv_flags.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:44:02 -07:00
Jacob Keller
bfe040c385 i40e: move I40E_FLAG_FILTER_SYNC to a state bit
The I40E_FLAG_FILTER_SYNC flag is modified during run time possibly when
the RTNL lock is not held. Thus, it should not be part of pf->flags, and
instead should be using atomic bit operations in the pf->state field.

Create a __I40E_MACVLAN_SYNC_PENDING state bit, and use it instead of
the old I40E_FLAG_FILTER_SYNC flag.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:09:44 -07:00
Anirudh Venkataramanan
e94d447866 ice: Implement filter sync, NDO operations and bump version
This patch implements multiple pieces of functionality:

1. Added ice_vsi_sync_filters, which is called through the service task
   to push filter updates to the hardware.

2. Add support to enable/disable promiscuous mode on an interface.
   Enabling/disabling promiscuous mode on an interface results in
   addition/removal of a promisc filter rule through ice_vsi_sync_filters.

3. Implement handlers for ndo_set_mac_address, ndo_change_mtu,
   ndo_poll_controller and ndo_set_rx_mode.

This patch also marks the end of the driver addition by bumping up the
driver version.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 12:41:38 -07:00
Anirudh Venkataramanan
0b28b702e7 ice: Support link events, reset and rebuild
Link events are posted to a PF's admin receive queue (ARQ). This patch
adds the ability to detect and process link events.

This patch also adds the ability to process resets.

The driver can process the following resets:
    1) EMP Reset (EMPR)
    2) Global Reset (GLOBR)
    3) Core Reset (CORER)
    4) Physical Function Reset (PFR)

EMPR is the largest level of reset that the driver can handle. An EMPR
resets the manageability block and also the data path, including PHY and
link for all the PFs. The affected PFs are notified of this event through
a miscellaneous interrupt.

GLOBR is a subset of EMPR. It does everything EMPR does except that it
doesn't reset the manageability block.

CORER is a subset of GLOBR. It does everything GLOBR does but doesn't
reset PHY and link.

PFR is a subset of CORER and affects only the given physical function.
In other words, PFR can be thought of as a CORER for a single PF. Since
only the issuing PF is affected, a PFR doesn't result in the miscellaneous
interrupt being triggered.

All the resets have the following in common:
1) Tx/Rx is halted and all queues are stopped.
2) All the VSIs and filters programmed for the PF are lost and have to be
   reprogrammed.
3) Control queue interfaces are reset and have to be reprogrammed.

In the rebuild flow, control queues are reinitialized, VSIs are reallocated
and filters are restored.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 12:32:17 -07:00
Anirudh Venkataramanan
5513b920a4 ice: Update Tx scheduler tree for VSI multi-Tx queue support
This patch adds the ability for a VSI to use multiple Tx queues. More
specifically, the patch
    1) Provides the ability to update the Tx scheduler tree in the
       firmware. The driver can configure the Tx scheduler tree by
       adding/removing multiple Tx queues per TC per VSI.

    2) Allows a VSI to reconfigure its Tx queues during runtime.

    3) Synchronizes the Tx scheduler update operations using locks.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 12:21:42 -07:00
Anirudh Venkataramanan
fcea6f3da5 ice: Add stats and ethtool support
This patch implements a watchdog task to get packet statistics from
the device.

This patch also adds support for the following ethtool operations:

ethtool devname
ethtool -s devname [msglvl N] [msglevel type on|off]
ethtool -g|--show-ring devname
ethtool -G|--set-ring devname [rx N] [tx N]
ethtool -i|--driver devname
ethtool -d|--register-dump devname [raw on|off] [hex on|off] [file name]
ethtool -k|--show-features|--show-offload devname
ethtool -K|--features|--offload devname feature on|off
ethtool -P|--show-permaddr devname
ethtool -S|--statistics devname
ethtool -a|--show-pause devname
ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
ethtool -r|--negotiate devname

CC: Andrew Lunn <andrew@lunn.ch>
CC: Jakub Kicinski <kubakici@wp.pl>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 12:11:19 -07:00
Anirudh Venkataramanan
d76a60ba7a ice: Add support for VLANs and offloads
This patch adds support for VLANs. When a VLAN is created a switch filter
is added to direct the VLAN traffic to the corresponding VSI. When a VLAN
is deleted, the filter is deleted as well.

This patch also adds support for the following hardware offloads.
    1) VLAN tag insertion/stripping
    2) Receive Side Scaling (RSS)
    3) Tx checksum and TCP segmentation
    4) Rx checksum

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:54:49 -07:00
Anirudh Venkataramanan
2b245cb294 ice: Implement transmit and NAPI support
This patch implements ice_start_xmit (the handler for ndo_start_xmit) and
related functions. ice_start_xmit ultimately calls ice_tx_map, where the
Tx descriptor is built and posted to the hardware by bumping the ring tail.

This patch also implements ice_napi_poll, which is invoked when there's an
interrupt on the VSI's queues. The interrupt can be due to either a
completed Tx or an Rx event. In case of a completed Tx/Rx event, resources
are reclaimed. Additionally, in case of an Rx event, the skb is fetched
and passed up to the network stack.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:27:05 -07:00
Anirudh Venkataramanan
cdedef59de ice: Configure VSIs for Tx/Rx
This patch configures the VSIs to be able to send and receive
packets by doing the following:

1) Initialize flexible parser to extract and include certain
   fields in the Rx descriptor.

2) Add Tx queues by programming the Tx queue context (implemented in
   ice_vsi_cfg_txqs). Note that adding the queues also enables (starts)
   the queues.

3) Add Rx queues by programming Rx queue context (implemented in
   ice_vsi_cfg_rxqs). Note that this only adds queues but doesn't start
   them. The rings will be started by calling ice_vsi_start_rx_rings on
   interface up.

4) Configure interrupts for VSI queues.

5) Implement ice_open and ice_stop.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:18:36 -07:00
Anirudh Venkataramanan
9daf8208dd ice: Add support for switch filter programming
A VSI needs traffic directed towards it. This is done by programming
filter rules on the switch (embedded vSwitch) element in the hardware,
which connects the VSI to the ingress/egress port.

This patch introduces data structures and functions necessary to add
remove or update switch rules on the switch element. This is a pretty low
level function that is generic enough to add a whole range of filters.

This patch also introduces two top level functions ice_add_mac and
ice_remove mac which through a series of intermediate helper functions
eventually call ice_aq_sw_rules to add/delete simple MAC based filters.
It's worth noting that one invocation of ice_add_mac/ice_remove_mac
is capable of adding/deleting multiple MAC filters.

Also worth noting is the fact that the driver maintains a list of currently
active filters, so every filter addition/removal causes an update to this
list. This is done for a couple of reasons:

1) If two VSIs try to add the same filters, we need to detect it and do
   things a little differently (i.e. use VSI lists, described below) as
   the same filter can't be added more than once.

2) In the event of a hardware reset we can simply walk through this list
   and restore the filters.

VSI Lists:
In a multi-VSI situation, it's possible that multiple VSIs want to add the
same filter rule. For example, two VSIs that want to receive broadcast
traffic would both add a filter for destination MAC ff:ff:ff:ff:ff:ff.
This can become cumbersome to maintain and so this is handled using a
VSI list.

A VSI list is resource that can be allocated in the hardware using the
ice_aq_alloc_free_res admin queue command. Simply put, a VSI list can
be thought of as a subscription list containing a set of VSIs to which
the packet should be forwarded, should the filter match.

For example, if VSI-0 has already added a broadcast filter, and VSI-1
wants to do the same thing, the filter creation flow will detect this,
allocate a VSI list and update the switch rule so that broadcast traffic
will now be forwarded to the VSI list which contains VSI-0 and VSI-1.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 11:00:08 -07:00
Anirudh Venkataramanan
3a858ba392 ice: Add support for VSI allocation and deallocation
This patch introduces data structures and functions to alloc/free
VSIs. The driver represents a VSI using the ice_vsi structure.

Some noteworthy points about VSI allocation:

1) A VSI is allocated in the firmware using the "add VSI" admin queue
   command (implemented as ice_aq_add_vsi). The firmware returns an
   identifier for the allocated VSI. The VSI context is used to program
   certain aspects (loopback, queue map, etc.) of the VSI's configuration.

2) A VSI is deleted using the "free VSI" admin queue command (implemented
   as ice_aq_free_vsi).

3) The driver represents a VSI using struct ice_vsi. This is allocated
   and initialized as part of the ice_vsi_alloc flow, and deallocated
   as part of the ice_vsi_delete flow.

4) Once the VSI is created, a netdev is allocated and associated with it.
   The VSI's ring and vector related data structures are also allocated
   and initialized.

5) A VSI's queues can either be contiguous or scattered. To do this, the
   driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with
   the firmware's VSI queue allocation imap. If the VSI can't get a
   contiguous queue allocation, it will fallback to scatter. This is
   implemented in ice_vsi_get_qs which is called as part of the VSI setup
   flow. In the release flow, the VSI's queues are released and the bitmap
   is updated to reflect this by ice_vsi_put_qs.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 10:44:27 -07:00
Anirudh Venkataramanan
940b61af02 ice: Initialize PF and setup miscellaneous interrupt
This patch continues the initialization flow as follows:

1) Allocate and initialize necessary fields (like vsi, num_alloc_vsi,
   irq_tracker, etc) in the ice_pf instance.

2) Setup the miscellaneous interrupt handler. This also known as the
   "other interrupt causes" (OIC) handler and is used to handle non
   hotpath interrupts (like control queue events, link events,
   exceptions, etc.

3) Implement a background task to process admin queue receive (ARQ)
   events received by the driver.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 10:34:49 -07:00
Anirudh Venkataramanan
dc49c77236 ice: Get MAC/PHY/link info and scheduler topology
This patch adds code to continue the initialization flow as follows:

1) Get PHY/link information and store it
2) Get default scheduler tree topology and store it
3) Get the MAC address associated with the port and store it

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 10:24:54 -07:00
Anirudh Venkataramanan
9c20346b63 ice: Get switch config, scheduler config and device capabilities
This patch adds to the initialization flow by getting switch
configuration, scheduler configuration and device capabilities.

Switch configuration:
On boot, an L2 switch element is created in the firmware per physical
function. Each physical function is also mapped to a port, to which its
switch element is connected. In other words, this switch can be visualized
as an embedded vSwitch that can connect a physical function's virtual
station interfaces (VSIs) to the egress/ingress port. Egress/ingress
filters will be eventually created and applied on this switch element.
As part of the initialization flow, the driver gets configuration data
from this switch element and stores it.

Scheduler configuration:
The Tx scheduler is a subsystem responsible for setting and enforcing QoS.
As part of the initialization flow, the driver queries and stores the
default scheduler configuration for the given physical function.

Device capabilities:
As part of initialization, the driver has to determine what the device is
capable of (ex. max queues, VSIs, etc). This information is obtained from
the firmware and stored by the driver.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 10:14:57 -07:00
Anirudh Venkataramanan
f31e4b6fe2 ice: Start hardware initialization
This patch implements multiple pieces of the initialization flow
as follows:

1) A reset is issued to ensure a clean device state, followed
   by initialization of admin queue interface.

2) Once the admin queue interface is up, clear the PF config
   and transition the device to non-PXE mode.

3) Get the NVM configuration stored in the device's non-volatile
   memory (NVM) using ice_init_nvm.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 09:59:08 -07:00
Anirudh Venkataramanan
7ec59eeac8 ice: Add support for control queues
A control queue is a hardware interface which is used by the driver
to interact with other subsystems (like firmware, PHY, etc.). It is
implemented as a producer-consumer ring. More specifically, an
"admin queue" is a type of control queue used to interact with the
firmware.

This patch introduces data structures and functions to initialize
and teardown control/admin queues. Once the admin queue is initialized,
the driver uses it to get the firmware version.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 09:44:56 -07:00
Joe Perches
d3757ba4c1 ethernet: Use octal not symbolic permissions
Prefer the direct use of octal for permissions.

Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.

Miscellanea:

o Whitespace neatening around these conversions.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 12:07:49 -04:00
Anirudh Venkataramanan
837f08fdec ice: Add basic driver framework for Intel(R) E800 Series
This patch adds a basic driver framework for the Intel(R) E800 Ethernet
Series of network devices. There is no functionality right now other than
the ability to load.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 08:28:02 -07:00
Björn Töpel
ed93a39871 ixgbe: tweak page counting for XDP_REDIRECT
The current page counting scheme assumes that the reference count
cannot decrease until the received frame is sent to the upper layers
of the networking stack. This assumption does not hold for the
XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have
its reference count decreased via the xdp_do_redirect call.

To work around that, we now start off by a large page count and then
don't allow a refcount less than two.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:23:51 -07:00
Tony Nguyen
2eb34bafb3 ixgbevf: Add XDP queue stats reporting
XDP stats are included in TX stats, however, they are not
reported in TX queue stats since they are setup on different
queues.  Add reporting for XDP queue stats to provide
consistency between the total stats and per queue stats.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:22:11 -07:00
Tony Nguyen
be8333322e ixgbevf: Add support for meta data
Add support for XDP meta data when using build skb.

Based on commit 366a88fe2f ("bpf, ixgbe: add meta data support")

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:20:57 -07:00
Tony Nguyen
efecfd5f80 ixgbevf: Delay tail write for XDP packets
Current XDP implementation hits the tail on every XDP_TX; change the
driver to only hit the tail after packet processing is complete.

Based on
commit 7379f97a4f ("ixgbe: delay tail write to every 'n' packets")

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:13:45 -07:00
Tony Nguyen
21092e9ce8 ixgbevf: Add support for XDP_TX action
This implements the XDP_TX action which is modeled on the ixgbe
implementation. However instead of using CPU id to determine which XDP
queue to use, this uses the received RX queue index, which is similar
to i40e. Doing this eliminates the restriction that number of CPUs not
exceed number of XDP queues that ixgbe has.

Also, based on the number of queues available, the number of TX queues
may be reduced when an XDP program is loaded in order to accommodate the
XDP queues.

Based largely on
commit 33fdc82f08 ("ixgbe: add support for XDP_TX action")

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:12:15 -07:00
Tony Nguyen
c7aec59657 ixgbevf: Add XDP support for pass and drop actions
Implement XDP_PASS and XDP_DROP based on the ixgbe implementation.

Based largely on commit 9247080816 ("ixgbe: add XDP support for pass and
drop actions").

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:08:06 -07:00
Shannon Nelson
70da6824c3 ixgbe: enable TSO with IPsec offload
Fix things up to support TSO offload in conjunction
with IPsec hw offload.  This raises throughput with
IPsec offload on to nearly line rate.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 15:04:24 -07:00
Shannon Nelson
1db685e676 ixgbe: no need for esp trailer if GSO
There is no need to calculate the trailer length if we're doing
a GSO/TSO, as there is no trailer added to the packet data.
Also, don't bother clearing the flags field as it was already
cleared earlier.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 14:55:10 -07:00
Shannon Nelson
871dd09bdb ixgbe: remove unneeded ipsec test in TX path
Since the ipsec data fields will be zero anyway in the non-ipsec
case, we can remove the conditional jump.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 14:51:57 -07:00
Shannon Nelson
2137aec017 ixgbe: no need for ipsec csum feature check
With the patch
commit f8aa2696b4af ("esp: check the NETIF_F_HW_ESP_TX_CSUM bit before segmenting")
we no longer need to protect ourself from checksum
offload requests on IPsec packets, so we can remove
the check in our .ndo_features_check callback.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 14:50:41 -07:00
Paul Greenwalt
9bf1e20f65 ixgbe: fix read-modify-write in x550 phy setup
Replaced an assignment operation with an OR operation.

The variable assignment was overwriting the value read from the PHY
register. The OR operation sets only the intended register bits.

The bits that were being overwritten are reserved, so the assignment had no
functional impact.

Reported by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 14:49:07 -07:00
Paul Greenwalt
1aa37845f7 ixgbe: add status reg reads to ixgbe_check_remove
Add status register reads and delay between reads to ixgbe_check_remove.
Registers can read 0xFFFFFFFF during PCI reset, which causes the driver
to remove the adapter. The additional status register reads can reduce the
chance of this race condition.

If the status register is not 0xFFFFFFFF, then ixgbe_check_remove returns
the value of the register being read.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23 14:18:26 -07:00
Jeff Kirsher
ae06c70b13 intel: add SPDX identifiers to all the Intel drivers
Add the SPDX identifiers to all the Intel wired LAN driver files, as
outlined in Documentation/process/license-rules.rst.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:18:21 -04:00
David S. Miller
03fe2debbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 11:31:58 -04:00
Paweł Jabłoński
12d80bca0b i40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE
This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the
PF Reset path when Global Reset is in progress. While the driver
is polling for the end of the PF Reset and the Global Reset is
triggered, abandon the PF Reset path and prepare for the
upcoming Global Reset.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:57:52 -07:00
Jacob Keller
6b9a9c26ef i40evf: remove flags that are never used
These flags were defined, but there is no use within the driver code, so
we don't need to keep them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:56:27 -07:00
Patryk Małek
3f8c843727 i40e: Prevent setting link speed on I40E_DEV_ID_25G_B
Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:55:00 -07:00
Doug Dziggel
85925cd0b8 i40e: Fix incorrect return types
Fix return types from i40e_status to enum i40e_status_code.

Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:52:32 -07:00
Jacob Keller
35bea90436 i40e: add doxygen comment for new mode parameter
A recent patch updated the signature for i40e_aq_set_switch_config() to
add a new parameter 'mode'. It forgot to document the parameter in the
doxygen function header comment. Add the parameter to the function
description now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:50:51 -07:00
Shiraz Saleem
ddbb8d5dd9 i40e: Close client on suspend and restore client MSIx on resume
During suspend client MSIx vectors are freed while they are still
in use causing a crash on entering S3.

Fix this calling client close before freeing up its MSIx vectors.
Also update the client MSIx vectors on resume before client
open is called.

Fixes commit b980c0634f ("i40e: shutdown all IRQs and disable MSI-X
when suspended")

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:46:09 -07:00
Patryk Małek
88244a48d2 i40e: Prevent setting link speed on KX_X722
Setting link settings on backplane devices shouldn't be allowed.
This patch adds one more device id to the list which we check
that against.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:44:23 -07:00
Jan Sokolowski
32c23b47db i40e: Properly check allowed advertisement capabilities
The i40e_set_link_ksettings and i40e_get_link_ksettings use different
codepaths to check available and supported advertisement modes. This
creates scenarios where it's possible to set a mode that's not allowed,
resulting in a link down.

Fix setting advertisement in i40e_set_link_ksettings by calling
i40e_get_link_ksettings to check what modes are allowed.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:42:20 -07:00
Alexander Duyck
640a8af584 i40evf: Reorder configure_clsflower to avoid deadlock on error
While doing some code review I noticed that we can get into a state where
we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of
flower filters. This patch is meant to address that plus tweak the ordering
of the while loop waiting on it slightly so that we don't wait an extra
period after we have failed for the last time.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-19 09:28:03 -07:00
Jacob Keller
089915f0f2 i40e: restore TCPv4 input set when re-enabling ATR
When we re-enable ATR we need to restore the input set for TCPv4
filters, in order for ATR to function correctly. We already do this for
the normal case of re-enabling ATR when disabling ntuple support.
However, when re-enabling ATR after the last TCPv4 filter is removed (but
when ntuple support is still active), we did not restore the TCPv4
filter input set.

This can cause problems if the TCPv4 filters from FDir had changed the
input set, as ATR will no longer behave as expected.

When clearing the ATR auto-disable flag, make sure we restore the TCPv4
input set to avoid this.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:34:35 -07:00
Mariusz Stachura
bc2a3a6cdb i40e: fix for wrong partition id calculation on OCP mezz cards
This patch overwrites number of ports for X722 devices with support
for OCP PHY mezzanine.
The old method with checking if port is disabled in the PRTGEN_CNF
register cannot be used in this case. When the OCP is removed, ports
were seen as disabled, which resulted in wrong calculation of partition
id, that caused WoL to be disabled on certain ports.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:32:53 -07:00
Jacob Keller
01c9695284 i40e: factor out re-enable functions for ATR and SB
A future patch needs to expand on the logic for re-enabling ATR. Doing
so would cause some code to break the 80-character line limit.

To reduce the level of indentation, factor out helper functions for
re-enabling ATR and SB rules.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:30:47 -07:00
Jacob Keller
6ac6d5a7ff i40e: track filter type statistics when deleting invalid filters
When hardware has trouble with a particular filter, we delete it from
the list. Unfortunately, we did not properly update the per-filter
statistic when doing so.

Create a helper function to handle this, and properly reduce the
necessary counter so that it tracks the number of active filters
properly.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:29:08 -07:00
Filip Sadowski
03ce7b1d23 i40e: Fix permission check for VF MAC filters
When VF requests adding of MAC filters the checking is done against number
of already present MAC filters not adding them at the same time. It makes
it possible to add a bunch of filters at once possibly exceeding
acceptable limit of I40E_VC_MAX_MAC_ADDR_PER_VF filters.

This happens because when checking vf->num_mac, we do not check how many
filters are being requested at once. Modify the check function to ensure
that it knows how many filters are being requested. This allows the
check to ensure that the total number of filters in a single request
does not cause us to go over the limit.

Additionally, move the check to within the lock to ensure that the
vf->num_mac is checked while holding the lock to maintain consistency.
We could have simply moved the call to i40e_vf_check_permission to
within the loop, but this could cause a request to be non-atomic, and
add some but not all the addresses, while reporting an error code. We
want to avoid this behavior so that users are not confused about which
filters have or have not been added.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:23:11 -07:00
Jacob Keller
2972b00797 i40e: Cleanup i40e_vlan_rx_register
We used to use the function i40e_vlan_rx_register as a way to hook
into the now defunct .ndo_vlan_rx_register netdev hook. This was
removed but we kept the function around because we still used it
internally to control enabling or disabling of VLAN stripping.

As pointed out in upstream review, VLAN stripping is only used in a
single location and the previous function is quite small, just inline
it into i40e_restore_vlan() rather than carrying the function
separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:22:21 -07:00
Paweł Jabłoński
028daf8011 i40e: Fix attach VF to VM issue
Fix for "Resource temporarily unavailable" problem when virsh is
trying to attach a device to VM. When the VF driver is loaded on
host and virsh is trying to attach it to the VM and set a MAC
address, it ends with a race condition between i40e_reset_vf and
i40e_ndo_set_vf_mac functions. The bug is fixed by adding polling
in i40e_ndo_set_vf_mac function For when the VF is in Reset mode.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:22:17 -07:00
Gustavo A R Silva
20dd9147c9 i40evf/i40evf_main: Fix variable assignment in i40evf_parse_cls_flower
It seems this is a copy-paste error and that the proper variable to use
in this particular case is _src_ instead of _dst_.

Addresses-Coverity-ID: 1465282 ("Copy-paste error")
Fixes: 0075fa0fad ("i40evf: Add support to apply cloud filters")
Signed-off-by: Gustavo A R Silva <garsilva@embeddedor.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:22:13 -07:00
Corentin Labbe
1f71e2b11c i40e: remove i40e_fcoe files
i40e_fcoe support was removed via commit 9eed69a914 ("i40e: Drop FCoE code from core driver files")
But this left files in place but un-compilable.
Let's finish the cleaning.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-14 12:21:38 -07:00
Paul Greenwalt
b03254d797 ixgbe: fix disabling hide VLAN on VF reset
If port VLAN is enabled, set PFQDE.HIDE_VLAN during VF reset.

Setting only PFQDE.PFQDE during VF reset was clearing PFQDE.HIDE_VLAN.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 12:54:59 -07:00
Benjamin Poirier
e2710dbf0d e1000e: Fix link check race condition
Alex reported the following race condition:

/* link goes up... interrupt... schedule watchdog */
\ e1000_watchdog_task
	\ e1000e_has_link
		\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
			\ e1000e_phy_has_link_generic(..., &link)
				link = true

					 /* link goes down... interrupt */
					 \ e1000_msix_other
						 hw->mac.get_link_status = true

			/* link is up */
			mac->get_link_status = false

		link_active = true
		/* link_active is true, wrongly, and stays so because
		 * get_link_status is false */

Avoid this problem by making sure that we don't set get_link_status = false
after having checked the link.

It seems this problem has been present since the introduction of e1000e.

Link: https://lkml.org/lkml/2018/1/29/338
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 12:05:39 -07:00
Benjamin Poirier
3016e0a0c9 Revert "e1000e: Separate signaling for link check/link up"
This reverts commit 19110cfbb3.
This reverts commit 4110e02eb4.
This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98.

Commit 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
changed what happens to the link status when there is an error which
happens after "get_link_status = false" in the copper check_for_link
callbacks. Previously, such an error would be ignored and the link
considered up. After that commit, any error implies that the link is down.

Revert commit 19110cfbb3 ("e1000e: Separate signaling for link check/link
up") and its followups. After reverting, the race condition described in
the log of commit 19110cfbb3 is reintroduced. It may still be triggered
by LSC events but this should keep the link down in case the link is
electrically unstable, as discussed. The race may no longer be
triggered by RXO events because commit 4aea7a5c5e ("e1000e: Avoid
receiver overrun interrupt bursts") restored reading icr in the Other
handler.

Link: https://lkml.org/lkml/2018/3/1/789
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 11:34:05 -07:00
Arnd Bergmann
954b54dea0 ixgbevf: fix unused variable warning
The new ixgbevf_set_rx_buffer_len() function causes a harmless warnings
in configurations with large page size:

drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function 'ixgbevf_set_rx_buffer_len':
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1758:15: error: unused variable 'max_frame' [-Werror=unused-variable]

This rephrases the code so that the compiler can see the use of that
variable, making it slightly easier to read in the process.

Fixes: f15c5ba5b6 ("ixgbevf: add support for using order 1 pages to receive large frames")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 11:05:11 -07:00
Tonghao Zhang
e656805c1b ixgbe: Add receive length error counter
ixgbe enabled rlec counter and the rx_error used it.
We can export the counter directly via ethtool -S ethX.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 11:03:32 -07:00
Shannon Nelson
5c4aa45863 ixgbe: remove unneeded ipsec state free callback
With commit 7f05b467a7 ("xfrm: check for xdo_dev_state_free")
we no longer need to add an empty callback function
to the driver, so now let's remove the useless code.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 10:59:42 -07:00
Shannon Nelson
0ae418e6e2 ixgbe: fix ipsec trailer length
Fix up the Tx trailer length calculation.  We can't believe the
trailer len from the xstate information because it was calculated
before the packet was put together and padding added.  This bit
of code finds the padding value in the trailer, adds it to the
authentication length, and saves it so later we can put it into
the Tx descriptor to tell the device where to stop the checksum
calculation.

Fixes: 5925947047 ("ixgbe: process the Tx ipsec offload")
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 10:44:59 -07:00
Shannon Nelson
68c1fb2d30 ixgbe: check for 128-bit authentication
Make sure the Security Association is using
a 128-bit authentication, since that's the only
size that the hardware offload supports.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-12 10:36:04 -07:00
David S. Miller
3eac60d173 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2018-03-05

This series contains updates to igb only.

Corinna Vinschen adds the support for trusted VFs into the igb driver.

Mika fixes an issue where PCIe device is physically unplugged can cause
a kernel crash.  This issue is that netif_device_detach() is called in
these cases, which prevents netif_unregister() from bringing the device
down properly.

Christophe JAILLET fixes an issue with igb where HWTSTAMP_TX_ON was
being handled like a bit mask and not a value.

v2: dropped the e1000e fix from the series since I will be pushing it
    through David Miller's net tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07 12:05:56 -05:00
David S. Miller
0f3e9c97eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All of the conflicts were cases of overlapping changes.

In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06 01:20:46 -05:00
Pierre-Yves Kerbrat
aea3fca005 e1000e: allocate ring descriptors with dma_zalloc_coherent
Descriptor rings were not initialized at zero when allocated
When area contained garbage data, it caused skb_over_panic in
e1000_clean_rx_irq (if data had E1000_RXD_STAT_DD bit set)

This patch makes use of dma_zalloc_coherent to make sure the
ring is memset at 0 to prevent the area from containing garbage.

Following is the signature of the panic:
IODDR0@0.0: skbuff: skb_over_panic: text:80407b20 len:64010 put:64010 head:ab46d800 data:ab46d842 tail:0xab47d24c end:0xab46df40 dev:eth0
IODDR0@0.0: BUG: failure at net/core/skbuff.c:105/skb_panic()!
IODDR0@0.0: Kernel panic - not syncing: BUG!
IODDR0@0.0:
IODDR0@0.0: Process swapper/0 (pid: 0, threadinfo=81728000, task=8173cc00 ,cpu: 0)
IODDR0@0.0: SP = <815a1c0c>
IODDR0@0.0: Stack:      00000001
IODDR0@0.0: b2d89800 815e33ac
IODDR0@0.0: ea73c040 00000001
IODDR0@0.0: 60040003 0000fa0a
IODDR0@0.0: 00000002
IODDR0@0.0:
IODDR0@0.0: 804540c0 815a1c70
IODDR0@0.0: b2744000 602ac070
IODDR0@0.0: 815a1c44 b2d89800
IODDR0@0.0: 8173cc00 815a1c08
IODDR0@0.0:
IODDR0@0.0:     00000006
IODDR0@0.0: 815a1b50 00000000
IODDR0@0.0: 80079434 00000001
IODDR0@0.0: ab46df40 b2744000
IODDR0@0.0: b2d89800
IODDR0@0.0:
IODDR0@0.0: 0000fa0a 8045745c
IODDR0@0.0: 815a1c88 0000fa0a
IODDR0@0.0: 80407b20 b2789f80
IODDR0@0.0: 00000005 80407b20
IODDR0@0.0:
IODDR0@0.0:
IODDR0@0.0: Call Trace:
IODDR0@0.0: [<804540bc>] skb_panic+0xa4/0xa8
IODDR0@0.0: [<80079430>] console_unlock+0x2f8/0x6d0
IODDR0@0.0: [<80457458>] skb_put+0xa0/0xc0
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<804079c8>] e1000_clean_rx_irq+0x188/0x3e8
IODDR0@0.0: [<80407b1c>] e1000_clean_rx_irq+0x2dc/0x3e8
IODDR0@0.0: [<80468b48>] __dev_kfree_skb_any+0x88/0xa8
IODDR0@0.0: [<804101ac>] e1000e_poll+0x94/0x288
IODDR0@0.0: [<8046e9d4>] net_rx_action+0x19c/0x4e8
IODDR0@0.0:   ...
IODDR0@0.0: Maximum depth to print reached. Use kstack=<maximum_depth_to_print> To specify a custom value (where 0 means to display the full backtrace)
IODDR0@0.0: ---[ end Kernel panic - not syncing: BUG!

Signed-off-by: Pierre-Yves Kerbrat <pkerbrat@kalray.eu>
Signed-off-by: Marius Gligor <mgligor@kalray.eu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 13:34:59 -08:00
Benjamin Poirier
4e7dc08e57 e1000e: Fix check_for_link return value with autoneg off
When autoneg is off, the .check_for_link callback functions clear the
get_link_status flag and systematically return a "pseudo-error". This means
that the link is not detected as up until the next execution of the
e1000_watchdog_task() 2 seconds later.

Fixes: 19110cfbb3 ("e1000e: Separate signaling for link check/link up")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 10:15:51 -08:00
Benjamin Poirier
116f4a640b e1000e: Avoid missed interrupts following ICR read
The 82574 specification update errata 12 states that interrupts may be
missed if ICR is read while INT_ASSERTED is not set. Avoid that problem by
setting all bits related to events that can trigger the Other interrupt in
IMS.

The Other interrupt is raised for such events regardless of whether or not
they are set in IMS. However, only when they are set is the INT_ASSERTED
bit also set in ICR.

By doing this, we ensure that INT_ASSERTED is always set when we read ICR
in e1000_msix_other() and steer clear of the errata. This also ensures that
ICR will automatically be cleared on read, therefore we no longer need to
clear bits explicitly.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 10:08:20 -08:00
Benjamin Poirier
361a954e6a e1000e: Fix queue interrupt re-raising in Other interrupt
Restores the ICS write for Rx/Tx queue interrupts which was present before
commit 16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
but was not restored in commit 4aea7a5c5e
("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1).

This re-raises the queue interrupts in case the txq or rxq bits were set in
ICR and the Other interrupt handler read and cleared ICR before the queue
interrupt was raised.

Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 10:06:39 -08:00
Benjamin Poirier
1f0ea19722 Partial revert "e1000e: Avoid receiver overrun interrupt bursts"
This partially reverts commit 4aea7a5c5e.

We keep the fix for the first part of the problem (1) described in the log
of that commit, that is to read ICR in the other interrupt handler. We
remove the fix for the second part of the problem (2), Other interrupt
throttling.

Bursts of "Other" interrupts may once again occur during rxo (receive
overflow) traffic conditions. This is deemed acceptable in the interest of
avoiding unforeseen fallout from changes that are not strictly necessary.
As discussed, the e1000e driver should be in "maintenance mode".

Link: https://www.spinics.net/lists/netdev/msg480675.html
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 10:03:52 -08:00
Benjamin Poirier
745d0bd3af e1000e: Remove Other from EIAC
It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.

Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
16ecba59bc ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).

Fixes: 4aea7a5c5e ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 09:30:14 -08:00
Christophe JAILLET
0a6f2f05a2 igb: Fix a test with HWTSTAMP_TX_ON
'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask.
The modified code should behave the same, because HWTSTAMP_TX_ON is 1
and no other possible values of 'tx_type' would match the test.
However, this is more future-proof, should other values be allowed one day.

See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h'

This fixes a warning reported by smatch:
   igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&'

Fixes: 26bd4e2db0 ("igb: protect TX timestamping from API misuse")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 09:23:37 -08:00
Mika Westerberg
17a0b9add6 igb: Do not call netif_device_detach() when PCIe link goes missing
When the driver notices that PCIe link is gone by reading 0xffffffff
from a register it clears hw->hw_addr and then calls netif_device_detach().
This happens when the PCIe device is physically unplugged for example
the user disconnected the Thunderbolt cable.

However, netif_device_detach() prevents netif_unregister() from bringing
the device down properly including tearing down MSI-X vectors. This
triggers following crash during the driver removal:

  igb 0000:0b:00.0 enp11s0f0: PCIe link lost, device now detached
  ------------[ cut here ]------------
  kernel BUG at drivers/pci/msi.c:352!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  ...
  Call Trace:
   pci_disable_msix+0xc9/0xf0
   igb_reset_interrupt_capability+0x58/0x60 [igb]
   igb_remove+0x90/0x100 [igb]
   pci_device_remove+0x31/0xa0
   device_release_driver_internal+0x152/0x210
   pci_stop_bus_device+0x78/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_bus_device+0x26/0xa0
   pci_stop_bus_device+0x38/0xa0
   pci_stop_and_remove_bus_device+0x9/0x20
   trim_stale_devices+0xee/0x130
   ? _raw_spin_unlock_irqrestore+0xf/0x30
   trim_stale_devices+0x8f/0x130
   ? _raw_spin_unlock_irqrestore+0xf/0x30
   trim_stale_devices+0xa1/0x130
   ? get_slot_status+0x8b/0xc0
   acpiphp_check_bridge.part.7+0xf9/0x140
   acpiphp_hotplug_notify+0x170/0x1f0
   ...

To prevent the crash do not call netif_device_detach() in igb_rd32().
This should be fine because hw->hw_addr is set to NULL preventing future
hardware access of the now missing device.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181
Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com>
Reported-by: Nikolay Bogoychev <nheart@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 09:21:31 -08:00
Corinna Vinschen
1b8b062a99 igb: add VF trust infrastructure
* Add a per-VF value to know if a VF is trusted, by default don't
  trust VFs.

* Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add
  trust status to ndo_get_vf_config output.

* Allow a trusted VF to change MAC and MAC filters even if MAC
  has been administratively set.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05 08:35:05 -08:00
Jacob Keller
e9d328d3b7 fm10k: bump version number
We're aligned with latest version released on SourceForge, so update the
version number to match.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28 07:22:29 -08:00
Jacob Keller
7d6707a9da fm10k: fix incorrect warning for function prototype
Recent kernels now complain about incorrect function prototype comments,
in order to ensure comments are accurate to the function. However, it
incorrectly associates the comment above the fm10k_pci_tbl[] as
a function header comment. Fix this by removing the extra "*" in the
comment. This normally indicates that the function is a doxygen style
function header comment.

Once removed, the logic no longer kicks in and the following warning is
fixed:

  warning: cannot understand function prototype: 'const struct pci_device_id fm10k_pci_tbl[] = '

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28 07:22:29 -08:00
Jacob Keller
363656eb5e fm10k: fix function doxygen comments
Several function header comments had incorrect function parameter
definitions. Recent versions of the upstream kernel have started to warn
about these issues. Fix up the comments which do not match in order to
resolve these new warnings.

While fixing these, update the copyright year also.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28 07:22:29 -08:00
David S. Miller
c1de13bb93 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-02-26

This series contains updates to i40e and i40evf only.

Mariusz adds a new ethtool private flag for forcing true link state with
the requested changes from Jakub Kicinski.

Paweł fixes an issue where we were double locking the same resource
which would generate a kernel panic after bringing an interface up for
i40evf.

Alan modifies both drivers to use software values to determine if there
are packets stalled on the ring with the added benefit of being less CPU
intensive since we do not need to reach into the hardware to get the
values.

Colin Ian King provides a few fixes detected by Coverity, first was to
pass a struct by reference versus by value to be more efficient.  Then
verify the VSI pointer is not NULL before trying to dereference it.
Cleaned up redundant checks that always return true.

Dan Carpenter fixes over indented lines of code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 12:56:36 -05:00
Dan Carpenter
5dd3691c98 i40e: remove some stray indenting
These two lines are indented too far.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:40:39 -08:00
Colin Ian King
deb9a9ad3e i40evf: remove redundant array comparisons to 0 checks
The checks to see if key->dst.s6_addr and key->src.s6_addr are null
pointers are redundant because these are constant size arrays and
so the checks always return true.  Fix this by removing the redundant
checks.   Also replace filter->f with vf, allowing wide lines to be
condensed and to rejoin some split wide lines.

Detected by CoverityScan, CID#1465279 ("Array compared to 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:38:24 -08:00
Colin Ian King
46345b38e9 i40e: check that pointer VSI is not null before dereferencing it
Function i40e_find_vsi_from_id can potentially return null, hence
VSI may be null, so defensively check it is non-null before
dereferencing it to check the seid.

Fixes: e284fc2804 ("i40e: Add and delete cloud filter")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:36:46 -08:00
Colin Ian King
e85c1b8234 i40evf: pass struct virtchnl_filter by reference rather than by value
Passing struct virtchnl_filter f by value requires a 272 byte copy
on x86_64, so instead pass it by reference is much more efficient. Also
adjust some lines that are over 80 chars.

Detected by CoverityScan, CID#1465285 ("Big parameter passed by value")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:35:01 -08:00
Alan Brady
04d4105174 i40e/i40evf: use SW variables for hang detection
The i40e_detect_recover_hung function uses the i40e_get_tx_pending
function to determine if there are packets stalled on the ring.
i40e_get_tx_pending calculates the pending packets using the head
writeback value and HW tail.  If the queue is stopped and we lose the
interrupt to update our next_to_clean then we a) won't get another
interrupt to clean because queue is stopped b) we won't catch the
problem with i40e_detect_recover_hung because the HW values look like
there's no packets waiting to be transmitted.  Using the SW values we
can catch the issue because next_to_clean will be out of sync with head
writeback.

This has the added benefit being less CPU intensive because we don't
need to reach into the hardware to get the values.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:33:27 -08:00
Paweł Jabłoński
8cd5fe62cc i40evf: Fix double locking the same resource
Removes the locking of adapter->mac_vlan_list_lock resource in
i40evf_add_filter(). The locking part is moved above i40evf_add_filter().
i40evf_add_filter(), called by i40evf_addr_sync(), was trying to lock the
resource again and double locking generated a kernel panic after bringing
an interface up.

Fixes: 8946b56354 ("i40evf: use __dev_[um]c_sync routines in
       .set_rx_mode")
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:29:41 -08:00
Mariusz Stachura
c3880bd159 i40e: link_down_on_close private flag support
This patch introduces new ethtool private flag used for
forcing true link state. Function i40e_force_link_state that implements
this functionality was added, it sets phy_type = 0 in order to
work-around firmware's LESM. False positive error messages were
suppressed.

The ndo_open() should not succeed if there were issues with forcing link
state to be UP.

Added I40E_PHY_TYPES_BITMASK define with all phy types OR-ed together in
one bitmask.  Added after phy type definition, so it will be hard to
forget to include new phy types to the bitmask.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 11:48:06 -08:00
Emil Tantilov
0c5661ecc5 ixgbe: fix crash in build_skb Rx code path
Add check for build_skb enabled ring in ixgbe_dma_sync_frag().
In that case &skb_shinfo(skb)->frags[0] may not always be set which
can lead to a crash. Instead we derive the page offset from skb->data.

Fixes: 42073d91a2
("ixgbe: Have the CPU take ownership of the buffers sooner")
CC: stable <stable@vger.kernel.org>
Reported-by: Ambarish Soman <asoman@redhat.com>
Suggested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26 13:44:24 -05:00
Colin Ian King
93a6a37c69 ixgbevf: remove redundant initialization of variable 'dma'
Variable dma is initialized with a value that is never read, later
on it is re-assigned a new value, hence the initialization is redundant
and can be removed.

Cleans up clang warning:
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:584:13: warning: Value
stored to 'dma' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:38:50 -08:00
Emil Tantilov
6d9c02171a ixgbevf: add build_skb support
Add support for build_skb() similar to:
commit 6f429223b3 ("ixgbe: Add support for build_skb")

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:36:24 -08:00
Emil Tantilov
925f5690ff ixgbevf: break out Rx buffer page management
Based on commit e014272672 ("igb: Break out Rx buffer page management")

Consolidate Rx code paths to reduce duplication when we expand them in
the future.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:34:50 -08:00
Emil Tantilov
21c046e448 ixgbevf: allocate the rings as part of q_vector
Make it so that all rings allocations are made as part of q_vector.
The advantage to this is that we can keep all of the memory related to
a single interrupt in one page.

The goal is to bring the logic of handling rings closer to ixgbe.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:32:46 -08:00
Emil Tantilov
5cc0f1c0dc ixgbevf: make sure all frames fit minimum size requirements
Similar to commit a50c29dd09
("ixgbe: Make certain that all frames fit minimum size requirements")

Make sure that any packet we attempt to transmit will meet minimum
size requirements.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:30:15 -08:00
Emil Tantilov
1ab37e12e3 ixgbevf: add support for padding packet
Following the logic from commit 2de6aa3a66
("ixgbe: Add support for padding packet")

Add support for providing a buffer with headroom and tail room
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN.  With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:29:49 -08:00
Emil Tantilov
f2d00eca27 ixgbevf: setup queue counts
Add calls for netif_set_real_num_t/rx_queues() in ixgbevf_open().
Make sure that calls to ixgbevf_open() are rtnl protected and improve
the error handling when setting up multiple queues.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:27:07 -08:00
Emil Tantilov
f15c5ba5b6 ixgbevf: add support for using order 1 pages to receive large frames
Based on commit 8649aaef40
("igb: Add support for using order 1 pages to receive large frames")

Add support for using 3K buffers in order 1 page. We are reserving 1K for
now to have space available for future tail room and head room when we
enable build_skb support.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:25:03 -08:00
Emil Tantilov
bc04347f5b ixgbevf: add ethtool private flag for legacy Rx
Introduce legacy-rx private flag that will allow switching between the
old and new (build_skb based) Rx code paths. The implementation is the
same as in commit e08912985b
("igb: Add support for ethtool private flag to allow use of legacy Rx")

This provides a means of validating the legacy Rx path in the event that
we are forced to fall back.  At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:20:35 -08:00
Emil Tantilov
9913db03d7 ixgbevf: use page_address offset from page
Based on commit 3456fd5342
("igb: Use page_address offset from page instead of masking virtual address")

Update the handling of page addresses so that we always refer to them using
a void pointer, and try to use the consistent name of va indicating we are
working with a virtual address.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:16:15 -08:00
Jacob Keller
6704a3abf4 ixgbe: prevent ptp_rx_hang from running when in FILTER_ALL mode
On hardware which supports timestamping all packets, the timestamps are
recorded in the packet buffer, and the driver no longer uses or reads
the registers. This makes the logic for checking and clearing Rx
timestamp hangs meaningless.

If we run the ixgbe_ptp_rx_hang() function in this case, then the driver
will continuously spam the log output with "Clearing Rx timestamp hang".
These messages are spurious, and confusing to end users.

The original code in commit a9763f3cb5 ("ixgbe: Update PTP to support
X550EM_x devices", 2015-12-03) did have a flag PTP_RX_TIMESTAMP_IN_REGISTER
which was intended to be used to avoid the Rx timestamp hang check,
however it did not actually check the flag before calling the function.

Do so now in order to stop the checks and prevent the spurious log
messages.

Fixes: a9763f3cb5 ("ixgbe: Update PTP to support X550EM_x devices", 2015-12-03)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:11:30 -08:00
Tonghao Zhang
60f4b64549 ixgbe: Avoid to write the RETA table when unnecessary
If indir == 0 in the ixgbe_set_rxfh(), it is unnecessary
to write the HW. Because redirection table is not changed.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 09:09:15 -08:00
Colin Ian King
8f611fb046 ixgbe: remove redundant initialization of 'pool'
Variable pool is being assigned zero and then in the following for-loop
is it being set to zero again. Remove the redundant first assignment.

Cleans up clang warning:
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c:61:2: warning: Value stored
to 'pool' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 08:28:14 -08:00
Avinash Dayanand
e284fc2804 i40e: Add and delete cloud filter
This patch provides support to add or delete cloud filter for queue
channels created for ADq on VF.
We are using the HW's cloud filter feature and programming it to act
as a TC filter applied to a group of queues.

There are two possible modes for a VF when applying a cloud filter
1. Basic Mode:	Intended to apply filters that don't need a VF to be
		Trusted. This would include the following
		  Dest MAC + L4 port
		  Dest MAC + VLAN + L4 port
2. Advanced Mode: This mode is only for filters with combination that
		  requires VF to be Trusted.
		  Dest IP + L4 port

When cloud filters are applied on a trusted VF and for some reason
the same VF is later made as untrusted then all cloud filters
will be deleted. All cloud filters has to be re-applied in
such a case.
Cloud filters are also deleted when queue channel is deleted.

Testing-Hints:
=============
1. Adding Basic Mode filter should be possible on a VF in
   Non-Trusted mode.
2. In Advanced mode all filters should be able to be created.

Steps:
======
1. Enable ADq and create TCs using TC mqprio command
2. Apply cloud filter.
3. Turn-off the spoof check.
4. Pass traffic.

Example:
========
1. tc qdisc add dev enp4s2 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
	queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev enp4s2 ingress
3. ethtool -K enp4s2 hw-tc-offload on
4. ip link set ens261f0 vf 0 spoofchk off
5. tc filter add dev enp4s2 protocol ip parent ffff: prio 1 flower\
	dst_ip 192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Harshitha Ramamurthy
0075fa0fad i40evf: Add support to apply cloud filters
This patch enables a tc filter to be applied as a cloud
filter for the VF. This patch adds functions which parse the
tc filter, extract the necessary fields needed to configure the
filter and package them in a virtchnl message to be sent to the
PF to apply them.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Avinash Dayanand
0c483bd4b8 i40e: Service request to configure bandwidth for ADq on a VF
This patch handles the request from ADq enabled VF to allocate
bandwidth to each traffic class which means for each VSI.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Harshitha Ramamurthy
591532d614 i40evf: Add support to configure bw via tc tool
This patch adds support to configure bandwidth for the traffic
classes via tc tool. The required information is passed to the PF
which is used in the process of setting up the traffic classes.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Avinash Dayanand
c4998aa302 i40e: Delete queue channel for ADq on VF
This patch takes care of freeing up all the VSIs, queues and
other ADq related software and hardware resources, when a user
requests for deletion of ADq on VF.

Example command:
tc qdisc del dev eth0 root

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Avinash Dayanand
5e97ce6348 i40evf: Alloc queues for ADq on VF
This patch allocates number of queues requested by the user as a part
of TC command when ADq is enabled on a VF.

In order to be consistent in design with PF implementation of ADq,
don't allow to set channels via ethtool from VF when ADq is already
enabled. This means the users will not be able to change the number of
queues/channels via ethtool for a VF when ADq is ON. In order to be
able to use set channels, users will be required to disable ADq first
and then try setting the channels again.

When ADq is enabled on VF, it goes through a reset during which VSIs
and queues are re-configured. Meanwhile if we receive link status
message from PF even before the queues are re-configured, just ignore
this link up message.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Avinash Dayanand
c27eac4816 i40e: Enable ADq and create queue channel/s on VF
This patch enables ADq and creates queue channels on a VF. An ADq
enabled VF can have up to 4 VSIs and each one of them represents
a traffic class and this is termed as a queue channel. Each of these
VSIs can have up to 4 queues. This patch services the request for
enabling ADq and adds queue channel based on the TC mqprio info
provided by the user in the VF.

Initially a check is made to see if spoof check is OFF, if not ADq
will not be enabled. PF notifies VF for a reset in order to complete
the creation of ADq resources i.e. creation of additional VSIs and
allocation of queues as per TC information, all in the reset path.

Steps:
======
1. Turn off the spoof check
2. Enable ADq using tc mqprio command with or without rate limit.
3. Pass traffic.

Example:
========
% ip link set dev eth0 vf 0 spoofchk off
% tc qdisc add dev $iface root mqprio num_tc 4 map\
	0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 queues\
	4@0 4@4 4@8 4@8 hw 1 mode channel

Expected results:
=================
1. Total number of queues for the VF should be sum of queues of all TCs.
2. Traffic flow should be normal without errors.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Harshitha Ramamurthy
d5b33d0244 i40evf: add ndo_setup_tc callback to i40evf
This patch introduces the callback to the ndo_setup_tc function
in the VF driver. We add a wrapper function to make room for the
upcoming cloud filter patches which add calls to different functions
from setup_tc.

First, we add support for capability exchange for ADQ between the
PF and VF. Next, we add support to take in the mqprio configuration
and configure queues as per the traffic classes, rate limit and the
priorities specified by the user. This is done by passing the channel
config to the PF driver through a virtchannel message.

The flags and bits added, track if ADq is enabled, set max number of
traffic classes to 4 and provide ability to negotiate capability with
the PF.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:22 -08:00
Avinash Dayanand
836ce5ed72 i40evf: Fix link up issue when queues are disabled
One of the previous patch fixes the link up issue by ignoring it if
i40evf is not in __I40EVF_RUNNING state. However this doesn't fix the
race condition when queues are disabled esp for ADq on VF. Hence check
if all queues are enabled before starting all queues.

Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14 09:43:21 -08:00
Harshitha Ramamurthy
693acdd0f1 i40evf: Make VF reset warning message more clear
When the PF resets the VF, the VF puts out a warning message
indicating that the VF received a reset message from the PF.
Make this message more clear so that we do not mistakenly
think that the PF is undergoing a reset.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:11 -08:00
Jacob Keller
8946b56354 i40evf: use __dev_[um]c_sync routines in .set_rx_mode
Similar to changes done to the PF driver in commit 6622f5cdba ("i40e:
make use of __dev_uc_sync and __dev_mc_sync"), replace our
home-rolled method for updating the internal status of MAC filters with
__dev_uc_sync and __dev_mc_sync.

These new functions use internal state within the netdev struct in order
to efficiently break the question of "which filters in this list need to
be added or removed" into singular "add this filter" and "delete this
filter" requests.

This vastly improves our handling of .set_rx_mode especially with large
number of MAC filters being added to the device, and even results in
a simpler .set_rx_mode handler.

Under some circumstances, such as when attached to a bridge, we may
receive a request to delete our own permanent address. Prevent deletion
of this address during i40evf_addr_unsync so that we don't accidentally
stop receiving traffic.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Dave Ertman
7b63435a50 i40e: i40e: Change ethtool check from MAC to HW flag
The MAC, FW Version and NPAR check used to determine
if shutting off the FW LLDP engine is supported is not
using the usual feature check mechanism.

This patch fixes the problem by moving the feature check
to i40e_sw_init in order to set a flag in pf->hw_features
that ethtool will use for priv_flags disable operation.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Alan Brady
7363115efb i40e: do not force filter failure in overflow promiscuous
Broadcast filters can now cause overflow promiscuous to trigger when
adding "too many" VLANs to all the ports of a device and the driver
needs a way to exit overflow promiscuous once triggered.

Currently the driver looks to see if there are "too many" filters and/or
we have any failed filters to determine when it is safe to exit overflow
promiscuous.  If we trigger overflow promiscuous with broadcast filters,
any new filters added will be "auto-failed" until we exit overflow
promiscuous.  Since the user can't manually remove the failed broadcast
filters for VLANs (nor should we expect the user to do such), there is
no way to exit overflow promiscuous without reloading the driver.

The easiest way to do this is to remove the shortcut to "auto-fail"
filters in overflow promiscuous.  If the user removes the VLANs, the
failed filters will be removed and since we're no longer "auto-failing"
new filters, we'll eventually get a good set of filters and exit
overflow promiscuous.

This has the side benefit of making filter state more explicit in that
if a filter says it's failed we know for a fact it failed and not just
assuming it will if we're in overflow promiscuous.  This is nice because
if the user removes some filters and then adds some, even if we're in
overflow promiscuous, the filter might succeed; we were just assuming it
won't because the user hasn't rectified other existing failed filters.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Alan Brady
cc6a96a419 i40e: refactor promisc_changed in i40e_sync_vsi_filters
This code here is quite complex and easy to screw up.  Let's see if we
can't improve the readability and maintainability a bit.  This refactors
out promisc_changed into two variables 'old_overflow' and 'new_overflow'
which makes it a bit clearer when we're concerned about when and how
overflow promiscuous is changed.  This also makes so that we no longer
need to pass a boolean pointer to i40e_aqc_add_filters.  Instead we can
simply check if we changed the overflow promiscuous flag since the
function start.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Harshitha Ramamurthy
fbd5eb54c2 i40evf: Use an iterator of the same type as the list
When iterating through the linked list of VLAN filters, make the
iterator the same type as that of the linked list.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Alan Brady
a48350c29b i40e: broadcast filters can trigger overflow promiscuous
When adding a bunch of VLANs to all the ports on a device, it's possible
to run out of space for broadcast filters.  The driver should trigger
overflow promiscuous in this circumstance to prevent traffic from being
unexpectedly dropped.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Mitch Williams
7be78aa444 i40e: don't leak memory addresses
Could a Bad Person do Bad Things to a server if they found these
addresses printed in the log? Who knows? But let's not take that risk.

Remove pointers from a bunch of printks. In some cases, I was able to
adjust the message to indicate whether or not the value was null. In
others, I just removed the entire message as there was really no hope of
saving it.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Wei Yongjun
03f431b33a i40evf: use GFP_ATOMIC under spin lock
A spin lock is taken here so we should use GFP_ATOMIC.

Fixes: 504398f0a7 ("i40evf: use spinlock to protect (mac|vlan)_filter_list")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Wei Yongjun
3758d2c74d i40e: Make local function i40e_get_link_speed static
Fixes the following sparse warning:

drivers/net/ethernet/intel/i40e/i40e_main.c:5440:5: warning:
 symbol 'i40e_get_link_speed' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13 11:40:10 -08:00
Alexander Duyck
a0073a4b8b i40e/i40evf: Add support for new mechanism of updating adaptive ITR
This patch replaces the existing mechanism for determining the correct
value to program for adaptive ITR with yet another new and more
complicated approach.

The basic idea from a 30K foot view is that this new approach will push the
Rx interrupt moderation up so that by default it starts in low latency and
is gradually pushed up into a higher latency setup as long as doing so
increases the number of packets processed, if the number of packets drops
to 4 to 1 per packet we will reset and just base our ITR on the size of the
packets being received. For Tx we leave it floating at a high interrupt
delay and do not pull it down unless we start processing more than 112
packets per interrupt. If we start exceeding that we will cut our interrupt
rates in half until we are back below 112.

The side effect of these patches are that we will be processing more
packets per interrupt. This is both a good and a bad thing as it means we
will not be blocking processing in the case of things like pktgen and XDP,
but we will also be consuming a bit more CPU in the cases of things such as
network throughput tests using netperf.

One delta from this versus the ixgbe version of the changes is that I have
made the interrupt moderation a bit more aggressive when we are in bulk
mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
main motivation behind moving this is to address the fact that we need to
update less frequently, and have more fine grained control due to the
separate Tx and Rx ITR times.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:50:10 -08:00
Alexander Duyck
556fdfd6e6 i40e/i40evf: Split container ITR into current_itr and target_itr
This patch is mostly prep-work for replacing the current approach to
programming the dynamic aka adaptive ITR. Specifically here what we are
doing is splitting the Tx and Rx ITR each into two separate values.

The first value current_itr represents the current value of the register.

The second value target_itr represents the desired value of the register.

The general plan by doing this is to allow for deferring the update of the
ITR value under certain circumstances. For now we will work with what we
have, but in the future I hope to change the behavior so that we always
only update one ITR at a time using some simple logic to determine which
ITR requires an update.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:43:50 -08:00
Alexander Duyck
d4942d581a i40evf: Correctly populate rxitr_idx and txitr_idx
While testing code for the recent ITR changes I found that updating the Tx
ITR appeared to have no effect with everything defaulting to the Rx ITR. A
bit of digging narrowed it down the fact that we were asking the PF to
associate all causes with ITR 0 as we weren't populating the itr_idx values
for either Rx or Tx.

To correct it I have added the configuration for these values to this
patch. In addition I did some minor clean-up to just add a local pointer
for the vector map instead of dereferencing it based off of the index
repeatedly. In my opinion this makes the resultant code a bit more readable
and saves us a few characters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:33:46 -08:00
Alexander Duyck
92418fb147 i40e/i40evf: Use usec value instead of reg value for ITR defines
Instead of using the register value for the defines when setting up the
ring ITR we can just use the actual values and avoid the use of shifts and
macros to translate between the values we have and the values we want.

This helps to make the code more readable as we can quickly translate from
one value to the other.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:29:47 -08:00
Alexander Duyck
4ff17929e6 i40e/i40evf: Don't bother setting the CLEARPBA bit
The CLEARPBA bit in the dynamic interrupt control register actually has
no effect either way on the hardware. As per errata 28 in the XL710
specification update the interrupt is actually cleared any time the
register is written with the INTENA_MSK bit set to 0. As such the act of
toggling the enable bit actually will trigger the interrupt being
cleared and could lead to potential lost events if auto-masking is
not enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:56:16 -08:00
Alexander Duyck
8b99b1179c i40e/i40evf: Clean-up of bits related to using q_vector->reg_idx
This patch is a further clean-up related to the change over to using
q_vector->reg_idx when accessing the ITR registers. Specifically the code
appears to have several other spots where we were computing the register
offset manually and this resulted in errors in a few spots.

Specifically in the i40evf functions for mapping queues to vectors it
appears we may have had an off by 1 error since (v_idx - 1) for the first
q_vector with an index of 0 would result in us returning -1 if I am not
mistaken.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:54:25 -08:00
Alan Brady
fe09ed0ee1 i40e: use changed_flags to check I40E_FLAG_DISABLE_FW_LLDP
Currently in i40e_set_priv_flags we use new_flags to check for the
I40E_FLAG_DISABLE_FW_LLDP flag.  This is an issue for a few a reasons.
DISABLE_FW_LLDP is persistent across reboots/driver reloads.  This means
we need some way to detect if FW LLDP is enabled on init.  We do this by
trying to init_dcb and if it fails with EPERM we know LLDP is disabled
in FW.

This could be a problem on older FW versions or NPAR enabled PFs because
there are situations where the FW could disable LLDP, but they do _not_
support using this flag to change it.  If we do end up in this
situation, the flag will be set, then when the user tries to change any
priv flags, the driver thinks the user is trying to disable FW LLDP on a
FW that doesn't support it and essentially forbids any priv flag
changes.

The fix is simple, instead of checking if this flag is set, we should be
checking if the user is trying to _change_ the flag on unsupported FW
versions.

This patch also adds a comment explaining that the cmpxchg is the point
of no return.  Once we put the new flags into pf->flags we can't back
out.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:51:54 -08:00
Paweł Jabłoński
17b4d25c12 i40e: Warn when setting link-down-on-close while in MFP
This patch adds a warning message when the link-down-on-close flag is
setting on. The warning is printed only on MFP devices

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:43:02 -08:00
Filip Sadowski
1fa51a650e i40e: Add delay after EMP reset for firmware to recover
This patch adds necessary delay for 4.33 firmware to recover after
EMP reset. Without this patch driver occasionally reinitializes
structures too quickly to communicate with firmware after EMP reset
causing AdminQ to timeout.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:38:38 -08:00
Alexander Duyck
71dc371993 i40e/i40evf: Clean up logic for adaptive ITR
The logic for dynamic ITR update is confusing at best as there were odd
paths chosen for how to find the rings associated with a given queue based
on the vector index and other inconsistencies throughout the code.

This patch is an attempt to clean up the logic so that we can more easily
understand what is going on. Specifically if there is a Rx or Tx ring that
is enabled in dynamic mode on the q_vector it is allowed to override the
other side of the interrupt moderation. While it isn't correct all this
patch is doing is cleaning up the logic for now so that when we come
through and fix it we can more easily identify that this is wrong.

The other big change made here is that we replace references to:
	vsi->rx_rings[q_vector->v_idx]->itr_setting
with:
	q_vector->rx.ring->itr_setting

The general idea is we can avoid the long pointer chase since just
accessing q_vector->rx.ring is a single pointer access versus having to
chase down vsi->rx_rings, and then finding the pointer in the array, and
finally chasing down the itr_setting from there.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:35:49 -08:00
Alexander Duyck
40588ca651 i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx
The rings are already split out into Tx and Rx rings so it doesn't make
sense to have any single ring store both a Tx and Rx itr_setting value.
Since that is the case drop the pair in favor of storing just a single ITR
value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:27:12 -08:00
Alan Brady
11a350c965 i40e: fix typo in function description
'bufer' should be 'buffer'

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 09:55:34 -08:00