Commit Graph

933654 Commits

Author SHA1 Message Date
Aleksandr Loktionov
3dbdd6c2f7 i40e: Add support for 5Gbps cards
Make possible for the i40e driver to bind to the new v710 for 5GBASE-T
NICs.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Jeff Kirsher
34a2a3b83e net/intel: remove driver versions from Intel drivers
As with other networking drivers, remove the unnecessary driver version
from the Intel drivers. The ethtool driver information and module version
will then report the kernel version instead.

For ixgbe, i40e and ice drivers, the driver passes the driver version to
the firmware to confirm that we are up and running.  So we now pass the
value of UTS_RELEASE to the firmware.  This adminq call is required per
the HAS document.  The Device then sends an indication to the BMC that the
PF driver is present. This is done using Host NC Driver Status Indication
in NC-SI Get Link command or via the Host Network Controller Driver Status
Change AEN.

What the BMC may do with this information is implementation-dependent, but
this is a standard NC-SI 1.1 command we honor per the HAS.

CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Alek Loktionov <aleksandr.loktionov@intel.com>
CC: Kevin Liedtke <kevin.d.liedtke@intel.com>
CC: Aaron Rowden <aaron.f.rowden@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2020-06-25 22:25:13 -07:00
Todd Fujinaka
3a2c6ced90 i40e: Add a check to see if MFS is set
A customer was chain-booting to provision his systems and one of the
steps was setting MFS. MFS isn't cleared by normal warm reboots
(clearing requires a GLOBR) and there was no indication of why Jumbo
Frame receives were failing.

Add a warning if MFS is set to anything lower than the default.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Piotr Kwapulinski
fffeeddfcf i40e: detect and log info about pre-recovery mode
Detect and log information about pre-recovery mode when firmware
transitions to a recovery mode.
When a firmware transitions to a recovery mode it stores a number
of unexpected EMP resets in one of its registers. The number of EMP
resets ranging from 0x21 to 0x2A indicates that FW transitions
to recovery mode. Use these values to emit log entry about transition
process. Previously the pre-recovery mode may not have been detected
and there was no log entry when NIC was in pre-recovery mode.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Piotr Kwapulinski
91c534b5e3 i40e: make PF wait reset loop reliable
Use jiffies to limit max waiting time for PF reset to succeed.
Previous wait loop was unreliable. It required unreasonably long time
to wait for PF reset after reboot when NIC was about to enter
recovery mode

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Jesse Brandeburg
3c98f9ee6b i40e: remove unused defines
Remove all the unused defines as they are just dead weight.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
Shiraz Saleem
fe21b6c3a6 i40e: Move client header location
Move i40e_client.h to include/linux/net/intel/*
since its shared between i40iw and i40e.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-06-25 22:25:13 -07:00
David S. Miller
7bed145516 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes in xfrm_device.c, between the double
ESP trailing bug fix setting the XFRM_INIT flag and the changes
in net-next preparing for bonding encryption support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 19:29:51 -07:00
Linus Torvalds
4a21185cda Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen.

 2) The default crypto algorithm selection in Kconfig for IPSEC is out
    of touch with modern reality, fix this up. From Eric Biggers.

 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii
    Nakryiko.

 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from
    Hangbin Liu.

 5) Adjust packet alignment handling in ax88179_178a driver to match
    what the hardware actually does. From Jeremy Kerr.

 6) register_netdevice can leak in the case one of the notifiers fail,
    from Yang Yingliang.

 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo.

 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir
    Oltean.

 9) tg3 driver can sleep forever when we get enough EEH errors, fix from
    David Christensen.

10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet
    drivers, from Ciara Loftus.

11) Fix scanning loop break condition in of_mdiobus_register(), from
    Florian Fainelli.

12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon.

13) Endianness fix in mlxsw, from Ido Schimmel.

14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen.

15) Missing bridge mrp configuration validation, from Horatiu Vultur.

16) Fix circular netns references in wireguard, from Jason A. Donenfeld.

17) PTP initialization on recovery is not done properly in qed driver,
    from Alexander Lobakin.

18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong,
    from Rahul Lakkireddy.

19) Don't clear bound device TX queue of socket prematurely otherwise we
    get problems with ktls hw offloading, from Tariq Toukan.

20) ipset can do atomics on unaligned memory, fix from Russell King.

21) Align ethernet addresses properly in bridging code, from Thomas
    Martitz.

22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set,
    from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits)
  rds: transport module should be auto loaded when transport is set
  sch_cake: fix a few style nits
  sch_cake: don't call diffserv parsing code when it is not needed
  sch_cake: don't try to reallocate or unshare skb unconditionally
  ethtool: fix error handling in linkstate_prepare_data()
  wil6210: account for napi_gro_receive never returning GRO_DROP
  hns: do not cast return value of napi_gro_receive to null
  socionext: account for napi_gro_receive never returning GRO_DROP
  wireguard: receive: account for napi_gro_receive never returning GRO_DROP
  vxlan: fix last fdb index during dump of fdb with nhid
  sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
  tc-testing: avoid action cookies with odd length.
  bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
  net: dsa: sja1105: fix tc-gate schedule with single element
  net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
  net: dsa: sja1105: unconditionally free old gating config
  net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
  net: macb: free resources on failure path of at91ether_open()
  net: macb: call pm_runtime_put_sync on failure path
  ...
2020-06-25 18:27:40 -07:00
Kevin Darbyshire-Bryant
b8392808eb sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling
Change tin mapping on diffserv3, 4 & 8 for LE PHB support, in essence
making LE a member of the Bulk tin.

Bulk has the least priority and minimum of 1/16th total bandwidth in the
face of higher priority traffic.

NB: Diffserv 3 & 4 swap tin 0 & 1 priorities from the default order as
found in diffserv8, in case anyone is wondering why it looks a bit odd.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
[ reword commit message slightly ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:31:35 -07:00
Rao Shoaib
4c342f778f rds: transport module should be auto loaded when transport is set
This enhancement auto loads transport module when the transport
is set via SO_RDS_TRANSPORT socket option.

Reviewed-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:26:25 -07:00
David S. Miller
6aeaf26222 Merge branch 'sched-A-couple-of-fixes-for-sch_cake'
Toke Høiland-Jørgensen says:

====================
sched: A couple of fixes for sch_cake

This series contains a couple of fixes for diffserv handling in sch_cake that
provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end).

Not quite sure about whether this should go to stable; it does provide a nice
speedup, but it's not strictly a fix in the "correctness" sense. I lean towards
including this in stable as well, since our most important consumer of that
(OpenWrt) is likely to backport the series anyway.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:24:05 -07:00
Toke Høiland-Jørgensen
3f608f0c41 sch_cake: fix a few style nits
I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these.

Fixes: 046f6fd5da ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:24:05 -07:00
Toke Høiland-Jørgensen
8c95eca0bb sch_cake: don't call diffserv parsing code when it is not needed
As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is configured to neither use diffserv-based
classification, nor to zero out the diffserv bits.

Fixes: c87b4ecdbe ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:24:05 -07:00
Ilya Ponetayev
9208d2863a sch_cake: don't try to reallocate or unshare skb unconditionally
cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).

Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.

Fixes: c87b4ecdbe ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
[ fix a few style issues, reflow commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:24:05 -07:00
David S. Miller
7d22bc4530 Merge branch 'net-phy-mscc-multiple-improvements'
Antoine Tenart says:

====================
net: phy: mscc: multiple improvements

This series contains various improvements to the MSCC PHY driver, fixing
sparse and smatch warnings, using functions provided by the PHY core,
and improving the driver consistency and maintenance.

I don't think any of those improvements and fixes is worth backporting
to stable trees.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
b4368d2b5b net: phy: mscc: improve vsc8514/8584_config_init consistency
All PHY read and write return values are checked for errors in
vsc8514_config_init and vsc8584_config_init, except for one. Fix this.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
d4a76dc74d net: phy: mscc: remove useless page configuration in the config init
In the middle of vsc8584_config_init and vsc8514_config_init, the page
is set to 'standard'. This is the default value, and the page isn't set
to another value before. Those pages configuration can be safely
removed.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
6119dda34e net: phy: mscc: restore the base page in vsc8514/8584_config_init
In the vsc8584_config_init and vsc8514_config_init, the base page is set
to 'GPIO', configuration is done, and the page is never explicitly
restored to the standard page. No bug was triggered as it turns out
helpers called in those config_init functions do modify the base page,
and set it back to standard. But that is dangerous and any modification
to those functions would introduce bugs. This patch fixes this, to
improve maintenance, by restoring the base page to 'standard' once
'GPIO' accesses are completed.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
d9608aacd3 net: phy: mscc: do not access the MDIO bus lock directly
This patch improves the MSCC driver by using the provided
phy_lock_mdio_bus and phy_unlock_mdio_bus helpers instead of locking and
unlocking the MDIO bus lock directly. The patch is only cosmetic but
should improve maintenance and consistency.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
b9dccf91b3 net: phy: mscc: ptp: fix a typo in a comment
This patch fixes a typo in a comment, s/Ths/This/. The patch is cosmetic
only.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
b487032ee6 net: phy: mscc: ptp: fix a smatch error
The following error was reported by smatch:
vsc85xx_ts_read_csr() error: uninitialized symbol 'blk_hw'.

In practice this is very unlikely, as all the block identifiers given to
this functions are handled and described in an enum. The smatch error is
fixed by doing what is already done in vsc85xx_ts_write_csr: using the
"PROCESSOR" block by default.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
b16a213b4d net: phy: mscc: fix a possible double unlock
On vsc8584_ptp_init failure we jump to the 'err' label, which unlocks
the MDIO bus lock. But vsc8584_ptp_init isn't called with the MDIO bus
lock taken, which could result in a double unlock. Fix this.

Fixes: ab2bf93393 ("net: phy: mscc: 1588 block initialization")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:11 -07:00
Antoine Tenart
853bede848 net: phy: mscc: macsec: fix sparse warnings
This patch fixes the following sparse warnings when building MACsec
support in the MSCC PHY driver.

  mscc_macsec.c:393:42: warning: cast from restricted sci_t
  mscc_macsec.c:395:42: warning: restricted sci_t degrades to integer
  mscc_macsec.c:402:42: warning: restricted __be16 degrades to integer
  mscc_macsec.c:608:34: warning: cast from restricted sci_t
  mscc_macsec.c:610:34: warning: restricted sci_t degrades to integer

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:22:10 -07:00
Michal Kubecek
1ae71d997a ethtool: fix error handling in linkstate_prepare_data()
When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we
must not return without calling ethnl_ops_complete(dev) as that could
result in imbalance between ethtool_ops ->begin() and ->complete() calls.

Fixes: 8066021915 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:17:16 -07:00
Linus Torvalds
42e9c85f5c tracing: Four small fixes
- Fixed a ringbuffer bug for nested events having time go backwards
  - Fix a config dependency for boot time tracing to depend on synthetic
    events instead of histograms.
  - Fix trigger format parsing to handle multiple spaces
  - Fix bootconfig to handle failures in multiple events
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXvUjBBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qmLiAQD47/1T01ilYeXqJ+EG235aeQssvRa7
 RSmIAoMP+V6kHQD9G2RjnWkb3BcrdNk9zoi0LpnuMl95m5OuaMzE4PPO+ws=
 =Zbx8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Four small fixes:

   - Fix a ringbuffer bug for nested events having time go backwards

   - Fix a config dependency for boot time tracing to depend on
     synthetic events instead of histograms.

   - Fix trigger format parsing to handle multiple spaces

   - Fix bootconfig to handle failures in multiple events"

* tag 'trace-v5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/boottime: Fix kprobe multiple events
  tracing: Fix event trigger to accept redundant spaces
  tracing/boot: Fix config dependency for synthedic event
  ring-buffer: Zero out time extend if it is nested and not absolute
2020-06-25 16:16:49 -07:00
David S. Miller
0e00c05fa7 Merge branch 'napi_gro_receive-caller-return-value-cleanups'
Jason A. Donenfeld says:

====================
napi_gro_receive caller return value cleanups

In 6570bc79c0 ("net: core: use listified Rx for GRO_NORMAL in
napi_gro_receive()"), the GRO_NORMAL case stopped calling
netif_receive_skb_internal, checking its return value, and returning
GRO_DROP in case it failed. Instead, it calls into
netif_receive_skb_list_internal (after a bit of indirection), which
doesn't return any error. Therefore, napi_gro_receive will never return
GRO_DROP, making handling GRO_DROP dead code.

I emailed the author of 6570bc79c0 on netdev [1] to see if this change
was intentional, but the dlink.ru email address has been disconnected,
and looking a bit further myself, it seems somewhat infeasible to start
propagating return values backwards from the internal machinations of
netif_receive_skb_list_internal.

Taking a look at all the callers of napi_gro_receive, it appears that
three are checking the return value for the purpose of comparing it to
the now never-happening GRO_DROP, and one just casts it to (void), a
likely historical leftover. Every other of the 120 callers does not
bother checking the return value.

And it seems like these remaining 116 callers are doing the right thing:
after calling napi_gro_receive, the packet is now in the hands of the
upper layers of the newtworking, and the device driver itself has no
business now making decisions based on what the upper layers choose to
do. Incrementing stats counters on GRO_DROP seems like a mistake, made
by these three drivers, but not by the remaining 117.

It would seem, therefore, that after rectifying these four callers of
napi_gro_receive, that I should go ahead and just remove returning the
value from napi_gro_receive all together. However, napi_gro_receive has
a function event tracer, and being able to introspect into the
networking stack to see how often napi_gro_receive is returning whatever
interesting GRO status (aside from _DROP) remains an interesting
data point worth keeping for debugging.

So, this series simply gets rid of the return value checking for the
four useless places where that check never evaluates to anything
meaningful.

[1] https://lore.kernel.org/netdev/20200624210606.GA1362687@zx2c4.com/
====================

Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:16:21 -07:00
Jason A. Donenfeld
045790b7bc wil6210: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands. In this case, too, the non-gro path didn't bother checking
the return value. Plus, this had some clunky debugging functions that
duplicated code from elsewhere and was generally pretty messy. So, this
commit cleans that all up too.

Fixes: 6570bc79c0 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:16:21 -07:00
Jason A. Donenfeld
93ab48a97a hns: do not cast return value of napi_gro_receive to null
Basically no drivers care about the return value here, and there's no
__must_check that would make casting to void sensible, so remove it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:16:21 -07:00
Jason A. Donenfeld
e5e7d8052f socionext: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.

Fixes: 6570bc79c0 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:16:21 -07:00
Jason A. Donenfeld
df08126e38 wireguard: receive: account for napi_gro_receive never returning GRO_DROP
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.

Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Fixes: 6570bc79c0 ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:16:21 -07:00
Roopa Prabhu
b18e9834f7 vxlan: fix last fdb index during dump of fdb with nhid
This patch fixes last saved fdb index in fdb dump handler when
handling fdb's with nhid.

Fixes: 1274e1cc42 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:12:34 -07:00
Marcelo Ricardo Leitner
471e39df96 sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket
If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.

The fix is to not add IPv4 addresses to ipv6only sockets.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:11:33 -07:00
Briana Oursler
b6186d413b tc-testing: avoid action cookies with odd length.
Update odd length cookie hexstrings in csum.json, tunnel_key.json and
bpf.json to be even length to comply with check enforced in commit
0149dabf2a1b ("tc: m_actions: check cookie hexstring len") in iproute2.

Signed-off-by: Briana Oursler <briana.oursler@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:10:45 -07:00
David S. Miller
3b0e7dc0be Merge branch 'tcp_cubic-fix-spurious-HYSTART_DELAY-on-RTT-decrease'
Neal Cardwell says:

====================
tcp_cubic: fix spurious HYSTART_DELAY on RTT decrease

This series fixes a long-standing bug in the TCP CUBIC
HYSTART_DELAY mechanim recently reported by Mirja Kuehlewind. The
code can cause a spurious exit of slow start in some particular
cases: upon an RTT decrease that happens on the 9th or later ACK
in a round trip. This series fixes the original Hystart code and
also the recent BPF implementation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:08:47 -07:00
Neal Cardwell
7d21d54d62 bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Apply the fix from:
 "tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT"
to the BPF implementation of TCP CUBIC congestion control.

Repeating the commit description here for completeness:

Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:

o The first 8 RTT samples in a round trip are 150ms, resulting in a
  curr_rtt of 150ms and a delay_min of 150ms.

o The 9th RTT sample is 100ms. The curr_rtt does not change after the
  first 8 samples, so curr_rtt remains 150ms. But delay_min can be
  lowered at any time, so delay_min falls to 100ms. The code executes
  the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
  of 100ms, and the curr_rtt is declared far enough above delay_min to
  force a (spurious) exit of Slow start.

The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.

Fixes: 6de4a9c430 ("bpf: tcp: Add bpf_cubic example")
Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:08:47 -07:00
Neal Cardwell
b344579ca8 tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where
Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an
ACK when the minimum rtt of a connection goes down. From inspection it
is clear from the existing code that this could happen in an example
like the following:

o The first 8 RTT samples in a round trip are 150ms, resulting in a
  curr_rtt of 150ms and a delay_min of 150ms.

o The 9th RTT sample is 100ms. The curr_rtt does not change after the
  first 8 samples, so curr_rtt remains 150ms. But delay_min can be
  lowered at any time, so delay_min falls to 100ms. The code executes
  the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min
  of 100ms, and the curr_rtt is declared far enough above delay_min to
  force a (spurious) exit of Slow start.

The fix here is simple: allow every RTT sample in a round trip to
lower the curr_rtt.

Fixes: ae27e98a51 ("[TCP] CUBIC v2.3")
Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:08:47 -07:00
David S. Miller
29a30bac56 Merge branch 'Fixes-for-SJA1105-DSA-tc-gate-action'
Vladimir Oltean says:

====================
Fixes for SJA1105 DSA tc-gate action

This small series fixes 2 bugs in the tc-gate implementation:
1. The TAS state machine keeps getting rescheduled even after removing
   tc-gate actions on all ports.
2. tc-gate actions with only one gate control list entry are installed
   to hardware with an incorrect interval of zero, which makes the
   switch erroneously drop those packets (since the configuration is
   invalid).

To keep the code palatable, a forward-declaration was avoided by moving
some code around in patch 1/4. I hope that isn't too much of an issue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:06:56 -07:00
Vladimir Oltean
43ce887c50 net: dsa: sja1105: fix tc-gate schedule with single element
The sja1105_gating_cfg_time_to_interval function does this, as per the
comments:

/* The gate entries contain absolute times in their e->interval field. Convert
 * that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
 */

To perform that task, it iterates over gating_cfg->entries, at each step
updating the interval of the _previous_ entry. So one interval remains
to be updated at the end of the loop: the last one (since it isn't
"prev" for anyone else).

But there was an erroneous check, that the last element's interval
should not be updated if it's also the only element. I'm not quite sure
why that check was there, but it's clearly incorrect, as a tc-gate
schedule with a single element would get an e->interval of zero,
regardless of the duration requested by the user. The switch wouldn't
even consider this configuration as valid: it will just drop all traffic
that matches the rule.

Fixes: 834f8933d5 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:06:56 -07:00
Vladimir Oltean
82f6896a25 net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules
Currently, tas_data->enabled would remain true even after deleting all
tc-gate rules from the switch ports, which would cause the
sja1105_tas_state_machine to get unnecessarily scheduled.

Also, if there were any errors which would prevent the hardware from
enabling the gating schedule, the sja1105_tas_state_machine would
continuously detect and print that, spamming the kernel log, even if the
rules were subsequently deleted.

The rules themselves are _not_ active, because sja1105_init_scheduling
does enough of a job to not install the gating schedule in the static
config. But the virtual link rules themselves are still present.

So call the functions that remove the tc-gate configuration from
priv->tas_data.gating_cfg, so that tas_data->enabled can be set to
false, and sja1105_tas_state_machine will stop from being scheduled.

Fixes: 834f8933d5 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:06:56 -07:00
Vladimir Oltean
026bdb2b96 net: dsa: sja1105: unconditionally free old gating config
Currently sja1105_compose_gating_subschedule is not prepared to be
called for the case where we want to recompute the global tc-gate
configuration after we've deleted those actions on a port.

After deleting the tc-gate actions on the last port, max_cycle_time
would become zero, and that would incorrectly prevent
sja1105_free_gating_config from getting called.

So move the freeing function above the check for the need to apply a new
configuration.

Fixes: 834f8933d5 ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:06:56 -07:00
Vladimir Oltean
e39109f596 net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top
It turns out that sja1105_compose_gating_subschedule must also be called
from sja1105_vl_delete, to recalculate the overall tc-gate
configuration. Currently this is not possible without introducing a
forward declaration. So move the function at the top of the file, along
with its dependencies.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:06:56 -07:00
David S. Miller
d0a45b5ca5 Merge branch 'RGMII-Internal-delay-common-property'
Dan Murphy says:

====================
RGMII Internal delay common property

The RGMII internal delay is a common setting found in most RGMII capable PHY
devices.  It was found that many vendor specific device tree properties exist
to do the same function. This creates a common property to be used for PHY's
that have internal delays for the Rx and Tx paths.

If the internal delay is tunable then the caller needs to pass the internal
delay array and the return will be the index in the array that was found in
the firmware node.

If the internal delay is fixed then the caller only needs to indicate which
delay to return.  There is no need for a fixed delay to add device properties
since the value is not configurable. Per the ethernet-controller.yaml the
interface type indicates that the PHY should provide the delay.

This series contains examples of both a configurable delay and a fixed delay.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:21 -07:00
Dan Murphy
8095295292 net: phy: DP83822: Add setting the fixed internal delay
The DP83822 can be configured to use the RGMII interface. There are
independent fixed 3.5ns clock shift (aka internal delay) for the TX and RX
paths. This allow either one to be set if the MII interface is RGMII and
the value is set in the firmware node.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:21 -07:00
Dan Murphy
736b25afe2 net: dp83869: Add RGMII internal delay configuration
Add RGMII internal delay configuration for Rx and Tx.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:21 -07:00
Dan Murphy
2fb305c37d dt-bindings: net: Add RGMII internal delay for DP83869
Add the internal delay values into the header and update the binding
with the internal delay properties.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:21 -07:00
Dan Murphy
92252eec91 net: phy: Add a helper to return the index for of the internal delay
Add a helper function that will return the index in the array for the
passed in internal delay value.  The helper requires the array, size and
delay value.

The helper will then return the index for the exact match or return the
index for the index to the closest smaller value.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:21 -07:00
Dan Murphy
9150069bf5 dt-bindings: net: Add tx and rx internal delays
tx-internal-delays and rx-internal-delays are a common setting for RGMII
capable devices.

These properties are used when the phy-mode or phy-controller is set to
rgmii-id, rgmii-rxid or rgmii-txid.  These modes indicate to the
controller that the PHY will add the internal delay for the connection.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:05:20 -07:00
David S. Miller
17604218c8 Merge branch 'dpaa2-eth-small-updates'
Ioana Ciornei says:

====================
dpaa2-eth: small updates

This patch set adds some updates to the dpaa2-eth driver: trimming of
the frame queue debugfs counters, cleanup of the remaining sparse
warnings and some other small fixes such as a recursive header include.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:03:39 -07:00
Ioana Ciornei
05e190467d dpaa2-eth: fix misspelled function parameters in dpni_[set/get]_taildrop
Two of the function parameters (qtype and index) were misspelled in the
associated descriptions of dpni_[set/get]_taildrop which led to sparse
warnings. Fix this by using the exact same names as present in the
function definition.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25 16:03:39 -07:00