Commit Graph

887085 Commits

Author SHA1 Message Date
Russell King
0dea4d039a net: sfp: report error on failure to read sfp soft status
Report a rate-limited error if we fail to read the SFP soft status,
and preserve the current status in that case. This avoids I2C bus
errors from triggering a link flap.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 17:26:07 -08:00
David S. Miller
d8e419da04 Merge branch 'phylib-consolidation'
Russell King says:

====================
phylib consolidation

Over the last few releases, there has been a push to clean up and
consolidate the phylib code. Some cases have been missed, and this
series catches those cases.

1. Remove redundant .aneg_done initialisers; calling genphy_aneg_done()
   for clause 22 PHYs is the default when .aneg_done is not set.

2. Some PHY drivers manually set phydev->pause and phydev->asym_pause,
   but we have a helper for this - phy_resolve_aneg_pause(), introduced
   in 2d880b8709 ("net: phy: extract pause mode").  Use this in the
   lxt, marvell and uPD60620 drivers.

   Incidentally, this brings up the question whether marvell fiber mode
   is correctly interpreting and advertising the pause parameters.

3. Add a genphy_check_and_restart_aneg() helper, which complements the
   clause 45 version of this. This will be useful for PHY drivers that
   open code this logic (e.g. marvell.c)

4. Add a genphy_read_status_fixed() helper to read the fixed-mode
   status from a clause 22 PHY.  lxt and marvell both contain copies
   of this code, so convert them over.

5. Arrange marvell driver to use genphy_read_lpa() for copper mode.
   This needs some rearrangement of the code in
   marvell_read_status_page_an(), but preserves using the PHY specific
   status register to derive the current negotiation results.

6. Simplify the marvell driver so we can use the
   genphy_read_status_fixed() helper directly rather than
   marvell_read_status_page_fixed().

7. Use positive logic in the marvell driver to determine the link
   state, and get rid of the REGISTER_LINK_STATUS definition; we
   already have a definition for this.

8. The marvell driver reads the PHY specific status register multiple
   times when determining the status: once in marvell_update_link()
   and again in marvell_read_status_page_an(). This is a waste;
   rearrange to read the status register once, and pass its value into
   marvell_read_status_page_an().  We preserve using
   genphy_update_link() for the copper side.

9. The marvell driver was using private clause 37 definitions, but we
   have clause 37 definitions in uapi/linux/mii.h. Use the generic
   definitions.

10. Switch the marvell driver to use phy_modify_changed() to modify
    the fiber advertisement.

11. Switch the marvell driver to use genphy_check_and_restart_aneg()
    introduced above rather than open-coding this functionality.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
b5abac2d2d net: phy: marvell: use genphy_check_and_restart_aneg()
Use the helper to check and restart autonegotiation for the marvell
fiber page negotiation setting.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
9f4bae704f net: phy: marvell: use phy_modify_changed()
Use phy_modify_changed() to change the fiber advertisement register
rather than open coding this functionality.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
20ecf424d0 net: phy: marvell: use existing clause 37 definitions
Use existing clause 37 advertising/link partner definitions rather than
private ones for the advertisement registers.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
d2004e27eb net: phy: marvell: consolidate phy status reading
marvell_read_status_page_an() always reads the PHY status register, but
marvell_update_link() has already done this.  Rather than wastefully
reading the register twice in quick succession, read it once in
marvell_read_status_page() and use the result for both.

This makes marvell_update_link() rather pointless, so move it into
marvell_read_status_page().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
760fa78f35 net: phy: marvell: use positive logic for link state
Rather than using negative logic:

	if (there is no link)
		set link = 0
	else
		set link = 1

use the more natural positive logic:

	if (there is link)
		set link = 1
	else
		set link = 0

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
98f92831c5 net: phy: marvell: initialise link partner state earlier
Move the initialisation of the link partner state earlier, inside
marvell_read_status_page(), so we don't have the same initialisation
scattered amongst the other files.  This is in a similar place to
the genphy implementation, so would result in the same behaviour if
a PHY read error occurs.

This allows us to get rid of marvell_read_status_page_fixed(), which
became a pointless wrapper around genphy_read_status_fixed().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
fcf1f59afc net: phy: marvell: rearrange to use genphy_read_lpa()
Rearrange the Marvell PHY driver to use genphy_read_lpa() rather than
open-coding this functionality.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
0efc286a92 net: phy: provide and use genphy_read_status_fixed()
There are two drivers and generic code which contain exactly the same
code to read the status of a PHY operating without autonegotiation
enabled. Rather than duplicate this code, provide a helper to read
this information.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
2a10ab043a net: phy: add genphy_check_and_restart_aneg()
Add a helper for restarting autonegotiation(), similar to the clause 45
variant.  Use it in __genphy_config_aneg()

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
af006240c6 net: phy: use phy_resolve_aneg_pause()
Several drivers code their own version of this, working from the LPA
register, after setting the ethtool link partner advertisement bitmask.
Use the generic function instead.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Russell King
c48f16b42a net: phy: remove redundant .aneg_done initialisers
Remove initialisers that set .aneg_done to genphy_aneg_done - this is
the default for clause 22 PHYs, so the initialiser is redundant.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19 12:52:34 -08:00
Jose Abreu
a1ec57c020 net: stmmac: tc: Fix TAPRIO division operation
For ARCHs that don't support 64 bits division we need to use the
helpers.

Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 15:12:33 -08:00
David S. Miller
6bff001702 Merge branch 'ETS-qdisc'
Petr Machata says:

====================
Add a new Qdisc, ETS

The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
transmission selection algorithms: strict priority, credit-based shaper,
ETS (bandwidth sharing), and vendor-specific. All these have their
corresponding knobs in DCB. But DCB does not have interfaces to configure
RED and ECN, unlike Qdiscs.

In the Qdisc land, strict priority is implemented by PRIO. Credit-based
transmission selection algorithm can then be modeled by having e.g. TBF or
CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
placing a DRR Qdisc under the last PRIO band.

The problem with this approach is that DRR on its own, as well as the
combination of PRIO and DRR, are tricky to configure and tricky to offload
to 802.1Qaz-compliant hardware. This is due to several reasons:

- As any classful Qdisc, DRR supports adding classifiers to decide in which
  class to enqueue packets. Unlike PRIO, there's however no fallback in the
  form of priomap. A way to achieve classification based on packet priority
  is e.g. like this:

    # tc filter add dev swp1 root handle 1: \
		basic match 'meta(priority eq 0)' flowid 1:10

  Expressing the priomap in this manner however forces drivers to deep dive
  into the classifier block to parse the individual rules.

  A possible solution would be to extend the classes with a "defmap" a la
  split / defmap mechanism of CBQ, and introduce this as a last resort
  classification. However, unlike priomap, this doesn't have the guarantee
  of covering all priorities. Traffic whose priority is not covered is
  dropped by DRR as unclassified. But ASICs tend to implement dropping in
  the ACL block, not in scheduling pipelines. The need to treat these
  configurations correctly (if only to decide to not offload at all)
  complicates a driver.

  It's not clear how to retrofit priomap with all its benefits to DRR
  without changing it beyond recognition.

- The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
  all ETS TCs as a last resort. Switch ASICs that support ETS at all are
  likely to handle ETS traffic this way as well. However, the Linux model
  is more generic, allowing the DRR block in any band. Drivers would need
  to be careful to handle this case correctly, otherwise the offloaded
  model might not match the slow-path one.

  In a similar vein, PRIO and DRR need to agree on the list of priorities
  assigned to DRR. This is doubly problematic--the user needs to take care
  to keep the two in sync, and the driver needs to watch for any holes in
  DRR coverage and treat the traffic correctly, as discussed above.

  Note that at the time that DRR Qdisc is added, it has no classes, and
  thus any priorities assigned to that PRIO band are not covered. Thus this
  case is surprisingly rather common, and needs to be handled gracefully by
  the driver.

- Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
  below it, it is not immediately clear which TC the class represents. This
  is unlike PRIO with its straightforward classid scheme. When DRR is
  combined with PRIO, the relationship between classes and TCs gets even
  more murky.

  This is a problem for users as well: the TC mapping is rather important
  for (devlink) shared buffer configuration and (ethtool) counters.

So instead, this patch set introduces a new Qdisc, which is based on
802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
needs to specify how many bands there are, how many are strict and how many
are ETS, quanta for the latter, and priomap.

The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.

The chosen interface makes the overall system both reasonably easy to
configure, and reasonably easy to offload. The extra code to support ETS in
mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
lines is bona fide new business logic.

Credit-based shaping transmission selection algorithm can be configured by
adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
similar effect as well). As a non-work-conserving Qdisc, CBS can't be
hooked under the ETS bands. This is detected and handled identically to DRR
Qdisc at runtime. Note that offloading CBS is not subject of this patchset.

The patchset proceeds in four stages:

- Patches #1-#3 are cleanups.
- Patches #4 and #5 contain the new Qdisc.
- Patches #6 and #7 update mlxsw to offload the new Qdisc.
- Patches #8-#10 add selftests for ETS.

Examples:

- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5

- Tweak quantum of one of the classes of the previous Qdisc:

    # tc class ch dev swp1 classid 1:4 ets quantum 1000
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
    # tc class ch dev swp1 classid 1:3 ets quantum 1000
    Error: Strict bands do not have a configurable quantum.

- Purely strict Qdisc with 1:1 mapping between priorities and TCs:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

- Use "bands" to specify number of bands explicitly. Underspecified bands
  are implicitly ETS and their quantum is taken from MTU. The following
  thus gives each band the same weight:

    # tc qdisc add dev swp1 root handle 1: \
	ets bands 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

v2:
- This addresses points raised by David Miller.
- Patch #4:
    - sch_ets.c: Add a comment with description of the Qdisc and the
      dequeuing algorithm.
    - Kconfig: Add a high-level description to the help blurb.

v1:
- No changes, first upstream submission after RFC.

v3 (internal):
- This addresses review from Jiri Pirko.
- Patch #3:
    - Rename to _HR_ instead of to _HIERARCHY_.
- Patch #4:
    - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
    - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
      _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
    - sch_ets.c: Update to reflect the above changes. Add a new policy,
      ets_class_policy, which is used when parsing class changes.
      Currently that policy is the same as the quanta policy, but that
      might change.
    - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
      caller that makes use of it.
    - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
      attribute instead of returning an extack.
- Patch #6:
    - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
      function in this patch already. Drop the weight computation.
    - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
      pass for the abovementioned "weights".
    - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
      __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
      directly from mlxsw_sp_setup_tc_prio().
    - Update to follow the _HIERARCHY_ -> _HR_ renaming.
- Patch #7:
    - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
      weight computation removal are now done in a previous patch.
    - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
      earlier in the function.
- Patch #3 (iproute2):
    - Add an example output to the commit message.
    - tc-ets.8: Fix output of two examples.
    - tc-ets.8: Describe default values of "bands", "quanta".
    - q_ets.c: A number of fixes in error messages.
    - q_ets.c: Comment formatting: /*padding*/ -> /* padding */
    - q_ets.c: parse_nbands: Move duplicate checking to callers.
    - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.

v2 (internal):
- This addresses review from Ido Schimmel and comments from Alexander
  Kushnarov.
- Patch #2:
    - s/coment/comment in the commit message.
- Patch #4:
    - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
    - ets_class_find(): RXTify
- Patch #3 (iproute2):
    - tc-ets.8: some spelling fixes
    - tc-ets.8: add another example
    - tc.8: add an ETS to "CLASSFUL QDISCS" section

v1 (internal):
- This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
  by Alexander Petrovskiy and myself, and other improvements.
- Patch #2:
    - Expand the explanation with an explicit example.
- Patch #4:
    - Kconfig: s/sch_drr/sch_ets/
    - sch_ets: Reorder includes to be in alphabetical order
    - sch_ets: ets_quantum_parse(): Rename the return-pointer argument
      from pquantum to quantum, and use it directly, not going through a
      local temporary.
    - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
      argument "quanta" from an array to a pointer.
    - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
    - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
      __nla_validate_nested directly instead of nl80211_validate_nested().
    - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
      instead of returning an extack.
    - sch_ets: ets_qdisc_change(): Make the last band the default one for
      unmentioned priomap priorities.
    - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
      band notified its ETS parent.
    - sch_ets: When ungrafting, add the newly-created invisible FIFO to
      the Qdisc hash
- Patch #5:
    - pkt_cls.h: Note that quantum=0 signifies a strict band.
    - Fix error path handling when ets_offload_dump() fails.
- Patch #6:
    - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
      "quanta" and "priomap" from arrays to pointers.
- Patch #7:
    - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
      "weights" from an array to a pointer.
- Patch #9:
    - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
    - Adjust the whole suite to allow testing of traffic classifiers
      in addition to testing priomap.
- Patch #10:
    - Add a number of new tests to test default priomap band, overlarge
      number of bands, zeroes in quanta, and altogether missing quanta.
- Patch #1 (iproute2):
    - State motivation for inclusion of this patch in the patcheset in the
      commit message.
- Patch #3 (iproute2):
    - tc-ets.8: it is now December
    - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
    - tc-ets.8: s/flow/band in explanation of quantum
    - tc-ets.8: explain what happens with priorities not covered by priomap
    - tc-ets.8: default priomap band is now the last one
    - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
      priomap and quanta.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
82c664b69c selftests: qdiscs: Add test coverage for ETS Qdisc
Add TDC coverage for the new ETS Qdisc.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
ddd3fd750f selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc
This tests the newly-added ETS Qdisc. It runs two to three streams of
traffic, each with a different priority. ETS Qdisc is supposed to allocate
bandwidth according to the DRR algorithm and given weights. After running
the traffic for a while, counters are compared for each stream to check
that the expected ratio is in fact observed.

In order for the DRR process to kick in, a traffic bottleneck must exist in
the first place. In slow path, such bottleneck can be implemented by
wrapping the ETS Qdisc inside a TBF or other shaper. This might however
make the configuration unoffloadable. Instead, on HW datapath, the
bottleneck would be set up by lowering port speed and configuring shared
buffer suitably.

Therefore the test is structured as a core component that implements the
testing, with two wrapper scripts that implement the details of slow path
resp. fast path configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
4cf9b8f992 selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh
These two functions are used for starting several streams of traffic, and
then stopping them later. They will be handy for the test coverage of ETS
Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
19f405b988 mlxsw: spectrum_qdisc: Support offloading of ETS Qdisc
Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc.
Invoke the extended prio handlers implemented in the previous patch. For
stats ops, invoke directly the prio callbacks, which are not sensitive to
differences between PRIO and ETS.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
7917f52ae1 mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETS
Thanks to the similarity between PRIO and ETS it is possible to simply
reuse most of the code for offloading PRIO Qdisc. Extract the common
functionality into separate functions, making the current PRIO handlers
thin API adapters.

Extend the new functions to pass quanta for individual bands, which allows
configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as
appropriate to de/configure WRR-ness and weight of individual bands.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:30 -08:00
Petr Machata
d35eb52bd2 net: sch_ets: Make the ETS qdisc offloadable
Add hooks at appropriate points to make it possible to offload the ETS
Qdisc.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:29 -08:00
Petr Machata
dcc68b4d80 net: sch_ets: Add a new Qdisc
Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is
PRIO-like in how it is configured, meaning one needs to specify how many
bands there are, how many are strict and how many are dwrr, quanta for the
latter, and priomap.

The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:29 -08:00
Petr Machata
9cf9b925d5 mlxsw: spectrum: Rename MLXSW_REG_QEEC_HIERARCY_* enumerators
These enums want to be named MLXSW_REG_QEEC_HIERARCHY_, but due to a typo
lack the second H. That is confusing and complicates searching.

But actually the enumerators should be named _HR_, because that is how
their enum type is called. So rename them as appropriate.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:29 -08:00
Petr Machata
5bc146c90e mlxsw: spectrum_qdisc: Clarify a comment
Expand the comment at mlxsw_sp_qdisc_prio_graft() to make the problem that
this function is trying to handle clearer.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:29 -08:00
Petr Machata
9586a992fb net: pkt_cls: Clarify a comment
The bit about negating HW backlog left me scratching my head. Clarify the
comment.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:32:29 -08:00
Kevin 'ldir' Darbyshire-Bryant
cbd22f172d sch_cake: drop unused variable tin_quantum_prio
Turns out tin_quantum_prio isn't used anymore and is a leftover from a
previous implementation of diffserv tins.  Since the variable isn't used
in any calculations it can be eliminated.

Drop variable and places where it was set.  Rename remaining variable
and consolidate naming of intermediate variables that set it.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 13:27:33 -08:00
David S. Miller
dcbe4e9575 Merge branch 's390-next'
Julian Wiedmann says:

====================
s390/qeth: features 2019-12-18

please apply the following patch series to your net-next tree.
Nothing major, just the usual mix of small improvements and cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:57 -08:00
Julian Wiedmann
334b49de12 s390/qeth: make use of napi_schedule_irqoff()
qeth_qdio_start_poll() is called from the qdio layer's IRQ handler,
while IRQs are masked.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:57 -08:00
Julian Wiedmann
52f82bf16b s390/qeth: consolidate helpers for capability checking
Convert the old code to use struct qeth_ipa_caps, and while at it remove
all unused helper macros.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:57 -08:00
Julian Wiedmann
adee2592b6 s390/qeth: stop yielding the ip_lock during IPv4 registration
As commit df2a2a5225 ("s390/qeth: convert IP table spinlock to mutex")
converted the ip_lock to a mutex, we no longer have to yield it while
the subsequent IO sleep-waits for completion.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
b6beb62a52 s390/qeth: don't raise NETDEV_REBOOT event from L3 offline path
This is a leftover from back when a recovery action didn't go through
dev_close(), and was meant to shoot down all remaining af_iucv sockets
on the interface.

Now that the offline path always calls dev_close(), the
NETDEV_GOING_DOWN event from __dev_close_many() is sufficient and this
hack can be removed.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
490df97142 s390/qeth: remove open-coded inet_make_mask()
Use inet_make_mask() to replace some complicated bit-fiddling.

Also use the right data types to replace some raw memcpy calls with
proper assignments.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
2390166a6b s390/qeth: clean up L3 sysfs code
Consolidate some duplicated code for adding RXIP/VIPA addresses, and
move the locking to where it's actually needed.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
e6b1b7da24 s390/qeth: overhaul L3 IP address dump code
The current code that dumps the RXIP/VIPA/IPATO addresses via sysfs
first checks whether the buffer still provides sufficient space to hold
another formatted address.
But the maximum length of an formatted IPv4 address is 15 characters,
not 12. So we underestimate the max required length and if the buffer
was previously filled to _just_ the right level, a formatted address can
end up being truncated.

Revamp these code paths to use the _actually_ required length of the
formatted IP address, and while at it suppress a gratuitous newline.

Also use scnprintf() to format the output. In case of a truncation, this
would allow us to return the number of characters that were actually
written.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
7359393f3c s390/qeth: wake up all waiters from qeth_irq()
card->wait_q is shared by different users, for different wake-up
conditions. qeth_irq() can potentially trigger multiple of these
conditions:
1) A change to channel->irq_pending, which qeth_send_control_data() is
   waiting for.
2) A change to card->state, which qeth_clear_channel() and
   qeth_halt_channel() are waiting for.

As qeth_irq() does only a single wake_up(), we might miss to wake up
a second eligible waiter. Luckily all waiters are guarded with a
timeout, so this situation should recover on its own eventually.

To make things work robustly, add an additional wake_up() for changes
to channel->state. And extract a helper that updates
channel->irq_pending along with the needed wake_up().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
Julian Wiedmann
871602b107 s390/qeth: only handle IRQs while device is online
A qeth device that's offline should not be receiving any IRQs - all
pending IOs have been terminated, and we avoid starting any new ones.

So rather than immediately registering the IRQ handler when the device
is probed, only register it while the device is online.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:34:56 -08:00
David S. Miller
3a74a62d3c Merge branch 'stmmac-taprio'
Jose Abreu says:

====================
net: stmmac: TSN support using TAPRIO API

This series adds TSN support (EST and Frame Preemption) for stmmac driver.

1) Adds the HW specific support for EST in GMAC5+ cores.

2) Adds the HW specific support for EST in XGMAC3+ cores.

3) Integrates EST HW specific support with TAPRIO scheduler API.

4) Adds the Frame Preemption suppor on stmmac TAPRIO implementation.

5) Adds the HW specific support for Frame Preemption in GMAC5+ cores.

6) Adds the HW specific support for Frame Preemption in XGMAC3+ cores.

7) Adds support for HW debug counters for Frame Preemption available in
GMAC5+ cores.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
ea77b8c813 net: stmmac: mmc: Add Frame Preemption counters on GMAC5+ cores
This can be useful for debug. Add these counters on GMAC5+ cores just
like we did for XGMAC.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
f0e56c8d8f net: stmmac: xgmac3+: Add support for Frame Preemption
Adds the HW specific support for Frame Preemption on XGMAC3+ cores.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
7c72827468 net: stmmac: gmac5+: Add support for Frame Preemption
Adds the HW specific support for Frame Preemption on GMAC5+ cores.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
1ac1424154 net: stmmac: Add Frame Preemption support using TAPRIO API
Adds the support for Frame Preemption using TAPRIO API. This works along
with EST feature and allows to select if preemptable traffic shall be
sent during specific queues opening time.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
b60189e039 net: stmmac: Integrate EST with TAPRIO scheduler API
Now that we have the EST code for XGMAC and QoS we can use it with the
TAPRIO scheduler. Integrate it into the main driver and use the API to
configure the EST feature.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
8572aec3d0 net: stmmac: Add basic EST support for XGMAC
Adds the support for EST in XGMAC cores. This feature allows to offload
scheduling of queues opening time to the IP.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:11 -08:00
Jose Abreu
504723af0d net: stmmac: Add basic EST support for GMAC5+
Adds the support for EST in GMAC5+ cores. This feature allows to offload
scheduling of queues opening time to the IP.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:17:10 -08:00
David S. Miller
6dbb2e91f8 Merge branch 'stmmac-next'
Jose Abreu says:

====================
net: stmmac: Improvements for -next

Misc improvements for stmmac.

1) Adds more information regarding HW Caps in the DebugFS file.

2) Allows interrupts to be independently enabled or disabled so that we don't
have to schedule both TX and RX NAPIs.

3) Stops using a magic number in coalesce timer re-arm.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:14:08 -08:00
Jose Abreu
3755b21b04 net: stmmac: Always use TX coalesce timer value when rescheduling
When we have pending packets we re-arm the TX timer with a magic value.

This changes the re-arm of the timer from 10us to the user-defined
coalesce value. As we support different speeds, having a magic value of
10us can be either too short or to large depending on the speed so we
let user configure it. The default value of the timer is 1ms but it can
be reconfigured by ethtool.

Changes from v1:
- Reword commit message (Jakub)

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:14:08 -08:00
Jose Abreu
021bd5e369 net: stmmac: Let TX and RX interrupts be independently enabled/disabled
By using this mechanism we can get rid of the not so nice method of
scheduling TX NAPI when the RX was scheduled. No bandwidth reduction was
seen with this change.

Changes from v1:
- Remove useless comment (Jakub)
- Do not bind the TX clean to NAPI budget (Jakub)

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:14:08 -08:00
Jose Abreu
7d0b447a3f net: stmmac: Print more information in DebugFS DMA Capabilities file
DMA Capabilites have grown but the DebugFS that shows this info has not
been updated. Lets add the missing information.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18 12:14:08 -08:00
Paul Durrant
1f2565780e xen-netback: remove 'hotplug-status' once it has served its purpose
Removing the 'hotplug-status' node in netback_remove() is wrong; the script
may not have completed. Only remove the node once the watch has fired and
has been unregistered.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17 23:03:33 -08:00
Paul Durrant
f55c3188df xen-netback: switch state to InitWait at the end of netback_probe()...
...as the comment above the function states.

The switch to Initialising at the start of the function is somewhat bogus
as the toolstack will have set that initial state anyway. To behave
correctly, a backend should switch to InitWait once it has set up all
xenstore values that may be required by a initialising frontend. This
patch calls backend_switch_state() to make the transition at the
appropriate point.

NOTE: backend_switch_state() ignores errors from xenbus_switch_state()
      and so this patch removes an error path from netback_probe(). This
      means a failure to change state at this stage (in the absence of
      other failures) will leave the device instantiated. This is highly
      unlikley to happen as a failure to change state would indicate a
      failure to write to xenstore, and that will trigger other error
      paths. Also, a 'stuck' device can still be cleaned up using 'unbind'
      in any case.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17 23:03:33 -08:00