This adds the logic in the Felix DSA driver and Ocelot switch library.
For Ocelot switches, the DEST_IDX that is the output of the MAC table
lookup is a logical port (equal to physical port, if no LAG is used, or
a dynamically allocated number otherwise). The allocation we have in
place for LAG IDs is different from DSA's, so we can't use that:
- DSA allocates a continuous range of LAG IDs starting from 1
- Ocelot appears to require that physical ports and LAG IDs are in the
same space of [0, num_phys_ports), and additionally, ports that aren't
in a LAG must have physical port id == logical port id
The implication is that an FDB entry towards a LAG might need to be
deleted and reinstalled when the LAG ID changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the switchdev_handle_fdb_event_to_device() event replication helper
was created, my original thought was that FDB events on LAG interfaces
should most likely be special-cased, not just replicated towards all
switchdev ports beneath that LAG. So this replication helper currently
does not recurse through switchdev lower interfaces of LAG bridge ports,
but rather calls the lag_mod_cb() if that was provided.
No switchdev driver uses this helper for FDB events on LAG interfaces
yet, so that was an assumption which was yet to be tested. It is
certainly usable for that purpose, as my RFC series shows:
https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
however this approach is slightly convoluted because:
- the switchdev driver gets a "dev" that isn't its own net device, but
rather the LAG net device. It must call switchdev_lower_dev_find(dev)
in order to get a handle of any of its own net devices (the ones that
pass check_cb).
- in order for FDB entries on LAG ports to be correctly refcounted per
the number of switchdev ports beneath that LAG, we haven't escaped the
need to iterate through the LAG's lower interfaces. Except that is now
the responsibility of the switchdev driver, because the replication
helper just stopped half-way.
So, even though yes, FDB events on LAG bridge ports must be
special-cased, in the end it's simpler to let switchdev_handle_fdb_*
just iterate through the LAG port's switchdev lowers, and let the
switchdev driver figure out that those physical ports are under a LAG.
The switchdev_handle_fdb_event_to_device() helper takes a
"foreign_dev_check" callback so it can figure out whether @dev can
autonomously forward to @foreign_dev. DSA fills this method properly:
if the LAG is offloaded by another port in the same tree as @dev, then
it isn't foreign. If it is a software LAG, it is foreign - forwarding
happens in software.
Whether an interface is foreign or not decides whether the replication
helper will go through the LAG's switchdev lowers or not. Since the
lan966x doesn't properly fill this out, FDB events on software LAG
uppers will get called. By changing lan966x_foreign_dev_check(), we can
suppress them.
Whereas DSA will now start receiving FDB events for its offloaded LAG
uppers, so we need to return -EOPNOTSUPP, since we currently don't do
the right thing for them.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The main purpose of this change is to create a data structure for a LAG
as seen by DSA. This is similar to what we have for bridging - we pass a
copy of this structure by value to ->port_lag_join and ->port_lag_leave.
For now we keep the lag_dev, id and a reference count in it. Future
patches will add a list of FDB entries for the LAG (these also need to
be refcounted to work properly).
The LAG structure is created using dsa_port_lag_create() and destroyed
using dsa_port_lag_destroy(), just like we have for bridging.
Because now, the dsa_lag itself is refcounted, we can simplify
dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
the dst->lags array only as long as at least one port uses it. The
refcounting logic inside those functions can be removed now - they are
called only when we should perform the operation.
dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
structure instead of the lag_dev net_device.
dsa_lag_foreach_port() now takes the dsa_lag structure as argument.
dst->lags holds an array of dsa_lag structures.
dsa_lag_map() now also saves the dsa_lag->id value, so that linear
walking of dst->lags in drivers using dsa_lag_id() is no longer
necessary. They can just look at lag.id.
dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
which can be used by drivers to get the LAG ID assigned by DSA to a
given port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make the intent of the code more clear by using the dedicated helper for
iterating over the ports of a switch.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The DSA LAG API will be changed to become more similar with the bridge
data structures, where struct dsa_bridge holds an unsigned int num,
which is generated by DSA and is one-based. We have a similar thing
going with the DSA LAG, except that isn't stored anywhere, it is
calculated dynamically by dsa_lag_id() by iterating through dst->lags.
The idea of encoding an invalid (or not requested) LAG ID as zero for
the purpose of simplifying checks in drivers means that the LAG IDs
passed by DSA to drivers need to be one-based too. So back-and-forth
conversion is needed when indexing the dst->lags array, as well as in
drivers which assume a zero-based index.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in qca8k to "lag_dev".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation of converting struct net_device *dp->lag_dev into a
struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
all occurrences of the "lag" variable in mv88e6xxx to "lag_dev".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nguyen, Anthony L says:
====================
10GbE Intel Wired LAN Driver Updates 2022-02-23
Yang Li fixes incorrect indenting as reported by smatch for ixgbevf.
Piotr removes non-inclusive language from ixgbe driver.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbe: Remove non-inclusive language
ixgbevf: clean up some inconsistent indenting
====================
Link: https://lore.kernel.org/r/20220223185424.2129067-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.
It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.
Fixes: 76723bca28 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.
Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391
This reverts commit 3d2504663c.
Fixes: 3d2504663c (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 2afeec08ab.
The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).
This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit 1f2565780e.
The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.
Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.
More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch prepares the driver to use more than one RX-FIFO. Having a
bigger RX buffer is beneficial in high load situations, where the
system temporarily cannot keep up reading CAN frames from the chip.
Using a bigger RX buffer also allows to implement RX IRQ coalescing,
which will be added in a later patch series.
If using more than 1 RX-FIFO the driver has to figure out, which FIFOs
have RX'ed CAN frames pending. This is indicated by a set bit in the
RXIF register, which is positioned directly after the interrupt status
register INT. If more than 1 RX-FIFO is used, the driver reads both
registers in 1 transfer.
The mcp251xfd_handle_rxif() function iterates over all RX rings and
reads out the RX'ed CAN frames for for all pending FIFOs. To keep the
logic for the 1 RX-FIFO only case in mcp251xfd_handle_rxif() simple,
the driver marks that FIFO pending in mcp251xfd_ring_init().
The chip has a dedicated RX interrupt line to signal pending RX'ed
frames. If connected to an input GPIO and the driver will skip the
initial read of the interrupt status register (INT) and directly read
the pending RX'ed frames if the line is active. The driver assumes the
1st RX-FIFO pending (a read of the RXIF register would re-introduce
the skipped initial read of the INT register). Any other pending
RX-FIFO will be served in the main interrupt handler.
Link: https://lore.kernel.org/all/20220217103826.2299157-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The recent change of the order of the TX and RX FIFOs is not reflected
in the debug info of the FIFO setup. This patch adjust the order and
additionally prints the base address of each FIFO.
Since the mcp251xfd_ring_init() may fail due to wrongly configured
FIFOs, printing of the FIFO setup is moved there. In case of an error
it would not be printed in mcp251xfd_ring_init().
Link: https://lore.kernel.org/all/20220217103826.2299157-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
With this patch the usage of the on-chip RAM is checked. In the
current driver the FIFO setup is fixed and always fits into the RAM.
With an upcoming patch series the ring and FIFO setup will be more
dynamic. Although using more RAM than available should not happen, but
add this safety check, just in case.
Link: https://lore.kernel.org/all/20220217103826.2299157-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch actually changes the order of the TX and RX FIFOs.
This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.
Link: https://lore.kernel.org/all/20220217103826.2299157-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch improves the initialization of the TX and RX rings. The
initialization functions are now called with pointers to the next free
address (in the on chip RAM) and next free hardware FIFO. The rings
are initialized using these values and the pointers are modified to
point to the next free elements.
This means the order of the mcp251xfd_ring_init_*() functions
specifies the order of the rings in the hardware FIFO. This makes it
possible to change the order of the TX and RX FIFOs, which is done in
the next patch.
This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.
Link: https://lore.kernel.org/all/20220217103826.2299157-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch removes the hard coded assumption that the TX ring uses
hardware FIFO 1. This allows the hardware FIFO 1 to be used for RX and
the next free FIFO for TX.
This gives the opportunity to minimize the number of SPI transfers in
the IRQ handler. The read of the IRQ status register and RX FIFO
status registers can be combined into single SPI transfer. If the RX
ring uses FIFO 1, the overall length of the transfer is smaller than
in the original layout, where the RX FIFO comes after the TX FIFO.
Link: https://lore.kernel.org/all/20220217103826.2299157-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch prepares the mcp251xfd_chip_clock_init() function for PLL
support.
If the PLL is needed is must be switched on after chip reset. This
should be done in the mcp251xfd_chip_clock_init() function. Prepare
this function to wait for the OSC and PLL to be ready.
Link: https://lore.kernel.org/all/20220207131047.282110-14-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch changes the order of reading the Mode and Oscillator Ready
bits.
Instead of reading the Mode of the chip directly after reset, first
wait for the oscillator to get ready and the chip to fully start up.
Read the Mode after this.
Link: https://lore.kernel.org/all/20220207131047.282110-10-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The function mcp251xfd_chip_wait_for_osc_ready() polls the Oscillator
Control Register for the oscillator to get ready. By passing the
appropriate parameters (osc_reference and osc_mask) it can also poll
for PLL ready.
This patch adjusts the error message if the Oscillator and/or PLL fail
to get ready.
Link: https://lore.kernel.org/all/20220207131047.282110-9-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The function mcp251xfd_chip_wait_for_osc_ready() polls the Oscillator
Control Register for the oscillator to get ready.
This is the first register the driver reads from. Reading implausible
values (all bits set or unset) can be caused by the chip starting up
after power on, waking up after sleep, or by the chip not being preset
at all. Add check for implausible register content
mcp251xfd_reg_invalid() to the regmap_read_poll_timeout() loop.
In case of a regmap_read_poll_timeout() returns a fatal error (and not
a timeout), forward it to the caller.
As mcp251xfd_chip_wait_for_osc_ready() will be called after the probe
function has finished, (currently during ifup), move error message
about failed chip detection from there into the probe function.
Link: https://lore.kernel.org/all/20220207131047.282110-8-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The mcp251xfd_chip_stop() function tries the best to stop the chip and
put it into sleep mode. It continues, even if some intermediate steps
fail. As none of the callers use the return value, let this function
return void.
Link: https://lore.kernel.org/all/20220207131047.282110-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The mcp251xfd driver supports runtime PM enabled kernels, but also
works on !CONFIG_PM configurations.
This patch simplifies the runtime PM handling in the
mcp251xfd_unregister(). In the CONFIG_PM case, runtime PM has been
enabled in the mcp251xfd_probe() function, so we can disable it here.
For !CONFIG_PM builds call mcp251xfd_clks_and_vdd_disable() directly.
Link: https://lore.kernel.org/all/20220207131047.282110-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
MCP251XFD_REG_OSC is the first register the driver reads from. The
chip may be in deep sleep and the SPI transfer (i.e. the assertion of
the CS) will wake the chip up. This takes about 3ms. The CRC of this
transfer is wrong, or there isn't any chip at all, in this case the
CRC will be wrong, too. The driver ignores the CRC error and returns
the read data to the caller.
To avoid any confusion, this patch changes the
mcp251xfd_regmap_crc_read() function to only ignore the CRC error if
solely the OSC register is read. So when reading more than the OSC
registers at once, CRC errors are not ignored.
Link: https://lore.kernel.org/all/20220207131047.282110-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch renames mcp251xfd_osc_invalid() to mcp251xfd_reg_invalid(),
as it will be used for other registers than the "osc" register in a
later patch.
This patch also moves this function to more towards the beginning of
the file, to be available for other functions, too.
Link: https://lore.kernel.org/all/20220207131047.282110-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Completion Queue Entry(CQE) is a descriptor written
by hardware to notify software about the send and
receive completion status. The CQE can be of size
128 or 512 bytes. A 512 bytes CQE can hold more receive
fragments pointers compared to 128 bytes CQE. This
patch enables to modify CQE size using:
<ethtool -G cqe-size N>.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>