mirror of
https://github.com/torvalds/linux.git
synced 2024-11-27 06:31:52 +00:00
a563ae0ff6
1102 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Vladimir Oltean
|
43ba33b4f1 |
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
When setting up a bridge with stp_state 1, topology changes are not
detected and loops are not blocked. This is because the standard way of
transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the
right egress port, does not override the port STP state (in the case of
Ocelot switches, that's really the PGID_SRC masks).
To force a packet to be injected into a port that's BLOCKING, we must
send it as a control packet, which means in the case of this tagger to
send it using the manual register injection method. We already do this
for PTP frames, extend the logic to apply to any link-local MAC DA.
Fixes:
|
||
Vladimir Oltean
|
49f885b2d9 |
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.
This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.
The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.
We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...
We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.
Fixes:
|
||
Vladimir Oltean
|
deab6b1cd9 |
net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit |
||
Vladimir Oltean
|
4ac0567e40 |
net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.
The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.
Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.
Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.
Fixes:
|
||
Vladimir Oltean
|
28da0555c3 |
net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.
The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.
So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).
On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.
DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.
When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.
The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.
To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).
With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.
Fixes:
|
||
Alvin Šipraga
|
43a4b4dbd4 |
net: dsa: fix spurious error message when unoffloaded port leaves bridge
Flip the sign of a return value check, thereby suppressing the following
spurious error:
port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP
... which is emitted when removing an unoffloaded DSA switch port from a
bridge.
Fixes:
|
||
Vladimir Oltean
|
1951b3f19c |
net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol
It was a documented fact that ds->ops->change_tag_protocol() offered
rtnetlink mutex protection to the switch driver, since there was an
ASSERT_RTNL right before the call in dsa_switch_change_tag_proto()
(initiated from sysfs).
The blamed commit introduced another call path for
ds->ops->change_tag_protocol() which does not hold the rtnl_mutex.
This is:
dsa_tree_setup
-> dsa_tree_setup_switches
-> dsa_switch_setup
-> dsa_switch_setup_tag_protocol
-> ds->ops->change_tag_protocol()
-> dsa_port_setup
-> dsa_slave_create
-> register_netdevice(slave_dev)
-> dsa_tree_setup_master
-> dsa_master_setup
-> dev->dsa_ptr = cpu_dp
The reason why the rtnl_mutex is held in the sysfs call path is to
ensure that, once the master and all the DSA interfaces are down (which
is required so that no packets flow), they remain down during the
tagging protocol change.
The above calling order illustrates the fact that it should not be risky
to change the initial tagging protocol to the one specified in the
device tree at the given time:
- packets cannot enter the dsa_switch_rcv() packet type handler since
netdev_uses_dsa() for the master will not yet return true, since
dev->dsa_ptr has not yet been populated
- packets cannot enter the dsa_slave_xmit() function because no DSA
interface has yet been registered
So from the DSA core's perspective, holding the rtnl_mutex is indeed not
necessary.
Yet, drivers may need to do things which need rtnl_mutex protection. For
example:
felix_set_tag_protocol
-> felix_setup_tag_8021q
-> dsa_tag_8021q_register
-> dsa_tag_8021q_setup
-> dsa_tag_8021q_port_setup
-> vlan_vid_add
-> ASSERT_RTNL
These drivers do not really have a choice to take the rtnl_mutex
themselves, since in the sysfs case, the rtnl_mutex is already held.
Fixes:
|
||
Vladimir Oltean
|
5bded8259e |
net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports
Similar to commit |
||
Vladimir Oltean
|
c7709a02c1 |
net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
The present code is structured this way due to an incomplete thought
process. In Documentation/networking/switchdev.rst we document that if a
bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
port (or on the bridge itself, for that matter) should not affect the
ability to receive and transmit tagged or untagged packets.
If the bridge on behalf of which we are sending this packet is
VLAN-aware, then the TX forwarding offload API ensures that the skb will
be VLAN-tagged (if the packet was sent by user space as untagged, it
will get transmitted town to the driver as tagged with the bridge
device's pvid). But if the bridge is VLAN-unaware, it may or may not be
VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
idea that we should emulate what is being done in the VLAN-aware case.
But we shouldn't.
It appears that injecting packets using a VLAN ID of 0 serves the
purpose of forwarding the packets to the egress port with no VLAN tag
added or stripped by the hardware, and no filtering being performed.
So we can simply remove the superfluous logic.
One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n,
we call br_vlan_get_pvid_rcu() but that returns an error and we do error
out, dropping all packets on xmit. Not really smart. This is also an
issue when the user deletes the bridge pvid:
$ bridge vlan del dev br0 vid 1 self
As mentioned, in both cases, packets should still flow freely, and they
do just that on any net device where the bridge is not offloaded, but on
mv88e6xxx they don't.
Fixes:
|
||
Vladimir Oltean
|
1bec0f0506 |
net: dsa: fix bridge_num not getting cleared after ports leaving the bridge
The dp->bridge_num is zero-based, with -1 being the encoding for an
invalid value. But dsa_bridge_num_put used to check for an invalid value
by comparing bridge_num with 0, which is of course incorrect.
The result is that the bridge_num will never get cleared by
dsa_bridge_num_put, and further port joins to other bridges will get a
bridge_num larger than the previous one, and once all the available
bridges with TX forwarding offload supported by the hardware get
exhausted, the TX forwarding offload feature is simply disabled.
In the case of sja1105, 7 iterations of the loop below are enough to
exhaust the TX forwarding offload bits, and further bridge joins operate
without that feature.
ip link add br0 type bridge vlan_filtering 1
while :; do
ip link set sw0p2 master br0 && sleep 1
ip link set sw0p2 nomaster && sleep 1
done
This issue is enough of an indication that having the dp->bridge_num
invalid encoding be a negative number is prone to bugs, so this will be
changed to a one-based value, with the dp->bridge_num of zero being the
indication of no bridge. However, that is material for net-next.
Fixes:
|
||
Andrew Lunn
|
b44d52a50b |
dsa: tag_dsa: Fix mask for trunked packets
A packet received on a trunk will have bit 2 set in Forward DSA tagged
frame. Bit 1 can be either 0 or 1 and is otherwise undefined and bit 0
indicates the frame CFI. Masking with 7 thus results in frames as
being identified as being from a trunk when in fact they are not. Fix
the mask to just look at bit 2.
Fixes:
|
||
Vladimir Oltean
|
5135e96a3d |
net: dsa: don't allocate the slave_mii_bus using devres
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:
spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
-> spi_unregister_device(to_spi_device(dev));
-> device_del(&spi->dev);
So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:
i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
-> i2c_unregister_device(client);
-> device_unregister(&client->dev);
The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).
So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.
This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:
void mdiobus_free(struct mii_bus *bus)
{
/* For compatibility with error handling in drivers. */
if (bus->state == MDIOBUS_ALLOCATED) {
kfree(bus);
return;
}
BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
bus->state = MDIOBUS_RELEASED;
put_device(&bus->dev);
}
In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.
I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:
(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
located on a bus that unregisters its children on shutdown. After
the blamed patch, either both the alloc and the register should use
devres, or none should.
(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
mdiobus_unregister at all in the remove path. After the blamed
patch, nobody unregisters the MDIO bus anymore, so this is even more
buggy than the previous case which needs a specific bus
configuration to be seen, this one is an unconditional bug.
In this case, DSA falls into category (a), it tries to be helpful and
registers an MDIO bus on behalf of the switch, which might be on such a
bus. I've no idea why it does it under devres.
It does this on probe:
if (!ds->slave_mii_bus && ds->ops->phy_read)
alloc and register mdio bus
and this on remove:
if (ds->slave_mii_bus && ds->ops->phy_read)
unregister mdio bus
I _could_ imagine using devres because the condition used on remove is
different than the condition used on probe. So strictly speaking, DSA
cannot determine whether the ds->slave_mii_bus it sees on remove is the
ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
have solved that problem. But nonetheless, the existing code already
proceeds to unregister the MDIO bus, even though it might be
unregistering an MDIO bus it has never registered. So I can only guess
that no driver that implements ds->ops->phy_read also allocates and
registers ds->slave_mii_bus itself.
So in that case, if unregistering is fine, freeing must be fine too.
Stop using devres and free the MDIO bus manually. This will make devres
stop attempting to free a still registered MDIO bus on ->shutdown.
Fixes:
|
||
Vladimir Oltean
|
e5845aa0ea |
net: dsa: fix dsa_tree_setup error path
Since the blamed commit, dsa_tree_teardown_switches() was split into two
smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports.
However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports.
Fixes:
|
||
Vladimir Oltean
|
fd292c189a |
net: dsa: tear down devlink port regions when tearing down the devlink port on error
Commit |
||
Vladimir Oltean
|
0650bf52b3 |
net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.
What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.
This message can be seen in a loop, and it hangs the reboot process there:
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
So why 3?
A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:
dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold
So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.
Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.
It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.
So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.
However, complications arise really quickly.
The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).
Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.
So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.
So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.
The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.
The answer to that question lies in this comment above device_link_add:
* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).
so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.
Fixes:
|
||
Vladimir Oltean
|
3c9cfb5269 |
net: update NXP copyright text
NXP Legal insists that the following are not fine: - Saying "NXP Semiconductors" instead of "NXP", since the company's registered name is "NXP" - Putting a "(c)" sign in the copyright string - Putting a comma in the copyright string The only accepted copyright string format is "Copyright <year-range> NXP". This patch changes the copyright headers in the networking files that were sent by me, or derived from code sent by me. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
a57d8c217a |
net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
messages appear:
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
(and similarly for other ports)
What happens is that DSA has a policy "even if there are bugs, let's at
least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
dp->mdbs lists, which are supposed to be empty.
But deleting that cleanup code, the warnings go away.
=> the FDB and MDB lists (used for refcounting on shared ports, aka CPU
and DSA ports) will eventually be empty, but are not empty by the time
we tear down those ports. Aka we are deleting them too soon.
The addresses that DSA complains about are host-trapped addresses: the
local addresses of the ports, and the MAC address of the bridge device.
The problem is that offloading those entries happens from a deferred
work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
races with the teardown of the CPU and DSA ports where the refcounting
is kept.
In fact, not only it races, but fundamentally speaking, if we iterate
through the port list linearly, we might end up tearing down the shared
ports even before we delete a DSA user port which has a bridge upper.
So as it turns out, we need to first tear down the user ports (and the
unused ones, for no better place of doing that), then the shared ports
(the CPU and DSA ports). In between, we need to ensure that all work
items scheduled by our switchdev handlers (which only run for user
ports, hence the reason why we tear them down first) have finished.
Fixes:
|
||
Vladimir Oltean
|
6a52e73368 |
net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
DSA supports connecting to a phy-handle, and has a fallback to a non-OF
based method of connecting to an internal PHY on the switch's own MDIO
bus, if no phy-handle and no fixed-link nodes were present.
The -ENODEV error code from the first attempt (phylink_of_phy_connect)
is what triggers the second attempt (phylink_connect_phy).
However, when the first attempt returns a different error code than
-ENODEV, this results in an unbalance of calls to phylink_create and
phylink_destroy by the time we exit the function. The phylink instance
has leaked.
There are many other error codes that can be returned by
phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
So this is a practical issue too.
Fixes:
|
||
Linus Walleij
|
0e90dfa7a8 |
net: dsa: tag_rtl4_a: Fix egress tags
I noticed that only port 0 worked on the RTL8366RB since we
started to use custom tags.
It turns out that the format of egress custom tags is actually
different from ingress custom tags. While the lower bits just
contain the port number in ingress tags, egress tags need to
indicate destination port by setting the bit for the
corresponding port.
It was working on port 0 because port 0 added 0x00 as port
number in the lower bits, and if you do this the packet appears
at all ports, including the intended port. Ooops.
Fix this and all ports work again. Use the define for shifting
the "type A" into place while we're at it.
Tested on the D-Link DIR-685 by sending traffic to each of
the ports in turn. It works.
Fixes:
|
||
Vladimir Oltean
|
8ded916092 |
net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid
Introduced in commit
|
||
Vladimir Oltean
|
b0b8c67eaa |
net: dsa: sja1105: drop untagged packets on the CPU and DSA ports
The sja1105 driver is a bit special in its use of VLAN headers as DSA tags. This is because in VLAN-aware mode, the VLAN headers use an actual TPID of 0x8100, which is understood even by the DSA master as an actual VLAN header. Furthermore, control packets such as PTP and STP are transmitted with no VLAN header as a DSA tag, because, depending on switch generation, there are ways to steer these control packets towards a precise egress port other than VLAN tags. Transmitting control packets as untagged means leaving a door open for traffic in general to be transmitted as untagged from the DSA master, and for it to traverse the switch and exit a random switch port according to the FDB lookup. This behavior is a bit out of line with other DSA drivers which have native support for DSA tagging. There, it is to be expected that the switch only accepts DSA-tagged packets on its CPU port, dropping everything that does not match this pattern. We perhaps rely a bit too much on the switches' hardware dropping on the CPU port, and place no other restrictions in the kernel data path to avoid that. For example, sja1105 is also a bit special in that STP/PTP packets are transmitted using "management routes" (sja1105_port_deferred_xmit): when sending a link-local packet from the CPU, we must first write a SPI message to the switch to tell it to expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route it towards port 3 when it gets it. This entry expires as soon as it matches a packet received by the switch, and it needs to be reinstalled for the next packet etc. All in all quite a ghetto mechanism, but it is all that the sja1105 switches offer for injecting a control packet. The driver takes a mutex for serializing control packets and making the pairs of SPI writes of a management route and its associated skb atomic, but to be honest, a mutex is only relevant as long as all parties agree to take it. With the DSA design, it is possible to open an AF_PACKET socket on the DSA master net device, and blast packets towards 01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use, it all goes kaput because management routes installed by the driver will match skbs sent by the DSA master, and not skbs generated by the driver itself. So they will end up being routed on the wrong port. So through the lens of that, maybe it would make sense to avoid that from happening by doing something in the network stack, like: introduce a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around dev_hard_start_xmit(), introduce the following check: if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa) kfree_skb(skb); Ok, maybe that is a bit drastic, but that would at least prevent a bunch of problems. For example, right now, even though the majority of DSA switches drop packets without DSA tags sent by the DSA master (and therefore the majority of garbage that user space daemons like avahi and udhcpcd and friends create), it is still conceivable that an aggressive user space program can open an AF_PACKET socket and inject a spoofed DSA tag directly on the DSA master. We have no protection against that; the packet will be understood by the switch and be routed wherever user space says. Furthermore: there are some DSA switches where we even have register access over Ethernet, using DSA tags. So even user space drivers are possible in this way. This is a huge hole. However, the biggest thing that bothers me is that udhcpcd attempts to ask for an IP address on all interfaces by default, and with sja1105, it will attempt to get a valid IP address on both the DSA master as well as on sja1105 switch ports themselves. So with IP addresses in the same subnet on multiple interfaces, the routing table will be messed up and the system will be unusable for traffic until it is configured manually to not ask for an IP address on the DSA master itself. It turns out that it is possible to avoid that in the sja1105 driver, at least very superficially, by requesting the switch to drop VLAN-untagged packets on the CPU port. With the exception of control packets, all traffic originated from tag_sja1105.c is already VLAN-tagged, so only STP and PTP packets need to be converted. For that, we need to uphold the equivalence between an untagged and a pvid-tagged packet, and to remember that the CPU port of sja1105 uses a pvid of 4095. Now that we drop untagged traffic on the CPU port, non-aggressive user space applications like udhcpcd stop bothering us, and sja1105 effectively becomes just as vulnerable to the aggressive kind of user space programs as other DSA switches are (ok, users can also create 8021q uppers on top of the DSA master in the case of sja1105, but in future patches we can easily deny that, but it still doesn't change the fact that VLAN-tagged packets can still be injected over raw sockets). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
58adf9dcb1 |
net: dsa: let drivers state that they need VLAN filtering while standalone
As explained in commit
|
||
Vladimir Oltean
|
06cfb2df7e |
net: dsa: don't advertise 'rx-vlan-filter' when not needed
There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit |
||
Vladimir Oltean
|
67b5fb5db7 |
net: dsa: properly fall back to software bridging
If the driver does not implement .port_bridge_{join,leave}, then we must fall back to standalone operation on that port, and trigger the error path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL. In turn, having a non-NULL dp->bridge_dev when there is no offloading support makes the following things go wrong: - dsa_default_offload_fwd_mark make the wrong decision in setting skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for ports that don't offload the bridge, which should instruct the bridge to forward in software. But this does not happen, dp->bridge_dev is incorrectly set to point to the bridge, so the bridge is told that packets have been forwarded in hardware, which they haven't. - switchdev objects (MDBs, VLANs) should not be offloaded by ports that don't offload the bridge. Standalone ports should behave as packet-in, packet-out and the bridge should not be able to manipulate the pvid of the port, or tag stripping on egress, or ingress filtering. This should already work fine because dsa_slave_port_obj_add has: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); but since dsa_port_offloads_bridge_port works based on dp->bridge_dev, this is again sabotaging us. All the above work in case the port has an unoffloaded LAG interface, so this is well exercised code, we should apply it for plain unoffloaded bridge ports too. Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
09dba21b43 |
net: dsa: don't call switchdev_bridge_port_unoffload for unoffloaded bridge ports
For ports that have a NULL dp->bridge_dev, dsa_port_to_bridge_port()
also returns NULL as expected.
Issue #1 is that we are performing a NULL pointer dereference on brport_dev.
Issue #2 is that these are ports on which switchdev_bridge_port_offload
has not been called, so we should not call switchdev_bridge_port_unoffload
on them either.
Both issues are addressed by checking against a NULL brport_dev in
dsa_port_pre_bridge_leave and exiting early.
Fixes:
|
||
Vladimir Oltean
|
f5e165e72b |
net: dsa: track unique bridge numbers across all DSA switch trees
Right now, cross-tree bridging setups work somewhat by mistake. In the case of cross-tree bridging with sja1105, all switch instances need to agree upon a common VLAN ID for forwarding a packet that belongs to a certain bridging domain. With TX forwarding offload, the VLAN ID is the bridge VLAN for VLAN-aware bridging, and the tag_8021q TX forwarding offload VID (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging. The VBID for VLAN-unaware bridging is derived from the dp->bridge_num value calculated by DSA independently for each switch tree. If ports from one tree join one bridge, and ports from another tree join another bridge, DSA will assign them the same bridge_num, even though the bridges are different. If cross-tree bridging is supported, this is an issue. Modify DSA to calculate the bridge_num globally across all switch trees. This has the implication for a driver that the dp->bridge_num value that DSA will assign to its ports might not be contiguous, if there are boards with multiple DSA drivers instantiated. Additionally, all bridge_num values eat up towards each switch's ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate, and can be seen as a limitation introduced by this patch. However, that is the lesser evil for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
994d2cbb08 |
net: dsa: tag_sja1105: be dsa_loop-safe
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
b2b8913341 |
net: dsa: tag_8021q: fix notifiers broadcast when they shouldn't, and vice versa
During the development of the blamed patch, the "bool broadcast"
argument of dsa_port_tag_8021q_vlan_{add,del} was originally called
"bool local", and the meaning was the exact opposite.
Due to a rookie mistake where the patch was modified at the last minute
without retesting, the instances of dsa_port_tag_8021q_vlan_{add,del}
are called with the wrong values. During setup and teardown, cross-chip
notifiers should not be broadcast to all DSA trees, while during
bridging, they should.
Fixes:
|
||
Jakub Kicinski
|
f4083a752a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h |
||
Vladimir Oltean
|
724395f4dc |
net: dsa: tag_8021q: don't broadcast during setup/teardown
Currently, on my board with multiple sja1105 switches in disjoint trees
described in commit
|
||
Vladimir Oltean
|
ab97462beb |
net: dsa: print more information when a cross-chip notifier fails
Currently this error message does not say a lot: [ 32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT but in this form, it is immediately obvious (at least to me) what the problem is, even without further looking at the code: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
a72808b658 |
net: dsa: create a helper for locating EtherType DSA headers on TX
Create a similar helper for locating the offset to the DSA header relative to skb->data, and make the existing EtherType header taggers to use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
5d928ff486 |
net: dsa: create a helper for locating EtherType DSA headers on RX
It seems that protocol tagging driver writers are always surprised about the formula they use to reach their EtherType header on RX, which becomes apparent from the fact that there are comments in multiple drivers that mention the same information. Create a helper that returns a void pointer to skb->data - 2, as well as centralize the explanation why that is the case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
6bef794da6 |
net: dsa: create a helper which allocates space for EtherType DSA headers
Hide away the memmove used by DSA EtherType header taggers to shift the MAC SA and DA to the left to make room for the header, after they've called skb_push(). The call to skb_push() is still left explicit in drivers, to be symmetric with dsa_strip_etype_header, and because not all callers can be refactored to do it (for example, brcm_tag_xmit_ll has common code for a pre-Ethernet DSA tag and an EtherType DSA tag). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
f1dacd7aea |
net: dsa: create a helper that strips EtherType DSA headers on RX
All header taggers open-code a memmove that is fairly not all that obvious, and we can hide the details behind a helper function, since the only thing specific to the driver is the length of the header tag. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
c35b57ceff |
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
Fixes:
|
||
Leon Romanovsky
|
919d13a7e4 |
devlink: Set device as early as possible
All kernel devlink implementations call to devlink_alloc() during initialization routine for specific device which is used later as a parent device for devlink_register(). Such late device assignment causes to the situation which requires us to call to device_register() before setting other parameters, but that call opens devlink to the world and makes accessible for the netlink users. Any attempt to move devlink_register() to be the last call generates the following error due to access to the devlink->dev pointer. [ 8.758862] devlink_nl_param_fill+0x2e8/0xe50 [ 8.760305] devlink_param_notify+0x6d/0x180 [ 8.760435] __devlink_params_register+0x2f1/0x670 [ 8.760558] devlink_params_register+0x1e/0x20 The simple change of API to set devlink device in the devlink_alloc() instead of devlink_register() fixes all this above and ensures that prior to call to devlink_register() everything already set. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
bee7c577e6 |
net: dsa: avoid fast ageing twice when port leaves a bridge
Drivers that support both the toggling of address learning and dynamic FDB flushing (mv88e6xxx, b53, sja1105) currently need to fast-age a port twice when it leaves a bridge: - once, when del_nbp() calls br_stp_disable_port() which puts the port in the BLOCKING state - twice, when dsa_port_switchdev_unsync_attrs() calls dsa_port_clear_brport_flags() which disables address learning The knee-jerk reaction might be to say "dsa_port_clear_brport_flags does not need to fast-age the port at all", but the thing is, we still need both code paths to flush the dynamic FDB entries in different situations. When a DSA switch port leaves a bonding/team interface that is (still) a bridge port, no del_nbp() will be called, so we rely on dsa_port_clear_brport_flags() function to restore proper standalone port functionality with address learning disabled. So the solution is just to avoid double the work when both code paths are called in series. Luckily, DSA already caches the STP port state, so we can skip flushing the dynamic FDB when we disable address learning and the STP state is one where no address learning takes place at all. Under that condition, not flushing the FDB is safe because there is supposed to not be any dynamic FDB entry at all (they were flushed during the transition towards that state, and none were learned in the meanwhile). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
a4ffe09fc2 |
net: dsa: still fast-age ports joining a bridge if they can't configure learning
Commit
|
||
Vladimir Oltean
|
9264e4ad26 |
net: dsa: flush the dynamic FDB of the software bridge when fast ageing a port
Currently, when DSA performs fast ageing on a port, 'bridge fdb' shows us that the 'self' entries (corresponding to the hardware bridge, as printed by dsa_slave_fdb_dump) are deleted, but the 'master' entries (corresponding to the software bridge) aren't. Indeed, searching through the bridge driver, neither the brport_attr_learning handler nor the IFLA_BRPORT_LEARNING handler call br_fdb_delete_by_port. However, br_stp_disable_port does, which is one of the paths which DSA uses to trigger a fast ageing process anyway. There is, however, one other very promising caller of br_fdb_delete_by_port, and that is the bridge driver's handler of the SWITCHDEV_FDB_FLUSH_TO_BRIDGE atomic notifier. Currently the s390/qeth HiperSockets card driver is the only user of this. I can't say I understand that driver's architecture or interaction with the bridge, but it appears to not be a switchdev driver in the traditional sense of the word. Nonetheless, the mechanism it provides is a useful way for DSA to express the fact that it performs fast ageing too, in a way that does not change the existing behavior for other drivers. Cc: Alexandra Winter <wintera@linux.ibm.com> Cc: Julian Wiedmann <jwi@linux.ibm.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
4eab90d973 |
net: dsa: don't fast age bridge ports with learning turned off
On topology changes, stations that were dynamically learned on ports that are no longer part of the active topology must be flushed - this is described by clause "17.11 Updating learned station location information" of IEEE 802.1D-2004. However, when address learning on the bridge port is turned off in the first place, there is nothing to flush, so skip a potentially expensive operation. We can finally do this now since DSA is aware of the learning state of its bridged ports. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
045c45d1f5 |
net: dsa: centralize fast ageing when address learning is turned off
Currently DSA leaves it down to device drivers to fast age the FDB on a port when address learning is disabled on it. There are 2 reasons for doing that in the first place: - when address learning is disabled by user space, through IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user space typically wants to achieve is to operate in a mode with no dynamic FDB entry on that port. But if the port is already up, some addresses might have been already learned on it, and it seems silly to wait for 5 minutes for them to expire until something useful can be done. - when a port leaves a bridge and becomes standalone, DSA turns off address learning on it. This also has the nice side effect of flushing the dynamically learned bridge FDB entries on it, which is a good idea because standalone ports should not have bridge FDB entries on them. We let drivers manage fast ageing under this condition because if DSA were to do it, it would need to track each port's learning state, and act upon the transition, which it currently doesn't. But there are 2 reasons why doing it is better after all: - drivers might get it wrong and not do it (see b53_port_set_learning) - we would like to flush the dynamic entries from the software bridge too, and letting drivers do that would be another pain point So track the port learning state and trigger a fast age process automatically within DSA. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
39f3210154 |
net: dsa: don't fast age standalone ports
DSA drives the procedure to flush dynamic FDB entries from a port based
on the change of STP state: whenever we go from a state where address
learning is enabled (LEARNING, FORWARDING) to a state where it isn't
(LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic
entries.
However, there are cases when this is not needed. Internally, when a
DSA switch interface is not under a bridge, DSA still keeps it in the
"FORWARDING" STP state. And when that interface joins a bridge, the
bridge will meticulously iterate that port through all STP states,
starting with BLOCKING and ending with FORWARDING. Because there is a
state transition from the standalone version of FORWARDING into the
temporary BLOCKING bridge port state, DSA calls the fast age procedure.
Since commit
|
||
Vladimir Oltean
|
c73c57081b |
net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier
Commit |
||
Vladimir Oltean
|
7df4e74494 |
net: dsa: stop syncing the bridge mcast_router attribute at join time
Qingfang points out that when a bridge with the default settings is
created and a port joins it:
ip link add br0 type bridge
ip link set swp0 master br0
DSA calls br_multicast_router() on the bridge to see if the br0 device
is a multicast router port, and if it is, it enables multicast flooding
to the CPU port, otherwise it disables it.
If we look through the multicast_router_show() sysfs or at the
IFLA_BR_MCAST_ROUTER netlink attribute, we see that the default mrouter
attribute for the bridge device is "1" (MDB_RTR_TYPE_TEMP_QUERY).
However, br_multicast_router() will return "0" (MDB_RTR_TYPE_DISABLED),
because an mrouter port in the MDB_RTR_TYPE_TEMP_QUERY state may not be
actually _active_ until it receives an actual IGMP query. So, the
br_multicast_router() function should really have been called
br_multicast_router_active() perhaps.
When/if an IGMP query is received, the bridge device will transition via
br_multicast_mark_router() into the active state until the
ip4_mc_router_timer expires after an multicast_querier_interval.
Of course, this does not happen if the bridge is created with an
mcast_router attribute of "2" (MDB_RTR_TYPE_PERM).
The point is that in lack of any IGMP query messages, and in the default
bridge configuration, unregistered multicast packets will not be able to
reach the CPU port through flooding, and this breaks many use cases
(most obviously, IPv6 ND, with its ICMP6 neighbor solicitation multicast
messages).
Leave the multicast flooding setting towards the CPU port down to a driver
level decision.
Fixes:
|
||
Vladimir Oltean
|
f8b17a0bd9 |
net: dsa: tag_sja1105: optionally build as module when switch driver is module if PTP is enabled
TX timestamps are sent by SJA1110 as Ethernet packets containing
metadata, so they are received by the tagging driver but must be
processed by the switch driver - the one that is stateful since it
keeps the TX timestamp queue.
This means that there is an sja1110_process_meta_tstamp() symbol
exported by the switch driver which is called by the tagging driver.
There is a shim definition for that function when the switch driver is
not compiled, which does nothing, but that shim is not effective when
the tagging protocol driver is built-in and the switch driver is a
module, because built-in code cannot call symbols exported by modules.
So add an optional dependency between the tagger and the switch driver,
if PTP support is enabled in the switch driver. If PTP is not enabled,
sja1110_process_meta_tstamp() will translate into the shim "do nothing
with these meta frames" function.
Fixes:
|
||
Vladimir Oltean
|
2c0b03258b |
net: dsa: give preference to local CPU ports
Be there an "H" switch topology, where there are 2 switches connected as follows: eth0 eth1 | | CPU port CPU port | DSA link | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 -------- sw1p4 sw1p3 sw1p2 sw1p1 sw1p0 | | | | | | user user user user user user port port port port port port basically one where each switch has its own CPU port for termination, but there is also a DSA link in case packets need to be forwarded in hardware between one switch and another. DSA insists to see this as a daisy chain topology, basically registering all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding eth1 as a valid DSA master. This is only half the story, since when asked using dsa_port_is_cpu(), DSA will respond that sw1p1 is a CPU port, however one which has no dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used. Furthermore, be there a driver for switches which support only one upstream port. This driver iterates through its ports and checks using dsa_is_upstream_port() whether the current port is an upstream one. For switch 1, two ports pass the "is upstream port" checks: - sw1p4 is an upstream port because it is a routing port towards the dedicated CPU port assigned using dsa_tree_setup_default_cpu() - sw1p1 is also an upstream port because it is a CPU port, albeit one that is disabled. This is because dsa_upstream_port() returns: if (!cpu_dp) return port; which means that if @dp does not have a ->cpu_dp pointer (which is a characteristic of CPU ports themselves as well as unused ports), then @dp is its own upstream port. So the driver for switch 1 rightfully says: I have two upstream ports, but I don't support multiple upstream ports! So let me error out, I don't know which one to choose and what to do with the other one. Generally I am against enforcing any default policy in the kernel in terms of user to CPU port assignment (like round robin or such) but this case is different. To solve the conundrum, one would have to: - Disable sw1p1 in the device tree or mark it as "not a CPU port" in order to comply with DSA's view of this topology as a daisy chain, where the termination traffic from switch 1 must pass through switch 0. This is counter-productive because it wastes 1Gbps of termination throughput in switch 1. - Disable the DSA link between sw0p4 and sw1p4 and do software forwarding between switch 0 and 1, and basically treat the switches as part of disjoint switch trees. This is counter-productive because it wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1. - Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could work, but it makes cross-chip bridging impossible. In this setup we would need to have 2 separate bridges, br0 spanning the ports of switch 0, and br1 spanning the ports of switch 1, and the "DSA links treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are the gateway ports between one bridge and another. This is hard to manage from a user's perspective, who wants to have a unified view of the switching fabric and the ability to transparently add ports to the same bridge. VLANs would also need to be explicitly managed by the user on these gateway ports. So it seems that the only reasonable thing to do is to make DSA prefer CPU ports that are local to the switch. Meaning that by default, the user and DSA ports of switch 0 will get assigned to the CPU port from switch 0 (sw0p1) and the user and DSA ports of switch 1 will get assigned to the CPU port from switch 1. The way this solves the problem is that sw1p4 is no longer an upstream port as far as switch 1 is concerned (it no longer views sw0p1 as its dedicated CPU port). So here we are, the first multi-CPU port that DSA supports is also perhaps the most uneventful one: the individual switches don't support multiple CPUs, however the DSA switch tree as a whole does have multiple CPU ports. No user space assignment of user ports to CPU ports is desirable, necessary, or possible. Ports that do not have a local CPU port (say there was an extra switch hanging off of sw0p0) default to the standard implementation of getting assigned to the first CPU port of the DSA switch tree. Is that good enough? Probably not (if the downstream switch was hanging off of switch 1, we would most certainly prefer its CPU port to be sw1p1), but in order to support that use case too, we would need to traverse the dst->rtable in search of an optimum dedicated CPU port, one that has the smallest number of hops between dp->ds and dp->cpu_dp->ds. At the moment, the DSA routing table structure does not keep the number of hops between dl->dp and dl->link_dp, and while it is probably deducible, there is zero justification to write that code now. Let's hope DSA will never have to support that use case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
0e8eb9a16e |
net: dsa: rename teardown_default_cpu to teardown_cpu_ports
There is nothing specific to having a default CPU port to what dsa_tree_teardown_default_cpu() does. Even with multiple CPU ports, it would do the same thing: iterate through the ports of this switch tree and reset the ->cpu_dp pointer to NULL. So rename it accordingly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
421297efe6 |
net: dsa: tag_sja1105: consistently fail with arbitrary input
Dan Carpenter's smatch tests report that the "vid" variable, populated by sja1105_vlan_rcv when an skb is received by the tagger that has a VLAN ID which cannot be decoded by tag_8021q, may be uninitialized when used here: if (source_port == -1 || switch_id == -1) skb->dev = dsa_find_designated_bridge_port_by_vid(netdev, vid); The sja1105 driver, by construction, sets up the switch in a way that all data plane packets sent towards the CPU port are VLAN-tagged. So it is practically impossible, in a functional system, for a packet to be processed by sja1110_rcv() which is not a control packet and does not have a VLAN header either. However, it would be nice if the sja1105 tagging driver could consistently do something valid, for example fail, even if presented with packets that do not hold valid sja1105 tags. Currently it is a bit hard to argue that it does that, given the fact that a data plane packet with no VLAN tag will trigger a call to dsa_find_designated_bridge_port_by_vid with a vid argument that is an uninitialized stack variable. To fix this, we can initialize the u16 vid variable with 0, a value that can never be a bridge VLAN, so dsa_find_designated_bridge_port_by_vid will always return a NULL skb->dev. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210802195137.303625-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Vladimir Oltean
|
29a097b774 |
net: dsa: remove the struct packet_type argument from dsa_device_ops::rcv()
No tagging driver uses this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |