linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-25 13:41:51 +00:00

Author	SHA1	Message	Date
Vladimir Oltean	4f03dcc6b9	net: dsa: existing DSA masters cannot join upper interfaces All the traffic to/from a DSA master is supposed to be distributed among its DSA switch upper interfaces, so we should not allow other upper device kinds. An exception to this is DSA_TAG_PROTO_NONE (switches with no DSA tags), and in that case it is actually expected to create e.g. VLAN interfaces on the master. But for those, netdev_uses_dsa(master) returns false, so the restriction doesn't apply. The motivation for this change is to allow LAG interfaces of DSA masters to be DSA masters themselves. We want to restrict the user's degrees of freedom by 1: the LAG should already have all DSA masters as lowers, and while lower ports of the LAG can be removed, none can be added after the fact. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	920a33cd72	net: bridge: move DSA master bridging restriction to DSA When DSA gains support for multiple CPU ports in a LAG, it will become mandatory to monitor the changeupper events for the DSA master. In fact, there are already some restrictions to be imposed in that area, namely that a DSA master cannot be a bridge port except in some special circumstances. Centralize the restrictions at the level of the DSA layer as a preliminary step. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	0498277ee1	net: dsa: don't stop at NOTIFY_OK when calling ds->ops->port_prechangeupper dsa_slave_prechangeupper_sanity_check() is supposed to enforce some adjacency restrictions, and calls ds->ops->port_prechangeupper if the driver implements it. We convert the error code from the port_prechangeupper() call to a notifier code, and 0 is converted to NOTIFY_OK, but the caller of dsa_slave_prechangeupper_sanity_check() stops at any notifier code different from NOTIFY_DONE. Avoid this by converting back the notifier code to an error code, so that both NOTIFY_OK and NOTIFY_DONE will be seen as 0. This allows more parallel sanity check functions to be added. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	4c3f80d22b	net: dsa: walk through all changeupper notifier functions Traditionally, DSA has had a single netdev notifier handling function for each device type. For the sake of code cleanliness, we would like to introduce more handling functions which do one thing, but the conditions for entering these functions start to overlap. Example: a handling function which tracks whether any bridges contain both DSA and non-DSA interfaces. Either this is placed before dsa_slave_changeupper(), case in which it will prevent that function from executing, or we place it after dsa_slave_changeupper(), case in which we will prevent it from executing. The other alternative is to ignore errors from the new handling function (not ideal). To support this usage, we need to change the pattern. In the new model, we enter all notifier handling sub-functions, and exit with NOTIFY_DONE if there is nothing to do. This allows the sub-functions to be relatively free-form and independent from each other. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-08-23 11:39:22 +02:00
Vladimir Oltean	b18e04e362	net: dsa: tag_8021q: remove old comment regarding dsa_8021q_netdev_ops Since commit `129bd7ca8a` ("net: dsa: Prevent usage of NET_DSA_TAG_8021Q as tagging protocol"), dsa_8021q_netdev_ops no longer exists, so remove the comment that talks about it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220818143808.2808393-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 18:11:58 -07:00
Wolfram Sang	e4d44b3d27	dsa: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220818210216.8419-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 18:06:35 -07:00
Vladimir Oltean	e09e987315	net: dsa: make phylink-related OF properties mandatory on DSA and CPU ports Early DSA drivers were kind of simplistic in that they assumed a fairly narrow hardware layout. User ports would have integrated PHYs at an internal MDIO address that is derivable from the port number, and shared (DSA and CPU) ports would have an MII-style (serial or parallel) connection to another MAC. Phylib and then phylink were used to drive the internal PHYs, and this needed little to no description through the platform data structures. Bringing up the shared ports at the maximum supported link speed was the responsibility of the drivers. As a result of this, when these early drivers were converted from platform data to the new DSA OF bindings, there was no link information translated into the first DT bindings. https://lore.kernel.org/all/YtXFtTsf++AeDm1l@lunn.ch/ Later, phylink was adopted for shared ports as well, and today we have a workaround in place, introduced by commit `a20f997010` ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed"). There, DSA checks for the presence of phy-handle/fixed-link/managed OF properties, and if missing, phylink registration would be skipped. This is because phylink is optional for some drivers (the shared ports already work without it), but the process of starting to register a port with phylink is irreversible: if phylink_create() fails to find the fwnode properties it needs, it bails out and it leaves the ports inoperational (because phylink expects ports to be initially down, so DSA necessarily takes them down, and doesn't know how to put them back up again). DSA being a common framework, new drivers opt into this workaround willy-nilly, but the ideal behavior from the DSA core's side would have been to not interfere with phylink's process of failing at all. This isn't possible because of regression concerns with pre-phylink DT blobs, but at least DSA should put a stop to the proliferation of more of such cases that rely on the workaround to skip phylink registration, and sanitize the environment that new drivers work in. To that end, create a list of compatible strings for which the workaround is preserved, and don't apply the workaround for any drivers outside that list (this includes new drivers). In some cases, we make the assumption that even existing drivers don't rely on DSA's workaround, and we do this by looking at the device trees in which they appear. We can't fully know what is the situation with downstream DT blobs, but we can guess the overall trend by studying the DT blobs that were submitted upstream. If there are upstream blobs that have lacking descriptions, we take it as very likely that there are many more downstream blobs that do so too. If all upstream blobs have complete descriptions, we take that as a hint that the driver is a candidate for enforcing strict DT bindings (considering that most bindings are copy-pasted). If there are no upstream DT blobs, we take the conservative route of allowing the workaround, unless the driver maintainer instructs us otherwise. The driver situation is as follows: ar9331 ~~~~~~ compatible strings: - qca,ar9331-switch 1 occurrence in mainline device trees, part of SoC dtsi (arch/mips/boot/dts/qca/ar9331.dtsi), description is not problematic. Verdict: opt into strict DT bindings and out of workarounds. b53 ~~~ compatible strings: - brcm,bcm5325 - brcm,bcm53115 - brcm,bcm53125 - brcm,bcm53128 - brcm,bcm5365 - brcm,bcm5389 - brcm,bcm5395 - brcm,bcm5397 - brcm,bcm5398 - brcm,bcm53010-srab - brcm,bcm53011-srab - brcm,bcm53012-srab - brcm,bcm53018-srab - brcm,bcm53019-srab - brcm,bcm5301x-srab - brcm,bcm11360-srab - brcm,bcm58522-srab - brcm,bcm58525-srab - brcm,bcm58535-srab - brcm,bcm58622-srab - brcm,bcm58623-srab - brcm,bcm58625-srab - brcm,bcm88312-srab - brcm,cygnus-srab - brcm,nsp-srab - brcm,omega-srab - brcm,bcm3384-switch - brcm,bcm6328-switch - brcm,bcm6368-switch - brcm,bcm63xx-switch I've found at least these mainline DT blobs with problems: arch/arm/boot/dts/bcm47094-linksys-panamera.dts - lacks phy-mode arch/arm/boot/dts/bcm47189-tenda-ac9.dts - lacks phy-mode and fixed-link arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts - lacks phy-mode and fixed-link arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts arch/arm/boot/dts/bcm953012er.dts arch/arm/boot/dts/bcm4708-netgear-r6250.dts arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts - lacks phy-mode and fixed-link arch/arm/boot/dts/bcm53016-meraki-mr32.dts - lacks phy-mode Verdict: opt into DSA workarounds. bcm_sf2 ~~~~~~~ compatible strings: - brcm,bcm4908-switch - brcm,bcm7445-switch-v4.0 - brcm,bcm7278-switch-v4.0 - brcm,bcm7278-switch-v4.8 A single occurrence in mainline (arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi), part of a SoC dtsi, valid description. Florian Fainelli explains that most of the bcm_sf2 device trees lack a full description for the internal IMP ports. Verdict: opt the BCM4908 into strict DT bindings, and opt the rest into the workarounds. Note that even though BCM4908 has strict DT bindings, it still does not register with phylink on the IMP port due to it implementing ->adjust_link(). hellcreek ~~~~~~~~~ compatible strings: - hirschmann,hellcreek-de1soc-r1 No occurrence in mainline device trees. Kurt Kanzenbach explains that the downstream device trees lacked phy-mode and fixed link, and needed work, but were fixed in the meantime. Verdict: opt into strict DT bindings and out of workarounds. lan9303 ~~~~~~~ compatible strings: - smsc,lan9303-mdio - smsc,lan9303-i2c 1 occurrence in mainline device trees: arch/arm/boot/dts/imx53-kp-hsc.dts - no phy-mode, no fixed-link Verdict: opt out of strict DT bindings and into workarounds. lantiq_gswip ~~~~~~~~~~~~ compatible strings: - lantiq,xrx200-gswip - lantiq,xrx300-gswip - lantiq,xrx330-gswip No occurrences in mainline device trees. Martin Blumenstingl confirms that the downstream OpenWrt device trees lack a proper fixed-link and need work, and that the incomplete description can even be seen in the example from Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt. Verdict: opt out of strict DT bindings and into workarounds. microchip ksz ~~~~~~~~~~~~~ compatible strings: - microchip,ksz8765 - microchip,ksz8794 - microchip,ksz8795 - microchip,ksz8863 - microchip,ksz8873 - microchip,ksz9477 - microchip,ksz9897 - microchip,ksz9893 - microchip,ksz9563 - microchip,ksz8563 - microchip,ksz9567 - microchip,lan9370 - microchip,lan9371 - microchip,lan9372 - microchip,lan9373 - microchip,lan9374 5 occurrences in mainline device trees, all descriptions are valid. But we had a snafu for the ksz8795 and ksz9477 drivers where the phy-mode property would be expected to be located directly under the 'switch' node rather than under a port OF node. It was fixed by commit `edecfa98f6` ("net: dsa: microchip: look for phy-mode in port nodes"). The driver still has compatibility with the old DT blobs. The lan937x support was added later than the above snafu was fixed, and even though it has support for the broken DT blobs by virtue of sharing a common probing function, I'll take it that its DT blobs are correct. Verdict: opt lan937x into strict DT bindings, and the others out. mt7530 ~~~~~~ compatible strings - mediatek,mt7621 - mediatek,mt7530 - mediatek,mt7531 Multiple occurrences in mainline device trees, one is part of an SoC dtsi (arch/mips/boot/dts/ralink/mt7621.dtsi), all descriptions are fine. Verdict: opt into strict DT bindings and out of workarounds. mv88e6060 ~~~~~~~~~ compatible string: - marvell,mv88e6060 no occurrences in mainline, nobody knows anybody who uses it. Verdict: opt out of strict DT bindings and into workarounds. mv88e6xxx ~~~~~~~~~ compatible strings: - marvell,mv88e6085 - marvell,mv88e6190 - marvell,mv88e6250 Device trees that have incomplete descriptions of CPU or DSA ports: arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi - lacks phy-mode arch/arm64/boot/dts/marvell/cn9130-crb.dtsi - lacks phy-mode and fixed-link arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts - lacks phy-mode arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts - lacks phy-mode arch/arm/boot/dts/vf610-zii-spb4.dts - lacks phy-mode arch/arm/boot/dts/vf610-zii-cfu1.dts - lacks phy-mode arch/arm/boot/dts/vf610-zii-dev-rev-c.dts - lacks phy-mode on CPU port, fixed-link on DSA ports arch/arm/boot/dts/vf610-zii-dev-rev-b.dts - lacks phy-mode on CPU port arch/arm/boot/dts/armada-381-netgear-gs110emx.dts - lacks phy-mode arch/arm/boot/dts/vf610-zii-scu4-aib.dts - lacks fixed-link on xgmii DSA ports and/or in-band-status on 2500base-x DSA ports, and phy-mode on CPU port arch/arm/boot/dts/imx6qdl-gw5904.dtsi - lacks phy-mode and fixed-link arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts - lacks phy-mode and fixed-link arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts - lacks phy-mode arch/arm/boot/dts/kirkwood-dir665.dts - lacks phy-mode arch/arm/boot/dts/kirkwood-rd88f6281.dtsi - lacks phy-mode arch/arm/boot/dts/orion5x-netgear-wnr854t.dts - lacks phy-mode and fixed-link arch/arm/boot/dts/armada-388-clearfog.dts - lacks phy-mode arch/arm/boot/dts/armada-xp-linksys-mamba.dts - lacks phy-mode arch/arm/boot/dts/armada-385-linksys.dtsi - lacks phy-mode arch/arm/boot/dts/imx6q-b450v3.dts arch/arm/boot/dts/imx6q-b850v3.dts - has a phy-handle but not a phy-mode? arch/arm/boot/dts/armada-370-rd.dts - lacks phy-mode arch/arm/boot/dts/kirkwood-linksys-viper.dts - lacks phy-mode arch/arm/boot/dts/imx51-zii-rdu1.dts - lacks phy-mode arch/arm/boot/dts/imx51-zii-scu2-mezz.dts - lacks phy-mode arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi - lacks phy-mode arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts - lacks phy-mode and fixed-link Verdict: opt out of strict DT bindings and into workarounds. ocelot ~~~~~~ compatible strings: - mscc,vsc9953-switch - felix (arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi) is a PCI device, has no compatible string 2 occurrences in mainline, both are part of SoC dtsi and complete. Verdict: opt into strict DT bindings and out of workarounds. qca8k ~~~~~ compatible strings: - qca,qca8327 - qca,qca8328 - qca,qca8334 - qca,qca8337 5 occurrences in mainline device trees, none of the descriptions are problematic. Verdict: opt into strict DT bindings and out of workarounds. realtek ~~~~~~~ compatible strings: - realtek,rtl8366rb - realtek,rtl8365mb 2 occurrences in mainline, both descriptions are fine, additionally rtl8365mb.c has a comment "The device tree firmware should also specify the link partner of the extension port - either via a fixed-link or other phy-handle." Verdict: opt into strict DT bindings and out of workarounds. rzn1_a5psw ~~~~~~~~~~ compatible strings: - renesas,rzn1-a5psw One single occurrence, part of SoC dtsi (arch/arm/boot/dts/r9a06g032.dtsi), description is fine. Verdict: opt into strict DT bindings and out of workarounds. sja1105 ~~~~~~~ Driver already validates its port OF nodes in sja1105_parse_ports_node(). Verdict: opt into strict DT bindings and out of workarounds. vsc73xx ~~~~~~~ compatible strings: - vitesse,vsc7385 - vitesse,vsc7388 - vitesse,vsc7395 - vitesse,vsc7398 2 occurrences in mainline device trees, both descriptions are fine. Verdict: opt into strict DT bindings and out of workarounds. xrs700x ~~~~~~~ compatible strings: - arrow,xrs7003e - arrow,xrs7003f - arrow,xrs7004e - arrow,xrs7004f no occurrences in mainline, we don't know. Verdict: opt out of strict DT bindings and into workarounds. Because there is a pattern where newly added switches reuse existing drivers more often than introducing new ones, I've opted for deciding who gets to opt into the workaround based on an OF compatible match table in the DSA core. The alternative would have been to add another boolean property to struct dsa_switch, like configure_vlan_while_not_filtering. But this avoids situations where sometimes driver maintainers obfuscate what goes on by sharing a common probing function, and therefore making new switches inherit old quirks. Side note, we also warn about missing properties for drivers that rely on the workaround. This isn't an indication that we'll break compatibility with those DT blobs any time soon, but is rather done to raise awareness about the change, for future DT blob authors. Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Acked-by: Alvin Šipraga <alsi@bang-olufsen.dk> # realtek Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 17:45:47 -07:00
Vladimir Oltean	770375ff33	net: dsa: rename dsa_port_link_{,un}register_of There is a subset of functions that applies only to shared (DSA and CPU) ports, yet this is difficult to comprehend by looking at their code alone. These are dsa_port_link_register_of(), dsa_port_link_unregister_of(), and the functions that only these 2 call. Rename this class of functions to dsa_shared_port_* to make this fact more evident, even if this goes against the apparent convention that function names in port.c must start with dsa_port_. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 17:45:47 -07:00
Vladimir Oltean	da2c398e59	net: dsa: avoid dsa_port_link_{,un}register_of() calls with platform data dsa_port_link_register_of() and dsa_port_link_unregister_of() are not written with the fact in mind that they can be called with a dp->dn that is NULL (as evidenced even by the _of suffix in their name), but this is exactly what happens. How this behaves will differ depending on whether the backing driver implements ->adjust_link() or not. If it doesn't, the "if (of_phy_is_fixed_link(dp->dn) \|\| phy_np)" condition will return false, and dsa_port_link_register_of() will do nothing and return 0. If the driver does implement ->adjust_link(), the "if (of_phy_is_fixed_link(dp->dn))" condition will return false (dp->dn is NULL) and we will call dsa_port_setup_phy_of(). This will call dsa_port_get_phy_device(), which will also return NULL, and we will also do nothing and return 0. It is hard to maintain this code and make future changes to it in this state, so just suppress calls to these 2 functions if dp->dn is NULL. The only functional effect is that if the driver does implement ->adjust_link(), we'll stop printing this to the console: Using legacy PHYLIB callbacks. Please migrate to PHYLINK! but instead we'll always print: [ 8.539848] dsa-loop fixed-0:1f: skipping link registration for CPU port 5 This is for the better anyway, since "using legacy phylib callbacks" was misleading information - we weren't issuing _any_ callbacks due to dsa_port_get_phy_device() returning NULL. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-22 17:45:47 -07:00
Vladimir Oltean	211987f3ac	net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: `fd36454131` ("net: dsa: change scope of STP state setter") Reported-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220816201445.1809483-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 21:58:22 -07:00
Xie Shaowen	062cf5ebc2	net: dsa: Fix spelling mistakes and cleanup code fix follow spelling misktakes: desconstructed ==> deconstructed enforcment ==> enforcement Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Link: https://lore.kernel.org/r/20220730092254.3102875-1-studentxswpy@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-01 12:23:06 -07:00
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00
Vladimir Oltean	c7560d1203	net: dsa: fix reference counting for LAG FDBs Due to an invalid conflict resolution on my side while working on 2 different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add() does not store the database associated with a dsa_mac_addr structure. So after adding an FDB entry associated with a LAG, dsa_mac_addr_find() fails to find it while deleting it, because &a->db is zeroized memory for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del() returns -ENOENT rather than deleting the entry. Fixes: `c26933639b` ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-25 19:37:06 -07:00
Jakub Kicinski	6e0e846ee2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-21 13:03:39 -07:00
Vladimir Oltean	1699b4d502	net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering The "ds" iterator variable used in dsa_port_reset_vlan_filtering() -> dsa_switch_for_each_port() overwrites the "dp" received as argument, which is later used to call dsa_port_vlan_filtering() proper. As a result, switches which do enter that code path (the ones with vlan_filtering_is_global=true) will dereference an invalid dp in dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge. Use a dedicated "other_dp" iterator variable to avoid this from happening. Fixes: `d0004a020b` ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:14:23 -07:00
Vladimir Oltean	4db2a5ef4c	net: dsa: fix dsa_port_vlan_filtering when global The blamed refactoring commit changed a "port" iterator with "other_dp", but still looked at the slave_dev of the dp outside the loop, instead of other_dp->slave from the loop. As a result, dsa_port_vlan_filtering() would not call dsa_slave_manage_vlan_filtering() except for the port in cause, and not for all switch ports as expected. Fixes: `d0004a020b` ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Reported-by: Lucian Banu <Lucian.Banu@westermo.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:14:23 -07:00
Prasanna Vengateshan	092f875131	net: dsa: tag_ksz: add tag handling for Microchip LAN937x The Microchip LAN937X switches have a tagging protocol which is very similar to KSZ tagging. So that the implementation is added to tag_ksz.c and reused common APIs Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-02 16:34:05 +01:00
Oleksij Rempel	3d410403a5	net: dsa: add get_pause_stats support Add support for pause stats Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-29 20:17:11 -07:00
Clément Léger	a08d6a6dc8	net: dsa: add Renesas RZ/N1 switch tag driver The switch that is present on the Renesas RZ/N1 SoC uses a specific VLAN value followed by 6 bytes which contains forwarding configuration. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-27 11:37:55 +01:00
Clément Léger	67f38b1c73	net: dsa: add support for ethtool get_rmon_stats() Add support to allow dsa drivers to specify the .get_rmon_stats() operation. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-27 11:37:55 +01:00
Clément Léger	1c6e8088d9	net: dsa: allow port_bridge_join() to override extack message Some drivers might report that they are unable to bridge ports by returning -EOPNOTSUPP, but still wants to override extack message. In order to do so, in dsa_slave_changeupper(), if port_bridge_join() returns -EOPNOTSUPP, check if extack message is set and if so, do not override it. Signed-off-by: Clément Léger <clement.leger@bootlin.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-06-27 11:37:54 +01:00
Eric Dumazet	9962acefbc	net: adopt u64_stats_t in struct pcpu_sw_netstats As explained in commit `316580b69d` ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-09 21:53:11 -07:00
Luiz Angelo Daros de Luca	fe7324b932	net: dsa: OF-ware slave_mii_bus If found, register the DSA internally allocated slave_mii_bus with an OF "mdio" child object. It can save some drivers from creating their custom internal MDIO bus. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-23 12:27:53 +01:00
Vladimir Oltean	bacf93b056	net: dsa: remove port argument from ->change_tag_protocol() DSA has not supported (and probably will not support in the future either) independent tagging protocols per CPU port. Different switch drivers have different requirements, some may need to replicate some settings for each CPU port, some may need to apply some settings on a single CPU port, while some may have to configure some global settings and then some per-CPU-port settings. In any case, the current model where DSA calls ->change_tag_protocol for each CPU port turns out to be impractical for drivers where there are global things to be done. For example, felix calls dsa_tag_8021q_register(), which makes no sense per CPU port, so it suppresses the second call. Let drivers deal with replication towards all CPU ports, and remove the CPU port argument from the function prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:38:55 -07:00
Vladimir Oltean	72c3b0c735	net: dsa: felix: manage host flooding using a specific driver callback At the time - commit `7569459a52` ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:38:55 -07:00
Jakub Kicinski	9b19e57a3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c `54fccfdd7c` ("sfc: efx_default_channel_type APIs can be static") `49e6123c65` ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-12 16:15:30 -07:00
Vladimir Oltean	630fd4822a	net: dsa: flush switchdev workqueue on bridge join error path There is a race between switchdev_bridge_port_offload() and the dsa_port_switchdev_sync_attrs() call right below it. When switchdev_bridge_port_offload() finishes, FDB entries have been replayed by the bridge, but are scheduled for deferred execution later. However dsa_port_switchdev_sync_attrs -> dsa_port_can_apply_vlan_filtering() may impose restrictions on the vlan_filtering attribute and refuse offloading. When this happens, the delayed FDB entries will dereference dp->bridge, which is a NULL pointer because we have stopped the process of offloading this bridge. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Workqueue: dsa_ordered dsa_slave_switchdev_event_work pc : dsa_port_bridge_host_fdb_del+0x64/0x100 lr : dsa_slave_switchdev_event_work+0x130/0x1bc Call trace: dsa_port_bridge_host_fdb_del+0x64/0x100 dsa_slave_switchdev_event_work+0x130/0x1bc process_one_work+0x294/0x670 worker_thread+0x80/0x460 ---[ end trace 0000000000000000 ]--- Error: dsa_core: Must first remove VLAN uppers having VIDs also present in bridge. Fix the bug by doing what we do on the normal bridge leave path as well, which is to wait until the deferred FDB entries complete executing, then exit. The placement of dsa_flush_workqueue() after switchdev_bridge_port_unoffload() guarantees that both the FDB additions and deletions on rollback are waited for. Fixes: `d7d0d423db` ("net: dsa: flush switchdev workqueue when leaving the bridge") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220507134550.1849834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-09 18:08:04 -07:00
Vladimir Oltean	fe5233b0ba	net: dsa: delete dsa_port_walk_{fdbs,mdbs} All the users of these functions are gone, delete them before they gain new ones. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-06 21:00:12 -07:00
Jakub Kicinski	0e55546b18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/netdevice.h net/core/dev.c `6510ea973d` ("net: Use this_cpu_inc() to increment net->core_stats") `794c24e992` ("net-core: rx_otherhost_dropped to core_stats") https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/ drivers/net/wan/cosa.c `d48fea8401` ("net: cosa: fix error check return value of register_chrdev()") `89fbca3307` ("net: wan: remove support for COSA and SRP synchronous serial boards") https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-28 13:02:01 -07:00
Marcin Wojtas	df1cc21152	net: dsa: remove unused headers Reduce a number of included headers to a necessary minimum. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-25 12:08:40 +01:00
Vladimir Oltean	7c762e70c5	net: dsa: flood multicast to CPU when slave has IFF_PROMISC Certain DSA switches can eliminate flooding to the CPU when none of the ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call which normally comes from the bridge driver via switchdev. The bridge port flags and IFF_PROMISC\|IFF_ALLMULTI have slightly different semantics, and due to inattention/lack of proper testing, the IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but not unknown multicast. This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means that packets should not be filtered regardless of their MAC DA. Fixes: `7569459a52` ("net: dsa: manage flooding on the CPU ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-25 11:46:24 +01:00
Miaoqian Lin	fc06b2867f	net: dsa: Add missing of_node_put() in dsa_port_link_register_of The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done. of_node_put() will check for NULL value. Fixes: `a20f997010` ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-22 13:06:58 +01:00
Paolo Abeni	f70925bf99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/microchip/lan966x/lan966x_main.c `d08ed85256` ("net: lan966x: Make sure to release ptp interrupt") `c834963932` ("net: lan966x: Add FDMA functionality") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-22 09:56:00 +02:00
Vladimir Oltean	be6ff9665d	net: dsa: don't emit targeted cross-chip notifiers for MTU change A cross-chip notifier with "targeted_match=true" is one that matches only the local port of the switch that emitted it. In other words, passing through the cross-chip notifier layer serves no purpose. Eliminate this concept by calling directly ds->ops->port_change_mtu instead of emitting a targeted cross-chip notifier. This leaves the DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU port, which need to be reflected also across all DSA links. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	4715029fa7	net: dsa: drop dsa_slave_priv from dsa_slave_change_mtu We can get a hold of the "ds" pointer directly from "dp", no need for the dsa_slave_priv. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	cf1c39d3b3	net: dsa: avoid one dsa_to_port() in dsa_slave_change_mtu We could retrieve the cpu_dp pointer directly from the "dp" we already have, no need to resort to dsa_to_port(ds, port). This change also removes the need for an "int port", so that is also deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	b2033a05a7	net: dsa: use dsa_tree_for_each_user_port in dsa_slave_change_mtu Use the more conventional iterator over user ports instead of explicitly ignoring them, and use the more conventional name "other_dp" instead of "dp_iter", for readability. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	726816a129	net: dsa: make cross-chip notifiers more efficient for host events To determine whether a given port should react to the port targeted by the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match() look at the positioning of the switch port currently executing the notifier relative to the switch port for which the notifier was emitted. To maintain stylistic compatibility with the other match functions from switch.c, the host address and host VLAN match functions take the notifier information about targeted port, switch and tree indices as argument. However, these functions only use that information to retrieve the struct dsa_port *targeted_dp, which is an invariant for the outer loop that calls them. So it makes more sense to calculate the targeted dp only once, and pass it to them as argument. But furthermore, the targeted dp is actually known at the time the call to dsa_port_notify() is made. It is just that we decide to only save the indices of the port, switch and tree in the notifier structure, just to retrace our steps and find the dp again using dsa_switch_find() and dsa_to_port(). But both the above functions are relatively expensive, since they need to iterate through lists. It appears more straightforward to make all notifiers just pass the targeted dp inside their info structure, and have the code that needs the indices to look at info->dp->index instead of info->port, or info->dp->ds->index instead of info->sw_index, or info->dp->ds->dst->index instead of info->tree_index. For the sake of consistency, all cross-chip notifiers are converted to pass the "dp" directly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	8e9e678e47	net: dsa: move reset of VLAN filtering to dsa_port_switchdev_unsync_attrs In dsa_port_switchdev_unsync_attrs() there is a comment that resetting the VLAN filtering isn't done where it is expected. And since commit `108dc8741c` ("net: dsa: Avoid cross-chip syncing of VLAN filtering"), there is no reason to handle this in switch.c either. Therefore, move the logic to port.c, and adapt it slightly to the data structures and naming conventions from there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Kurt Kanzenbach	0763120b09	net: dsa: hellcreek: Calculate checksums in tagger In case the checksum calculation is offloaded to the DSA master network interface, it will include the switch trailing tag. As soon as the switch strips that tag on egress, the calculated checksum is wrong. Therefore, add the checksum calculation to the tagger (if required) before adding the switch tag. This way, the hellcreek code works with all DSA master interfaces regardless of their declared feature set. Fixes: `01ef09caad` ("net: dsa: Add tag handling for Hirschmann Hellcreek switches") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 09:49:38 +02:00
Vladimir Oltean	762c2998c9	Revert "net: dsa: setup master before ports" This reverts commit `11fd667dac`. dsa_slave_change_mtu() updates the MTU of the DSA master and of the associated CPU port, but only if it detects a change to the master MTU. The blamed commit in the Fixes: tag below addressed a regression where dsa_slave_change_mtu() would return early and not do anything due to ds->ops->port_change_mtu() not being implemented. However, that commit also had the effect that the master MTU got set up to the correct value by dsa_master_setup(), but the associated CPU port's MTU did not get updated. This causes breakage for drivers that rely on the ->port_change_mtu() DSA call to account for the tagging overhead on the CPU port, and don't set up the initial MTU during the setup phase. Things actually worked before because they were in a fragile equilibrium where dsa_slave_change_mtu() was called before dsa_master_setup() was. So dsa_slave_change_mtu() could actually detect a change and update the CPU port MTU too. Restore the code to the way things used to work by reverting the reorder of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did not have a concrete motivation going for it anyway, it just looked better. Fixes: `066dfc4290` ("Revert "net: dsa: stop updating master MTU from master.c"") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:38:06 +01:00
Vladimir Oltean	066dfc4290	Revert "net: dsa: stop updating master MTU from master.c" This reverts commit `a1ff94c297`. Switch drivers that don't implement ->port_change_mtu() will cause the DSA master to remain with an MTU of 1500, since we've deleted the other code path. In turn, this causes a regression for those systems, where MTU-sized traffic can no longer be terminated. Revert the change taking into account the fact that rtnl_lock() is now taken top-level from the callers of dsa_master_setup() and dsa_master_teardown(). Also add a comment in order for it to be absolutely clear why it is still needed. Fixes: `a1ff94c297` ("net: dsa: stop updating master MTU from master.c") Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-01 11:59:01 +01:00
Jakub Kicinski	89695196f0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-23 10:53:49 -07:00
Vladimir Oltean	5077e2c8cf	net: dsa: fix missing host-filtered multicast addresses DSA ports are stacked devices, so they use dev_mc_add() to sync their address list to their lower interface (DSA master). But they are also hardware devices, so they program those addresses to hardware using the __dev_mc_add() sync and unsync callbacks. Unfortunately both cannot work at the same time, and it seems that the multicast addresses which are already present on the DSA master, like 33:33:00:00:00:01 (added by addrconf.c as in6addr_linklocal_allnodes) are synced to the master via dev_mc_sync(), but not to hardware by __dev_mc_sync(). This happens because both the dev_mc_sync() -> __hw_addr_sync_one() code path, as well as __dev_mc_sync() -> __hw_addr_sync_dev(), operate on the same variable: ha->sync_cnt, in a way that causes the "sync" method (dsa_slave_sync_mc) to no longer be called. To fix the issue we need to work with the API in the way in which it was intended to be used, and therefore, call dev_uc_add() and friends for each individual hardware address, from the sync and unsync callbacks. Fixes: `5e8a1e03aa` ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") Link: https://lore.kernel.org/netdev/20220321163213.lrn5sk7m6grighbl@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220322003701.2056895-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-22 22:19:35 -07:00
Vladimir Oltean	8fd36358ce	net: dsa: fix panic on shutdown if multi-chip tree failed to probe DSA probing is atypical because a tree of devices must probe all at once, so out of N switches which call dsa_tree_setup_routing_table() during probe, for (N - 1) of them, "complete" will return false and they will exit probing early. The Nth switch will set up the whole tree on their behalf. The implication is that for (N - 1) switches, the driver binds to the device successfully, without doing anything. When the driver is bound, the ->shutdown() method may run. But if the Nth switch has failed to initialize the tree, there is nothing to do for the (N - 1) driver instances, since the slave devices have not been created, etc. Moreover, dsa_switch_shutdown() expects that the calling @ds has been in fact initialized, so it jumps at dereferencing the various data structures, which is incorrect. Avoid the ensuing NULL pointer dereferences by simply checking whether the Nth switch has previously set "ds->setup = true" for the switch which is currently shutting down. The entire setup is serialized under dsa2_mutex which we already hold. Fixes: `0650bf52b3` ("net: dsa: be compatible with masters which unregister on shutdown") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220318195443.275026-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-21 22:31:28 -07:00
Vladimir Oltean	0148bb50b8	net: dsa: pass extack to dsa_switch_ops :: port_mirror_add() Drivers might have error messages to propagate to user space, most common being that they support a single mirror port. Propagate the netlink extack so that they can inform user space in a verbal way of their limitations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 17:42:47 -07:00
Tobias Waldekranz	7414af30b7	net: dsa: Handle MST state changes Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:59 -07:00
Tobias Waldekranz	8e6598a7b0	net: dsa: Pass VLAN MSTI migration notifications to driver Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:59 -07:00
Tobias Waldekranz	332afc4c8c	net: dsa: Validate hardware support for MST When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 16:49:58 -07:00
Jakub Kicinski	e243f39685	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-17 13:56:58 -07:00
Miaoqian Lin	cb0b430b4e	net: dsa: Add missing of_node_put() in dsa_port_parse_of The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done. Fixes: `6d4e5c570c` ("net: dsa: get port type at parse time") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-03-17 13:13:27 +01:00
Tobias Waldekranz	a860352e9d	net: dsa: Never offload FDB entries on standalone ports If a port joins a bridge that it can't offload, it will fallback to standalone mode and software bridging. In this case, we never want to offload any FDB entries to hardware either. Previously, for host addresses, we would eventually end up in dsa_port_bridge_host_fdb_add, which would unconditionally dereference dp->bridge and cause a segfault. Fixes: `c26933639b` ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220315233033.1468071-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-16 19:36:56 -07:00
Vladimir Oltean	47d75f7822	net: dsa: report and change port dscp priority using dcbnl Similar to the port-based default priority, IEEE 802.1Q-2018 allows the Application Priority Table to define QoS classes (0 to 7) per IP DSCP value (0 to 63). In the absence of an app table entry for a packet with DSCP value X, QoS classification for that packet falls back to other methods (VLAN PCP or port-based default). The presence of an app table for DSCP value X with priority Y makes the hardware classify the packet to QoS class Y. As opposed to the default-prio where DSA exposes only a "set" in dsa_switch_ops (because the port-based default is the fallback, it always exists, either implicitly or explicitly), for DSCP priorities we expose an "add" and a "del". The addition of a DSCP entry means trusting that DSCP priority, the deletion means ignoring it. Drivers that already trust (at least some) DSCP values can describe their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is called for each DSCP value from 0 to 63. Again, there can be more than one dcbnl app table entry for the same DSCP value, DSA chooses the one with the largest configured priority. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-14 10:36:15 +00:00
Vladimir Oltean	d538eca85c	net: dsa: report and change port default priority using dcbnl The port-based default QoS class is assigned to packets that lack a VLAN PCP (or the port is configured to not trust the VLAN PCP), an IP DSCP (or the port is configured to not trust IP DSCP), and packets on which no tc-skbedit action has matched. Similar to other drivers, this can be exposed to user space using the DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table D-8 - Sel field values that when the Selector is 1, the Protocol ID value of 0 denotes the "Default application priority. For use when application priority is not otherwise specified." The way in which the dcbnl integration in DSA has been designed has to do with its requirements. Andrew Lunn explains that SOHO switches are expected to come with some sort of pre-configured QoS profile, and that it is desirable for this to come pre-loaded into the DSA slave interfaces' DCB application priority table. In the dcbnl design, this is possible because calls to dcb_ieee_setapp() can be initiated by anyone including being self-initiated by this device driver. However, what makes this challenging to implement in DSA is that the DSA core manages the net_devices (effectively hiding them from drivers), while drivers manage the hardware. The DSA core has no knowledge of what individual drivers' QoS policies are. DSA could export to drivers a wrapper over dcb_ieee_setapp() and these could call that function to pre-populate the app priority table, however drivers don't have a good moment in time to do this. The dsa_switch_ops :: setup() method gets called before the net_devices are created (dsa_slave_create), and so is dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops :: port_enable(), but this gets called upon each ndo_open. If we add app table entries on every open, we'd need to remove them on close, to avoid duplicate entry errors. But if we delete app priority entries on close, what we delete may not be the initial, driver pre-populated entries, but rather user-added entries. So it is clear that letting drivers choose the timing of the dcb_ieee_setapp() call is inappropriate. The alternative which was chosen is to introduce hardware-specific ops in dsa_switch_ops, and effectively hide dcbnl details from drivers as well. For pre-populating the application table, dsa_slave_dcbnl_init() will call ds->ops->port_get_default_prio() which is supposed to read from hardware. If the operation succeeds, DSA creates a default-prio app table entry. The method is called as soon as the slave_dev is registered, but before we release the rtnl_mutex. This is done such that user space sees the app table entries as soon as it sees the interface being registered. The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is NULL. Because there are still dcbnl-unaware DSA drivers even if they have dcbnl_ops populated, the way to restore the behavior is to make all dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific dsa_switch_ops method. The dcbnl framework absurdly allows there to be more than one app table entry for the same selector and protocol (in other words, more than one port-based default priority). In the iproute2 dcb program, there is a "replace" syntactical sugar command which performs an "add" and a "del" to hide this away. But we choose the largest configured priority when we call ds->ops->port_set_default_prio(), using __fls(). When there is no default-prio app table entry left, the port-default priority is restored to 0. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-14 10:36:15 +00:00
Jakub Kicinski	1e8a3f0d2a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/dsa/dsa2.c commit `afb3cc1a39` ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails") commit `e83d565378` ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master") https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/ drivers/net/ethernet/intel/ice/ice.h commit `97b0129146` ("ice: Fix error with handling of bonding MTU") commit `43113ff734` ("ice: add TTY for GNSS module for E810T device") https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/ drivers/staging/gdm724x/gdm_lte.c commit `fc7f750dc9` ("staging: gdm724x: fix use after free in gdm_lte_rx()") commit `4bcc4249b4` ("staging: Use netif_rx().") https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 17:16:56 -08:00
Luiz Angelo Daros de Luca	3126b731ce	net: dsa: tag_rtl8_4: fix typo in modalias name DSA_TAG_PROTO_RTL8_4L is not defined. It should be DSA_TAG_PROTO_RTL8_4T. Fixes: `cd87fecded` ("net: dsa: tag_rtl8_4: add rtl8_4t trailing variant") Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220309175641.12943-1-luizluca@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-09 20:36:24 -08:00
Vladimir Oltean	7e580490ac	net: dsa: felix: avoid early deletion of host FDB entries The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: `54c3198460` ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-09 11:12:10 +00:00
Vladimir Oltean	e2d0576f0c	net: dsa: be mostly no-op in dsa_slave_set_mac_address when down Since the slave unicast address is synced to hardware and to the DSA master during dsa_slave_open(), this means that a call to dsa_slave_set_mac_address() while the slave interface is down will result to a call to dsa_port_standalone_host_fdb_del() and to dev_uc_del() for the MAC address while there was no previous dsa_port_standalone_host_fdb_add() or dev_uc_add(). This is a partial revert of the blamed commit below, which was too aggressive. Fixes: `35aae5ab91` ("net: dsa: remove workarounds for changing master promisc/allmulti only while up") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-09 11:12:09 +00:00
Vladimir Oltean	fe95784fb1	net: dsa: move port lists initialization to dsa_port_touch &cpu_db->fdbs and &cpu_db->mdbs may be uninitialized lists during some call paths of felix_set_tag_protocol(). There was an attempt to avoid calling dsa_port_walk_fdbs() during setup by using a "bool change" in the felix driver, but this doesn't work when the tagging protocol is defined in the device tree, and a change is triggered by DSA at pseudo-runtime: dsa_tree_setup_switches -> dsa_switch_setup -> dsa_switch_setup_tag_protocol -> ds->ops->change_tag_protocol dsa_tree_setup_ports -> dsa_port_setup -> &dp->fdbs and &db->mdbs only get initialized here So it seems like the only way to fix this is to move the initialization of these lists earlier. dsa_port_touch() is called from dsa_switch_touch_ports() which is called from dsa_switch_parse_of(), and this runs completely before dsa_tree_setup(). Similarly, dsa_switch_release_ports() runs after dsa_tree_teardown(). Fixes: `f9cef64fa2` ("net: dsa: felix: migrate host FDB and MDB entries when changing tag proto") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-09 11:12:09 +00:00
Vladimir Oltean	0832cd9f1f	net: dsa: warn if port lists aren't empty in dsa_port_teardown There has been recent work towards matching each switchdev object addition with a corresponding deletion. Therefore, having elements in the fdbs, mdbs, vlans lists at the time of a shared (DSA, CPU) port's teardown is indicative of a bug somewhere else, and not something that is to be expected. We shouldn't try to silently paper over that. Instead, print a warning and a stack trace. This change is a prerequisite for moving the initialization/teardown of these lists. Make it clear that clearing the lists isn't needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-09 11:12:09 +00:00
Tobias Waldekranz	6c43a920a5	net: dsa: tag_dsa: Fix tx from VLAN uppers on non-filtering bridges In this situation (VLAN filtering disabled on br0): br0.10 / br0 / \ swp0 swp1 When a frame is transmitted from the VLAN upper, the bridge will send it down to one of the switch ports with forward offloading enabled. This will cause tag_dsa to generate a FORWARD tag. Before this change, that tag would have it's VID set to 10, even though VID 10 is not loaded in the VTU. Before the blamed commit, the frame would trigger a VTU miss and be forwarded according to the PVT configuration. Now that all fabric ports are in 802.1Q secure mode, the frame is dropped instead. Therefore, restrict the condition under which we rewrite an 802.1Q tag to a DSA tag. On standalone port's, reuse is always safe since we will always generate FROM_CPU tags in that case. For bridged ports though, we must ensure that VLAN filtering is enabled, which in turn guarantees that the VID in question is loaded into the VTU. Fixes: `d352b20f41` ("net: dsa: mv88e6xxx: Improve multichip isolation of standalone ports") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220307110548.812455-1-tobias@waldekranz.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-03-08 11:12:28 +01:00
Tom Rix	cd5169841c	net: dsa: return success if there was nothing to do Clang static analysis reports this representative issue dsa.c:486:2: warning: Undefined or garbage value returned to caller return err; ^~~~~~~~~~ err is only set in the loop. If the loop is empty, garbage will be returned. So initialize err to 0 to handle this noop case. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-07 12:25:59 +00:00
Vladimir Oltean	afb3cc1a39	net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails After the blamed commit, dsa_tree_setup_master() may exit without calling rtnl_unlock(), fix that. Fixes: `c146f9bc19` ("net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-06 10:55:54 +00:00
Luiz Angelo Daros de Luca	cd87fecded	net: dsa: tag_rtl8_4: add rtl8_4t trailing variant Realtek switches supports the same tag both before ethertype or between payload and the CRC. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-05 11:04:25 +00:00
Jakub Kicinski	80901bff81	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/batman-adv/hard-interface.c commit `690bb6fb64` ("batman-adv: Request iflink once in batadv-on-batadv check") commit `6ee3c393ee` ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit `4d08b7b57e` ("net/smc: Fix cleanup when register ULP fails") commit `462791bbfa` ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 11:55:12 -08:00
Vladimir Oltean	e1bec7fa1c	net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change The blamed commit said one thing but did another. It explains that we should restore the "return err" to the original "goto out_unwind_tagger", but instead it replaced it with "goto out_unlock". When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a multi-switch tree, the switches would end up not using the same tagging protocol. Fixes: `0b0e2ff103` ("net: dsa: restore error path of dsa_tree_change_tag_proto") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:39:12 -08:00
Vladimir Oltean	f9cef64fa2	net: dsa: felix: migrate host FDB and MDB entries when changing tag proto The "ocelot" and "ocelot-8021q" tagging protocols make use of different hardware resources, and host FDB entries have different destination ports in the switch analyzer module, practically speaking. So when the user requests a tagging protocol change, the driver must migrate all host FDB and MDB entries from the NPI port (in fact CPU port module) towards the same physical port, but this time used as a regular port. It is pointless for the felix driver to keep a copy of the host addresses, when we can create and export DSA helpers for walking through the addresses that it already needs to keep on the CPU port, for refcounting purposes. felix_classify_db() is moved up to avoid a forward declaration. We pass "bool change" because dp->fdbs and dp->mdbs are uninitialized lists when felix_setup() first calls felix_set_tag_protocol(), so we need to avoid calling dsa_port_walk_fdbs() during probe time. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	7569459a52	net: dsa: manage flooding on the CPU ports DSA can treat IFF_PROMISC and IFF_ALLMULTI on standalone user ports as signifying whether packets with an unknown MAC DA will be received or not. Since known MAC DAs are handled by FDB/MDB entries, this means that promiscuity is analogous to including/excluding the CPU port from the flood domain of those packets. There are two ways to signal CPU flooding to drivers. The first (chosen here) is to synthesize a call to ds->ops->port_bridge_flags() for the CPU port, with a mask of BR_FLOOD \| BR_MCAST_FLOOD. This has the effect of turning on egress flooding on the CPU port regardless of source. The alternative would be to create a new ds->ops->port_host_flood() which is called per user port. Some switches (sja1105) have a flood domain that is managed per {ingress port, egress port} pair, so it would make more sense for this kind of switch to not flood the CPU from port A if just port B requires it. Nonetheless, the sja1105 has other quirks that prevent it from making use of unicast filtering, and without a concrete user making use of this feature, I chose not to implement it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	499aa9e1b3	net: dsa: install the primary unicast MAC address as standalone port host FDB To be able to safely turn off CPU flooding for standalone ports, we need to ensure that the dev_addr of each DSA slave interface is installed as a standalone host FDB entry for compatible switches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	5e8a1e03aa	net: dsa: install secondary unicast and multicast addresses as host FDB/MDB In preparation of disabling flooding towards the CPU in standalone ports mode, identify the addresses requested by upper interfaces and use the new API for DSA FDB isolation to request the hardware driver to offload these as FDB or MDB objects. The objects belong to the user port's database, and are installed pointing towards the CPU port. Because dev_uc_add()/dev_mc_add() is VLAN-unaware, we offload to the port standalone database addresses with VID 0 (also VLAN-unaware). So this excludes switches with global VLAN filtering from supporting unicast filtering, because there, it is possible for a port of a switch to join a VLAN-aware bridge, and this changes the VLAN awareness of standalone ports, requiring VLAN-aware standalone host FDB entries. For the same reason, hellcreek, which requires VLAN awareness in standalone mode, is also exempted from unicast filtering. We create "standalone" variants of dsa_port_host_fdb_add() and dsa_port_host_mdb_add() (and the _del coresponding functions). We also create a separate work item type for handling deferred standalone host FDB/MDB entries compared to the switchdev one. This is done for the purpose of clarity - the procedure for offloading a bridge FDB entry is different than offloading a standalone one, and the switchdev event work handles only FDBs anyway, not MDBs. Deferral is needed for standalone entries because ndo_set_rx_mode runs in atomic context. We could probably optimize things a little by first queuing up all entries that need to be offloaded, and scheduling the work item just once, but the data structures that we can pass through __dev_uc_sync() and __dev_mc_sync() are limiting (there is nothing like a void *priv), so we'd have to keep the list of queued events somewhere in struct dsa_switch, and possibly a lock for it. Too complicated for now. Adding the address to the master is handled by dev_uc_sync(), adding it to the hardware is handled by __dev_uc_sync(). So this is the reason why dsa_port_standalone_host_fdb_add() does not call dev_uc_add(). Not that it had the rtnl_mutex anyway - ndo_set_rx_mode has it, but is atomic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	68d6d71eaf	net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespace We are preparing to add API in port.c that adds FDB and MDB entries that correspond to the port's standalone database. Rename the existing methods to make it clear that the FDB and MDB entries offloaded come from the bridge database. Since the function names lengthen in dsa_slave_switchdev_event_work(), we place "addr" and "vid" in temporary variables, to shorten those. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	35aae5ab91	net: dsa: remove workarounds for changing master promisc/allmulti only while up Lennert Buytenhek explains in commit `df02c6ff2e` ("dsa: fix master interface allmulti/promisc handling"), dated Nov 2008, that changing the promiscuity of interfaces that are down (here the master) is broken. This fact regarding promisc/allmulti has changed since commit `b6c40d68ff` ("net: only invoke dev->change_rx_flags when device is UP") by Vlad Yasevich, dated Nov 2013. Therefore, DSA now has unnecessary complexity to handle master state transitions from down to up. In fact, syncing the unicast and multicast addresses can happen completely asynchronously to the administrative state changes. This change reduces that complexity by effectively fully reverting commit `df02c6ff2e` ("dsa: fix master interface allmulti/promisc handling"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:15:31 +00:00
Vladimir Oltean	0b0e2ff103	net: dsa: restore error path of dsa_tree_change_tag_proto When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process which initiated the protocol change exits the kernel processing while still holding the rtnl_mutex. So any other process attempting to lock the rtnl_mutex would deadlock after such event. The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed by the blamed commit, introducing this regression. We must still call rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old protocol. The latter is due to the limiting design of notifier chains for cross-chip operations, which don't have a built-in error recovery mechanism - we should look into using notifier_call_chain_robust for that. Fixes: `dc452a471d` ("net: dsa: introduce tagger-owned storage for private and shared data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-01 18:26:21 -08:00
Vladimir Oltean	06b9cce426	net: dsa: pass extack to .port_bridge_join driver methods As FDB isolation cannot be enforced between VLAN-aware bridges in lack of hardware assistance like extra FID bits, it seems plausible that many DSA switches cannot do it. Therefore, they need to reject configurations with multiple VLAN-aware bridges from the two code paths that can transition towards that state: - joining a VLAN-aware bridge - toggling VLAN awareness on an existing bridge The .port_vlan_filtering method already propagates the netlink extack to the driver, let's propagate it from .port_bridge_join too, to make sure that the driver can use the same function for both. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:14 +00:00
Vladimir Oltean	c26933639b	net: dsa: request drivers to perform FDB isolation For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:14 +00:00
Vladimir Oltean	b6362bdf75	net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vid The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for bridge TX forwarding offload, it is the private VLAN reserved for VLAN-unaware bridging in a way that is compatible with FDB isolation. So just rename it dsa_tag_8021q_bridge_vid. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:14 +00:00
Vladimir Oltean	04b67e18ce	net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:14 +00:00
Vladimir Oltean	d7f9787a76	net: dsa: tag_8021q: add support for imprecise RX based on the VBID The sja1105 switch can't populate the PORT field of the tag_8021q header when sending a frame to the CPU with a non-zero VBID. Similar to dsa_find_designated_bridge_port_by_vid() which performs imprecise RX for VLAN-aware bridges, let's introduce a helper in tag_8021q for performing imprecise RX based on the VLAN that it has allocated for a VLAN-unaware bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:13 +00:00
Vladimir Oltean	91495f21fc	net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too tied with the sja1105 switch: each port uses the same pvid which is also used for standalone operation (a unique one from which the source port and device ID can be retrieved when packets from that port are forwarded to the CPU). Since each port has a unique pvid when performing autonomous forwarding, the switch must be configured for Shared VLAN Learning (SVL) such that the VLAN ID itself is ignored when performing FDB lookups. Without SVL, packets would always be flooded, since FDB lookup in the source port's VLAN would never find any entry. First of all, to make tag_8021q more palatable to switches which might not support Shared VLAN Learning, let's just use a common VLAN for all ports that are under the same bridge. Secondly, using Shared VLAN Learning means that FDB isolation can never be enforced. But if all ports under the same VLAN-unaware bridge share the same VLAN ID, it can. The disadvantage is that the CPU port can no longer perform precise source port identification for these packets. But at least we have a mechanism which has proven to be adequate for that situation: imprecise RX (dsa_find_designated_bridge_port_by_vid), which is what we use for termination on VLAN-aware bridges. The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the same one as we were previously using for imprecise TX (bridge TX forwarding offload). It is already allocated, it is just a matter of using it. Note that because now all ports under the same bridge share the same VLAN, the complexity of performing a tag_8021q bridge join decreases dramatically. We no longer have to install the RX VLAN of a newly joining port into the port membership of the existing bridge ports. The newly joining port just becomes a member of the VLAN corresponding to that bridge, and the other ports are already members of it from when they joined the bridge themselves. So forwarding works properly. This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put these calls directly into the sja1105 driver. With this new mode of operation, a port controlled by tag_8021q can have two pvids whereas before it could only have one. The pvid for standalone operation is different from the pvid used for VLAN-unaware bridging. This is done, again, so that FDB isolation can be enforced. Let tag_8021q manage this by deleting the standalone pvid when a port joins a bridge, and restoring it when it leaves it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-27 11:06:13 +00:00
Vladimir Oltean	e212fa7c54	net: dsa: support FDB events on offloaded LAG interfaces This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as support for filtering towards the CPU local FDB entries emitted for LAG interfaces that are bridge ports. Conceptually, host addresses on LAG ports are identical to what we do for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply be replicated towards all member ports like we do for multicast, or VLAN. Instead we need new driver API. Hardware usually considers a LAG to be a "logical port", and sets the entire LAG as the forwarding destination. The physical egress port selection within the LAG is made by hashing policy, as usual. To represent the logical port corresponding to the LAG, we pass by value a copy of the dsa_lag structure to all switches in the tree that have at least one port in that LAG. To illustrate why a refcounted list of FDB entries is needed in struct dsa_lag, it is enough to say that: - a LAG may be a bridge port and may therefore receive FDB events even while it isn't yet offloaded by any DSA interface - DSA interfaces may be removed from a LAG while that is a bridge port; we don't want FDB entries lingering around, but we don't want to remove entries that are still in use, either For all the cases below to work, the idea is to always keep an FDB entry on a LAG with a reference count equal to the DSA member ports. So: - if a port joins a LAG, it requests the bridge to replay the FDB, and the FDB entries get created, or their refcount gets bumped by one - if a port leaves a LAG, the FDB replay deletes or decrements refcount by one - if an FDB is installed towards a LAG with ports already present, that entry is created (if it doesn't exist) and its refcount is bumped by the amount of ports already present in the LAG echo "Adding FDB entry to bond with existing ports" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond, then removing ports one by one" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link set swp1 nomaster ip link set swp2 nomaster ip link del br0 ip link del bond0 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:44 -08:00
Vladimir Oltean	93c798230a	net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_dev When switchdev_handle_fdb_event_to_device() replicates a FDB event emitted for the bridge or for a LAG port and DSA offloads that, we should notify back to switchdev that the FDB entry on the original device is what was offloaded, not on the DSA slave devices that the event is replicated on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	e35f12e993	net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_work By construction, the struct net_device *dev passed to dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work is always a DSA slave device. Therefore, it is redundant to pass struct dsa_switch and int port information in the deferred work structure. This can be retrieved at all times from the provided struct net_device via dsa_slave_to_port(). For the same reason, we can drop the dsa_is_user_port() check in dsa_fdb_offload_notify(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	ec638740fc	net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device When the switchdev_handle_fdb_event_to_device() event replication helper was created, my original thought was that FDB events on LAG interfaces should most likely be special-cased, not just replicated towards all switchdev ports beneath that LAG. So this replication helper currently does not recurse through switchdev lower interfaces of LAG bridge ports, but rather calls the lag_mod_cb() if that was provided. No switchdev driver uses this helper for FDB events on LAG interfaces yet, so that was an assumption which was yet to be tested. It is certainly usable for that purpose, as my RFC series shows: https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/ however this approach is slightly convoluted because: - the switchdev driver gets a "dev" that isn't its own net device, but rather the LAG net device. It must call switchdev_lower_dev_find(dev) in order to get a handle of any of its own net devices (the ones that pass check_cb). - in order for FDB entries on LAG ports to be correctly refcounted per the number of switchdev ports beneath that LAG, we haven't escaped the need to iterate through the LAG's lower interfaces. Except that is now the responsibility of the switchdev driver, because the replication helper just stopped half-way. So, even though yes, FDB events on LAG bridge ports must be special-cased, in the end it's simpler to let switchdev_handle_fdb_* just iterate through the LAG port's switchdev lowers, and let the switchdev driver figure out that those physical ports are under a LAG. The switchdev_handle_fdb_event_to_device() helper takes a "foreign_dev_check" callback so it can figure out whether @dev can autonomously forward to @foreign_dev. DSA fills this method properly: if the LAG is offloaded by another port in the same tree as @dev, then it isn't foreign. If it is a software LAG, it is foreign - forwarding happens in software. Whether an interface is foreign or not decides whether the replication helper will go through the LAG's switchdev lowers or not. Since the lan966x doesn't properly fill this out, FDB events on software LAG uppers will get called. By changing lan966x_foreign_dev_check(), we can suppress them. Whereas DSA will now start receiving FDB events for its offloaded LAG uppers, so we need to return -EOPNOTSUPP, since we currently don't do the right thing for them. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	dedd6a009f	net: dsa: create a dsa_lag structure The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:43 -08:00
Vladimir Oltean	3d4a0a2a46	net: dsa: make LAG IDs one-based The DSA LAG API will be changed to become more similar with the bridge data structures, where struct dsa_bridge holds an unsigned int num, which is generated by DSA and is one-based. We have a similar thing going with the DSA LAG, except that isn't stored anywhere, it is calculated dynamically by dsa_lag_id() by iterating through dst->lags. The idea of encoding an invalid (or not requested) LAG ID as zero for the purpose of simplifying checks in drivers means that the LAG IDs passed by DSA to drivers need to be one-based too. So back-and-forth conversion is needed when indexing the dst->lags array, as well as in drivers which assume a zero-based index. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Vladimir Oltean	46a76724e4	net: dsa: rename references to "lag" as "lag_dev" In preparation of converting struct net_device dp->lag_dev into a struct dsa_lag dp->lag, we need to rename, for consistency purposes, all occurrences of the "lag" variable in the DSA core to "lag_dev". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 21:31:42 -08:00
Jakub Kicinski	aaa25a2fa7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/mptcp/mptcp_join.sh `34aa6e3bcc` ("selftests: mptcp: add ip mptcp wrappers") `857898eb4b` ("selftests: mptcp: add missing join check") `6ef84b1517` ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c `fb7e76ea3f` ("net/mlx5e: TC, Skip redundant ct clear actions") `c63741b426` ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") `09bf979232` ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") `84ba8062e3` ("net/mlx5e: Test CT and SAMPLE on flow attr") `efe6f961cd` ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") `3b49a7edec` ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 17:54:25 -08:00
Hans Schultz	b9e8b58fd2	net: dsa: Include BR_PORT_LOCKED in the list of synced brport flags Ensures that the DSA switch driver gets notified of changes to the BR_PORT_LOCKED flag as well, for the case when a DSA port joins or leaves a LAG that is a bridge port. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-23 12:52:34 +00:00
Alvin Šipraga	342b641919	net: dsa: fix panic when removing unoffloaded port from bridge If a bridged port is not offloaded to the hardware - either because the underlying driver does not implement the port_bridge_{join,leave} ops, or because the operation failed - then its dp->bridge pointer will be NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL. This fixes the following splat when removing a port from a bridge: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1 Call trace: dsa_port_bridge_leave+0x8c/0x1e4 dsa_slave_changeupper+0x40/0x170 dsa_slave_netdevice_event+0x494/0x4d4 notifier_call_chain+0x80/0xe0 raw_notifier_call_chain+0x1c/0x24 call_netdevice_notifiers_info+0x5c/0xac __netdev_upper_dev_unlink+0xa4/0x200 netdev_upper_dev_unlink+0x38/0x60 del_nbp+0x1b0/0x300 br_del_if+0x38/0x114 add_del_if+0x60/0xa0 br_ioctl_stub+0x128/0x2dc br_ioctl_call+0x68/0xb0 dev_ifsioc+0x390/0x554 dev_ioctl+0x128/0x400 sock_do_ioctl+0xb4/0xf4 sock_ioctl+0x12c/0x4e0 __arm64_sys_ioctl+0xa8/0xf0 invoke_syscall+0x4c/0x110 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x28/0x84 el0_svc+0x1c/0x50 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x17c/0x180 Code: f9402f00 f0002261 f9401302 913cc021 (a9401404) ---[ end trace 0000000000000000 ]--- Fixes: `d3eed0e57d` ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-22 17:03:02 -08:00
Russell King (Oracle)	1054457006	net: phy: phylink: fix DSA mac_select_pcs() introduction Vladimir Oltean reports that probing on DSA drivers that aren't yet populating supported_interfaces now fails. Fix this by allowing phylink to detect whether DSA actually provides an underlying mac_select_pcs() implementation. Reported-by: Vladimir Oltean <olteanv@gmail.com> Fixes: `bde018222c` ("net: dsa: add support for phylink mac_select_pcs()") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1nMCD6-00A0wC-FG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-22 16:57:08 -08:00
Vladimir Oltean	8940e6b669	net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held If the DSA master doesn't support IFF_UNICAST_FLT, then the following call path is possible: dsa_slave_switchdev_event_work -> dsa_port_host_fdb_add -> dev_uc_add -> __dev_set_rx_mode -> __dev_set_promiscuity Since the blamed commit, dsa_slave_switchdev_event_work() no longer holds rtnl_lock(), which triggers the ASSERT_RTNL() from __dev_set_promiscuity(). Taking rtnl_lock() around dev_uc_add() is impossible, because all the code paths that call dsa_flush_workqueue() do so from contexts where the rtnl_mutex is already held - so this would lead to an instant deadlock. dev_uc_add() in itself doesn't require the rtnl_mutex for protection. There is this comment in __dev_set_rx_mode() which assumes so: /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ but it is from commit `4417da668c` ("[NET]: dev: secondary unicast address support") dated June 2007, and in the meantime, commit `f1f28aa351` ("netdev: Add addr_list_lock to struct net_device."), dated July 2008, has added &dev->addr_list_lock to protect this instead of the global rtnl_mutex. Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection, but it is the uncommon path of what we typically expect dev_uc_add() to do. So since only the uncommon path requires rtnl_lock(), just check ahead of time whether dev_uc_add() would result into a call to __dev_set_promiscuity(), and handle that condition separately. DSA already configures the master interface to be promiscuous if the tagger requires this. We can extend this to also cover the case where the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT), and on the premise that we'd end up making it promiscuous during operation anyway, either if a DSA slave has a non-inherited MAC address, or if the bridge notifies local FDB entries for its own MAC address, the address of a station learned on a foreign port, etc. Fixes: `0faf890fc5` ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-19 21:18:33 +00:00
Russell King (Oracle)	ccfbf44d4c	net: dsa: remove pcs_poll With drivers converted over to using phylink PCS, there is no need for the struct dsa_switch member "pcs_poll" to exist anymore - there is a flag in the struct phylink_pcs which indicates whether this PCS needs to be polled which supersedes this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-19 16:41:50 +00:00
Russell King (Oracle)	bde018222c	net: dsa: add support for phylink mac_select_pcs() Add DSA support for the phylink mac_select_pcs() method so DSA drivers can return provide phylink with the appropriate PCS for the PHY interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-18 11:28:32 +00:00
Vladimir Oltean	d2b1d186ce	net: dsa: delete unused exported symbols for ethtool PHY stats Introduced in commit `cf96357303` ("net: dsa: Allow providing PHY statistics from CPU port"), it appears these were never used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220216193726.2926320-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 20:07:09 -08:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Mans Rullgard	017b355bbd	net: dsa: lan9303: handle hwaccel VLAN tags Check for a hwaccel VLAN tag on rx and use it if present. Otherwise, use __skb_vlan_pop() like the other tag parsers do. This fixes the case where the VLAN tag has already been consumed by the master. Fixes: `a1292595e0` ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 09:32:06 -08:00
Vladimir Oltean	29940ce32a	net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the expectation is that TX checksumming is offloaded and not done in software. Normally the DSA master takes care of this, but packets handled by ocelot_defer_xmit() are a very special exception, because they are actually injected into the switch through register-based MMIO. So the DSA master is not involved at all for these packets => no one calculates the checksum. This allows PTP over UDP to work using the ocelot-8021q tagging protocol. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-17 14:06:51 +00:00
Vladimir Oltean	c862033595	net: dsa: tag_8021q: only call skb_push/skb_pull around __skb_vlan_pop __skb_vlan_pop() needs skb->data to point at the mac_header, while skb_vlan_tag_present() and skb_vlan_tag_get() don't, because they don't look at skb->data at all. So we can avoid uselessly moving around skb->data for the case where the VLAN tag was offloaded by the DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220215204722.2134816-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-16 20:35:35 -08:00
Vladimir Oltean	164f861bd4	net: dsa: offload bridge port VLANs on foreign interfaces DSA now explicitly handles VLANs installed with the 'self' flag on the bridge as host VLANs, instead of just replicating every bridge port VLAN also on the CPU port and never deleting it, which is what it did before. However, this leaves a corner case uncovered, as explained by Tobias Waldekranz: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 Forwarding towards a bridge port VLAN installed on a bridge port foreign to DSA (separate NIC, Wi-Fi AP) used to work by virtue of the fact that DSA itself needed to have at least one port in that VLAN (therefore, it also had the CPU port in said VLAN). However, now that the CPU port may not be member of all VLANs that user ports are members of, we need to ensure this isn't the case if software forwarding to a foreign interface is required. The solution is to treat bridge port VLANs on standalone interfaces in the exact same way as host VLANs. From DSA's perspective, there is no difference between local termination and software forwarding; packets in that VLAN must reach the CPU in both cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:05 +00:00
Vladimir Oltean	134ef2388e	net: dsa: add explicit support for host bridge VLANs Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it does so on user ports. This is good for basic functionality but has several limitations: - the VLAN group which must reach the CPU may be radically different from the VLAN group that must be autonomously forwarded by the switch. In other words, the admin may want to isolate noisy stations and avoid traffic from them going to the control processor of the switch, where it would just waste useless cycles. The bridge already supports independent control of VLAN groups on bridge ports and on the bridge itself, and when VLAN-aware, it will drop packets in software anyway if their VID isn't added as a 'self' entry towards the bridge device. - Replaying host FDB entries may depend, for some drivers like mv88e6xxx, on replaying the host VLANs as well. The 2 VLAN groups are approximately the same in most regular cases, but there are corner cases when timing matters, and DSA's approximation of replicating VLANs on shared ports simply does not work. - If a user makes the bridge (implicitly the CPU port) join a VLAN by accident, there is no way for the CPU port to isolate itself from that noisy VLAN except by rebooting the system. This is because for each VLAN added on a user port, DSA will add it on shared ports too, but for each VLAN deletion on a user port, it will remain installed on shared ports, since DSA has no good indication of whether the VLAN is still in use or not. Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN addition and removal events, DSA has a simple and straightforward task of separating the bridge port VLANs (these have an orig_dev which is a DSA slave interface, or a LAG interface) from the host VLANs (these have an orig_dev which is a bridge interface), and to keep a simple reference count of each VID on each shared port. Forwarding VLANs must be installed on the bridge ports and on all DSA ports interconnecting them. We don't have a good view of the exact topology, so we simply install forwarding VLANs on all DSA ports, which is what has been done until now. Host VLANs must be installed primarily on the dedicated CPU port of each bridge port. More subtly, they must also be installed on upstream-facing and downstream-facing DSA ports that are connecting the bridge ports and the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB entry may be absent from VTU) is still addressed even if that switch is in a cross-chip setup, and it has no local CPU port. Therefore: - user ports contain only bridge port (forwarding) VLANs, and no refcounting is necessary - DSA ports contain both forwarding and host VLANs. Refcounting is necessary among these 2 types. - CPU ports contain only host VLANs. Refcounting is also necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:05 +00:00
Vladimir Oltean	a2614140dc	net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN mv88e6xxx is special among DSA drivers in that it requires the VTU to contain the VID of the FDB entry it modifies in mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP. Sometimes due to races this is not always satisfied even if external code does everything right (first deletes the FDB entries, then the VLAN), because DSA commits to hardware FDB entries asynchronously since commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification"). Therefore, the mv88e6xxx driver must close this race condition by itself, by asking DSA to flush the switchdev workqueue of any FDB deletions in progress, prior to exiting a VLAN. Fixes: `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-14 13:31:12 +00:00
Vladimir Oltean	ddb44bdcde	net: dsa: remove lockdep class for DSA slave address list Since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:33 +00:00
Vladimir Oltean	8db2bc790d	net: dsa: remove lockdep class for DSA master address list Since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:32 +00:00
Vladimir Oltean	45b987d5ed	net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:32 +00:00
Jakub Kicinski	5b91c5cc0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-10 17:29:56 -08:00
Vladimir Oltean	ee534378f0	net: dsa: fix panic when DSA master device unbinds on shutdown Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: `0650bf52b3` ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-09 13:21:39 +00:00
Ansuel Smith	31eb6b4386	net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet packet Add connect/disconnect helper to assign private struct to the DSA switch. Add support for Ethernet mgmt and MIB if the DSA driver provide an handler to correctly parse and elaborate the data. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	18be654a43	net: dsa: tag_qca: add define for handling MIB packet Add struct to correctly parse a mib Ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	c2ee8181fd	net: dsa: tag_qca: add define for handling mgmt Ethernet packet Add all the required define to prepare support for mgmt read/write in Ethernet packet. Any packet of this type has to be dropped as the only use of these special packet is receive ack for an mgmt write request or receive data for an mgmt read request. A struct is used that emulates the Ethernet header but is used for a different purpose. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	101c04c346	net: dsa: tag_qca: enable promisc_on_master flag Ethernet MDIO packets are non-standard and DSA master expects the first 6 octets to be the MAC DA. To address these kind of packet, enable promisc_on_master flag for the tagger. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	3ec762fb13	net: dsa: tag_qca: move define to include linux/dsa Move tag_qca define to include dir linux/dsa as the qca8k require access to the tagger define to support in-band mdio read/write using ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	6b04582992	net: dsa: tag_qca: convert to FIELD macro Convert driver to FIELD macro to drop redundant define. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Vladimir Oltean	e83d565378	net: dsa: replay master state events in dsa_tree_{setup,teardown}_master In order for switch driver to be able to make simple and reliable use of the master tracking operations, they must also be notified of the initial state of the DSA master, not just of the changes. This is because they might enable certain features only during the time when they know that the DSA master is up and running. Therefore, this change explicitly checks the state of the DSA master under the same rtnl_mutex as we were holding during the dsa_master_setup() and dsa_master_teardown() call. The idea being that if the DSA master became operational in between the moment in which it became a DSA master (dsa_master_setup set dev->dsa_ptr) and the moment when we checked for the master being up, there is a chance that we would emit a ->master_state_change() call with no actual state change. We need to avoid that by serializing the concurrent netdevice event with us. If the netdevice event started before, we force it to finish before we begin, because we take rtnl_lock before making netdev_uses_dsa() return true. So we also handle that early event and do nothing on it. Similarly, if the dev_open() attempt is concurrent with us, it will attempt to take the rtnl_mutex, but we're holding it. We'll see that the master flag IFF_UP isn't set, then when we release the rtnl_mutex we'll process the NETDEV_UP notifier. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Vladimir Oltean	295ab96f47	net: dsa: provide switch operations for tracking the master state Certain drivers may need to send management traffic to the switch for things like register access, FDB dump, etc, to accelerate what their slow bus (SPI, I2C, MDIO) can already do. Ethernet is faster (especially in bulk transactions) but is also more unreliable, since the user may decide to bring the DSA master down (or not bring it up), therefore severing the link between the host and the attached switch. Drivers needing Ethernet-based register access already should have fallback logic to the slow bus if the Ethernet method fails, but that fallback may be based on a timeout, and the I/O to the switch may slow down to a halt if the master is down, because every Ethernet packet will have to time out. The driver also doesn't have the option to turn off Ethernet-based I/O momentarily, because it wouldn't know when to turn it back on. Which is where this change comes in. By tracking NETDEV_CHANGE, NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know the exact interval of time during which this interface is reliably available for traffic. Provide this information to switches so they can use it as they wish. An helper is added dsa_port_master_is_operational() to check if a master port is operational. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Tobias Waldekranz	108dc8741c	net: dsa: Avoid cross-chip syncing of VLAN filtering Changes to VLAN filtering are not applicable to cross-chip notifications. On a system like this: .-----. .-----. .-----. \| sw1 +---+ sw2 +---+ sw3 \| '-1-2-' '-1-2-' '-1-2-' Before this change, upon sw1p1 leaving a bridge, a call to dsa_port_vlan_filtering would also be made to sw2p1 and sw3p1. In this scenario: .---------. .-----. .-----. \| sw1 +---+ sw2 +---+ sw3 \| '-1-2-3-4-' '-1-2-' '-1-2-' When sw1p4 would leave a bridge, dsa_port_vlan_filtering would be called for sw2 and sw3 with a non-existing port - leading to array out-of-bounds accesses and crashes on mv88e6xxx. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-25 11:45:39 +00:00
Tobias Waldekranz	381a730182	net: dsa: Move VLAN filtering syncing out of dsa_switch_bridge_leave Most of dsa_switch_bridge_leave was, in fact, dealing with the syncing of VLAN filtering for switches on which that is a global setting. Separate the two phases to prepare for the cross-chip related bugfix in the following commit. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-25 11:45:39 +00:00
Vladimir Oltean	11fd667dac	net: dsa: setup master before ports It is said that as soon as a network interface is registered, all its resources should have already been prepared, so that it is available for sending and receiving traffic. One of the resources needed by a DSA slave interface is the master. dsa_tree_setup -> dsa_tree_setup_ports -> dsa_port_setup -> dsa_slave_create -> register_netdevice -> dsa_tree_setup_master -> dsa_master_setup -> sets up master->dsa_ptr, which enables reception Therefore, there is a short period of time after register_netdevice() during which the master isn't prepared to pass traffic to the DSA layer (master->dsa_ptr is checked by eth_type_trans). Same thing during unregistration, there is a time frame in which packets might be missed. Note that this change opens us to another race: dsa_master_find_slave() will get invoked potentially earlier than the slave creation, and later than the slave deletion. Since dp->slave starts off as a NULL pointer, the earlier calls aren't a problem, but the later calls are. To avoid use-after-free, we should zeroize dp->slave before calling dsa_slave_destroy(). In practice I cannot really test real life improvements brought by this change, since in my systems, netdevice creation races with PHY autoneg which takes a few seconds to complete, and that masks quite a few races. Effects might be noticeable in a setup with fixed links all the way to an external system. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	1e3f407f3c	net: dsa: first set up shared ports, then non-shared ports After commit `a57d8c217a` ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports"), the port setup and teardown procedure became asymmetric. The fact of the matter is that user ports need the shared ports to be up before they can be used for CPU-initiated termination. And since we register net devices for the user ports, those won't be functional until we also call the setup for the shared (CPU, DSA) ports. But we may do that later, depending on the port numbering scheme of the hardware we are dealing with. It just makes sense that all shared ports are brought up before any user port is. I can't pinpoint any issue due to the current behavior, but let's change it nonetheless, for consistency's sake. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	c146f9bc19	net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown} DSA needs to simulate master tracking events when a binding is first with a DSA master established and torn down, in order to give drivers the simplifying guarantee that ->master_state_change calls are made only when the master's readiness state to pass traffic changes. master_state_change() provide a operational bool that DSA driver can use to understand if DSA master is operational or not. To avoid races, we need to block the reception of NETDEV_UP/NETDEV_CHANGE/NETDEV_GOING_DOWN events in the netdev notifier chain while we are changing the master's dev->dsa_ptr (this changes what netdev_uses_dsa(dev) reports). The dsa_master_setup() and dsa_master_teardown() functions optionally require the rtnl_mutex to be held, if the tagger needs the master to be promiscuous, these functions call dev_set_promiscuity(). Move the rtnl_lock() from that function and make it top-level. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	a1ff94c297	net: dsa: stop updating master MTU from master.c At present there are two paths for changing the MTU of the DSA master. The first is: dsa_tree_setup -> dsa_tree_setup_ports -> dsa_port_setup -> dsa_slave_create -> dsa_slave_change_mtu -> dev_set_mtu(master) The second is: dsa_tree_setup -> dsa_tree_setup_master -> dsa_master_setup -> dev_set_mtu(dev) So the dev_set_mtu() call from dsa_master_setup() has been effectively superseded by the dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN) that is done from dsa_slave_create() for each user port. The later function also updates the master MTU according to the largest user port MTU from the tree. Therefore, updating the master MTU through a separate code path isn't needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	e31dbd3b6a	net: dsa: merge rtnl_lock sections in dsa_slave_create Currently dsa_slave_create() has two sequences of rtnl_lock/rtnl_unlock in a row. Remove the rtnl_unlock() and rtnl_lock() in between, such that the operation can execute slighly faster. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	904e112ad4	net: dsa: reorder PHY initialization with MTU setup in slave.c In dsa_slave_create() there are 2 sections that take rtnl_lock(): MTU change and netdev registration. They are separated by PHY initialization. There isn't any strict ordering requirement except for the fact that netdev registration should be last. Therefore, we can perform the MTU change a bit later, after the PHY setup. A future change will then be able to merge the two rtnl_lock sections into one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	a68dc7b938	net: dsa: remove cross-chip support for HSR The cross-chip notifiers for HSR are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: George McCollister <george.mccollister@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: George McCollister <george.mccollister@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:51 +00:00
Vladimir Oltean	cad69019f2	net: dsa: remove cross-chip support for MRP The cross-chip notifiers for MRP are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:50 +00:00
Vladimir Oltean	ff91e1b684	net: dsa: fix incorrect function pointer check for MRP ring roles The cross-chip notifier boilerplate code meant to check the presence of ds->ops->port_mrp_add_ring_role before calling it, but checked ds->ops->port_mrp_add instead, before calling ds->ops->port_mrp_add_ring_role. Therefore, a driver which implements one operation but not the other would trigger a NULL pointer dereference. There isn't any such driver in DSA yet, so there is no reason to backport the change. Issue found through code inspection. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Fixes: `c595c4330d` ("net: dsa: add MRP support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:50 +00:00
Vladimir Oltean	258030acc9	net: dsa: make dsa_switch :: num_ports an unsigned int Currently, num_ports is declared as size_t, which is defined as __kernel_ulong_t, therefore it occupies 8 bytes of memory. Even switches with port numbers in the range of tens are exotic, so there is no need for this amount of storage. Additionally, because the max_num_bridges member right above it is also 4 bytes, it means the compiler needs to add padding between the last 2 fields. By reducing the size, we don't need that padding and can reduce the struct size. Before: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device * dev; /* 0 8 / struct dsa_switch_tree dst; /* 8 8 / unsigned int index; / 16 4 / u32 setup:1; / 20: 0 4 / u32 vlan_filtering_is_global:1; / 20: 1 4 / u32 needs_standalone_vlan_filtering:1; / 20: 2 4 / u32 configure_vlan_while_not_filtering:1; / 20: 3 4 / u32 untag_bridge_pvid:1; / 20: 4 4 / u32 assisted_learning_on_cpu_port:1; / 20: 5 4 / u32 vlan_filtering:1; / 20: 6 4 / u32 pcs_poll:1; / 20: 7 4 / u32 mtu_enforcement_ingress:1; / 20: 8 4 / / XXX 23 bits hole, try to pack / struct notifier_block nb; / 24 24 / / XXX last struct has 4 bytes of padding / void priv; /* 48 8 / void tagger_data; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct dsa_chip_data cd; /* 64 8 / const struct dsa_switch_ops ops; /* 72 8 / u32 phys_mii_mask; / 80 4 / / XXX 4 bytes hole, try to pack / struct mii_bus slave_mii_bus; /* 88 8 / unsigned int ageing_time_min; / 96 4 / unsigned int ageing_time_max; / 100 4 / struct dsa_8021q_context tag_8021q_ctx; /* 104 8 / struct devlink devlink; /* 112 8 / unsigned int num_tx_queues; / 120 4 / unsigned int num_lag_ids; / 124 4 / / --- cacheline 2 boundary (128 bytes) --- / unsigned int max_num_bridges; / 128 4 / / XXX 4 bytes hole, try to pack / size_t num_ports; / 136 8 / / size: 144, cachelines: 3, members: 27 / / sum members: 132, holes: 2, sum holes: 8 / / sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits / / paddings: 1, sum paddings: 4 / / last cacheline: 16 bytes / }; After: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device dev; /* 0 8 / struct dsa_switch_tree dst; /* 8 8 / unsigned int index; / 16 4 / u32 setup:1; / 20: 0 4 / u32 vlan_filtering_is_global:1; / 20: 1 4 / u32 needs_standalone_vlan_filtering:1; / 20: 2 4 / u32 configure_vlan_while_not_filtering:1; / 20: 3 4 / u32 untag_bridge_pvid:1; / 20: 4 4 / u32 assisted_learning_on_cpu_port:1; / 20: 5 4 / u32 vlan_filtering:1; / 20: 6 4 / u32 pcs_poll:1; / 20: 7 4 / u32 mtu_enforcement_ingress:1; / 20: 8 4 / / XXX 23 bits hole, try to pack / struct notifier_block nb; / 24 24 / / XXX last struct has 4 bytes of padding / void priv; /* 48 8 / void tagger_data; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct dsa_chip_data cd; /* 64 8 / const struct dsa_switch_ops ops; /* 72 8 / u32 phys_mii_mask; / 80 4 / / XXX 4 bytes hole, try to pack / struct mii_bus slave_mii_bus; /* 88 8 / unsigned int ageing_time_min; / 96 4 / unsigned int ageing_time_max; / 100 4 / struct dsa_8021q_context tag_8021q_ctx; /* 104 8 / struct devlink devlink; /* 112 8 / unsigned int num_tx_queues; / 120 4 / unsigned int num_lag_ids; / 124 4 / / --- cacheline 2 boundary (128 bytes) --- / unsigned int max_num_bridges; / 128 4 / unsigned int num_ports; / 132 4 / / size: 136, cachelines: 3, members: 27 / / sum members: 128, holes: 1, sum holes: 4 / / sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits / / paddings: 1, sum paddings: 4 / / last cacheline: 8 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 14:46:23 +00:00
David S. Miller	e63a023489	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your net-next tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg\|ret\|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:35:40 +00:00
Jakub Kicinski	b6459415b3	net: Don't include filter.h from net/sock.h sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org	2021-12-29 08:48:14 -08:00
Jakub Kicinski	8b3f913322	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h commit `8f905c0e73` ("inet: fully convert sk->sk_rx_dst to RCU rules") commit `43f51df417` ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-23 16:09:58 -08:00
Xiaoliang Yang	ae2778a647	net: dsa: tag_ocelot: use traffic class to map priority on injected header For Ocelot switches, the CPU injected frames have an injection header where it can specify the QoS class of the packet and the DSA tag, now it uses the SKB priority to set that. If a traffic class to priority mapping is configured on the netdevice (with mqprio for example ...), it won't be considered for CPU injected headers. This patch make the QoS class aligned to the priority to traffic class mapping if it exists. Fixes: `8dce89aa5f` ("net: dsa: ocelot: add tagger for Ocelot/Felix switches") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Marouen Ghodhbane <marouen.ghodhbane@nxp.com> Link: https://lore.kernel.org/r/20211223072211.33130-1-xiaoliang.yang_1@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-23 09:44:59 -08:00
Vladimir Oltean	7f2973149c	net: dsa: make tagging protocols connect to individual switches from a tree On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105, the mechanism through which the tagger connects to the switch tree is broken, due to improper DSA code design. At the time when tag_ops->connect() is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all the ports, so it doesn't know how large the tree is and how many ports it has. It has just seen the first CPU port by this time. As a result, this function will call the tagger's ->connect method too early, and the tagger will connect only to the first switch from the tree. This could be perhaps addressed a bit more simply by just moving the tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup), but there is already a design inconsistency at present: on the switch side, the notification is on a per-switch basis, but on the tagger side, it is on a per-tree basis. Furthermore, the persistent storage itself is per switch (ds->tagger_data). And the tagger connect and disconnect procedures (at least the ones that exist currently) could see a fair bit of simplification if they didn't have to iterate through the switches of a tree. To fix the issue, this change transforms tag_ops->connect(dst) into tag_ops->connect(ds) and moves it somewhere where we already iterate over all switches of a tree. That is in dsa_switch_setup_tag_protocol(), which is a good placement because we already have there the connection call to the switch side of things. As for the dsa_tree_bind_tag_proto() method (called from the code path that changes the tag protocol), things are a bit more complicated because we receive the tree as argument, yet when we unwind on errors, it would be nice to not call tag_ops->disconnect(ds) where we didn't previously call tag_ops->connect(ds). We didn't have this problem before because the tag_ops connection operations passed the entire dst before, and this is more fine grained now. To solve the error rewind case using the new API, we have to create yet one more cross-chip notifier for disconnection, and stay connected with the old tag protocol to all the switches in the tree until we've succeeded to connect with the new one as well. So if something fails half way, the whole tree is still connected to the old tagger. But there may still be leaks if the tagger fails to connect to the 2nd out of 3 switches in a tree: somebody needs to tell the tagger to disconnect from the first switch. Nothing comes for free, and this was previously handled privately by the tagging protocol driver before, but now we need to emit a disconnect cross-chip notifier for that, because DSA has to take care of the unwind path. We assume that the tagging protocol has connected to a switch if it has set ds->tagger_data to something, otherwise we avoid calling its disconnection method in the error rewind path. The rest of the changes are in the tagging protocol drivers, and have to do with the replacement of dst with ds. The iteration is removed and the error unwind path is simplified, as mentioned above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-14 12:45:16 +00:00
Vladimir Oltean	e2f01bfe14	net: dsa: tag_sja1105: fix zeroization of ds->priv on tag proto disconnect The method was meant to zeroize ds->tagger_data but got the wrong pointer. Fix this. Fixes: `c79e84866d` ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-14 12:45:15 +00:00
Vladimir Oltean	950a419d9d	net: dsa: tag_sja1105: split sja1105_tagger_data into private and public sections The sja1105 driver messes with the tagging protocol's state when PTP RX timestamping is enabled/disabled. This is fundamentally necessary because the tagger needs to know what to do when it receives a PTP packet. If RX timestamping is enabled, then a metadata follow-up frame is expected, and this holds the (partial) timestamp. So the tagger plays hide-and-seek with the network stack until it also gets the metadata frame, and then presents a single packet, the timestamped PTP packet. But when RX timestamping isn't enabled, there is no metadata frame expected, so the hide-and-seek game must be turned off and the packet must be delivered right away to the network stack. Considering this, we create a pseudo isolation by devising two tagger methods callable by the switch: one to get the RX timestamping state, and one to set it. Since we can't export symbols between the tagger and the switch driver, these methods are exposed through function pointers. After this change, the public portion of the sja1105_tagger_data contains only function pointers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:34 +00:00
Vladimir Oltean	fcbf979a5b	Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver" This reverts commit `6d709cadfd`. The above change was done to avoid calling symbols exported by the switch driver from the tagging protocol driver. With the tagger-owned storage model, we have a new option on our hands, and that is for the switch driver to provide a data consumer handler in the form of a function pointer inside the ->connect_tag_protocol() method. Having a function pointer avoids the problems of the exported symbols approach. By creating a handler for metadata frames holding TX timestamps on SJA1110, we are able to eliminate an skb queue from the tagger data, and replace it with a simple, and stateless, function pointer. This skb queue is now handled exclusively by sja1105_ptp.c, which makes the code easier to follow, as it used to be before the reverted patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:34 +00:00
Vladimir Oltean	c79e84866d	net: dsa: tag_sja1105: convert to tagger-owned data Currently, struct sja1105_tagger_data is a part of struct sja1105_private, and is used by the sja1105 driver to populate dp->priv. With the movement towards tagger-owned storage, the sja1105 driver should not be the owner of this memory. This change implements the connection between the sja1105 switch driver and its tagging protocol, which means that sja1105_tagger_data no longer stays in dp->priv but in ds->tagger_data, and that the sja1105 driver now only populates the sja1105_port_deferred_xmit callback pointer. The kthread worker is now the responsibility of the tagger. The sja1105 driver also alters the tagger's state some more, especially with regard to the PTP RX timestamping state. This will be fixed up a bit in further changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	bfcf142522	net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data The design of the sja1105 tagger dp->priv is that each port has a separate struct sja1105_port, and the sp->data pointer points to a common struct sja1105_tagger_data. We have removed all per-port members accessible by the tagger, and now only struct sja1105_tagger_data remains. Make dp->priv point directly to this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	d38049bbe7	net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q When the ocelot-8021q driver was converted to deferred xmit as part of commit `8d5f7954b7` ("net: dsa: felix: break at first CPU port during init and teardown"), the deferred implementation was deliberately made subtly different from what sja1105 has. The implementation differences lied on the following observations: - There might be a race between these two lines in tag_sja1105.c: skb_queue_tail(&sp->xmit_queue, skb_get(skb)); kthread_queue_work(sp->xmit_worker, &sp->xmit_work); and the skb dequeue logic in sja1105_port_deferred_xmit(). For example, the xmit_work might be already queued, however the work item has just finished walking through the skb queue. Because we don't check the return code from kthread_queue_work, we don't do anything if the work item is already queued. However, nobody will take that skb and send it, at least until the next timestampable skb is sent. This creates additional (and avoidable) TX timestamping latency. To close that race, what the ocelot-8021q driver does is it doesn't keep a single work item per port, and a skb timestamping queue, but rather dynamically allocates a work item per packet. - It is also unnecessary to have more than one kthread that does the work. So delete the per-port kthread allocations and replace them with a single kthread which is global to the switch. This change brings the two implementations in line by applying those observations to the sja1105 driver as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	35d9768021	net: dsa: tag_ocelot: convert to tagger-owned data The felix driver makes very light use of dp->priv, and the tagger is effectively stateless. dp->priv is practically only needed to set up a callback to perform deferred xmit of PTP and STP packets using the ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no use of dp->priv, although this driver sets up dp->priv irrespective of actual tagging protocol in use). struct felix_port (what used to be pointed to by dp->priv) is removed and replaced with a two-sided structure. The public side of this structure, visible to the switch driver, is ocelot_8021q_tagger_data. The private side is ocelot_8021q_tagger_private, and the latter structure physically encapsulates the former. The public half of the tagger data structure can be accessed through a helper of the same name (ocelot_8021q_tagger_data) which also sanity-checks the protocol currently in use by the switch. The public/private split was requested by Andrew Lunn. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	dc452a471d	net: dsa: introduce tagger-owned storage for private and shared data Ansuel is working on register access over Ethernet for the qca8k switch family. This requires the qca8k tagging protocol driver to receive frames which aren't intended for the network stack, but instead for the qca8k switch driver itself. The dp->priv is currently the prevailing method for passing data back and forth between the tagging protocol driver and the switch driver. However, this method is riddled with caveats. The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be modified to do just that. But in the current design, the memory behind dp->priv has to be allocated by the switch driver, so if the tagging protocol is paired to an unexpected switch driver, we may end up in NULL pointer dereferences inside the kernel, or worse (a switch driver may allocate dp->priv according to the expectations of a different tagger). The latter possibility is even more plausible considering that DSA switches can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends itself to mistakes that are all too easy to make. This patch proposes that the tagging protocol driver should manage its own memory, instead of relying on the switch driver to do so. After analyzing the different in-tree needs, it can be observed that the required tagger storage is per switch, therefore a ds->tagger_data pointer is introduced. In principle, per-port storage could also be introduced, although there is no need for it at the moment. Future changes will replace the current usage of dp->priv with ds->tagger_data. We define a "binding" event between the DSA switch tree and the tagging protocol. During this binding event, the tagging protocol's ->connect() method is called first, and this may allocate some memory for each switch of the tree. Then a cross-chip notifier is emitted for the switches within that tree, and they are given the opportunity to fix up the tagger's memory (for example, they might set up some function pointers that represent virtual methods for consuming packets). Because the memory is owned by the tagger, there exists a ->disconnect() method for the tagger (which is the place to free the resources), but there doesn't exist a ->disconnect() method for the switch driver. This is part of the design. The switch driver should make minimal use of the public part of the tagger data, and only after type-checking it using the supplied "proto" argument. In the code there are in fact two binding events, one is the initial event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip notifier chains aren't initialized, so we call each switch's connect() method by hand. Then there is dsa_tree_bind_tag_proto() during dsa_tree_change_tag_proto(), and here we have an old protocol and a new one. We first connect to the new one before disconnecting from the old one, to simplify error handling a bit and to ensure we remain in a valid state at all times. Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Russell King (Oracle)	0a9f0794d9	net: dsa: mark DSA phylink as legacy_pre_march2020 The majority of DSA drivers do not make use of the PCS support, and thus operate in legacy mode. In order to preserve this behaviour in future, we need to set the legacy_pre_march2020 flag so phylink knows this may require the legacy calls. There are some DSA drivers that do make use of PCS support, and these will continue operating as before - legacy_pre_march2020 will not prevent split-PCS support enabling the newer phylink behaviour. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-09 11:21:03 -08:00
Vladimir Oltean	857fdd74fb	net: dsa: eliminate dsa_switch_ops :: port_bridge_tx_fwd_{,un}offload We don't really need new switch API for these, and with new switches which intend to add support for this feature, it will become cumbersome to maintain. The change consists in restructuring the two drivers that implement this offload (sja1105 and mv88e6xxx) such that the offload is enabled and disabled from the ->port_bridge_{join,leave} methods instead of the old ->port_bridge_tx_fwd_{,un}offload. The only non-trivial change is that mv88e6xxx_map_virtual_bridge_to_pvt() has been moved to avoid a forward declaration, and the mv88e6xxx_reg_lock() calls from inside it have been removed, since locking is now done from mv88e6xxx_port_bridge_{join,leave}. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	b079922ba2	net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join This is a preparation patch for the removal of the DSA switch methods ->port_bridge_tx_fwd_offload() and ->port_bridge_tx_fwd_unoffload(). The plan is for the switch to report whether it offloads TX forwarding directly as a response to the ->port_bridge_join() method. This change deals with the noisy portion of converting all existing function prototypes to take this new boolean pointer argument. The bool is placed in the cross-chip notifier structure for bridge join, and a reference to it is provided to drivers. In the next change, DSA will then actually look at this value instead of calling ->port_bridge_tx_fwd_offload(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	d3eed0e57d	net: dsa: keep the bridge_dev and bridge_num as part of the same structure The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	6a43cba303	net: dsa: export bridging offload helpers to drivers Move the static inline helpers from net/dsa/dsa_priv.h to include/net/dsa.h, so that drivers can call functions such as dsa_port_offloads_bridge_dev(), which will be necessary after the transition to a more complex bridge structure. More functions than are needed right now are being moved, but this is done for uniformity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	936db8a2db	net: dsa: rename dsa_port_offloads_bridge to dsa_port_offloads_bridge_dev Currently the majority of dsa_port_bridge_dev_get() calls in drivers is just to check whether a port is under the bridge device provided as argument by the DSA API. We'd like to change that DSA API so that a more complex structure is provided as argument. To keep things more generic, and considering that the new complex structure will be provided by value and not by reference, direct comparisons between dp->bridge and the provided bridge will be broken. The generic way to do the checking would simply be to do something like dsa_port_offloads_bridge(dp, &bridge). But there's a problem, we already have a function named that way, which actually takes a bridge_dev net_device as argument. Rename it so that we can use dsa_port_offloads_bridge for something else. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:15 -08:00
Vladimir Oltean	36cbf39b56	net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpers The location of the bridge device pointer and number is going to change. It is not going to be kept individually per port, but in a common structure allocated dynamically and which will have lockdep validation. Create helpers to access these elements so that we have a migration path to the new organization. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:15 -08:00
Vladimir Oltean	947c8746e2	net: dsa: assign a bridge number even without TX forwarding offload The service where DSA assigns a unique bridge number for each forwarding domain is useful even for drivers which do not implement the TX forwarding offload feature. For example, drivers might use the dp->bridge_num for FDB isolation. So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and calculate a unique bridge_num for all drivers that set this value. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:14 -08:00
Vladimir Oltean	3f9bb0301d	net: dsa: make dp->bridge_num one-based I have seen too many bugs already due to the fact that we must encode an invalid dp->bridge_num as a negative value, because the natural tendency is to check that invalid value using (!dp->bridge_num). Latest example can be seen in commit `1bec0f0506` ("net: dsa: fix bridge_num not getting cleared after ports leaving the bridge"). Convert the existing users to assume that dp->bridge_num == 0 is the encoding for invalid, and valid bridge numbers start from 1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:14 -08:00
Russell King (Oracle)	5938bce4b6	net: dsa: support use of phylink_generic_validate() Support the use of phylink_generic_validate() when there is no phylink_validate method given in the DSA switch operations and mac_capabilities have been set in the phylink_config structure by the DSA switch driver. This gives DSA switch drivers the option to use this if they provide the supported_interfaces and mac_capabilities, while still giving them an option to override the default implementation if necessary. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Russell King (Oracle)	072eea6c22	net: dsa: replace phylink_get_interfaces() with phylink_get_caps() Phylink needs slightly more information than phylink_get_interfaces() allows us to get from the DSA drivers - we need the MAC capabilities. Replace the phylink_get_interfaces() method with phylink_get_caps() to allow DSA drivers to fill in the phylink_config MAC capabilities field as well. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Russell King (Oracle)	21bd64bd71	net: dsa: consolidate phylink creation The code in port.c and slave.c creating the phylink instance is very similar - let's consolidate this into a single function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Leon Romanovsky	4c897cfc46	devlink: Simplify devlink resources unregister call The devlink_resources_unregister() used second parameter as an entry point for the recursive removal of devlink resources. None of the callers outside of devlink core needed to use this field, so let's remove it. As part of this removal, the "struct devlink_resource" was moved from .h to .c file as it is not possible to use in any place in the code except devlink.c. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-30 12:23:32 +00:00
Vladimir Oltean	92f62485b3	net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit `c0bcf53766` ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: `ea440cd2d9` ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-03 14:22:00 +00:00
Marek Behún	c07c6e8eb4	net: dsa: populate supported_interfaces member Add a new DSA switch operation, phylink_get_interfaces, which should fill in which PHY_INTERFACE_MODE_* are supported by given port. Use this before phylink_create() to fill phylinks supported_interfaces member, allowing phylink to determine which PHY_INTERFACE_MODEs are supported. Signed-off-by: Marek Behún <kabel@kernel.org> [tweaked patch and description to add more complete support -- rmk] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:06:32 +00:00
Vladimir Oltean	716a30a97a	net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device To reduce code churn, the same patch makes multiple changes, since they all touch the same lines: 1. The implementations for these two are identical, just with different function pointers. Reduce duplications and name the function pointers "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument. 2. Drop the "const" attribute from "orig_dev". If the driver needs to check whether orig_dev belongs to itself and then call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it can't, because call_switchdev_notifiers takes a non-const struct net_device *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-27 14:54:02 +01:00
Vladimir Oltean	425d19cede	net: dsa: stop calling dev_hold in dsa_slave_fdb_event Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have finished executing by the time we leave our bridge upper interface, we've established a stronger boundary condition for how long the dsa_slave_switchdev_event_work() might run. As such, it is no longer possible for DSA slave interfaces to become unregistered, since they are still bridge ports. So delete the unnecessary dev_hold() and dev_put(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 15:07:35 +01:00
Vladimir Oltean	d7d0d423db	net: dsa: flush switchdev workqueue when leaving the bridge DSA is preparing to offer switch drivers an API through which they can associate each FDB entry with a struct net_device *bridge_dev. This can be used to perform FDB isolation (the FDB lookup performed on the ingress of a standalone, or bridged port, should not find an FDB entry that is present in the FDB of another bridge). In preparation of that work, DSA needs to ensure that by the time we call the switch .port_fdb_add and .port_fdb_del methods, the dp->bridge_dev pointer is still valid, i.e. the port is still a bridge port. This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API requires drivers that must have sleepable context to handle those events to schedule the deferred work themselves. DSA does this through the dsa_owq. It can happen that a port leaves a bridge, del_nbp() flushes the FDB on that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context, DSA schedules its deferred work, but del_nbp() finishes unlinking the bridge as a master from the port before DSA's deferred work is run. Fundamentally, the port must not be unlinked from the bridge until all FDB deletion deferred work items have been flushed. The bridge must wait for the completion of these hardware accesses. An attempt has been made to address this issue centrally in switchdev by making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev level, which would offer implicit synchronization with del_nbp: https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/ but it seems that any attempt to modify switchdev's behavior and make the events blocking there would introduce undesirable side effects in other switchdev consumers. The most undesirable behavior seems to be that switchdev_deferred_process_work() takes the rtnl_mutex itself, which would be worse off than having the rtnl_mutex taken individually from drivers which is what we have now (except DSA which has removed that lock since commit `0faf890fc5` ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")). So to offer the needed guarantee to DSA switch drivers, I have come up with a compromise solution that does not require switchdev rework: we already have a hook at the last moment in time when the bridge is still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush the dsa_owq manually from there, which makes all FDB deletions synchronous. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 15:07:35 +01:00
Vladimir Oltean	0faf890fc5	net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit `b117e1e8a8` ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
Vladimir Oltean	338a3a4745	net: dsa: introduce locking for the address lists on CPU and DSA ports Now that the rtnl_mutex is going away for dsa_port_{host_,}fdb_{add,del}, no one is serializing access to the address lists that DSA keeps for the purpose of reference counting on shared ports (CPU and cascade ports). It can happen for one dsa_switch_do_fdb_del to do list_del on a dp->fdbs element while another dsa_switch_do_fdb_{add,del} is traversing dp->fdbs. We need to avoid that. Currently dp->mdbs is not at risk, because dsa_switch_do_mdb_{add,del} still runs under the rtnl_mutex. But it would be nice if it would not depend on that being the case. So let's introduce a mutex per port (the address lists are per port too) and share it between dp->mdbs and dp->fdbs. The place where we put the locking is interesting. It could be tempting to put a DSA-level lock which still serializes calls to .port_fdb_{add,del}, but it would still not avoid concurrency with other driver code paths that are currently under rtnl_mutex (.port_fdb_dump, .port_fast_age). So it would add a very false sense of security (and adding a global switch-wide lock in DSA to resynchronize with the rtnl_lock is also counterproductive and hard). So the locking is intentionally done only where the dp->fdbs and dp->mdbs lists are traversed. That means, from a driver perspective, that .port_fdb_add will be called with the dp->addr_lists_lock mutex held on the CPU port, but not held on user ports. This is done so that driver writers are not encouraged to rely on any guarantee offered by dp->addr_lists_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
Vladimir Oltean	232deb3f95	net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error At present, when either of ds->ops->port_fdb_del() or ds->ops->port_mdb_del() return a non-zero error code, we attempt to save the day and keep the data structure associated with that switchdev object, as the deletion procedure did not complete. However, the way in which we do this is suspicious to the checker in lib/refcount.c, who thinks it is buggy to increment a refcount that became zero, and that this is indicative of a use-after-free. Fixes: `161ca59d39` ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:41 +01:00
David S. Miller	2d7e73f09f	Revert "Merge branch 'dsa-rtnl'" This reverts commit `965e6b262f`, reversing changes made to `4d98bb0d7e`.	2021-10-25 12:59:25 +01:00
Vladimir Oltean	5cdfde49a0	net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit `b117e1e8a8` ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:44 +01:00
Vladimir Oltean	d3bd892437	net: dsa: introduce locking for the address lists on CPU and DSA ports Now that the rtnl_mutex is going away for dsa_port_{host_,}fdb_{add,del}, no one is serializing access to the address lists that DSA keeps for the purpose of reference counting on shared ports (CPU and cascade ports). It can happen for one dsa_switch_do_fdb_del to do list_del on a dp->fdbs element while another dsa_switch_do_fdb_{add,del} is traversing dp->fdbs. We need to avoid that. Currently dp->mdbs is not at risk, because dsa_switch_do_mdb_{add,del} still runs under the rtnl_mutex. But it would be nice if it would not depend on that being the case. So let's introduce a mutex per port (the address lists are per port too) and share it between dp->mdbs and dp->fdbs. The place where we put the locking is interesting. It could be tempting to put a DSA-level lock which still serializes calls to .port_fdb_{add,del}, but it would still not avoid concurrency with other driver code paths that are currently under rtnl_mutex (.port_fdb_dump, .port_fast_age). So it would add a very false sense of security (and adding a global switch-wide lock in DSA to resynchronize with the rtnl_lock is also counterproductive and hard). So the locking is intentionally done only where the dp->fdbs and dp->mdbs lists are traversed. That means, from a driver perspective, that .port_fdb_add will be called with the dp->addr_lists_lock mutex held on the CPU port, but not held on user ports. This is done so that driver writers are not encouraged to rely on any guarantee offered by dp->addr_lists_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:44 +01:00
David S. Miller	bdfa75ad70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-22 11:41:16 +01:00
Vladimir Oltean	992e5cc7be	net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argument Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that contains the necessary information from the two arguments that are currently provided: the switch and the port number. Also rename those functions so that they have a dsa_port_* prefix, since they operate on a struct dsa_port *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	5068887a4f	net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_port Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	fac6abd5f1	net: dsa: convert cross-chip notifiers to iterate using dp The majority of cross-chip switch notifiers need to filter in some way over the type of ports: some install VLANs etc on all cascade ports. The difference is that the matching function, which filters by port type, is separate from the function where the iteration happens. So this patch needs to refactor the matching functions' prototypes as well, to take the dp as argument. In a future patch/series, I might convert dsa_towards_port to return a struct dsa_port *dp too, but at the moment it is a bit entangled with dsa_routing_port which is also used by mv88e6xxx and they both return an int port. So keep dsa_towards_port the way it is and convert it into a dp using dsa_to_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	57d77986e7	net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_port Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct dsa_port *dp was already available in the function scope, and replace them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses that dp directly and does not perform another hidden dsa_to_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	65c563a677	net: dsa: do not open-code dsa_switch_for_each_port Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:06 +01:00
Vladimir Oltean	d0004a020b	net: dsa: remove the "dsa_to_port in a loop" antipattern from the core Ever since Vivien's conversion of the ds->ports array into a dst->ports list, and the introduction of dsa_to_port, iterations through the ports of a switch became quadratic whenever dsa_to_port was needed. dsa_to_port can either be called directly, or indirectly through the dsa_is_{user,cpu,dsa,unused}_port helpers. Use the newly introduced dsa_switch_for_each_port() iteration macro that works with the iterator variable being a struct dsa_port *dp directly, and not an int i. It is an expensive variable to go from i to dp, but cheap to go from dp to i. This macro iterates through the entire ds->dst->ports list and filters by the ports belonging just to the switch provided as argument. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:06 +01:00
Christophe JAILLET	ba69fd9101	net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()' If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: `83c0afaec7` ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-19 15:41:16 -07:00
Alvin Šipraga	1521d5adfc	net: dsa: tag_rtl8_4: add realtek 8 byte protocol 4 tag This commit implements a basic version of the 8 byte tag protocol used in the Realtek RTL8365MB-VC unmanaged switch, which carries with it a protocol version of 0x04. The implementation itself only handles the parsing of the EtherType value and Realtek protocol version, together with the source or destination port fields. The rest is left unimplemented for now. The tag format is described in a confidential document provided to my company by Realtek Semiconductor Corp. Permission has been granted by the vendor to publish this driver based on that material, together with an extract from the document describing the tag format and its fields. It is hoped that this will help future implementors who do not have access to the material but who wish to extend the functionality of drivers for chips which use this protocol. In addition, two possible values of the REASON field are specified, based on experiments on my end. Realtek does not specify what value this field can take. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:56 +01:00
Alvin Šipraga	9cb8edda21	net: dsa: move NET_DSA_TAG_RTL4_A to right place in Kconfig/Makefile Move things around a little so that this tag driver is alphabetically ordered. The Kconfig file is sorted based on the tristate text. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:55 +01:00
Alvin Šipraga	487d3855b6	net: dsa: allow reporting of standard ethtool stats for slave devices Jakub pointed out that we have a new ethtool API for reporting device statistics in a standardized way, via .get_eth_{phy,mac,ctrl}_stats. Add a small amount of plumbing to allow DSA drivers to take advantage of this when exposing statistics. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:55 +01:00
Jakub Kicinski	e15f5972b8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/ioam6.sh `7b1700e009` ("selftests: net: modify IOAM tests for undef bits") `bf77b1400a` ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 16:50:14 -07:00
Vladimir Oltean	39e222bfd7	net: dsa: unregister cross-chip notifier after ds->ops->teardown To be symmetric with the error unwind path of dsa_switch_setup(), call dsa_switch_unregister_notifier() after ds->ops->teardown. The implication is that ds->ops->teardown cannot emit cross-chip notifiers. For example, currently the dsa_tag_8021q_unregister() call from sja1105_teardown() does not propagate to the entire tree due to this reason. However I cannot find an actual issue caused by this, observed using code inspection. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211012123735.2545742-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-13 13:36:01 -07:00
Vladimir Oltean	43ba33b4f1	net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports When setting up a bridge with stp_state 1, topology changes are not detected and loops are not blocked. This is because the standard way of transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the right egress port, does not override the port STP state (in the case of Ocelot switches, that's really the PGID_SRC masks). To force a packet to be injected into a port that's BLOCKING, we must send it as a control packet, which means in the case of this tagger to send it using the manual register injection method. We already do this for PTP frames, extend the logic to apply to any link-local MAC DA. Fixes: `7c83a7c539` ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:19 -07:00
Vladimir Oltean	49f885b2d9	net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: `0a6f17c6ae` ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:18 -07:00
Vladimir Oltean	deab6b1cd9	net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit `40d3f295b5` ("net: mscc: ocelot: use common tag parsing code with DSA"). With that function declared as static inline, its body is expanded inside each call site, so the dependency is broken and the DSA tagger can be built without the switch library, upon which the felix driver depends. Fixes: `39e5308b32` ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:18 -07:00
Vladimir Oltean	4ac0567e40	net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver It's nice to be able to test a tagging protocol with dsa_loop, but not at the cost of losing the ability of building the tagging protocol and switch driver as modules, because as things stand, there is a circular dependency between the two. Tagging protocol drivers cannot depend on switch drivers, that is a hard fact. The reasoning behind the blamed patch was that accessing dp->priv should first make sure that the structure behind that pointer is what we really think it is. Currently the "sja1105" and "sja1110" tagging protocols only operate with the sja1105 switch driver, just like any other tagging protocol and switch combination. The only way to mix and match them is by modifying the code, and this applies to dsa_loop as well (by default that uses DSA_TAG_PROTO_NONE). So while in principle there is an issue, in practice there isn't one. Until we extend dsa_loop to allow user space configuration, treat the problem as a non-issue and just say that DSA ports found by tag_sja1105 are always sja1105 ports, which is in fact true. But keep the dsa_port_is_sja1105 function so that it's easy to patch it during testing, and rely on dead code elimination. Fixes: `994d2cbb08` ("net: dsa: tag_sja1105: be dsa_loop-safe") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:33:36 -07:00
Vladimir Oltean	28da0555c3	net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver The problem is that DSA tagging protocols really must not depend on the switch driver, because this creates a circular dependency at insmod time, and the switch driver will effectively not load when the tagging protocol driver is missing. The code was structured in the way it was for a reason, though. The DSA driver-facing API for PTP timestamping relies on the assumption that two-step TX timestamps are provided by the hardware in an out-of-band manner, typically by raising an interrupt and making that timestamp available inside some sort of FIFO which is to be accessed over SPI/MDIO/etc. So the API puts .port_txtstamp into dsa_switch_ops, because it is expected that the switch driver needs to save some state (like put the skb into a queue until its TX timestamp arrives). On SJA1110, TX timestamps are provided by the switch as Ethernet packets, so this makes them be received and processed by the tagging protocol driver. This in itself is great, because the timestamps are full 64-bit and do not require reconstruction, and since Ethernet is the fastest I/O method available to/from the switch, PTP timestamps arrive very quickly, no matter how bottlenecked the SPI connection is, because SPI interaction is not needed at all. DSA's code structure and strict isolation between the tagging protocol driver and the switch driver break the natural code organization. When the tagging protocol driver receives a packet which is classified as a metadata packet containing timestamps, it passes those timestamps one by one to the switch driver, which then proceeds to compare them based on the recorded timestamp ID that was generated in .port_txtstamp. The communication between the tagging protocol and the switch driver is done through a method exported by the switch driver, sja1110_process_meta_tstamp. To satisfy build requirements, we force a dependency to build the tagging protocol driver as a module when the switch driver is a module. However, as explained in the first paragraph, that causes the circular dependency. To solve this, move the skb queue from struct sja1105_private :: struct sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data. The latter is a data structure for which hacks have already been put into place to be able to create persistent storage per switch that is accessible from the tagging protocol driver (see sja1105_setup_ports). With the skb queue directly accessible from the tagging protocol driver, we can now move sja1110_process_meta_tstamp into the tagging driver itself, and avoid exporting a symbol. Fixes: `566b18c8b7` ("net: dsa: sja1105: implement TX timestamping for SJA1110") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:33:36 -07:00
Alvin Šipraga	43a4b4dbd4	net: dsa: fix spurious error message when unoffloaded port leaves bridge Flip the sign of a return value check, thereby suppressing the following spurious error: port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP ... which is emitted when removing an unoffloaded DSA switch port from a bridge. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 16:11:23 -07:00
Vladimir Oltean	1951b3f19c	net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol It was a documented fact that ds->ops->change_tag_protocol() offered rtnetlink mutex protection to the switch driver, since there was an ASSERT_RTNL right before the call in dsa_switch_change_tag_proto() (initiated from sysfs). The blamed commit introduced another call path for ds->ops->change_tag_protocol() which does not hold the rtnl_mutex. This is: dsa_tree_setup -> dsa_tree_setup_switches -> dsa_switch_setup -> dsa_switch_setup_tag_protocol -> ds->ops->change_tag_protocol() -> dsa_port_setup -> dsa_slave_create -> register_netdevice(slave_dev) -> dsa_tree_setup_master -> dsa_master_setup -> dev->dsa_ptr = cpu_dp The reason why the rtnl_mutex is held in the sysfs call path is to ensure that, once the master and all the DSA interfaces are down (which is required so that no packets flow), they remain down during the tagging protocol change. The above calling order illustrates the fact that it should not be risky to change the initial tagging protocol to the one specified in the device tree at the given time: - packets cannot enter the dsa_switch_rcv() packet type handler since netdev_uses_dsa() for the master will not yet return true, since dev->dsa_ptr has not yet been populated - packets cannot enter the dsa_slave_xmit() function because no DSA interface has yet been registered So from the DSA core's perspective, holding the rtnl_mutex is indeed not necessary. Yet, drivers may need to do things which need rtnl_mutex protection. For example: felix_set_tag_protocol -> felix_setup_tag_8021q -> dsa_tag_8021q_register -> dsa_tag_8021q_setup -> dsa_tag_8021q_port_setup -> vlan_vid_add -> ASSERT_RTNL These drivers do not really have a choice to take the rtnl_mutex themselves, since in the sysfs case, the rtnl_mutex is already held. Fixes: `deff710703` ("net: dsa: Allow default tag protocol to be overridden from DT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-09 13:44:05 +01:00
Vladimir Oltean	5bded8259e	net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports Similar to commit `6087175b79` ("net: dsa: mt7530: use independent VLAN learning on VLAN-unaware bridges"), software forwarding between an unoffloaded LAG port (a bonding interface with an unsupported policy) and a mv88e6xxx user port directly under a bridge is broken. We adopt the same strategy, which is to make the standalone ports not find any ATU entry learned on a bridge port. Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are as many FIDs as VIDs (4096). The FID is derived from the VID when possible (the VTU maps a VID to a FID), with a fallback to the port based default FID value when not (802.1Q Mode is disabled on the port, or the classified VID isn't present in the VTU). The mv88e6xxx driver makes the following use of FIDs and VIDs: - the port's DefaultVID (to which untagged & pvid-tagged packets get classified) is 0 and is absent from the VTU, so this kind of packets is processed in FID 0, the default FID assigned by mv88e6xxx_setup_port. - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() -> mv88e6xxx_atu_new() associates a FID with that VID which increases linearly starting from 1. Like this: bridge vlan add dev lan0 vid 100 # FID 1 bridge vlan add dev lan1 vid 100 # still FID 1 bridge vlan add dev lan2 vid 1024 # FID 2 The FID allocation made by the driver is sub-optimal for the following reasons: (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too. A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID of 0 too. The difference is that the bridged ports may learn ATU entries, while the standalone port has the requirement that it must not, and must not find them either. Standalone ports must not use the same FID as ports belonging to a bridge. All standalone ports can use the same FID, since the ATU will never have an entry in that FID. (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a default FID of 0 on all their ports. The FDBs will not be isolated between these bridges. Every VLAN-unaware bridge must use the same FID on all its ports, different from the FID of other bridge ports. (c) Each bridge VLAN uses a unique FID which is useful for Independent VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges will result in the same FID being used by mv88e6xxx_atu_new(). The correct behavior is for VLAN 1 in br0 to have a different FID compared to VLAN 1 in br1. This patch cannot fix all the above. Traditionally the DSA framework did not care about this, and the reality is that DSA core involvement is needed for the aforementioned issues to be solved. The only thing we can solve here is an issue which does not require API changes, and that is issue (a), aka use a different FID for standalone ports vs ports under VLAN-unaware bridges. The first step is deciding what VID and FID to use for standalone ports, and what VID and FID for bridged ports. The 0/0 pair for standalone ports is what they used up till now, let's keep using that. For bridged ports, there are 2 cases: - VLAN-aware ports will never end up using the port default FID, because packets will always be classified to a VID in the VTU or dropped otherwise. The FID is the one associated with the VID in the VTU. - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at zero (just as in the case of standalone ports), and just change the port's default FID from 0 to a different number (say 1). However, Tobias points out that there is one more requirement to cater to: cross-chip bridging. The Marvell DSA header does not carry the FID in it, only the VID. So once a packet crosses a DSA link, if it has a VID of zero it will get classified to the default FID of that cascade port. Relying on a port default FID for upstream cascade ports results in contradictions: a default FID of 0 breaks ATU isolation of bridged ports on the downstream switch, a default FID of 1 breaks standalone ports on the downstream switch. So not only must standalone ports have different FIDs compared to bridged ports, they must also have different DefaultVID values. IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply choose 4095 as the DefaultVID of ports belonging to VLAN-unaware bridges, and VID 4095 maps to FID 1. For the xmit operation to look up the same ATU database, we need to put VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges too. All shared ports are configured to map this VID to the bridging FID, because they are members of that VLAN in the VTU. Shared ports don't need to have 802.1QMode enabled in any way, they always parse the VID from the DSA header, they don't need to look at the 802.1Q header. We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the mention that mv88e6xxx_vtu_setup() which was located right below that call was flushing the VTU so those entries wouldn't be preserved. So we need to relocate the VTU flushing prior to the port initialization during ->setup(). Also note that this is why it is safe to assume that VID 4095 will get associated with FID 1: the user ports haven't been created, so there is no avenue for the user to create a bridge VLAN which could otherwise race with the creation of another FID which would otherwise use up the non-reserved FID value of 1. [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ] mv88e6xxx_port_db_load_purge() is the function to access the ATU for FDB/MDB entries, and it used to determine the FID to use for VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid(). But the driver only called mv88e6xxx_port_set_fid() once, during probe, so no surprises, the port FID was always 0, the call to get_fid() was redundant. As much as I would have wanted to not touch that code, the logic is broken when we add a new FID which is not the port-based default. Now the port-based default FID only corresponds to standalone ports, and FDB/MDB entries belong to the bridging service. So while in the future, when the DSA API will support FDB isolation, we will have to figure out the FID based on the bridge number, for now there's a single bridging FID, so hardcode that. Lastly, the tagger needs to check, when it is transmitting a VLAN untagged skb, whether it is sending it towards a bridged or a standalone port. When we see it is bridged we assume the bridge is VLAN-unaware. Not because it cannot be VLAN-aware but: - if we are transmitting from a VLAN-aware bridge we are likely doing so using TX forwarding offload. That code path guarantees that skbs have a vlan hwaccel tag in them, so we would not enter the "else" branch of the "if (skb->protocol == htons(ETH_P_8021Q))" condition. - if we are transmitting on behalf of a VLAN-aware bridge but with no TX forwarding offload (no PVT support, out of space in the PVT, whatever), we would indeed be transmitting with VLAN 4095 instead of the bridge device's pvid. However we would be injecting a "From CPU" frame, and the switch won't learn from that - it only learns from "Forward" frames. So it is inconsequential for address learning. And VLAN 4095 is absolutely enough for the frame to exit the switch, since we never remove that VLAN from any port. Fixes: `57e661aae6` ("net: dsa: mv88e6xxx: Link aggregation support") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:46 -07:00
Vladimir Oltean	c7709a02c1	net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 The present code is structured this way due to an incomplete thought process. In Documentation/networking/switchdev.rst we document that if a bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge port (or on the bridge itself, for that matter) should not affect the ability to receive and transmit tagged or untagged packets. If the bridge on behalf of which we are sending this packet is VLAN-aware, then the TX forwarding offload API ensures that the skb will be VLAN-tagged (if the packet was sent by user space as untagged, it will get transmitted town to the driver as tagged with the bridge device's pvid). But if the bridge is VLAN-unaware, it may or may not be VLAN-tagged. In fact the logic to insert the bridge's PVID came from the idea that we should emulate what is being done in the VLAN-aware case. But we shouldn't. It appears that injecting packets using a VLAN ID of 0 serves the purpose of forwarding the packets to the egress port with no VLAN tag added or stripped by the hardware, and no filtering being performed. So we can simply remove the superfluous logic. One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n, we call br_vlan_get_pvid_rcu() but that returns an error and we do error out, dropping all packets on xmit. Not really smart. This is also an issue when the user deletes the bridge pvid: $ bridge vlan del dev br0 vid 1 self As mentioned, in both cases, packets should still flow freely, and they do just that on any net device where the bridge is not offloaded, but on mv88e6xxx they don't. Fixes: `d82f8ab0d8` ("net: dsa: tag_dsa: offload the bridge forwarding process") Reported-by: Andrew Lunn <andrew@lunn.ch> Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/ Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:45 -07:00
Vladimir Oltean	1bec0f0506	net: dsa: fix bridge_num not getting cleared after ports leaving the bridge The dp->bridge_num is zero-based, with -1 being the encoding for an invalid value. But dsa_bridge_num_put used to check for an invalid value by comparing bridge_num with 0, which is of course incorrect. The result is that the bridge_num will never get cleared by dsa_bridge_num_put, and further port joins to other bridges will get a bridge_num larger than the previous one, and once all the available bridges with TX forwarding offload supported by the hardware get exhausted, the TX forwarding offload feature is simply disabled. In the case of sja1105, 7 iterations of the loop below are enough to exhaust the TX forwarding offload bits, and further bridge joins operate without that feature. ip link add br0 type bridge vlan_filtering 1 while :; do ip link set sw0p2 master br0 && sleep 1 ip link set sw0p2 nomaster && sleep 1 done This issue is enough of an indication that having the dp->bridge_num invalid encoding be a negative number is prone to bugs, so this will be changed to a one-based value, with the dp->bridge_num of zero being the indication of no bridge. However, that is material for net-next. Fixes: `f5e165e72b` ("net: dsa: track unique bridge numbers across all DSA switch trees") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:45 -07:00
Jakub Kicinski	9fe1155233	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-07 15:24:06 -07:00
Andrew Lunn	b44d52a50b	dsa: tag_dsa: Fix mask for trunked packets A packet received on a trunk will have bit 2 set in Forward DSA tagged frame. Bit 1 can be either 0 or 1 and is otherwise undefined and bit 0 indicates the frame CFI. Masking with 7 thus results in frames as being identified as being from a trunk when in fact they are not. Fix the mask to just look at bit 2. Fixes: `5b60dadb71` ("net: dsa: tag_dsa: Support reception of packets from LAG devices") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-04 13:43:55 +01:00
Jakub Kicinski	e35b8d7dbb	net: use eth_hw_addr_set() instead of ether_addr_copy() Convert from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-02 14:18:25 +01:00
Vladimir Oltean	5ca721c54d	net: dsa: tag_ocelot: set the classified VLAN during xmit Currently, all packets injected into Ocelot switches are classified to VLAN 0, regardless of whether they are VLAN-tagged or not. This is because the switch only looks at the VLAN TCI from the DSA tag. VLAN 0 is then stripped on egress due to REW_TAG_CFG_TAG_CFG. There are 2 cases really, below is the explanation for ocelot_port_set_native_vlan: - Port is VLAN-aware, we set REW_TAG_CFG_TAG_CFG to 1 (egress-tag all frames except VID 0 and the native VLAN) if a native VLAN exists, or to 3 otherwise (tag all frames, including VID 0). - Port is VLAN-unaware, we set REW_TAG_CFG_TAG_CFG to 0 (port tagging disabled, classified VLAN never appears in the packet). One can already see an inconsistency: when a native VLAN exists, VID 0 is egress-untagged, but when it doesn't, VID 0 is egress-tagged. So when we do this: ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0 bridge vlan del dev swp0 vid 1 bridge vlan add dev swp0 vid 1 pvid # but not untagged and we ping through swp0, packets will look like this: MAC > 33:33:00:00:00:02, ethertype 802.1Q (0x8100): vlan 0, p 0, ethertype 802.1Q (0x8100), vlan 1, p 0, ethertype IPv6 (0x86dd), ICMP6, router solicitation, length 16 So VID 1 frames (sent that way by the Linux bridge) are encapsulated in a VID 0 header - the classified VLAN of the packets as far as the hw is concerned. To avoid that, what we really need to do is stop injecting packets using the classified VLAN of 0. This patch strips the VLAN header from the skb payload, if that VLAN exists and if the port is under a VLAN-aware bridge. Then it copies that VLAN header into the DSA injection frame header. A positive side effect is that VCAP ES0 VLAN rewriting rules now work for packets injected from the CPU into a port that's under a VLAN-aware bridge, and we are able to match those packets by the VLAN ID that was sent by the network stack, and not by VLAN ID 0. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-02 14:15:57 +01:00
Mianhan Liu	ca4b0649be	net/dsa/tag_ksz.c: remove superfluous headers tag_ksz.c hasn't use any macro or function declared in linux/slab.h. Thus, these files can be removed from tag_ksz.c safely without affecting the compilation of the ./net/dsa module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-29 11:38:21 +01:00
Mianhan Liu	6f8b64f86e	net/dsa/tag_8021q.c: remove superfluous headers tag_8021q.c hasn't use any macro or function declared in linux/if_bridge.h. Thus, these files can be removed from tag_8021q.c safely without affecting the compilation of the ./net/dsa module Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-29 11:38:21 +01:00
Leon Romanovsky	bd936bd53b	net: dsa: Move devlink registration to be last devlink command This change prevents from users to access device before devlink is fully configured. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-27 16:32:00 +01:00
Jakub Kicinski	2fcd14d0f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/mptcp/protocol.c `977d293e23` ("mptcp: ensure tx skbs always have the MPTCP ext") `efe686ffce` ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-09-23 11:19:49 -07:00
Vladimir Oltean	f5aef42415	net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver It's nice to be able to test a tagging protocol with dsa_loop, but not at the cost of losing the ability of building the tagging protocol and switch driver as modules, because as things stand, there is a circular dependency between the two. Tagging protocol drivers cannot depend on switch drivers, that is a hard fact. The reasoning behind the blamed patch was that accessing dp->priv should first make sure that the structure behind that pointer is what we really think it is. Currently the "sja1105" and "sja1110" tagging protocols only operate with the sja1105 switch driver, just like any other tagging protocol and switch combination. The only way to mix and match them is by modifying the code, and this applies to dsa_loop as well (by default that uses DSA_TAG_PROTO_NONE). So while in principle there is an issue, in practice there isn't one. Until we extend dsa_loop to allow user space configuration, treat the problem as a non-issue and just say that DSA ports found by tag_sja1105 are always sja1105 ports, which is in fact true. But keep the dsa_port_is_sja1105 function so that it's easy to patch it during testing, and rely on dead code elimination. Fixes: `994d2cbb08` ("net: dsa: tag_sja1105: be dsa_loop-safe") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-23 12:45:07 +01:00
Vladimir Oltean	6d709cadfd	net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver The problem is that DSA tagging protocols really must not depend on the switch driver, because this creates a circular dependency at insmod time, and the switch driver will effectively not load when the tagging protocol driver is missing. The code was structured in the way it was for a reason, though. The DSA driver-facing API for PTP timestamping relies on the assumption that two-step TX timestamps are provided by the hardware in an out-of-band manner, typically by raising an interrupt and making that timestamp available inside some sort of FIFO which is to be accessed over SPI/MDIO/etc. So the API puts .port_txtstamp into dsa_switch_ops, because it is expected that the switch driver needs to save some state (like put the skb into a queue until its TX timestamp arrives). On SJA1110, TX timestamps are provided by the switch as Ethernet packets, so this makes them be received and processed by the tagging protocol driver. This in itself is great, because the timestamps are full 64-bit and do not require reconstruction, and since Ethernet is the fastest I/O method available to/from the switch, PTP timestamps arrive very quickly, no matter how bottlenecked the SPI connection is, because SPI interaction is not needed at all. DSA's code structure and strict isolation between the tagging protocol driver and the switch driver break the natural code organization. When the tagging protocol driver receives a packet which is classified as a metadata packet containing timestamps, it passes those timestamps one by one to the switch driver, which then proceeds to compare them based on the recorded timestamp ID that was generated in .port_txtstamp. The communication between the tagging protocol and the switch driver is done through a method exported by the switch driver, sja1110_process_meta_tstamp. To satisfy build requirements, we force a dependency to build the tagging protocol driver as a module when the switch driver is a module. However, as explained in the first paragraph, that causes the circular dependency. To solve this, move the skb queue from struct sja1105_private :: struct sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data. The latter is a data structure for which hacks have already been put into place to be able to create persistent storage per switch that is accessible from the tagging protocol driver (see sja1105_setup_ports). With the skb queue directly accessible from the tagging protocol driver, we can now move sja1110_process_meta_tstamp into the tagging driver itself, and avoid exporting a symbol. Fixes: `566b18c8b7` ("net: dsa: sja1105: implement TX timestamping for SJA1110") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-23 12:45:07 +01:00
Leon Romanovsky	db4278c55f	devlink: Make devlink_register to be void devlink_register() can't fail and always returns success, but all drivers are obligated to check returned status anyway. This adds a lot of boilerplate code to handle impossible flow. Make devlink_register() void and simplify the drivers that use that API call. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Simon Horman <simon.horman@corigine.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-22 14:15:12 +01:00
Vladimir Oltean	5135e96a3d	net: dsa: don't allocate the slave_mii_bus using devres The Linux device model permits both the ->shutdown and ->remove driver methods to get called during a shutdown procedure. Example: a DSA switch which sits on an SPI bus, and the SPI bus driver calls this on its ->shutdown method: spi_unregister_controller -> device_for_each_child(&ctlr->dev, NULL, __unregister); -> spi_unregister_device(to_spi_device(dev)); -> device_del(&spi->dev); So this is a simple pattern which can theoretically appear on any bus, although the only other buses on which I've been able to find it are I2C: i2c_del_adapter -> device_for_each_child(&adap->dev, NULL, __unregister_client); -> i2c_unregister_device(client); -> device_unregister(&client->dev); The implication of this pattern is that devices on these buses can be unregistered after having been shut down. The drivers for these devices might choose to return early either from ->remove or ->shutdown if the other callback has already run once, and they might choose that the ->shutdown method should only perform a subset of the teardown done by ->remove (to avoid unnecessary delays when rebooting). So in other words, the device driver may choose on ->remove to not do anything (therefore to not unregister an MDIO bus it has registered on ->probe), because this ->remove is actually triggered by the device_shutdown path, and its ->shutdown method has already run and done the minimally required cleanup. This used to be fine until the blamed commit, but now, the following BUG_ON triggers: void mdiobus_free(struct mii_bus bus) { / For compatibility with error handling in drivers. */ if (bus->state == MDIOBUS_ALLOCATED) { kfree(bus); return; } BUG_ON(bus->state != MDIOBUS_UNREGISTERED); bus->state = MDIOBUS_RELEASED; put_device(&bus->dev); } In other words, there is an attempt to free an MDIO bus which was not unregistered. The attempt to free it comes from the devres release callbacks of the SPI device, which are executed after the device is unregistered. I'm not saying that the fact that MDIO buses allocated using devres would automatically get unregistered wasn't strange. I'm just saying that the commit didn't care about auditing existing call paths in the kernel, and now, the following code sequences are potentially buggy: (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device located on a bus that unregisters its children on shutdown. After the blamed patch, either both the alloc and the register should use devres, or none should. (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no mdiobus_unregister at all in the remove path. After the blamed patch, nobody unregisters the MDIO bus anymore, so this is even more buggy than the previous case which needs a specific bus configuration to be seen, this one is an unconditional bug. In this case, DSA falls into category (a), it tries to be helpful and registers an MDIO bus on behalf of the switch, which might be on such a bus. I've no idea why it does it under devres. It does this on probe: if (!ds->slave_mii_bus && ds->ops->phy_read) alloc and register mdio bus and this on remove: if (ds->slave_mii_bus && ds->ops->phy_read) unregister mdio bus I _could_ imagine using devres because the condition used on remove is different than the condition used on probe. So strictly speaking, DSA cannot determine whether the ds->slave_mii_bus it sees on remove is the ds->slave_mii_bus that _it_ has allocated on probe. Using devres would have solved that problem. But nonetheless, the existing code already proceeds to unregister the MDIO bus, even though it might be unregistering an MDIO bus it has never registered. So I can only guess that no driver that implements ds->ops->phy_read also allocates and registers ds->slave_mii_bus itself. So in that case, if unregistering is fine, freeing must be fine too. Stop using devres and free the MDIO bus manually. This will make devres stop attempting to free a still registered MDIO bus on ->shutdown. Fixes: `ac3a68d566` ("net: phy: don't abuse devres in devm_mdiobus_register()") Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-21 13:52:16 +01:00
Vladimir Oltean	e5845aa0ea	net: dsa: fix dsa_tree_setup error path Since the blamed commit, dsa_tree_teardown_switches() was split into two smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports. However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports. Fixes: `a57d8c217a` ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-21 10:59:39 +01:00
Vladimir Oltean	fd292c189a	net: dsa: tear down devlink port regions when tearing down the devlink port on error Commit `86f8b1c01a` ("net: dsa: Do not make user port errors fatal") decided it was fine to ignore errors on certain ports that fail to probe, and go on with the ports that do probe fine. Commit `fb6ec87f72` ("net: dsa: Fix type was not set for devlink port") noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get called, and devlink notices after a timeout of 3600 seconds and prints a WARN_ON. So it went ahead to unregister the devlink port. And because there exists an UNUSED port flavour, we actually re-register the devlink port as UNUSED. Commit `08156ba430` ("net: dsa: Add devlink port regions support to DSA") added devlink port regions, which are set up by the driver and not by DSA. When we trigger the devlink port deregistration and reregistration as unused, devlink now prints another WARN_ON, from here: devlink_port_unregister: WARN_ON(!list_empty(&devlink_port->region_list)); So the port still has regions, which makes sense, because they were set up by the driver, and the driver doesn't know we're unregistering the devlink port. Somebody needs to tear them down, and optionally (actually it would be nice, to be consistent) set them up again for the new devlink port. But DSA's layering stays in our way quite badly here. The options I've considered are: 1. Introduce a function in devlink to just change a port's type and flavour. No dice, devlink keeps a lot of state, it really wants the port to not be registered when you set its parameters, so changing anything can only be done by destroying what we currently have and recreating it. 2. Make DSA cache the parameters passed to dsa_devlink_port_region_create, and the region returned, keep those in a list, then when the devlink port unregister needs to take place, the existing devlink regions are destroyed by DSA, and we replay the creation of new regions using the cached parameters. Problem: mv88e6xxx keeps the region pointers in chip->ports[port].region, and these will remain stale after DSA frees them. There are many things DSA can do, but updating mv88e6xxx's private pointers is not one of them. 3. Just let the driver do it (i.e. introduce a very specific method called ds->ops->port_reinit_as_unused, which unregisters its devlink port devlink regions, then the old devlink port, then registers the new one, then the devlink port regions for it). While it does work, as opposed to the others, it's pretty horrible from an API perspective and we can do better. 4. Introduce a new pair of methods, ->port_setup and ->port_teardown, which in the case of mv88e6xxx must register and unregister the devlink port regions. Call these 2 methods when the port must be reinitialized as unused. Naturally, I went for the 4th approach. Fixes: `08156ba430` ("net: dsa: Add devlink port regions support to DSA") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-19 13:05:44 +01:00

... 2 3 4 5 6 ...

1440 Commits