linux

Author	SHA1	Message	Date
Jakub Kicinski	aaa25a2fa7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/mptcp/mptcp_join.sh `34aa6e3bcc` ("selftests: mptcp: add ip mptcp wrappers") `857898eb4b` ("selftests: mptcp: add missing join check") `6ef84b1517` ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c `fb7e76ea3f` ("net/mlx5e: TC, Skip redundant ct clear actions") `c63741b426` ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") `09bf979232` ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") `84ba8062e3` ("net/mlx5e: Test CT and SAMPLE on flow attr") `efe6f961cd` ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") `3b49a7edec` ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 17:54:25 -08:00
Hans Schultz	b9e8b58fd2	net: dsa: Include BR_PORT_LOCKED in the list of synced brport flags Ensures that the DSA switch driver gets notified of changes to the BR_PORT_LOCKED flag as well, for the case when a DSA port joins or leaves a LAG that is a bridge port. Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-23 12:52:34 +00:00
Alvin Šipraga	342b641919	net: dsa: fix panic when removing unoffloaded port from bridge If a bridged port is not offloaded to the hardware - either because the underlying driver does not implement the port_bridge_{join,leave} ops, or because the operation failed - then its dp->bridge pointer will be NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL. This fixes the following splat when removing a port from a bridge: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1 Call trace: dsa_port_bridge_leave+0x8c/0x1e4 dsa_slave_changeupper+0x40/0x170 dsa_slave_netdevice_event+0x494/0x4d4 notifier_call_chain+0x80/0xe0 raw_notifier_call_chain+0x1c/0x24 call_netdevice_notifiers_info+0x5c/0xac __netdev_upper_dev_unlink+0xa4/0x200 netdev_upper_dev_unlink+0x38/0x60 del_nbp+0x1b0/0x300 br_del_if+0x38/0x114 add_del_if+0x60/0xa0 br_ioctl_stub+0x128/0x2dc br_ioctl_call+0x68/0xb0 dev_ifsioc+0x390/0x554 dev_ioctl+0x128/0x400 sock_do_ioctl+0xb4/0xf4 sock_ioctl+0x12c/0x4e0 __arm64_sys_ioctl+0xa8/0xf0 invoke_syscall+0x4c/0x110 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x28/0x84 el0_svc+0x1c/0x50 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x17c/0x180 Code: f9402f00 f0002261 f9401302 913cc021 (a9401404) ---[ end trace 0000000000000000 ]--- Fixes: `d3eed0e57d` ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-22 17:03:02 -08:00
Russell King (Oracle)	1054457006	net: phy: phylink: fix DSA mac_select_pcs() introduction Vladimir Oltean reports that probing on DSA drivers that aren't yet populating supported_interfaces now fails. Fix this by allowing phylink to detect whether DSA actually provides an underlying mac_select_pcs() implementation. Reported-by: Vladimir Oltean <olteanv@gmail.com> Fixes: `bde018222c` ("net: dsa: add support for phylink mac_select_pcs()") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1nMCD6-00A0wC-FG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-22 16:57:08 -08:00
Vladimir Oltean	8940e6b669	net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held If the DSA master doesn't support IFF_UNICAST_FLT, then the following call path is possible: dsa_slave_switchdev_event_work -> dsa_port_host_fdb_add -> dev_uc_add -> __dev_set_rx_mode -> __dev_set_promiscuity Since the blamed commit, dsa_slave_switchdev_event_work() no longer holds rtnl_lock(), which triggers the ASSERT_RTNL() from __dev_set_promiscuity(). Taking rtnl_lock() around dev_uc_add() is impossible, because all the code paths that call dsa_flush_workqueue() do so from contexts where the rtnl_mutex is already held - so this would lead to an instant deadlock. dev_uc_add() in itself doesn't require the rtnl_mutex for protection. There is this comment in __dev_set_rx_mode() which assumes so: /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ but it is from commit `4417da668c` ("[NET]: dev: secondary unicast address support") dated June 2007, and in the meantime, commit `f1f28aa351` ("netdev: Add addr_list_lock to struct net_device."), dated July 2008, has added &dev->addr_list_lock to protect this instead of the global rtnl_mutex. Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection, but it is the uncommon path of what we typically expect dev_uc_add() to do. So since only the uncommon path requires rtnl_lock(), just check ahead of time whether dev_uc_add() would result into a call to __dev_set_promiscuity(), and handle that condition separately. DSA already configures the master interface to be promiscuous if the tagger requires this. We can extend this to also cover the case where the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT), and on the premise that we'd end up making it promiscuous during operation anyway, either if a DSA slave has a non-inherited MAC address, or if the bridge notifies local FDB entries for its own MAC address, the address of a station learned on a foreign port, etc. Fixes: `0faf890fc5` ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-19 21:18:33 +00:00
Russell King (Oracle)	ccfbf44d4c	net: dsa: remove pcs_poll With drivers converted over to using phylink PCS, there is no need for the struct dsa_switch member "pcs_poll" to exist anymore - there is a flag in the struct phylink_pcs which indicates whether this PCS needs to be polled which supersedes this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-19 16:41:50 +00:00
Russell King (Oracle)	bde018222c	net: dsa: add support for phylink mac_select_pcs() Add DSA support for the phylink mac_select_pcs() method so DSA drivers can return provide phylink with the appropriate PCS for the PHY interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-18 11:28:32 +00:00
Vladimir Oltean	d2b1d186ce	net: dsa: delete unused exported symbols for ethtool PHY stats Introduced in commit `cf96357303` ("net: dsa: Allow providing PHY statistics from CPU port"), it appears these were never used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220216193726.2926320-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 20:07:09 -08:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Mans Rullgard	017b355bbd	net: dsa: lan9303: handle hwaccel VLAN tags Check for a hwaccel VLAN tag on rx and use it if present. Otherwise, use __skb_vlan_pop() like the other tag parsers do. This fixes the case where the VLAN tag has already been consumed by the master. Fixes: `a1292595e0` ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 09:32:06 -08:00
Vladimir Oltean	29940ce32a	net: dsa: tag_ocelot_8021q: calculate TX checksum in software for deferred packets DSA inherits NETIF_F_CSUM_MASK from master->vlan_features, and the expectation is that TX checksumming is offloaded and not done in software. Normally the DSA master takes care of this, but packets handled by ocelot_defer_xmit() are a very special exception, because they are actually injected into the switch through register-based MMIO. So the DSA master is not involved at all for these packets => no one calculates the checksum. This allows PTP over UDP to work using the ocelot-8021q tagging protocol. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-17 14:06:51 +00:00
Vladimir Oltean	c862033595	net: dsa: tag_8021q: only call skb_push/skb_pull around __skb_vlan_pop __skb_vlan_pop() needs skb->data to point at the mac_header, while skb_vlan_tag_present() and skb_vlan_tag_get() don't, because they don't look at skb->data at all. So we can avoid uselessly moving around skb->data for the case where the VLAN tag was offloaded by the DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220215204722.2134816-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-16 20:35:35 -08:00
Vladimir Oltean	164f861bd4	net: dsa: offload bridge port VLANs on foreign interfaces DSA now explicitly handles VLANs installed with the 'self' flag on the bridge as host VLANs, instead of just replicating every bridge port VLAN also on the CPU port and never deleting it, which is what it did before. However, this leaves a corner case uncovered, as explained by Tobias Waldekranz: https://patchwork.kernel.org/project/netdevbpf/patch/20220209213044.2353153-6-vladimir.oltean@nxp.com/#24735260 Forwarding towards a bridge port VLAN installed on a bridge port foreign to DSA (separate NIC, Wi-Fi AP) used to work by virtue of the fact that DSA itself needed to have at least one port in that VLAN (therefore, it also had the CPU port in said VLAN). However, now that the CPU port may not be member of all VLANs that user ports are members of, we need to ensure this isn't the case if software forwarding to a foreign interface is required. The solution is to treat bridge port VLANs on standalone interfaces in the exact same way as host VLANs. From DSA's perspective, there is no difference between local termination and software forwarding; packets in that VLAN must reach the CPU in both cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:05 +00:00
Vladimir Oltean	134ef2388e	net: dsa: add explicit support for host bridge VLANs Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it does so on user ports. This is good for basic functionality but has several limitations: - the VLAN group which must reach the CPU may be radically different from the VLAN group that must be autonomously forwarded by the switch. In other words, the admin may want to isolate noisy stations and avoid traffic from them going to the control processor of the switch, where it would just waste useless cycles. The bridge already supports independent control of VLAN groups on bridge ports and on the bridge itself, and when VLAN-aware, it will drop packets in software anyway if their VID isn't added as a 'self' entry towards the bridge device. - Replaying host FDB entries may depend, for some drivers like mv88e6xxx, on replaying the host VLANs as well. The 2 VLAN groups are approximately the same in most regular cases, but there are corner cases when timing matters, and DSA's approximation of replicating VLANs on shared ports simply does not work. - If a user makes the bridge (implicitly the CPU port) join a VLAN by accident, there is no way for the CPU port to isolate itself from that noisy VLAN except by rebooting the system. This is because for each VLAN added on a user port, DSA will add it on shared ports too, but for each VLAN deletion on a user port, it will remain installed on shared ports, since DSA has no good indication of whether the VLAN is still in use or not. Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN addition and removal events, DSA has a simple and straightforward task of separating the bridge port VLANs (these have an orig_dev which is a DSA slave interface, or a LAG interface) from the host VLANs (these have an orig_dev which is a bridge interface), and to keep a simple reference count of each VID on each shared port. Forwarding VLANs must be installed on the bridge ports and on all DSA ports interconnecting them. We don't have a good view of the exact topology, so we simply install forwarding VLANs on all DSA ports, which is what has been done until now. Host VLANs must be installed primarily on the dedicated CPU port of each bridge port. More subtly, they must also be installed on upstream-facing and downstream-facing DSA ports that are connecting the bridge ports and the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB entry may be absent from VTU) is still addressed even if that switch is in a cross-chip setup, and it has no local CPU port. Therefore: - user ports contain only bridge port (forwarding) VLANs, and no refcounting is necessary - DSA ports contain both forwarding and host VLANs. Refcounting is necessary among these 2 types. - CPU ports contain only host VLANs. Refcounting is also necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-16 11:21:05 +00:00
Vladimir Oltean	a2614140dc	net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN mv88e6xxx is special among DSA drivers in that it requires the VTU to contain the VID of the FDB entry it modifies in mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP. Sometimes due to races this is not always satisfied even if external code does everything right (first deletes the FDB entries, then the VLAN), because DSA commits to hardware FDB entries asynchronously since commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification"). Therefore, the mv88e6xxx driver must close this race condition by itself, by asking DSA to flush the switchdev workqueue of any FDB deletions in progress, prior to exiting a VLAN. Fixes: `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-14 13:31:12 +00:00
Vladimir Oltean	ddb44bdcde	net: dsa: remove lockdep class for DSA slave address list Since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:33 +00:00
Vladimir Oltean	8db2bc790d	net: dsa: remove lockdep class for DSA master address list Since commit `2f1e8ea726` ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:32 +00:00
Vladimir Oltean	45b987d5ed	net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-11 11:17:32 +00:00
Jakub Kicinski	5b91c5cc0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-10 17:29:56 -08:00
Vladimir Oltean	ee534378f0	net: dsa: fix panic when DSA master device unbinds on shutdown Rafael reports that on a system with LX2160A and Marvell DSA switches, if a reboot occurs while the DSA master (dpaa2-eth) is up, the following panic can be seen: systemd-shutdown[1]: Rebooting. Unable to handle kernel paging request at virtual address 00a0000800000041 [00a0000800000041] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32 pc : dsa_slave_netdevice_event+0x130/0x3e4 lr : raw_notifier_call_chain+0x50/0x6c Call trace: dsa_slave_netdevice_event+0x130/0x3e4 raw_notifier_call_chain+0x50/0x6c call_netdevice_notifiers_info+0x54/0xa0 __dev_close_many+0x50/0x130 dev_close_many+0x84/0x120 unregister_netdevice_many+0x130/0x710 unregister_netdevice_queue+0x8c/0xd0 unregister_netdev+0x20/0x30 dpaa2_eth_remove+0x68/0x190 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver_internal+0xac/0xb0 device_links_unbind_consumers+0xd4/0x100 __device_release_driver+0x94/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_device_remove+0x24/0x40 __fsl_mc_device_remove+0xc/0x20 device_for_each_child+0x58/0xa0 dprc_remove+0x90/0xb0 fsl_mc_driver_remove+0x20/0x5c __device_release_driver+0x21c/0x220 device_release_driver+0x28/0x40 bus_remove_device+0x118/0x124 device_del+0x174/0x420 fsl_mc_bus_remove+0x80/0x100 fsl_mc_bus_shutdown+0xc/0x1c platform_shutdown+0x20/0x30 device_shutdown+0x154/0x330 __do_sys_reboot+0x1cc/0x250 __arm64_sys_reboot+0x20/0x30 invoke_syscall.constprop.0+0x4c/0xe0 do_el0_svc+0x4c/0x150 el0_svc+0x24/0xb0 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x178/0x17c It can be seen from the stack trace that the problem is that the deregistration of the master causes a dev_close(), which gets notified as NETDEV_GOING_DOWN to dsa_slave_netdevice_event(). But dsa_switch_shutdown() has already run, and this has unregistered the DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to call dev_close_many() on those slave interfaces, leading to the problem. The previous attempt to avoid the NETDEV_GOING_DOWN on the master after dsa_switch_shutdown() was called seems improper. Unregistering the slave interfaces is unnecessary and unhelpful. Instead, after the slaves have stopped being uppers of the DSA master, we can now reset to NULL the master->dsa_ptr pointer, which will make DSA start ignoring all future notifier events on the master. Fixes: `0650bf52b3` ("net: dsa: be compatible with masters which unregister on shutdown") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-09 13:21:39 +00:00
Ansuel Smith	31eb6b4386	net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet packet Add connect/disconnect helper to assign private struct to the DSA switch. Add support for Ethernet mgmt and MIB if the DSA driver provide an handler to correctly parse and elaborate the data. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	18be654a43	net: dsa: tag_qca: add define for handling MIB packet Add struct to correctly parse a mib Ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	c2ee8181fd	net: dsa: tag_qca: add define for handling mgmt Ethernet packet Add all the required define to prepare support for mgmt read/write in Ethernet packet. Any packet of this type has to be dropped as the only use of these special packet is receive ack for an mgmt write request or receive data for an mgmt read request. A struct is used that emulates the Ethernet header but is used for a different purpose. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	101c04c346	net: dsa: tag_qca: enable promisc_on_master flag Ethernet MDIO packets are non-standard and DSA master expects the first 6 octets to be the MAC DA. To address these kind of packet, enable promisc_on_master flag for the tagger. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	3ec762fb13	net: dsa: tag_qca: move define to include linux/dsa Move tag_qca define to include dir linux/dsa as the qca8k require access to the tagger define to support in-band mdio read/write using ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Ansuel Smith	6b04582992	net: dsa: tag_qca: convert to FIELD macro Convert driver to FIELD macro to drop redundant define. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Vladimir Oltean	e83d565378	net: dsa: replay master state events in dsa_tree_{setup,teardown}_master In order for switch driver to be able to make simple and reliable use of the master tracking operations, they must also be notified of the initial state of the DSA master, not just of the changes. This is because they might enable certain features only during the time when they know that the DSA master is up and running. Therefore, this change explicitly checks the state of the DSA master under the same rtnl_mutex as we were holding during the dsa_master_setup() and dsa_master_teardown() call. The idea being that if the DSA master became operational in between the moment in which it became a DSA master (dsa_master_setup set dev->dsa_ptr) and the moment when we checked for the master being up, there is a chance that we would emit a ->master_state_change() call with no actual state change. We need to avoid that by serializing the concurrent netdevice event with us. If the netdevice event started before, we force it to finish before we begin, because we take rtnl_lock before making netdev_uses_dsa() return true. So we also handle that early event and do nothing on it. Similarly, if the dev_open() attempt is concurrent with us, it will attempt to take the rtnl_mutex, but we're holding it. We'll see that the master flag IFF_UP isn't set, then when we release the rtnl_mutex we'll process the NETDEV_UP notifier. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Vladimir Oltean	295ab96f47	net: dsa: provide switch operations for tracking the master state Certain drivers may need to send management traffic to the switch for things like register access, FDB dump, etc, to accelerate what their slow bus (SPI, I2C, MDIO) can already do. Ethernet is faster (especially in bulk transactions) but is also more unreliable, since the user may decide to bring the DSA master down (or not bring it up), therefore severing the link between the host and the attached switch. Drivers needing Ethernet-based register access already should have fallback logic to the slow bus if the Ethernet method fails, but that fallback may be based on a timeout, and the I/O to the switch may slow down to a halt if the master is down, because every Ethernet packet will have to time out. The driver also doesn't have the option to turn off Ethernet-based I/O momentarily, because it wouldn't know when to turn it back on. Which is where this change comes in. By tracking NETDEV_CHANGE, NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know the exact interval of time during which this interface is reliably available for traffic. Provide this information to switches so they can use it as they wish. An helper is added dsa_port_master_is_operational() to check if a master port is operational. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-02 14:43:59 +00:00
Tobias Waldekranz	108dc8741c	net: dsa: Avoid cross-chip syncing of VLAN filtering Changes to VLAN filtering are not applicable to cross-chip notifications. On a system like this: .-----. .-----. .-----. \| sw1 +---+ sw2 +---+ sw3 \| '-1-2-' '-1-2-' '-1-2-' Before this change, upon sw1p1 leaving a bridge, a call to dsa_port_vlan_filtering would also be made to sw2p1 and sw3p1. In this scenario: .---------. .-----. .-----. \| sw1 +---+ sw2 +---+ sw3 \| '-1-2-3-4-' '-1-2-' '-1-2-' When sw1p4 would leave a bridge, dsa_port_vlan_filtering would be called for sw2 and sw3 with a non-existing port - leading to array out-of-bounds accesses and crashes on mv88e6xxx. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-25 11:45:39 +00:00
Tobias Waldekranz	381a730182	net: dsa: Move VLAN filtering syncing out of dsa_switch_bridge_leave Most of dsa_switch_bridge_leave was, in fact, dealing with the syncing of VLAN filtering for switches on which that is a global setting. Separate the two phases to prepare for the cross-chip related bugfix in the following commit. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-25 11:45:39 +00:00
Vladimir Oltean	11fd667dac	net: dsa: setup master before ports It is said that as soon as a network interface is registered, all its resources should have already been prepared, so that it is available for sending and receiving traffic. One of the resources needed by a DSA slave interface is the master. dsa_tree_setup -> dsa_tree_setup_ports -> dsa_port_setup -> dsa_slave_create -> register_netdevice -> dsa_tree_setup_master -> dsa_master_setup -> sets up master->dsa_ptr, which enables reception Therefore, there is a short period of time after register_netdevice() during which the master isn't prepared to pass traffic to the DSA layer (master->dsa_ptr is checked by eth_type_trans). Same thing during unregistration, there is a time frame in which packets might be missed. Note that this change opens us to another race: dsa_master_find_slave() will get invoked potentially earlier than the slave creation, and later than the slave deletion. Since dp->slave starts off as a NULL pointer, the earlier calls aren't a problem, but the later calls are. To avoid use-after-free, we should zeroize dp->slave before calling dsa_slave_destroy(). In practice I cannot really test real life improvements brought by this change, since in my systems, netdevice creation races with PHY autoneg which takes a few seconds to complete, and that masks quite a few races. Effects might be noticeable in a setup with fixed links all the way to an external system. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	1e3f407f3c	net: dsa: first set up shared ports, then non-shared ports After commit `a57d8c217a` ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports"), the port setup and teardown procedure became asymmetric. The fact of the matter is that user ports need the shared ports to be up before they can be used for CPU-initiated termination. And since we register net devices for the user ports, those won't be functional until we also call the setup for the shared (CPU, DSA) ports. But we may do that later, depending on the port numbering scheme of the hardware we are dealing with. It just makes sense that all shared ports are brought up before any user port is. I can't pinpoint any issue due to the current behavior, but let's change it nonetheless, for consistency's sake. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	c146f9bc19	net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown} DSA needs to simulate master tracking events when a binding is first with a DSA master established and torn down, in order to give drivers the simplifying guarantee that ->master_state_change calls are made only when the master's readiness state to pass traffic changes. master_state_change() provide a operational bool that DSA driver can use to understand if DSA master is operational or not. To avoid races, we need to block the reception of NETDEV_UP/NETDEV_CHANGE/NETDEV_GOING_DOWN events in the netdev notifier chain while we are changing the master's dev->dsa_ptr (this changes what netdev_uses_dsa(dev) reports). The dsa_master_setup() and dsa_master_teardown() functions optionally require the rtnl_mutex to be held, if the tagger needs the master to be promiscuous, these functions call dev_set_promiscuity(). Move the rtnl_lock() from that function and make it top-level. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:10 +00:00
Vladimir Oltean	a1ff94c297	net: dsa: stop updating master MTU from master.c At present there are two paths for changing the MTU of the DSA master. The first is: dsa_tree_setup -> dsa_tree_setup_ports -> dsa_port_setup -> dsa_slave_create -> dsa_slave_change_mtu -> dev_set_mtu(master) The second is: dsa_tree_setup -> dsa_tree_setup_master -> dsa_master_setup -> dev_set_mtu(dev) So the dev_set_mtu() call from dsa_master_setup() has been effectively superseded by the dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN) that is done from dsa_slave_create() for each user port. The later function also updates the master MTU according to the largest user port MTU from the tree. Therefore, updating the master MTU through a separate code path isn't needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	e31dbd3b6a	net: dsa: merge rtnl_lock sections in dsa_slave_create Currently dsa_slave_create() has two sequences of rtnl_lock/rtnl_unlock in a row. Remove the rtnl_unlock() and rtnl_lock() in between, such that the operation can execute slighly faster. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	904e112ad4	net: dsa: reorder PHY initialization with MTU setup in slave.c In dsa_slave_create() there are 2 sections that take rtnl_lock(): MTU change and netdev registration. They are separated by PHY initialization. There isn't any strict ordering requirement except for the fact that netdev registration should be last. Therefore, we can perform the MTU change a bit later, after the PHY setup. A future change will then be able to merge the two rtnl_lock sections into one. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 11:59:09 +00:00
Vladimir Oltean	a68dc7b938	net: dsa: remove cross-chip support for HSR The cross-chip notifiers for HSR are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: George McCollister <george.mccollister@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: George McCollister <george.mccollister@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:51 +00:00
Vladimir Oltean	cad69019f2	net: dsa: remove cross-chip support for MRP The cross-chip notifiers for MRP are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:50 +00:00
Vladimir Oltean	ff91e1b684	net: dsa: fix incorrect function pointer check for MRP ring roles The cross-chip notifier boilerplate code meant to check the presence of ds->ops->port_mrp_add_ring_role before calling it, but checked ds->ops->port_mrp_add instead, before calling ds->ops->port_mrp_add_ring_role. Therefore, a driver which implements one operation but not the other would trigger a NULL pointer dereference. There isn't any such driver in DSA yet, so there is no reason to backport the change. Issue found through code inspection. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Fixes: `c595c4330d` ("net: dsa: add MRP support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 15:04:50 +00:00
Vladimir Oltean	258030acc9	net: dsa: make dsa_switch :: num_ports an unsigned int Currently, num_ports is declared as size_t, which is defined as __kernel_ulong_t, therefore it occupies 8 bytes of memory. Even switches with port numbers in the range of tens are exotic, so there is no need for this amount of storage. Additionally, because the max_num_bridges member right above it is also 4 bytes, it means the compiler needs to add padding between the last 2 fields. By reducing the size, we don't need that padding and can reduce the struct size. Before: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device * dev; /* 0 8 / struct dsa_switch_tree dst; /* 8 8 / unsigned int index; / 16 4 / u32 setup:1; / 20: 0 4 / u32 vlan_filtering_is_global:1; / 20: 1 4 / u32 needs_standalone_vlan_filtering:1; / 20: 2 4 / u32 configure_vlan_while_not_filtering:1; / 20: 3 4 / u32 untag_bridge_pvid:1; / 20: 4 4 / u32 assisted_learning_on_cpu_port:1; / 20: 5 4 / u32 vlan_filtering:1; / 20: 6 4 / u32 pcs_poll:1; / 20: 7 4 / u32 mtu_enforcement_ingress:1; / 20: 8 4 / / XXX 23 bits hole, try to pack / struct notifier_block nb; / 24 24 / / XXX last struct has 4 bytes of padding / void priv; /* 48 8 / void tagger_data; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct dsa_chip_data cd; /* 64 8 / const struct dsa_switch_ops ops; /* 72 8 / u32 phys_mii_mask; / 80 4 / / XXX 4 bytes hole, try to pack / struct mii_bus slave_mii_bus; /* 88 8 / unsigned int ageing_time_min; / 96 4 / unsigned int ageing_time_max; / 100 4 / struct dsa_8021q_context tag_8021q_ctx; /* 104 8 / struct devlink devlink; /* 112 8 / unsigned int num_tx_queues; / 120 4 / unsigned int num_lag_ids; / 124 4 / / --- cacheline 2 boundary (128 bytes) --- / unsigned int max_num_bridges; / 128 4 / / XXX 4 bytes hole, try to pack / size_t num_ports; / 136 8 / / size: 144, cachelines: 3, members: 27 / / sum members: 132, holes: 2, sum holes: 8 / / sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits / / paddings: 1, sum paddings: 4 / / last cacheline: 16 bytes / }; After: pahole -C dsa_switch net/dsa/slave.o struct dsa_switch { struct device dev; /* 0 8 / struct dsa_switch_tree dst; /* 8 8 / unsigned int index; / 16 4 / u32 setup:1; / 20: 0 4 / u32 vlan_filtering_is_global:1; / 20: 1 4 / u32 needs_standalone_vlan_filtering:1; / 20: 2 4 / u32 configure_vlan_while_not_filtering:1; / 20: 3 4 / u32 untag_bridge_pvid:1; / 20: 4 4 / u32 assisted_learning_on_cpu_port:1; / 20: 5 4 / u32 vlan_filtering:1; / 20: 6 4 / u32 pcs_poll:1; / 20: 7 4 / u32 mtu_enforcement_ingress:1; / 20: 8 4 / / XXX 23 bits hole, try to pack / struct notifier_block nb; / 24 24 / / XXX last struct has 4 bytes of padding / void priv; /* 48 8 / void tagger_data; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct dsa_chip_data cd; /* 64 8 / const struct dsa_switch_ops ops; /* 72 8 / u32 phys_mii_mask; / 80 4 / / XXX 4 bytes hole, try to pack / struct mii_bus slave_mii_bus; /* 88 8 / unsigned int ageing_time_min; / 96 4 / unsigned int ageing_time_max; / 100 4 / struct dsa_8021q_context tag_8021q_ctx; /* 104 8 / struct devlink devlink; /* 112 8 / unsigned int num_tx_queues; / 120 4 / unsigned int num_lag_ids; / 124 4 / / --- cacheline 2 boundary (128 bytes) --- / unsigned int max_num_bridges; / 128 4 / unsigned int num_ports; / 132 4 / / size: 136, cachelines: 3, members: 27 / / sum members: 128, holes: 1, sum holes: 4 / / sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits / / paddings: 1, sum paddings: 4 / / last cacheline: 8 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-05 14:46:23 +00:00
David S. Miller	e63a023489	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your net-next tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg\|ret\|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-31 14:35:40 +00:00
Jakub Kicinski	b6459415b3	net: Don't include filter.h from net/sock.h sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org	2021-12-29 08:48:14 -08:00
Jakub Kicinski	8b3f913322	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/net/sock.h commit `8f905c0e73` ("inet: fully convert sk->sk_rx_dst to RCU rules") commit `43f51df417` ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-23 16:09:58 -08:00
Xiaoliang Yang	ae2778a647	net: dsa: tag_ocelot: use traffic class to map priority on injected header For Ocelot switches, the CPU injected frames have an injection header where it can specify the QoS class of the packet and the DSA tag, now it uses the SKB priority to set that. If a traffic class to priority mapping is configured on the netdevice (with mqprio for example ...), it won't be considered for CPU injected headers. This patch make the QoS class aligned to the priority to traffic class mapping if it exists. Fixes: `8dce89aa5f` ("net: dsa: ocelot: add tagger for Ocelot/Felix switches") Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Marouen Ghodhbane <marouen.ghodhbane@nxp.com> Link: https://lore.kernel.org/r/20211223072211.33130-1-xiaoliang.yang_1@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-23 09:44:59 -08:00
Vladimir Oltean	7f2973149c	net: dsa: make tagging protocols connect to individual switches from a tree On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105, the mechanism through which the tagger connects to the switch tree is broken, due to improper DSA code design. At the time when tag_ops->connect() is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all the ports, so it doesn't know how large the tree is and how many ports it has. It has just seen the first CPU port by this time. As a result, this function will call the tagger's ->connect method too early, and the tagger will connect only to the first switch from the tree. This could be perhaps addressed a bit more simply by just moving the tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup), but there is already a design inconsistency at present: on the switch side, the notification is on a per-switch basis, but on the tagger side, it is on a per-tree basis. Furthermore, the persistent storage itself is per switch (ds->tagger_data). And the tagger connect and disconnect procedures (at least the ones that exist currently) could see a fair bit of simplification if they didn't have to iterate through the switches of a tree. To fix the issue, this change transforms tag_ops->connect(dst) into tag_ops->connect(ds) and moves it somewhere where we already iterate over all switches of a tree. That is in dsa_switch_setup_tag_protocol(), which is a good placement because we already have there the connection call to the switch side of things. As for the dsa_tree_bind_tag_proto() method (called from the code path that changes the tag protocol), things are a bit more complicated because we receive the tree as argument, yet when we unwind on errors, it would be nice to not call tag_ops->disconnect(ds) where we didn't previously call tag_ops->connect(ds). We didn't have this problem before because the tag_ops connection operations passed the entire dst before, and this is more fine grained now. To solve the error rewind case using the new API, we have to create yet one more cross-chip notifier for disconnection, and stay connected with the old tag protocol to all the switches in the tree until we've succeeded to connect with the new one as well. So if something fails half way, the whole tree is still connected to the old tagger. But there may still be leaks if the tagger fails to connect to the 2nd out of 3 switches in a tree: somebody needs to tell the tagger to disconnect from the first switch. Nothing comes for free, and this was previously handled privately by the tagging protocol driver before, but now we need to emit a disconnect cross-chip notifier for that, because DSA has to take care of the unwind path. We assume that the tagging protocol has connected to a switch if it has set ds->tagger_data to something, otherwise we avoid calling its disconnection method in the error rewind path. The rest of the changes are in the tagging protocol drivers, and have to do with the replacement of dst with ds. The iteration is removed and the error unwind path is simplified, as mentioned above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-14 12:45:16 +00:00
Vladimir Oltean	e2f01bfe14	net: dsa: tag_sja1105: fix zeroization of ds->priv on tag proto disconnect The method was meant to zeroize ds->tagger_data but got the wrong pointer. Fix this. Fixes: `c79e84866d` ("net: dsa: tag_sja1105: convert to tagger-owned data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-14 12:45:15 +00:00
Vladimir Oltean	950a419d9d	net: dsa: tag_sja1105: split sja1105_tagger_data into private and public sections The sja1105 driver messes with the tagging protocol's state when PTP RX timestamping is enabled/disabled. This is fundamentally necessary because the tagger needs to know what to do when it receives a PTP packet. If RX timestamping is enabled, then a metadata follow-up frame is expected, and this holds the (partial) timestamp. So the tagger plays hide-and-seek with the network stack until it also gets the metadata frame, and then presents a single packet, the timestamped PTP packet. But when RX timestamping isn't enabled, there is no metadata frame expected, so the hide-and-seek game must be turned off and the packet must be delivered right away to the network stack. Considering this, we create a pseudo isolation by devising two tagger methods callable by the switch: one to get the RX timestamping state, and one to set it. Since we can't export symbols between the tagger and the switch driver, these methods are exposed through function pointers. After this change, the public portion of the sja1105_tagger_data contains only function pointers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:34 +00:00
Vladimir Oltean	fcbf979a5b	Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver" This reverts commit `6d709cadfd`. The above change was done to avoid calling symbols exported by the switch driver from the tagging protocol driver. With the tagger-owned storage model, we have a new option on our hands, and that is for the switch driver to provide a data consumer handler in the form of a function pointer inside the ->connect_tag_protocol() method. Having a function pointer avoids the problems of the exported symbols approach. By creating a handler for metadata frames holding TX timestamps on SJA1110, we are able to eliminate an skb queue from the tagger data, and replace it with a simple, and stateless, function pointer. This skb queue is now handled exclusively by sja1105_ptp.c, which makes the code easier to follow, as it used to be before the reverted patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:34 +00:00
Vladimir Oltean	c79e84866d	net: dsa: tag_sja1105: convert to tagger-owned data Currently, struct sja1105_tagger_data is a part of struct sja1105_private, and is used by the sja1105 driver to populate dp->priv. With the movement towards tagger-owned storage, the sja1105 driver should not be the owner of this memory. This change implements the connection between the sja1105 switch driver and its tagging protocol, which means that sja1105_tagger_data no longer stays in dp->priv but in ds->tagger_data, and that the sja1105 driver now only populates the sja1105_port_deferred_xmit callback pointer. The kthread worker is now the responsibility of the tagger. The sja1105 driver also alters the tagger's state some more, especially with regard to the PTP RX timestamping state. This will be fixed up a bit in further changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	bfcf142522	net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data The design of the sja1105 tagger dp->priv is that each port has a separate struct sja1105_port, and the sp->data pointer points to a common struct sja1105_tagger_data. We have removed all per-port members accessible by the tagger, and now only struct sja1105_tagger_data remains. Make dp->priv point directly to this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	d38049bbe7	net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q When the ocelot-8021q driver was converted to deferred xmit as part of commit `8d5f7954b7` ("net: dsa: felix: break at first CPU port during init and teardown"), the deferred implementation was deliberately made subtly different from what sja1105 has. The implementation differences lied on the following observations: - There might be a race between these two lines in tag_sja1105.c: skb_queue_tail(&sp->xmit_queue, skb_get(skb)); kthread_queue_work(sp->xmit_worker, &sp->xmit_work); and the skb dequeue logic in sja1105_port_deferred_xmit(). For example, the xmit_work might be already queued, however the work item has just finished walking through the skb queue. Because we don't check the return code from kthread_queue_work, we don't do anything if the work item is already queued. However, nobody will take that skb and send it, at least until the next timestampable skb is sent. This creates additional (and avoidable) TX timestamping latency. To close that race, what the ocelot-8021q driver does is it doesn't keep a single work item per port, and a skb timestamping queue, but rather dynamically allocates a work item per packet. - It is also unnecessary to have more than one kthread that does the work. So delete the per-port kthread allocations and replace them with a single kthread which is global to the switch. This change brings the two implementations in line by applying those observations to the sja1105 driver as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	35d9768021	net: dsa: tag_ocelot: convert to tagger-owned data The felix driver makes very light use of dp->priv, and the tagger is effectively stateless. dp->priv is practically only needed to set up a callback to perform deferred xmit of PTP and STP packets using the ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no use of dp->priv, although this driver sets up dp->priv irrespective of actual tagging protocol in use). struct felix_port (what used to be pointed to by dp->priv) is removed and replaced with a two-sided structure. The public side of this structure, visible to the switch driver, is ocelot_8021q_tagger_data. The private side is ocelot_8021q_tagger_private, and the latter structure physically encapsulates the former. The public half of the tagger data structure can be accessed through a helper of the same name (ocelot_8021q_tagger_data) which also sanity-checks the protocol currently in use by the switch. The public/private split was requested by Andrew Lunn. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	dc452a471d	net: dsa: introduce tagger-owned storage for private and shared data Ansuel is working on register access over Ethernet for the qca8k switch family. This requires the qca8k tagging protocol driver to receive frames which aren't intended for the network stack, but instead for the qca8k switch driver itself. The dp->priv is currently the prevailing method for passing data back and forth between the tagging protocol driver and the switch driver. However, this method is riddled with caveats. The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be modified to do just that. But in the current design, the memory behind dp->priv has to be allocated by the switch driver, so if the tagging protocol is paired to an unexpected switch driver, we may end up in NULL pointer dereferences inside the kernel, or worse (a switch driver may allocate dp->priv according to the expectations of a different tagger). The latter possibility is even more plausible considering that DSA switches can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends itself to mistakes that are all too easy to make. This patch proposes that the tagging protocol driver should manage its own memory, instead of relying on the switch driver to do so. After analyzing the different in-tree needs, it can be observed that the required tagger storage is per switch, therefore a ds->tagger_data pointer is introduced. In principle, per-port storage could also be introduced, although there is no need for it at the moment. Future changes will replace the current usage of dp->priv with ds->tagger_data. We define a "binding" event between the DSA switch tree and the tagging protocol. During this binding event, the tagging protocol's ->connect() method is called first, and this may allocate some memory for each switch of the tree. Then a cross-chip notifier is emitted for the switches within that tree, and they are given the opportunity to fix up the tagger's memory (for example, they might set up some function pointers that represent virtual methods for consuming packets). Because the memory is owned by the tagger, there exists a ->disconnect() method for the tagger (which is the place to free the resources), but there doesn't exist a ->disconnect() method for the switch driver. This is part of the design. The switch driver should make minimal use of the public part of the tagger data, and only after type-checking it using the supplied "proto" argument. In the code there are in fact two binding events, one is the initial event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip notifier chains aren't initialized, so we call each switch's connect() method by hand. Then there is dsa_tree_bind_tag_proto() during dsa_tree_change_tag_proto(), and here we have an old protocol and a new one. We first connect to the new one before disconnecting from the old one, to simplify error handling a bit and to ensure we remain in a valid state at all times. Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Russell King (Oracle)	0a9f0794d9	net: dsa: mark DSA phylink as legacy_pre_march2020 The majority of DSA drivers do not make use of the PCS support, and thus operate in legacy mode. In order to preserve this behaviour in future, we need to set the legacy_pre_march2020 flag so phylink knows this may require the legacy calls. There are some DSA drivers that do make use of PCS support, and these will continue operating as before - legacy_pre_march2020 will not prevent split-PCS support enabling the newer phylink behaviour. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-09 11:21:03 -08:00
Vladimir Oltean	857fdd74fb	net: dsa: eliminate dsa_switch_ops :: port_bridge_tx_fwd_{,un}offload We don't really need new switch API for these, and with new switches which intend to add support for this feature, it will become cumbersome to maintain. The change consists in restructuring the two drivers that implement this offload (sja1105 and mv88e6xxx) such that the offload is enabled and disabled from the ->port_bridge_{join,leave} methods instead of the old ->port_bridge_tx_fwd_{,un}offload. The only non-trivial change is that mv88e6xxx_map_virtual_bridge_to_pvt() has been moved to avoid a forward declaration, and the mv88e6xxx_reg_lock() calls from inside it have been removed, since locking is now done from mv88e6xxx_port_bridge_{join,leave}. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	b079922ba2	net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_join This is a preparation patch for the removal of the DSA switch methods ->port_bridge_tx_fwd_offload() and ->port_bridge_tx_fwd_unoffload(). The plan is for the switch to report whether it offloads TX forwarding directly as a response to the ->port_bridge_join() method. This change deals with the noisy portion of converting all existing function prototypes to take this new boolean pointer argument. The bool is placed in the cross-chip notifier structure for bridge join, and a reference to it is provided to drivers. In the next change, DSA will then actually look at this value instead of calling ->port_bridge_tx_fwd_offload(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	d3eed0e57d	net: dsa: keep the bridge_dev and bridge_num as part of the same structure The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	6a43cba303	net: dsa: export bridging offload helpers to drivers Move the static inline helpers from net/dsa/dsa_priv.h to include/net/dsa.h, so that drivers can call functions such as dsa_port_offloads_bridge_dev(), which will be necessary after the transition to a more complex bridge structure. More functions than are needed right now are being moved, but this is done for uniformity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:16 -08:00
Vladimir Oltean	936db8a2db	net: dsa: rename dsa_port_offloads_bridge to dsa_port_offloads_bridge_dev Currently the majority of dsa_port_bridge_dev_get() calls in drivers is just to check whether a port is under the bridge device provided as argument by the DSA API. We'd like to change that DSA API so that a more complex structure is provided as argument. To keep things more generic, and considering that the new complex structure will be provided by value and not by reference, direct comparisons between dp->bridge and the provided bridge will be broken. The generic way to do the checking would simply be to do something like dsa_port_offloads_bridge(dp, &bridge). But there's a problem, we already have a function named that way, which actually takes a bridge_dev net_device as argument. Rename it so that we can use dsa_port_offloads_bridge for something else. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:15 -08:00
Vladimir Oltean	36cbf39b56	net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpers The location of the bridge device pointer and number is going to change. It is not going to be kept individually per port, but in a common structure allocated dynamically and which will have lockdep validation. Create helpers to access these elements so that we have a migration path to the new organization. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:15 -08:00
Vladimir Oltean	947c8746e2	net: dsa: assign a bridge number even without TX forwarding offload The service where DSA assigns a unique bridge number for each forwarding domain is useful even for drivers which do not implement the TX forwarding offload feature. For example, drivers might use the dp->bridge_num for FDB isolation. So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and calculate a unique bridge_num for all drivers that set this value. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:14 -08:00
Vladimir Oltean	3f9bb0301d	net: dsa: make dp->bridge_num one-based I have seen too many bugs already due to the fact that we must encode an invalid dp->bridge_num as a negative value, because the natural tendency is to check that invalid value using (!dp->bridge_num). Latest example can be seen in commit `1bec0f0506` ("net: dsa: fix bridge_num not getting cleared after ports leaving the bridge"). Convert the existing users to assume that dp->bridge_num == 0 is the encoding for invalid, and valid bridge numbers start from 1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-08 14:31:14 -08:00
Russell King (Oracle)	5938bce4b6	net: dsa: support use of phylink_generic_validate() Support the use of phylink_generic_validate() when there is no phylink_validate method given in the DSA switch operations and mac_capabilities have been set in the phylink_config structure by the DSA switch driver. This gives DSA switch drivers the option to use this if they provide the supported_interfaces and mac_capabilities, while still giving them an option to override the default implementation if necessary. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Russell King (Oracle)	072eea6c22	net: dsa: replace phylink_get_interfaces() with phylink_get_caps() Phylink needs slightly more information than phylink_get_interfaces() allows us to get from the DSA drivers - we need the MAC capabilities. Replace the phylink_get_interfaces() method with phylink_get_caps() to allow DSA drivers to fill in the phylink_config MAC capabilities field as well. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Russell King (Oracle)	21bd64bd71	net: dsa: consolidate phylink creation The code in port.c and slave.c creating the phylink instance is very similar - let's consolidate this into a single function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-01 18:58:00 -08:00
Leon Romanovsky	4c897cfc46	devlink: Simplify devlink resources unregister call The devlink_resources_unregister() used second parameter as an entry point for the recursive removal of devlink resources. None of the callers outside of devlink core needed to use this field, so let's remove it. As part of this removal, the "struct devlink_resource" was moved from .h to .c file as it is not possible to use in any place in the code except devlink.c. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-30 12:23:32 +00:00
Vladimir Oltean	92f62485b3	net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit `c0bcf53766` ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: `ea440cd2d9` ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-03 14:22:00 +00:00
Marek Behún	c07c6e8eb4	net: dsa: populate supported_interfaces member Add a new DSA switch operation, phylink_get_interfaces, which should fill in which PHY_INTERFACE_MODE_* are supported by given port. Use this before phylink_create() to fill phylinks supported_interfaces member, allowing phylink to determine which PHY_INTERFACE_MODEs are supported. Signed-off-by: Marek Behún <kabel@kernel.org> [tweaked patch and description to add more complete support -- rmk] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:06:32 +00:00
Vladimir Oltean	716a30a97a	net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device To reduce code churn, the same patch makes multiple changes, since they all touch the same lines: 1. The implementations for these two are identical, just with different function pointers. Reduce duplications and name the function pointers "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument. 2. Drop the "const" attribute from "orig_dev". If the driver needs to check whether orig_dev belongs to itself and then call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it can't, because call_switchdev_notifiers takes a non-const struct net_device *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-27 14:54:02 +01:00
Vladimir Oltean	425d19cede	net: dsa: stop calling dev_hold in dsa_slave_fdb_event Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have finished executing by the time we leave our bridge upper interface, we've established a stronger boundary condition for how long the dsa_slave_switchdev_event_work() might run. As such, it is no longer possible for DSA slave interfaces to become unregistered, since they are still bridge ports. So delete the unnecessary dev_hold() and dev_put(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 15:07:35 +01:00
Vladimir Oltean	d7d0d423db	net: dsa: flush switchdev workqueue when leaving the bridge DSA is preparing to offer switch drivers an API through which they can associate each FDB entry with a struct net_device *bridge_dev. This can be used to perform FDB isolation (the FDB lookup performed on the ingress of a standalone, or bridged port, should not find an FDB entry that is present in the FDB of another bridge). In preparation of that work, DSA needs to ensure that by the time we call the switch .port_fdb_add and .port_fdb_del methods, the dp->bridge_dev pointer is still valid, i.e. the port is still a bridge port. This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API requires drivers that must have sleepable context to handle those events to schedule the deferred work themselves. DSA does this through the dsa_owq. It can happen that a port leaves a bridge, del_nbp() flushes the FDB on that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context, DSA schedules its deferred work, but del_nbp() finishes unlinking the bridge as a master from the port before DSA's deferred work is run. Fundamentally, the port must not be unlinked from the bridge until all FDB deletion deferred work items have been flushed. The bridge must wait for the completion of these hardware accesses. An attempt has been made to address this issue centrally in switchdev by making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev level, which would offer implicit synchronization with del_nbp: https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/ but it seems that any attempt to modify switchdev's behavior and make the events blocking there would introduce undesirable side effects in other switchdev consumers. The most undesirable behavior seems to be that switchdev_deferred_process_work() takes the rtnl_mutex itself, which would be worse off than having the rtnl_mutex taken individually from drivers which is what we have now (except DSA which has removed that lock since commit `0faf890fc5` ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")). So to offer the needed guarantee to DSA switch drivers, I have come up with a compromise solution that does not require switchdev rework: we already have a hook at the last moment in time when the bridge is still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush the dsa_owq manually from there, which makes all FDB deletions synchronous. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-26 15:07:35 +01:00
Vladimir Oltean	0faf890fc5	net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit `b117e1e8a8` ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
Vladimir Oltean	338a3a4745	net: dsa: introduce locking for the address lists on CPU and DSA ports Now that the rtnl_mutex is going away for dsa_port_{host_,}fdb_{add,del}, no one is serializing access to the address lists that DSA keeps for the purpose of reference counting on shared ports (CPU and cascade ports). It can happen for one dsa_switch_do_fdb_del to do list_del on a dp->fdbs element while another dsa_switch_do_fdb_{add,del} is traversing dp->fdbs. We need to avoid that. Currently dp->mdbs is not at risk, because dsa_switch_do_mdb_{add,del} still runs under the rtnl_mutex. But it would be nice if it would not depend on that being the case. So let's introduce a mutex per port (the address lists are per port too) and share it between dp->mdbs and dp->fdbs. The place where we put the locking is interesting. It could be tempting to put a DSA-level lock which still serializes calls to .port_fdb_{add,del}, but it would still not avoid concurrency with other driver code paths that are currently under rtnl_mutex (.port_fdb_dump, .port_fast_age). So it would add a very false sense of security (and adding a global switch-wide lock in DSA to resynchronize with the rtnl_lock is also counterproductive and hard). So the locking is intentionally done only where the dp->fdbs and dp->mdbs lists are traversed. That means, from a driver perspective, that .port_fdb_add will be called with the dp->addr_lists_lock mutex held on the CPU port, but not held on user ports. This is done so that driver writers are not encouraged to rely on any guarantee offered by dp->addr_lists_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:42 +01:00
Vladimir Oltean	232deb3f95	net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error At present, when either of ds->ops->port_fdb_del() or ds->ops->port_mdb_del() return a non-zero error code, we attempt to save the day and keep the data structure associated with that switchdev object, as the deletion procedure did not complete. However, the way in which we do this is suspicious to the checker in lib/refcount.c, who thinks it is buggy to increment a refcount that became zero, and that this is indicative of a use-after-free. Fixes: `161ca59d39` ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-25 12:59:41 +01:00
David S. Miller	2d7e73f09f	Revert "Merge branch 'dsa-rtnl'" This reverts commit `965e6b262f`, reversing changes made to `4d98bb0d7e`.	2021-10-25 12:59:25 +01:00
Vladimir Oltean	5cdfde49a0	net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work After talking with Ido Schimmel, it became clear that rtnl_lock is not actually required for anything that is done inside the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE deferred work handlers. The reason why it was probably added by Arkadi Sharshevsky in commit `c9eb3e0f87` ("net: dsa: Add support for learning FDB through notification") was to offer the same locking/serialization guarantees as .ndo_fdb_{add,del} and avoid reworking any drivers. DSA has implemented .ndo_fdb_add and .ndo_fdb_del until commit `b117e1e8a8` ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del") - that is to say, until fairly recently. But those methods have been deleted, so now we are free to drop the rtnl_lock as well. Note that exposing DSA switch drivers to an unlocked method which was previously serialized by the rtnl_mutex is a potentially dangerous affair. Driver writers couldn't ensure that their internal locking scheme does the right thing even if they wanted. We could err on the side of paranoia and introduce a switch-wide lock inside the DSA framework, but that seems way overreaching. Instead, we could check as many drivers for regressions as we can, fix those first, then let this change go in once it is assumed to be fairly safe. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:44 +01:00
Vladimir Oltean	d3bd892437	net: dsa: introduce locking for the address lists on CPU and DSA ports Now that the rtnl_mutex is going away for dsa_port_{host_,}fdb_{add,del}, no one is serializing access to the address lists that DSA keeps for the purpose of reference counting on shared ports (CPU and cascade ports). It can happen for one dsa_switch_do_fdb_del to do list_del on a dp->fdbs element while another dsa_switch_do_fdb_{add,del} is traversing dp->fdbs. We need to avoid that. Currently dp->mdbs is not at risk, because dsa_switch_do_mdb_{add,del} still runs under the rtnl_mutex. But it would be nice if it would not depend on that being the case. So let's introduce a mutex per port (the address lists are per port too) and share it between dp->mdbs and dp->fdbs. The place where we put the locking is interesting. It could be tempting to put a DSA-level lock which still serializes calls to .port_fdb_{add,del}, but it would still not avoid concurrency with other driver code paths that are currently under rtnl_mutex (.port_fdb_dump, .port_fast_age). So it would add a very false sense of security (and adding a global switch-wide lock in DSA to resynchronize with the rtnl_lock is also counterproductive and hard). So the locking is intentionally done only where the dp->fdbs and dp->mdbs lists are traversed. That means, from a driver perspective, that .port_fdb_add will be called with the dp->addr_lists_lock mutex held on the CPU port, but not held on user ports. This is done so that driver writers are not encouraged to rely on any guarantee offered by dp->addr_lists_lock. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:47:44 +01:00
David S. Miller	bdfa75ad70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-22 11:41:16 +01:00
Vladimir Oltean	992e5cc7be	net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argument Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that contains the necessary information from the two arguments that are currently provided: the switch and the port number. Also rename those functions so that they have a dsa_port_* prefix, since they operate on a struct dsa_port *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	5068887a4f	net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_port Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	fac6abd5f1	net: dsa: convert cross-chip notifiers to iterate using dp The majority of cross-chip switch notifiers need to filter in some way over the type of ports: some install VLANs etc on all cascade ports. The difference is that the matching function, which filters by port type, is separate from the function where the iteration happens. So this patch needs to refactor the matching functions' prototypes as well, to take the dp as argument. In a future patch/series, I might convert dsa_towards_port to return a struct dsa_port *dp too, but at the moment it is a bit entangled with dsa_routing_port which is also used by mv88e6xxx and they both return an int port. So keep dsa_towards_port the way it is and convert it into a dp using dsa_to_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	57d77986e7	net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_port Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct dsa_port *dp was already available in the function scope, and replace them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses that dp directly and does not perform another hidden dsa_to_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:07 +01:00
Vladimir Oltean	65c563a677	net: dsa: do not open-code dsa_switch_for_each_port Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:06 +01:00
Vladimir Oltean	d0004a020b	net: dsa: remove the "dsa_to_port in a loop" antipattern from the core Ever since Vivien's conversion of the ds->ports array into a dst->ports list, and the introduction of dsa_to_port, iterations through the ports of a switch became quadratic whenever dsa_to_port was needed. dsa_to_port can either be called directly, or indirectly through the dsa_is_{user,cpu,dsa,unused}_port helpers. Use the newly introduced dsa_switch_for_each_port() iteration macro that works with the iterator variable being a struct dsa_port *dp directly, and not an int i. It is an expensive variable to go from i to dp, but cheap to go from dp to i. This macro iterates through the entire ds->dst->ports list and filters by the ports belonging just to the switch provided as argument. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:44:06 +01:00
Christophe JAILLET	ba69fd9101	net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()' If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: `83c0afaec7` ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-19 15:41:16 -07:00
Alvin Šipraga	1521d5adfc	net: dsa: tag_rtl8_4: add realtek 8 byte protocol 4 tag This commit implements a basic version of the 8 byte tag protocol used in the Realtek RTL8365MB-VC unmanaged switch, which carries with it a protocol version of 0x04. The implementation itself only handles the parsing of the EtherType value and Realtek protocol version, together with the source or destination port fields. The rest is left unimplemented for now. The tag format is described in a confidential document provided to my company by Realtek Semiconductor Corp. Permission has been granted by the vendor to publish this driver based on that material, together with an extract from the document describing the tag format and its fields. It is hoped that this will help future implementors who do not have access to the material but who wish to extend the functionality of drivers for chips which use this protocol. In addition, two possible values of the REASON field are specified, based on experiments on my end. Realtek does not specify what value this field can take. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:56 +01:00
Alvin Šipraga	9cb8edda21	net: dsa: move NET_DSA_TAG_RTL4_A to right place in Kconfig/Makefile Move things around a little so that this tag driver is alphabetically ordered. The Kconfig file is sorted based on the tristate text. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:55 +01:00
Alvin Šipraga	487d3855b6	net: dsa: allow reporting of standard ethtool stats for slave devices Jakub pointed out that we have a new ethtool API for reporting device statistics in a standardized way, via .get_eth_{phy,mac,ctrl}_stats. Add a small amount of plumbing to allow DSA drivers to take advantage of this when exposing statistics. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-18 14:02:55 +01:00
Jakub Kicinski	e15f5972b8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/ioam6.sh `7b1700e009` ("selftests: net: modify IOAM tests for undef bits") `bf77b1400a` ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-14 16:50:14 -07:00
Vladimir Oltean	39e222bfd7	net: dsa: unregister cross-chip notifier after ds->ops->teardown To be symmetric with the error unwind path of dsa_switch_setup(), call dsa_switch_unregister_notifier() after ds->ops->teardown. The implication is that ds->ops->teardown cannot emit cross-chip notifiers. For example, currently the dsa_tag_8021q_unregister() call from sja1105_teardown() does not propagate to the entire tree due to this reason. However I cannot find an actual issue caused by this, observed using code inspection. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211012123735.2545742-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-13 13:36:01 -07:00
Vladimir Oltean	43ba33b4f1	net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports When setting up a bridge with stp_state 1, topology changes are not detected and loops are not blocked. This is because the standard way of transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the right egress port, does not override the port STP state (in the case of Ocelot switches, that's really the PGID_SRC masks). To force a packet to be injected into a port that's BLOCKING, we must send it as a control packet, which means in the case of this tagger to send it using the manual register injection method. We already do this for PTP frames, extend the logic to apply to any link-local MAC DA. Fixes: `7c83a7c539` ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:19 -07:00
Vladimir Oltean	49f885b2d9	net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: `0a6f17c6ae` ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:18 -07:00
Vladimir Oltean	deab6b1cd9	net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver As explained here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ DSA tagging protocol drivers cannot depend on symbols exported by switch drivers, because this creates a circular dependency that breaks module autoloading. The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function exported by the common ocelot switch lib. This function looks at OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the DSA tag, for PTP timestamping (the command: one-step/two-step, and the TX timestamp identifier). None of that requires deep insight into the driver, it is quite stateless, as it only depends upon the skb->cb. So let's make it a static inline function and put it in include/linux/dsa/ocelot.h, a file that despite its name is used by the ocelot switch driver for populating the injection header too - since commit `40d3f295b5` ("net: mscc: ocelot: use common tag parsing code with DSA"). With that function declared as static inline, its body is expanded inside each call site, so the dependency is broken and the DSA tagger can be built without the switch library, upon which the felix driver depends. Fixes: `39e5308b32` ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:35:18 -07:00
Vladimir Oltean	4ac0567e40	net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver It's nice to be able to test a tagging protocol with dsa_loop, but not at the cost of losing the ability of building the tagging protocol and switch driver as modules, because as things stand, there is a circular dependency between the two. Tagging protocol drivers cannot depend on switch drivers, that is a hard fact. The reasoning behind the blamed patch was that accessing dp->priv should first make sure that the structure behind that pointer is what we really think it is. Currently the "sja1105" and "sja1110" tagging protocols only operate with the sja1105 switch driver, just like any other tagging protocol and switch combination. The only way to mix and match them is by modifying the code, and this applies to dsa_loop as well (by default that uses DSA_TAG_PROTO_NONE). So while in principle there is an issue, in practice there isn't one. Until we extend dsa_loop to allow user space configuration, treat the problem as a non-issue and just say that DSA ports found by tag_sja1105 are always sja1105 ports, which is in fact true. But keep the dsa_port_is_sja1105 function so that it's easy to patch it during testing, and rely on dead code elimination. Fixes: `994d2cbb08` ("net: dsa: tag_sja1105: be dsa_loop-safe") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:33:36 -07:00
Vladimir Oltean	28da0555c3	net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver The problem is that DSA tagging protocols really must not depend on the switch driver, because this creates a circular dependency at insmod time, and the switch driver will effectively not load when the tagging protocol driver is missing. The code was structured in the way it was for a reason, though. The DSA driver-facing API for PTP timestamping relies on the assumption that two-step TX timestamps are provided by the hardware in an out-of-band manner, typically by raising an interrupt and making that timestamp available inside some sort of FIFO which is to be accessed over SPI/MDIO/etc. So the API puts .port_txtstamp into dsa_switch_ops, because it is expected that the switch driver needs to save some state (like put the skb into a queue until its TX timestamp arrives). On SJA1110, TX timestamps are provided by the switch as Ethernet packets, so this makes them be received and processed by the tagging protocol driver. This in itself is great, because the timestamps are full 64-bit and do not require reconstruction, and since Ethernet is the fastest I/O method available to/from the switch, PTP timestamps arrive very quickly, no matter how bottlenecked the SPI connection is, because SPI interaction is not needed at all. DSA's code structure and strict isolation between the tagging protocol driver and the switch driver break the natural code organization. When the tagging protocol driver receives a packet which is classified as a metadata packet containing timestamps, it passes those timestamps one by one to the switch driver, which then proceeds to compare them based on the recorded timestamp ID that was generated in .port_txtstamp. The communication between the tagging protocol and the switch driver is done through a method exported by the switch driver, sja1110_process_meta_tstamp. To satisfy build requirements, we force a dependency to build the tagging protocol driver as a module when the switch driver is a module. However, as explained in the first paragraph, that causes the circular dependency. To solve this, move the skb queue from struct sja1105_private :: struct sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data. The latter is a data structure for which hacks have already been put into place to be able to create persistent storage per switch that is accessible from the tagging protocol driver (see sja1105_setup_ports). With the skb queue directly accessible from the tagging protocol driver, we can now move sja1110_process_meta_tstamp into the tagging driver itself, and avoid exporting a symbol. Fixes: `566b18c8b7` ("net: dsa: sja1105: implement TX timestamping for SJA1110") Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 17:33:36 -07:00
Alvin Šipraga	43a4b4dbd4	net: dsa: fix spurious error message when unoffloaded port leaves bridge Flip the sign of a return value check, thereby suppressing the following spurious error: port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP ... which is emitted when removing an unoffloaded DSA switch port from a bridge. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-12 16:11:23 -07:00
Vladimir Oltean	1951b3f19c	net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol It was a documented fact that ds->ops->change_tag_protocol() offered rtnetlink mutex protection to the switch driver, since there was an ASSERT_RTNL right before the call in dsa_switch_change_tag_proto() (initiated from sysfs). The blamed commit introduced another call path for ds->ops->change_tag_protocol() which does not hold the rtnl_mutex. This is: dsa_tree_setup -> dsa_tree_setup_switches -> dsa_switch_setup -> dsa_switch_setup_tag_protocol -> ds->ops->change_tag_protocol() -> dsa_port_setup -> dsa_slave_create -> register_netdevice(slave_dev) -> dsa_tree_setup_master -> dsa_master_setup -> dev->dsa_ptr = cpu_dp The reason why the rtnl_mutex is held in the sysfs call path is to ensure that, once the master and all the DSA interfaces are down (which is required so that no packets flow), they remain down during the tagging protocol change. The above calling order illustrates the fact that it should not be risky to change the initial tagging protocol to the one specified in the device tree at the given time: - packets cannot enter the dsa_switch_rcv() packet type handler since netdev_uses_dsa() for the master will not yet return true, since dev->dsa_ptr has not yet been populated - packets cannot enter the dsa_slave_xmit() function because no DSA interface has yet been registered So from the DSA core's perspective, holding the rtnl_mutex is indeed not necessary. Yet, drivers may need to do things which need rtnl_mutex protection. For example: felix_set_tag_protocol -> felix_setup_tag_8021q -> dsa_tag_8021q_register -> dsa_tag_8021q_setup -> dsa_tag_8021q_port_setup -> vlan_vid_add -> ASSERT_RTNL These drivers do not really have a choice to take the rtnl_mutex themselves, since in the sysfs case, the rtnl_mutex is already held. Fixes: `deff710703` ("net: dsa: Allow default tag protocol to be overridden from DT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-09 13:44:05 +01:00
Vladimir Oltean	5bded8259e	net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports Similar to commit `6087175b79` ("net: dsa: mt7530: use independent VLAN learning on VLAN-unaware bridges"), software forwarding between an unoffloaded LAG port (a bonding interface with an unsupported policy) and a mv88e6xxx user port directly under a bridge is broken. We adopt the same strategy, which is to make the standalone ports not find any ATU entry learned on a bridge port. Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are as many FIDs as VIDs (4096). The FID is derived from the VID when possible (the VTU maps a VID to a FID), with a fallback to the port based default FID value when not (802.1Q Mode is disabled on the port, or the classified VID isn't present in the VTU). The mv88e6xxx driver makes the following use of FIDs and VIDs: - the port's DefaultVID (to which untagged & pvid-tagged packets get classified) is 0 and is absent from the VTU, so this kind of packets is processed in FID 0, the default FID assigned by mv88e6xxx_setup_port. - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() -> mv88e6xxx_atu_new() associates a FID with that VID which increases linearly starting from 1. Like this: bridge vlan add dev lan0 vid 100 # FID 1 bridge vlan add dev lan1 vid 100 # still FID 1 bridge vlan add dev lan2 vid 1024 # FID 2 The FID allocation made by the driver is sub-optimal for the following reasons: (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too. A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID of 0 too. The difference is that the bridged ports may learn ATU entries, while the standalone port has the requirement that it must not, and must not find them either. Standalone ports must not use the same FID as ports belonging to a bridge. All standalone ports can use the same FID, since the ATU will never have an entry in that FID. (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a default FID of 0 on all their ports. The FDBs will not be isolated between these bridges. Every VLAN-unaware bridge must use the same FID on all its ports, different from the FID of other bridge ports. (c) Each bridge VLAN uses a unique FID which is useful for Independent VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges will result in the same FID being used by mv88e6xxx_atu_new(). The correct behavior is for VLAN 1 in br0 to have a different FID compared to VLAN 1 in br1. This patch cannot fix all the above. Traditionally the DSA framework did not care about this, and the reality is that DSA core involvement is needed for the aforementioned issues to be solved. The only thing we can solve here is an issue which does not require API changes, and that is issue (a), aka use a different FID for standalone ports vs ports under VLAN-unaware bridges. The first step is deciding what VID and FID to use for standalone ports, and what VID and FID for bridged ports. The 0/0 pair for standalone ports is what they used up till now, let's keep using that. For bridged ports, there are 2 cases: - VLAN-aware ports will never end up using the port default FID, because packets will always be classified to a VID in the VTU or dropped otherwise. The FID is the one associated with the VID in the VTU. - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at zero (just as in the case of standalone ports), and just change the port's default FID from 0 to a different number (say 1). However, Tobias points out that there is one more requirement to cater to: cross-chip bridging. The Marvell DSA header does not carry the FID in it, only the VID. So once a packet crosses a DSA link, if it has a VID of zero it will get classified to the default FID of that cascade port. Relying on a port default FID for upstream cascade ports results in contradictions: a default FID of 0 breaks ATU isolation of bridged ports on the downstream switch, a default FID of 1 breaks standalone ports on the downstream switch. So not only must standalone ports have different FIDs compared to bridged ports, they must also have different DefaultVID values. IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply choose 4095 as the DefaultVID of ports belonging to VLAN-unaware bridges, and VID 4095 maps to FID 1. For the xmit operation to look up the same ATU database, we need to put VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges too. All shared ports are configured to map this VID to the bridging FID, because they are members of that VLAN in the VTU. Shared ports don't need to have 802.1QMode enabled in any way, they always parse the VID from the DSA header, they don't need to look at the 802.1Q header. We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the mention that mv88e6xxx_vtu_setup() which was located right below that call was flushing the VTU so those entries wouldn't be preserved. So we need to relocate the VTU flushing prior to the port initialization during ->setup(). Also note that this is why it is safe to assume that VID 4095 will get associated with FID 1: the user ports haven't been created, so there is no avenue for the user to create a bridge VLAN which could otherwise race with the creation of another FID which would otherwise use up the non-reserved FID value of 1. [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ] mv88e6xxx_port_db_load_purge() is the function to access the ATU for FDB/MDB entries, and it used to determine the FID to use for VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid(). But the driver only called mv88e6xxx_port_set_fid() once, during probe, so no surprises, the port FID was always 0, the call to get_fid() was redundant. As much as I would have wanted to not touch that code, the logic is broken when we add a new FID which is not the port-based default. Now the port-based default FID only corresponds to standalone ports, and FDB/MDB entries belong to the bridging service. So while in the future, when the DSA API will support FDB isolation, we will have to figure out the FID based on the bridge number, for now there's a single bridging FID, so hardcode that. Lastly, the tagger needs to check, when it is transmitting a VLAN untagged skb, whether it is sending it towards a bridged or a standalone port. When we see it is bridged we assume the bridge is VLAN-unaware. Not because it cannot be VLAN-aware but: - if we are transmitting from a VLAN-aware bridge we are likely doing so using TX forwarding offload. That code path guarantees that skbs have a vlan hwaccel tag in them, so we would not enter the "else" branch of the "if (skb->protocol == htons(ETH_P_8021Q))" condition. - if we are transmitting on behalf of a VLAN-aware bridge but with no TX forwarding offload (no PVT support, out of space in the PVT, whatever), we would indeed be transmitting with VLAN 4095 instead of the bridge device's pvid. However we would be injecting a "From CPU" frame, and the switch won't learn from that - it only learns from "Forward" frames. So it is inconsequential for address learning. And VLAN 4095 is absolutely enough for the frame to exit the switch, since we never remove that VLAN from any port. Fixes: `57e661aae6` ("net: dsa: mv88e6xxx: Link aggregation support") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:46 -07:00
Vladimir Oltean	c7709a02c1	net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0 The present code is structured this way due to an incomplete thought process. In Documentation/networking/switchdev.rst we document that if a bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge port (or on the bridge itself, for that matter) should not affect the ability to receive and transmit tagged or untagged packets. If the bridge on behalf of which we are sending this packet is VLAN-aware, then the TX forwarding offload API ensures that the skb will be VLAN-tagged (if the packet was sent by user space as untagged, it will get transmitted town to the driver as tagged with the bridge device's pvid). But if the bridge is VLAN-unaware, it may or may not be VLAN-tagged. In fact the logic to insert the bridge's PVID came from the idea that we should emulate what is being done in the VLAN-aware case. But we shouldn't. It appears that injecting packets using a VLAN ID of 0 serves the purpose of forwarding the packets to the egress port with no VLAN tag added or stripped by the hardware, and no filtering being performed. So we can simply remove the superfluous logic. One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n, we call br_vlan_get_pvid_rcu() but that returns an error and we do error out, dropping all packets on xmit. Not really smart. This is also an issue when the user deletes the bridge pvid: $ bridge vlan del dev br0 vid 1 self As mentioned, in both cases, packets should still flow freely, and they do just that on any net device where the bridge is not offloaded, but on mv88e6xxx they don't. Fixes: `d82f8ab0d8` ("net: dsa: tag_dsa: offload the bridge forwarding process") Reported-by: Andrew Lunn <andrew@lunn.ch> Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003155141.2241314-1-andrew@lunn.ch/ Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210928233708.1246774-1-vladimir.oltean@nxp.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:45 -07:00
Vladimir Oltean	1bec0f0506	net: dsa: fix bridge_num not getting cleared after ports leaving the bridge The dp->bridge_num is zero-based, with -1 being the encoding for an invalid value. But dsa_bridge_num_put used to check for an invalid value by comparing bridge_num with 0, which is of course incorrect. The result is that the bridge_num will never get cleared by dsa_bridge_num_put, and further port joins to other bridges will get a bridge_num larger than the previous one, and once all the available bridges with TX forwarding offload supported by the hardware get exhausted, the TX forwarding offload feature is simply disabled. In the case of sja1105, 7 iterations of the loop below are enough to exhaust the TX forwarding offload bits, and further bridge joins operate without that feature. ip link add br0 type bridge vlan_filtering 1 while :; do ip link set sw0p2 master br0 && sleep 1 ip link set sw0p2 nomaster && sleep 1 done This issue is enough of an indication that having the dp->bridge_num invalid encoding be a negative number is prone to bugs, so this will be changed to a one-based value, with the dp->bridge_num of zero being the indication of no bridge. However, that is material for net-next. Fixes: `f5e165e72b` ("net: dsa: track unique bridge numbers across all DSA switch trees") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-08 15:47:45 -07:00

1 2 3 4 5 ...

1254 Commits