linux

Author	SHA1	Message	Date
Yajun Deng	9d85a6f44b	net: sched: cls_api: Fix the the wrong parameter The 4th parameter in tc_chain_notify() should be flags rather than seq. Let's change it back correctly. Fixes: `32a4f5ecd7` ("net: sched: introduce chain object to uapi") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:48:03 -07:00
Vladimir Oltean	2b0a568849	net: switchdev: fix FDB entries towards foreign ports not getting propagated to us The newly introduced switchdev_handle_fdb_{add,del}_to_device helpers solved a problem but introduced another one. They have a severe design bug: they do not propagate FDB events on foreign interfaces to us, i.e. this use case: br0 / \ / \ / \ / \ swp0 eno0 (switchdev) (foreign) when an address is learned on eno0, what is supposed to happen is that this event should also be propagated towards swp0. Somehow I managed to convince myself that this did work correctly, but obviously it does not. The trouble with foreign interfaces is that we must reach a switchdev net_device pointer through a foreign net_device that has no direct upper/lower relationship with it. So we need to do exploratory searching through the lower interfaces of the foreign net_device's bridge upper (to reach swp0 from eno0, we must check its upper, br0, for lower interfaces that pass the check_cb and foreign_dev_check_cb). This is something that the previous code did not do, it just assumed that "dev" will become a switchdev interface at some point, somehow, probably by magic. With this patch, assisted address learning on the CPU port works again in DSA: ip link add br0 type bridge ip link set swp0 master br0 ip link set eno0 master br0 ip link set br0 up [ 46.708929] mscc_felix 0000:00:00.5 swp0: Adding FDB entry towards eno0, addr 00:04:9f:05:f4:ab vid 0 as host address Fixes: `8ca07176ab` ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE") Reported-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:45:40 -07:00
Randy Dunlap	98c5b13f3a	net: sparx5: fix unmet dependencies warning WARNING: unmet direct dependencies detected for PHY_SPARX5_SERDES Depends on [n]: (ARCH_SPARX5 \|\| COMPILE_TEST [=n]) && OF [=y] && HAS_IOMEM [=y] Selected by [y]: - SPARX5_SWITCH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y] && OF [=y] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:44:32 -07:00
David S. Miller	f796fcd613	Merge branch 'bridge-port-offload' Vladimir Oltean says: ==================== Let switchdev drivers offload and unoffload bridge ports at their own convenience This series introduces an explicit API through which switchdev drivers mark a bridge port as offloaded or not: - switchdev_bridge_port_offload() - switchdev_bridge_port_unoffload() Currently, the bridge assumes that a port is offloaded if dev_get_port_parent_id(dev, &ppid, recurse=true) returns something, but that is just an assumption that breaks some use cases (like a non-offloaded LAG interface on top of a switchdev port, bridged with other switchdev ports). Along with some consolidation of the bridge logic to assign a "switchdev offloading mark" to a port (now better called a "hardware domain"), this series allows the bridge driver side to no longer impose restrictions on that configuration. Right now, all switchdev drivers must be modified to use the explicit API, but more and more logic can then be placed centrally in the bridge and therefore ease the job of a switchdev driver writer in the future. For example, the first thing we can hook into the explicit switchdev offloading API calls are the switchdev object and FDB replay helpers. So far, these have only been used by DSA in "pull" mode (where the driver must ask for them). Adding the replay helpers to other drivers involves a lot of repetition. But by moving the helpers inside the bridge port offload/unoffload hook points, we can move the entire replay process to "push" mode (where the bridge provides them automatically). The explicit switchdev offloading API will see further extensions in the future. The patches were split from a larger series for easier review: https://patchwork.kernel.org/project/netdevbpf/cover/20210718214434.3938850-1-vladimir.oltean@nxp.com/ Changes in v6: - Make the switchdev replay helpers opt-in - Opt out of the replay helpers for mlxsw, rocker, prestera, sparx5, cpsw, am65-cpsw ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:43 -07:00
Vladimir Oltean	4e51bf44a0	net: bridge: move the switchdev object replay helpers to "push" mode Starting with commit `4f2673b3a2` ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit `0d2cfbd41c` ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit `2c4eca3ef7` ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit `4f2673b3a2` ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Vladimir Oltean	7105b50b7e	net: bridge: guard the switchdev replay helpers against a NULL notifier block There is a desire to make the object and FDB replay helpers optional when moving them inside the bridge driver. For example a certain driver might not offload host MDBs and there is no case where the replay helpers would be of immediate use to it. So it would be nice if we could allow drivers to pass NULL pointers for the atomic and blocking notifier blocks, and the replay helpers to do nothing in that case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Vladimir Oltean	2f5dc00f7a	net: bridge: switchdev: let drivers inform which bridge ports are offloaded On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it is, it assigns the source hardware domain of that skb based on the hardware domain of the ingress port. Then during forwarding, it enforces that the egress port must have a different hardware domain than the ingress one (this is done in nbp_switchdev_allowed_egress). Non-switchdev drivers don't report any physical switch id (neither through devlink nor .ndo_get_port_parent_id), therefore the bridge assigns them a hardware domain of 0, and packets coming from them will always have skb->offload_fwd_mark = 0. So there aren't any restrictions. Problems appear due to the fact that DSA would like to perform software fallback for bonding and team interfaces that the physical switch cannot offload. +-- br0 ---+ / / \| \ / / \| \ / \| \| bond0 / \| \| / \ swp0 swp1 swp2 swp3 swp4 There, it is desirable that the presence of swp3 and swp4 under a non-offloaded LAG does not preclude us from doing hardware bridging beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high enough that software bridging between {swp0,swp1,swp2} and bond0 is not impractical. But this creates an impossible paradox given the current way in which port hardware domains are assigned. When the driver receives a packet from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to something. - If we set it to 0, then the bridge will forward it towards swp1, swp2 and bond0. But the switch has already forwarded it towards swp1 and swp2 (not to bond0, remember, that isn't offloaded, so as far as the switch is concerned, ports swp3 and swp4 are not looking up the FDB, and the entire bond0 is a destination that is strictly behind the CPU). But we don't want duplicated traffic towards swp1 and swp2, so it's not ok to set skb->offload_fwd_mark = 0. - If we set it to 1, then the bridge will not forward the skb towards the ports with the same switchdev mark, i.e. not to swp1, swp2 and bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should have forwarded the skb there. So the real issue is that bond0 will be assigned the same hardware domain as {swp0,swp1,swp2}, because the function that assigns hardware domains to bridge ports, nbp_switchdev_add(), recurses through bond0's lower interfaces until it finds something that implements devlink (calls dev_get_port_parent_id with bool recurse = true). This is a problem because the fact that bond0 can be offloaded by swp3 and swp4 in our example is merely an assumption. A solution is to give the bridge explicit hints as to what hardware domain it should use for each port. Currently, the bridging offload is very 'silent': a driver registers a netdevice notifier, which is put on the netns's notifier chain, and which sniffs around for NETDEV_CHANGEUPPER events where the upper is a bridge, and the lower is an interface it knows about (one registered by this driver, normally). Then, from within that notifier, it does a bunch of stuff behind the bridge's back, without the bridge necessarily knowing that there's somebody offloading that port. It looks like this: ip link set swp0 master br0 \| v br_add_if() calls netdev_master_upper_dev_link() \| v call_netdevice_notifiers \| v dsa_slave_netdevice_event \| v oh, hey! it's for me! \| v .port_bridge_join What we do to solve the conundrum is to be less silent, and change the switchdev drivers to present themselves to the bridge. Something like this: ip link set swp0 master br0 \| v br_add_if() calls netdev_master_upper_dev_link() \| v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the \| \| hardware domain for v \| this port, and zero dsa_slave_netdevice_event \| if I got nothing. \| \| v \| oh, hey! it's for me! \| \| \| v \| .port_bridge_join \| \| \| +------------------------+ switchdev_bridge_port_offload(swp0, swp0) Then stacked interfaces (like bond0 on top of swp3/swp4) would be treated differently in DSA, depending on whether we can or cannot offload them. The offload case: ip link set bond0 master br0 \| v br_add_if() calls netdev_master_upper_dev_link() \| v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the \| \| switchdev mark for v \| bond0. dsa_slave_netdevice_event \| Coincidentally (or not), \| \| bond0 and swp0, swp1, swp2 v \| all have the same switchdev hmm, it's not quite for me, \| mark now, since the ASIC but my driver has already \| is able to forward towards called .port_lag_join \| all these ports in hw. for it, because I have \| a port with dp->lag_dev == bond0. \| \| \| v \| .port_bridge_join \| for swp3 and swp4 \| \| \| +------------------------+ switchdev_bridge_port_offload(bond0, swp3) switchdev_bridge_port_offload(bond0, swp4) And the non-offload case: ip link set bond0 master br0 \| v br_add_if() calls netdev_master_upper_dev_link() \| v bridge waiting: call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload \| \| wasn't called, okay, I'll use a v \| hwdom of zero for this one. dsa_slave_netdevice_event : Then packets received on swp0 will \| : not be software-forwarded towards v : swp1, but they will towards bond0. it's not for me, but bond0 is an upper of swp3 and swp4, but their dp->lag_dev is NULL because they couldn't offload it. Basically we can draw the conclusion that the lowers of a bridge port can come and go, so depending on the configuration of lowers for a bridge port, it can dynamically toggle between offloaded and unoffloaded. Therefore, we need an equivalent switchdev_bridge_port_unoffload too. This patch changes the way any switchdev driver interacts with the bridge. From now on, everybody needs to call switchdev_bridge_port_offload and switchdev_bridge_port_unoffload, otherwise the bridge will treat the port as non-offloaded and allow software flooding to other ports from the same ASIC. Note that these functions lay the ground for a more complex handshake between switchdev drivers and the bridge in the future. For drivers that will request a replay of the switchdev objects when they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we place the call to switchdev_bridge_port_unoffload() strategically inside the NETDEV_PRECHANGEUPPER notifier's code path, and not inside NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers need the netdev adjacency lists to be valid, and that is only true in NETDEV_PRECHANGEUPPER. Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Tobias Waldekranz	8582661048	net: bridge: switchdev: recycle unused hwdoms Since hwdoms have only been used thus far for equality comparisons, the bridge has used the simplest possible assignment policy; using a counter to keep track of the last value handed out. With the upcoming transmit offloading, we need to perform set operations efficiently based on hwdoms, e.g. we want to answer questions like "has this skb been forwarded to any port within this hwdom?" Move to a bitmap-based allocation scheme that recycles hwdoms once all members leaves the bridge. This means that we can use a single unsigned long to keep track of the hwdoms that have received an skb. v1->v2: convert the typedef DECLARE_BITMAP(br_hwdom_map_t, BR_HWDOM_MAX) into a plain unsigned long. v2->v6: none Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Tobias Waldekranz	f7cf972f93	net: bridge: disambiguate offload_fwd_mark Before this change, four related - but distinct - concepts where named offload_fwd_mark: - skb->offload_fwd_mark: Set by the switchdev driver if the underlying hardware has already forwarded this frame to the other ports in the same hardware domain. - nbp->offload_fwd_mark: An idetifier used to group ports that share the same hardware forwarding domain. - br->offload_fwd_mark: Counter used to make sure that unique IDs are used in cases where a bridge contains ports from multiple hardware domains. - skb->cb->offload_fwd_mark: The hardware domain on which the frame ingressed and was forwarded. Introduce the term "hardware forwarding domain" ("hwdom") in the bridge to denote a set of ports with the following property: If an skb with skb->offload_fwd_mark set, is received on a port belonging to hwdom N, that frame has already been forwarded to all other ports in hwdom N. By decoupling the name from "offload_fwd_mark", we can extend the term's definition in the future - e.g. to add constraints that describe expected egress behavior - without overloading the meaning of "offload_fwd_mark". - nbp->offload_fwd_mark thus becomes nbp->hwdom. - br->offload_fwd_mark becomes br->last_hwdom. - skb->cb->offload_fwd_mark becomes skb->cb->src_hwdom. The slight change in naming here mandates a slight change in behavior of the nbp_switchdev_frame_mark() function. Previously, it only set this value in skb->cb for packets with skb->offload_fwd_mark true (ones which were forwarded in hardware). Whereas now we always track the incoming hwdom for all packets coming from a switchdev (even for the packets which weren't forwarded in hardware, such as STP BPDUs, IGMP reports etc). As all uses of skb->cb->offload_fwd_mark were already gated behind checks of skb->offload_fwd_mark, this will not introduce any functional change, but it paves the way for future changes where the ingressing hwdom must be known for frames coming from a switchdev regardless of whether they were forwarded in hardware or not (basically, if the skb comes from a switchdev, skb->cb->src_hwdom now always tracks which one). A typical example where this is relevant: the switchdev has a fixed configuration to trap STP BPDUs, but STP is not running on the bridge and the group_fwd_mask allows them to be forwarded. Say we have this setup: br0 / \| \ / \| \ swp0 swp1 swp2 A BPDU comes in on swp0 and is trapped to the CPU; the driver does not set skb->offload_fwd_mark. The bridge determines that the frame should be forwarded to swp{1,2}. It is imperative that forward offloading is _not_ allowed in this case, as the source hwdom is already "poisoned". Recording the source hwdom allows this case to be handled properly. v2->v3: added code comments v3->v6: none Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Vladimir Oltean	45035febc4	net: dpaa2-switch: refactor prechangeupper sanity checks Make more room for some extra code in the NETDEV_PRECHANGEUPPER handler by moving what already exists into a dedicated function. Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
Vladimir Oltean	123338d7d4	net: dpaa2-switch: use extack in dpaa2_switch_port_bridge_join We need to propagate the extack argument for dpaa2_switch_port_bridge_join to use it in a future patch, and it looks like there is already an error message there which is currently printed to the console. Move it over netlink so it is properly transmitted to user space. Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-22 00:26:23 -07:00
David S. Miller	5ca096dbea	Merge branch 'ksz-dsa-fixes' Lino Sanfilippo says: ==================== Fixes for KSZ DSA switch These patches fix issues I encountered while using a KSZ9897 as a DSA switch with a broadcom GENET network device as the DSA master device. PATCH 1 fixes an invalid access to an SKB in case it is scattered. PATCH 2 fixes incorrect hardware checksum calculation caused by the DSA tag. Changes in v2: - instead of linearizing the SKBs only for KSZ switches ensure linearized SKBs for all tail taggers by clearing the feature flags NETIF_F_HW_SG and NETIF_F_FRAGLIST (suggested by Vladimir Oltean) The patches have been tested with a KSZ9897 and apply against net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 23:15:06 -07:00
Lino Sanfilippo	37120f23ac	net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum If the checksum calculation is offloaded to the network device (e.g due to NETIF_F_HW_CSUM inherited from the DSA master device), the calculated layer 4 checksum is incorrect. This is since the DSA tag which is placed after the layer 4 data is considered as being part of the daa and thus errorneously included into the checksum calculation. To avoid this, always calculate the layer 4 checksum in software. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 23:14:49 -07:00
Lino Sanfilippo	21cf377a9c	net: dsa: ensure linearized SKBs in case of tail taggers The function skb_put() that is used by tail taggers to make room for the DSA tag must only be called for linearized SKBS. However in case that the slave device inherited features like NETIF_F_HW_SG or NETIF_F_FRAGLIST the SKB passed to the slaves transmit function may not be linearized. Avoid those SKBs by clearing the NETIF_F_HW_SG and NETIF_F_FRAGLIST flags for tail taggers. Furthermore since the tagging protocol can be changed at runtime move the code for setting up the slaves features into dsa_slave_setup_tagger(). Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 23:14:49 -07:00
Biju Das	9f061b9acb	ravb: Remove extra TAB Align the member description comments for struct ravb_desc by removing the extra TAB. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 22:55:59 -07:00
Biju Das	291d0a2c1f	ravb: Fix a typo in comment Fix the typo RX->TX in comment, as the code following the comment process TX and not RX. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 22:55:26 -07:00
Vladimir Oltean	e40cba9490	net: dsa: sja1105: make VID 4095 a bridge VLAN too This simple series of commands: ip link add br0 type bridge vlan_filtering 1 ip link set swp0 master br0 fails on sja1105 with the following error: [ 33.439103] sja1105 spi0.1: vlan-lookup-table needs to have at least the default untagged VLAN [ 33.447710] sja1105 spi0.1: Invalid config, cannot upload Warning: sja1105: Failed to change VLAN Ethertype. For context, sja1105 has 3 operating modes: - SJA1105_VLAN_UNAWARE: the dsa_8021q_vlans are committed to hardware - SJA1105_VLAN_FILTERING_FULL: the bridge_vlans are committed to hardware - SJA1105_VLAN_FILTERING_BEST_EFFORT: both the dsa_8021q_vlans and the bridge_vlans are committed to hardware Swapping out a VLAN list and another in happens in sja1105_build_vlan_table(), which performs a delta update procedure. That function is called from a few places, notably from sja1105_vlan_filtering() which is called from the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler. The above set of 2 commands fails when run on a kernel pre-commit `8841f6e63f` ("net: dsa: sja1105: make devlink property best_effort_vlan_filtering true by default"). So the priv->vlan_state transition that takes place is between VLAN-unaware and full VLAN filtering. So the dsa_8021q_vlans are swapped out and the bridge_vlans are swapped in. So why does it fail? Well, the bridge driver, through nbp_vlan_init(), first sets up the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING attribute, and only then proceeds to call nbp_vlan_add for the default_pvid. So when we swap out the dsa_8021q_vlans and swap in the bridge_vlans in the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING handler, there are no bridge VLANs (yet). So we have wiped the VLAN table clean, and the low-level static config checker complains of an invalid configuration. We _will_ add the bridge VLANs using the dynamic config interface, albeit later, when nbp_vlan_add() calls us. So it is natural that it fails. So why did it ever work? Surprisingly, it looks like I only tested this configuration with 2 things set up in a particular way: - a network manager that brings all ports up - a kernel with CONFIG_VLAN_8021Q=y It is widely known that commit `ad1afb0039` ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") installs VID 0 to every net device that comes up. DSA treats these VLANs as bridge VLANs, and therefore, in my testing, the list of bridge_vlans was never empty. However, if CONFIG_VLAN_8021Q is not enabled, or the port is not up when it joins a VLAN-aware bridge, the bridge_vlans list will be temporarily empty, and the sja1105_static_config_reload() call from sja1105_vlan_filtering() will fail. To fix this, the simplest thing is to keep VID 4095, the one used for CPU-injected control packets since commit `ed040abca4` ("net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic"), in the list of bridge VLANs too, not just the list of tag_8021q VLANs. This ensures that the list of bridge VLANs will never be empty. Fixes: `ec5ae61076` ("net: dsa: sja1105: save/restore VLANs using a delta commit method") Reported-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 22:53:43 -07:00
Wei Wang	213ad73d06	tcp: disable TFO blackhole logic by default Multiple complaints have been raised from the TFO users on the internet stating that the TFO blackhole logic is too aggressive and gets falsely triggered too often. (e.g. https://blog.apnic.net/2021/07/05/tcp-fast-open-not-so-fast/) Considering that most middleboxes no longer drop TFO packets, we decide to disable the blackhole logic by setting /proc/sys/net/ipv4/tcp_fastopen_blackhole_timeout_set to 0 by default. Fixes: `cf1ef3f071` ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 22:50:31 -07:00
Leon Romanovsky	c2255ff477	ionic: cleanly release devlink instance The failure to register devlink will leave the system with dangled devlink resource, which is not cleaned if devlink_port_register() fails. In order to remove access to ".registered" field of struct devlink_port, require both devlink_register and devlink_port_register to success and check it through device pointer. Fixes: `fbfb803153` ("ionic: Add hardware init and device commands") Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:35:52 -07:00
Nikolay Aleksandrov	58d913a326	net: bridge: multicast: add context support for host-joined groups Adding bridge multicast context support for host-joined groups is easy because we only need the proper timer value. We pass the already chosen context and use its timer value. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:34:47 -07:00
Nikolay Aleksandrov	6567cb438a	net: bridge: multicast: add mdb context support Choose the proper bridge multicast context when user-spaces is adding mdb entries. Currently we require the vlan to be configured on at least one device (port or bridge) in order to add an mdb entry if vlan mcast snooping is enabled (vlan snooping implies vlan filtering). Note that we always allow deleting an entry, regardless of the vlan state. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:34:47 -07:00
Joakim Zhang	dabb5db17c	ARM: dts: imx6qdl: move phy properties into phy device node This patch fixes issues found by dtbs_check: make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/fsl,fec.yaml According to the Micrel PHY dt-binding: Documentation/devicetree/bindings/net/micrel-ksz90x1.txt, Add clock delay in an Ethernet OF device node is deprecated, so move these properties to PHY OF device node. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:31:22 -07:00
Joakim Zhang	649502a337	dt-bindings: net: fsl,fec: improve the binding a bit This patch improves the yaml a bit according to Rob Herring comments: 1) normalize interrupt-names property, there is no reason to support random order. 2) validate each string in clock-names property. 3) add constraints for fsl,num-tx-queues/fsl,num-rx-queues property. 4) change additionalProperties to false in order to do strict checking. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:31:22 -07:00
Xin Long	02dc2ee7c7	sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set Currently, in sctp_packet_config(), sctp_transport_pmtu_check() is called to update transport pathmtu with dst's mtu when dst's mtu has been changed by non sctp stack like xfrm. However, this should only happen when SPP_PMTUD_ENABLE is set, no matter where dst's mtu changed. This patch is to fix by checking SPP_PMTUD_ENABLE flag before calling sctp_transport_pmtu_check(). Thanks Jacek for reporting and looking into this issue. v1->v2: - add the missing "{" to fix the build error. Fixes: `69fec325a6` ('Revert "sctp: remove sctp_transport_pmtu_check"') Reported-by: Jacek Szafraniec <jacek.szafraniec@nokia.com> Tested-by: Jacek Szafraniec <jacek.szafraniec@nokia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 14:17:58 -07:00
Linus Torvalds	3d5895cd35	s390 updates for 5.14-rc3 - fix / add expoline usage in "DMA" code - fix compat vdso Makefile to avoid permanent rebuild - fix ftrace_update_ftrace_func to avoid NULL pointer dereference - update defconfigs - trivial coding style fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmD4aBQACgkQIg7DeRsp bsJu3Q/9F5roAWqs+L6Us9tCd+rYvUsJYBhYeI7eQDM9eyb860IgJgCSmxj45WGH 798C0yO01u7DkzN21XOfl8GnnGYSEX7yN1IZkXm7k8rDauxqbk0tc/XsPRJM9wOT WckqSuKJrgZYxnEuiMQQgxv3aBYsQBWAG3z1spDE/PlUyfcBKTYrUrR55t8WZgQp UPE21ArH8wzx2sycxet5+m3UwttbMxdhN0VY9yEg1tFesVKX0CcjpwhGcX1DUPY1 +lJRCEjEvryJWg8xBdPXXBeUjV2U5h8035gi2um75IGe5LL+J3uwQlmkGgUXa635 /oWJ6ST1F7TyJ2wM9Nmcn+MjvwMHs+Sd8vfSgpX040GidNGMlZYL/1494zxQ91XS h51JB1b9m/8kP+RAtoYKEpK0k4ELWXWaI5zAh9gsKGisfH25pMOSxsuQIBpTijRu o9VDUtRXHkmqoQ3wurxrFzWZLVDneBGv5EOK7SxoKGdkpg1PDqkO+ZLarppMsx9g IcLmQwV3mQYQrBttAa061UkgvDxUFhrfUANJraEhI9283aEc8tzc+dRJkNJu48kA kHWHhgKqVysKiyppQO2XbidRMp2vVlVHuvB8e0sMbDsC7a7EH5LW2pRq/B+RCB+1 797j5B9EczsneISU2YNL10JWPMfHeZ8BYnkA5CWRqAbcYdDBAAI= =E8AN -----END PGP SIGNATURE----- Merge tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - fix / add expoline usage in "DMA" code - fix compat vdso Makefile to avoid permanent rebuild - fix ftrace_update_ftrace_func to avoid NULL pointer dereference - update defconfigs - trivial coding style fix * tag 's390-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: update defconfigs s390/cpumf: fix semicolon.cocci warnings s390/boot: fix use of expolines in the DMA code s390/ftrace: fix ftrace_update_ftrace_func implementation s390/defconfig: allow early device mapper disks s390/vdso32: add vdso32.lds to targets	2021-07-21 13:51:26 -07:00
Linus Torvalds	7b6ae471e5	spi: Fixes for v5.14 A collection of driver specific fixes, there was a bit of a kerfuffle with some last minute review on hte spi-cadence-quadspi division by zero change but otherwise nothing terribly remarkable here - important fixes if you have the hardware but nothing with too wide an impact. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmD4WjwACgkQJNaLcl1U h9CofQf/exao6MVAh2aMFv4A0UjQ4jI/wOWzhDY84ooxtTpcSlmAcPNR/yXt7yvc 2s7OcDOSRL8uO9agrnLINDzRrB/+Z8N7ra3sUwzzkNEe6YoOcLYW+GvFSYbyqqPQ w8Ij6xn05RINQ63WuwwNHNwxlNBcXAT/bkKUkuzAinQi91hehwVQrAgoaNmbDzvI B0NhSwVEYvlEsfvmVwOJN4VUDbFor31oE3hNvK685ATRvPEQssbQarloK4OsPajv hOKqDNY/UKggSEFSaeN8gExrylhqEsXk1r+p0S8kKfZnyoNkemPa9xK2QD5YjulE V9rDs90qjWoA1Rw90e61HvDbtDBgdQ== =74y9 -----END PGP SIGNATURE----- Merge tag 'spi-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A collection of driver specific fixes, there was a bit of a kerfuffle with some last minute review on hte spi-cadence-quadspi division by zero change but otherwise nothing terribly remarkable here - important fixes if you have the hardware but nothing with too wide an impact" * tag 'spi-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-bcm2835: Fix deadlock spi: cadence: Correct initialisation of runtime PM again spi: cadence-quadspi: Disable Auto-HW polling spi: spi-cadence-quadspi: Fix division by zero warning spi: spi-cadence-quadspi: Revert "Fix division by zero warning" spi: spi-cadence-quadspi: Fix division by zero warning spi: mediatek: move devm_spi_register_master position spi: mediatek: fix fifo rx mode spi: atmel: Fix CS and initialization bug spi: stm32: fixes pm_runtime calls in probe/remove spi: imx: mx51-ecspi: Reinstate low-speed CONFIGREG delay spi: stm32h7: fix full duplex irq handler handling	2021-07-21 12:41:41 -07:00
Linus Torvalds	7c3d49b0b5	regulator: Fixes for v5.14 A few driver specific fixes that came in since the merge window, plus a change to mark the regulator-fixed-domain DT binding as deprecated in order to try to to discourage any new users while a better solution is put in place. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmD4WckACgkQJNaLcl1U h9DltAgAgxwqrdGPLt/0rGlhpI183L6x81fbLcbvVYwWiyUDqU1rSqZhR+UnxkOf 3tOX4B+4EJTNpArLRcoXD2uu1KP92AhOwv3uGKsJGibS0OC6AwV7adyK+qkBuZsS IJgyI63YfRGP1udWfklZle+CxK6JIRMILbT9oTMGoWrPnK3QfS2rNDapTtpfYRn1 PTIu0TqMQ6xKHg8XPbtmD4INbrbhxP0H3848g0ZhohGozEwggKSEwB1c55cbQc2J I3HpBMVk3sO0UrG6NBpX+fSj0BT4MwjNdFDgmstgwEn/31gSSDjAxaHnfZqWLJqJ cW6rlhw0eSqRXAjrqh6WfK5QiojRXw== =+jG2 -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A few driver specific fixes that came in since the merge window, plus a change to mark the regulator-fixed-domain DT binding as deprecated in order to try to to discourage any new users while a better solution is put in place" * tag 'regulator-fix-v5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: hi6421: Fix getting wrong drvdata regulator: mtk-dvfsrc: Fix wrong dev pointer for devm_regulator_register regulator: fixed: Mark regulator-fixed-domain as deprecated regulator: bd9576: Fix testing wrong flag in check_temp_flag_mismatch regulator: hi6421v600: Fix getting wrong drvdata that causes boot failure regulator: rt5033: Fix n_voltages settings for BUCK and LDO regulator: rtmv20: Fix wrong mask for strobe-polarity-high	2021-07-21 12:37:49 -07:00
Linus Torvalds	b4e62aaf95	AFS fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmD4LMkACgkQ+7dXa6fL C2t4vg//eOjVirl2L9NSwW27btAv3tySeX7zN78i4IIy1KesVTV4An/yNYnZ1XDg yackjIIgRaw/pX+W/dWiQalRtAcYno4vFam874lzBq+Yourk9j+7mU7WDtkofRh4 UtuWAyARGVfLZdkJQp1Eb1YoZj0oro31Qa7eFeHQ/1Wlx6Wo1gdsRAdGfRpukLeC fTv/DXrpK+cC36ENZjP3VuhfsOAKIxKVqf7rLKiMBbeyAGW7jRs3fSTxgOJAgJN0 opj/WTGZf8vskq+u3uTbAv8dwK9CakqB6QEkDq22kZSVh+LSxKzIB6E/3wSXfhAj 1NAKUVQb4e6caXKk/vxFUhnyq8wWdzWW6ExVzH2DYSdNn442A6eo9g6d6qE7wIGK ui0sL1p7ilwV/1aBVA7imYM5bSS3HDpVFCW1+HCZpbVxnARDv87su8iOJGErWpz+ 5fDqifhm+Wrtrhoo05RnsYoygCf5elnz5iUu4MaOYKno1dFQrHBBJhFsv8PAduUq nZkr+q8HUFH7EYcUjidJiMR+NSiVpe8bCV4LoaGczY4j/AYc5rOfD1s3kW0tfiqG zqx+pGy5AXTwrZqzVf28I5T6cBCW1GxLnFLmSWZ4DsQdNGziYbPUTYDCJcJxfT0U cASaEXQSItfxlkVw+JIr+Hb/QXYEfoNEdwWLvUQ2LrPueJwH2V8= =Pn/w -----END PGP SIGNATURE----- Merge tag 'afs-fixes-20210721' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Fix a tracepoint that causes one of the tracing subsystem query files to crash if the module is loaded - Fix afs_writepages() to take account of whether the storage rpc actually succeeded when updating the cyclic writeback counter - Fix some error code propagation/handling - Fix place where afs_writepages() was setting writeback_index to a file position rather than a page index * tag 'afs-fixes-20210721' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Remove redundant assignment to ret afs: Fix setting of writeback_index afs: check function return afs: Fix tracepoint string placement with built-in AFS	2021-07-21 11:51:59 -07:00
Arnd Bergmann	161dcc0242	net: ixp46x: fix ptp build failure The rework of the ixp46x cpu detection left the network driver in a half broken state: drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init': drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function) 290 \| (struct ixp46x_ts_regs __iomem *) IXP4XX_TIMESYNC_BASE_VIRT; \| ^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: note: each undeclared identifier is reported only once for each function it appears in drivers/net/ethernet/xscale/ptp_ixp46x.c: At top level: drivers/net/ethernet/xscale/ptp_ixp46x.c:323:1: error: data definition has no type or storage class [-Werror] 323 \| module_init(ptp_ixp_init); I have patches to complete the transition for a future release, but for the moment, add the missing include statements to get it to build again. Fixes: `09aa9aabdc` ("soc: ixp4xx: move cpu detection to linux/soc/ixp4xx/cpu.h") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:10:24 -07:00
Eric Dumazet	240bfd134c	tcp: tweak len/truesize ratio for coalesce candidates tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh which has a direct impact on advertized window sizes. We added TCP coalescing in linux-3.4 & linux-3.5: Instead of storing skbs with one or two MSS in receive queue (or OFO queue), we try to append segments together to reduce memory overhead. High performance network drivers tend to cook skb with 3 parts : 1) sk_buff structure (256 bytes) 2) skb->head contains room to copy headers as needed, and skb_shared_info 3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU) Once coalesced into a previous skb, 1) and 2) are freed. We can therefore tweak the way we compute len/truesize ratio knowing that skb->truesize is inflated by 1) and 2) soon to be freed. This is done only for in-order skb, or skb coalesced into OFO queue. The result is that low rate flows no longer pay the memory price of having low GRO aggregation factor. Same result for drivers not using GRO. This is critical to allow a big enough receiver window, typically tcp_rmem[2] / 2. We have been using this at Google for about 5 years, it is due time to make it upstream. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:05:57 -07:00
Nikolay Aleksandrov	54cb43199e	net: bridge: multicast: fix igmp/mld port context null pointer dereferences With the recent change to use bridge/port multicast context pointers instead of bridge/port I missed to convert two locations which pass the port pointer as-is, but with the new model we need to verify the port context is non-NULL first and retrieve the port from it. The first location is when doing querier selection when a query is received, the second location is when leaving a group. The port context will be null if the packets originated from the bridge device (i.e. from the host). The fix is simple just check if the port context exists and retrieve the port pointer from it. Fixes: `adc47037a7` ("net: bridge: multicast: use multicast contexts instead of bridge or port") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:04:03 -07:00
Leon Romanovsky	524df92c19	ionic: drop useless check of PCI driver data validity The driver core will call to .remove callback only if .probe succeeded and it will ensure that driver data has pointer to struct ionic. There is no need to check it again. Fixes: `fbfb803153` ("ionic: Add hardware init and device commands") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:03:07 -07:00
Eric Dumazet	739b2adf99	tcp: avoid indirect call in tcp_new_space() For tcp sockets, sk->sk_write_space is most probably sk_stream_write_space(). Other sk->sk_write_space() calls in TCP are slow path and do not deserve any change. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:02:11 -07:00
Andy Shevchenko	7f8b20d0de	net: wwan: iosm: Switch to use module_pci_driver() macro Eliminate some boilerplate code by using module_pci_driver() instead of init/exit, moving the salient bits from init into probe. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 09:01:05 -07:00
Dongliang Mu	dcb713d53e	usb: hso: remove the bailout parameter There are two invocation sites of hso_free_net_device. After refactoring hso_create_net_device, this parameter is useless. Remove the bailout in the hso_free_net_device and change the invocation sites of this function. Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:59:56 -07:00
Dongliang Mu	788e67f18d	usb: hso: fix error handling code of hso_create_net_device The current error handling code of hso_create_net_device is hso_free_net_device, no matter which errors lead to. For example, WARNING in hso_free_net_device [1]. Fix this by refactoring the error handling code of hso_create_net_device by handling different errors by different code. [1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76efe Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com Fixes: `5fcfb6d0bf` ("hso: fix bailout in error case of probe") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:59:56 -07:00
Sukadev Bhattiprolu	bb55362bd6	ibmvnic: Remove the proper scrq flush Commit `65d6470d13` ("ibmvnic: clean pending indirect buffs during reset") intended to remove the call to ibmvnic_tx_scrq_flush() when the ->resetting flag is true and was tested that way. But during the final rebase to net-next, the hunk got applied to a block few lines below (which happened to have the same diff context) and the wrong call to ibmvnic_tx_scrq_flush() got removed. Fix that by removing the correct ibmvnic_tx_scrq_flush() and restoring the one that was incorrectly removed. Fixes: `65d6470d13` ("ibmvnic: clean pending indirect buffs during reset") Reported-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:57:41 -07:00
Wei Liu	f5a11c69b6	Revert "x86/hyperv: fix logical processor creation" This reverts commit `450605c28d`. Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-21 15:55:43 +00:00
Piotr Kwapulinski	1050713026	i40e: add support for PTP external synchronization clock Add support for external synchronization clock via GPIOs. 1PPS signals are handled via the dedicated 3 GPIOs: SDP3_2, SDP3_3 and GPIO_4. Previously it was not possible to use the external PTP synchronization clock. All possible HW configurations are supported. SDP3_2, SDP3_3, GPIO_4 off, off, off off, in_A, off off, out_A, off off, in_B, off off, out_B, off in_A, off, off in_A, in_B, off in_A, out_B, off out_A, off, off out_A, in_B, off in_B, off, off in_B, in_A, off in_B, out_A, off out_B, off, off out_B, in_A, off off, off, in_A off, out_A, in_A off, in_B, in_A off, out_B, in_A out_A, off, in_A out_A, in_B, in_A in_B, off, in_A in_B, out_A, in_A out_B, off, in_A off, off, out_A off, in_A, out_A off, in_B, out_A off, out_B, out_A in_A, off, out_A in_A, in_B, out_A in_A, out_B, out_A in_B, off, out_A in_B, in_A, out_A out_B, off, out_A out_B, in_A, out_A off, off, in_B off, in_A, in_B off, out_A, in_B off, out_B, in_B in_A, off, in_B in_A, out_B, in_B out_A, off, in_B out_B, off, in_B out_B, in_A, in_B off, off, out_B off, in_A, out_B off, out_A, out_B off, in_B, out_B in_A, off, out_B in_A, in_B, out_B out_A, off, out_B out_A, in_B, out_B in_B, off, out_B in_B, in_A, out_B in_B, out_A, out_B Tested with oscilloscope, 1PPS generator and ts2phc. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Tested-by: Ashish K <ashishx.k@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:53:54 -07:00
David S. Miller	3ddaed6b09	Merge branch 'pmtu-esp' Vadim Fedorenko ays: ==================== Fix PMTU for ESP-in-UDP encapsulation Bug 213669 uncovered regression in PMTU discovery for UDP-encapsulated routes and some incorrect usage in udp tunnel fields. This series fixes problems and also adds such case for selftests v3: - update checking logic to account SCTP use case v2: - remove refactor code that was in first patch - move checking logic to __udp{4,6}_lib_err_encap - add more tests, especially routed configuration ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:50:52 -07:00
Vadim Fedorenko	ece1278a9b	selftests: net: add ESP-in-UDP PMTU test The case of ESP in UDP encapsulation was not covered before. Add cases of local changes of MTU and difference on routed path. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:50:35 -07:00
Vadim Fedorenko	9bfce73c89	udp: check encap socket in __udp_lib_err Commit `d26796ae58` ("udp: check udp sock encap_type in __udp_lib_err") added checks for encapsulated sockets but it broke cases when there is no implementation of encap_err_lookup for encapsulation, i.e. ESP in UDP encapsulation. Fix it by calling encap_err_lookup only if socket implements this method otherwise treat it as legal socket. Fixes: `d26796ae58` ("udp: check udp sock encap_type in __udp_lib_err") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:49:31 -07:00
Xin Long	58acd10092	sctp: update active_key for asoc when old key is being replaced syzbot reported a call trace: BUG: KASAN: use-after-free in sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 Call Trace: sctp_auth_shkey_hold+0x22/0xa0 net/sctp/auth.c:112 sctp_set_owner_w net/sctp/socket.c:131 [inline] sctp_sendmsg_to_asoc+0x152e/0x2180 net/sctp/socket.c:1865 sctp_sendmsg+0x103b/0x1d30 net/sctp/socket.c:2027 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:821 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:723 This is an use-after-free issue caused by not updating asoc->shkey after it was replaced in the key list asoc->endpoint_shared_keys, and the old key was freed. This patch is to fix by also updating active_key for asoc when old key is being replaced with a new one. Note that this issue doesn't exist in sctp_auth_del_key_id(), as it's not allowed to delete the active_key from the asoc. Fixes: `1b1e0bc994` ("sctp: add refcnt support for sh_key") Reported-by: syzbot+b774577370208727d12b@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:44:44 -07:00
Vadim Fedorenko	ac6627a28d	net: ipv4: Consolidate ipv4_mtu and ip_dst_mtu_maybe_forward Consolidate IPv4 MTU code the same way it is done in IPv6 to have code aligned in both address families Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:22:03 -07:00
Vadim Fedorenko	427faee167	net: ipv6: introduce ip6_dst_mtu_maybe_forward Replace ip6_dst_mtu_forward with ip6_dst_mtu_maybe_forward and reuse this code in ip6_mtu. Actually these two functions were almost duplicates, this change will simplify the maintaince of mtu calculation code. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:22:02 -07:00
David S. Miller	7c804e91df	Merge branch 'ipv6-ioam' Justin Iurman says: ==================== Support for the IOAM Pre-allocated Trace with IPv6 v5: - Refine types, min/max and default values for new sysctls - Introduce a "_wide" sysctl for each "ioam6_id" sysctl - Add more validation on headers before processing data - RCU for sc <> ns pointers + appropriate accessors - Generic Netlink policies are now per op, not per family anymore - Address other comments/remarks from Jakub (thanks again) - Revert "__packed" to "__attribute__((packed))" for uapi headers - Add tests to cover the functionality added, as requested by David Ahern v4: - Address warnings from checkpatch (ignore errors related to unnamed bitfields in the first patch) - Use of hweight32 (thanks Jakub) - Remove inline keyword from static functions in C files and let the compiler decide what to do (thanks Jakub) v3: - Fix warning "unused label 'out_unregister_genl'" by adding conditional macro - Fix lwtunnel output redirect bug: dst cache useless in this case, use orig_output instead v2: - Fix warning with static for __ioam6_fill_trace_data - Fix sparse warning with __force when casting __be64 to __be32 - Fix unchecked dereference when removing IOAM namespaces or schemas - exthdrs.c: Don't drop by default (now: ignore) to match the act bits "00" - Add control plane support for the inline insertion (lwtunnel) - Provide uapi structures - Use __net_timestamp if skb->tstamp is empty - Add note about the temporary IANA allocation - Remove support for "removable" TLVs - Remove support for virtual/anonymous tunnel decapsulation In-situ Operations, Administration, and Maintenance (IOAM) records operational and telemetry information in a packet while it traverses a path between two points in an IOAM domain. It is defined in draft-ietf-ippm-ioam-data [1]. IOAM data fields can be encapsulated into a variety of protocols. The IPv6 encapsulation is defined in draft-ietf-ippm-ioam-ipv6-options [2], via extension headers. IOAM can be used to complement OAM mechanisms based on e.g. ICMP or other types of probe packets. This patchset implements support for the Pre-allocated Trace, carried by a Hop-by-Hop. Therefore, a new IPv6 Hop-by-Hop TLV option is introduced, see IANA [3]. The three other IOAM options are not included in this patchset (Incremental Trace, Proof-of-Transit and Edge-to-Edge). The main idea behind the IOAM Pre-allocated Trace is that a node pre-allocates some room in packets for IOAM data. Then, each IOAM node on the path will insert its data. There exist several interesting use- cases, e.g. Fast failure detection/isolation or Smart service selection. Another killer use-case is what we have called Cross-Layer Telemetry, see the demo video on its repository [4], that aims to make the entire stack (L2/L3 -> L7) visible for distributed tracing tools (e.g. Jaeger), instead of the current L5 -> L7 limited view. So, basically, this is a nice feature for the Linux Kernel. This patchset also provides support for the control plane part, but only for the inline insertion (host-to-host use case), through lightweight tunnels. Indeed, for in-transit traffic, the solution is to have an IPv6-in-IPv6 encapsulation, which brings some difficulties and still requires a little bit of work and discussion (ie anonymous tunnel decapsulation and multi egress resolution). - Patch 1: IPv6 IOAM headers definition - Patch 2: Data plane support for Pre-allocated Trace - Patch 3: IOAM Generic Netlink API - Patch 4: Support for IOAM injection with lwtunnels - Patch 5: Documentation for new IOAM sysctls - Patch 6: Test for the IOAM insertion with IPv6 [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2 [4] https://github.com/iurmanj/cross-layer-telemetry ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:21:39 -07:00
Justin Iurman	968691c777	selftests: net: Test for the IOAM insertion with IPv6 This test evaluates the IOAM insertion for IPv6 by checking the IOAM data integrity on the receiver. The topology is formed by 3 nodes: Alpha (sender), Beta (router in-between) and Gamma (receiver). An IOAM domain is configured from Alpha to Gamma only, which means not on the reverse path. When Gamma is the destination, Alpha adds an IOAM option (Pre-allocated Trace) inside a Hop-by-hop and fills the trace with its own IOAM data. Beta and Gamma also fill the trace. The IOAM data integrity is checked on Gamma, by comparing with the pre-defined IOAM configuration (see below). +-------------------+ +-------------------+ \| \| \| \| \| alpha netns \| \| gamma netns \| \| \| \| \| \| +-------------+ \| \| +-------------+ \| \| \| veth0 \| \| \| \| veth0 \| \| \| \| db01::2/64 \| \| \| \| db02::2/64 \| \| \| +-------------+ \| \| +-------------+ \| \| . \| \| . \| +-------------------+ +-------------------+ . . . . . . +----------------------------------------------------+ \| . . \| \| +-------------+ +-------------+ \| \| \| veth0 \| \| veth1 \| \| \| \| db01::1/64 \| ................ \| db02::1/64 \| \| \| +-------------+ +-------------+ \| \| \| \| beta netns \| \| \| +--------------------------+-------------------------+ ~~~~~~~~~~~~~~~~~~~~~~ \| IOAM configuration \| ~~~~~~~~~~~~~~~~~~~~~~ Alpha +-----------------------------------------------------------+ \| Type \| Value \| +-----------------------------------------------------------+ \| Node ID \| 1 \| +-----------------------------------------------------------+ \| Node Wide ID \| 11111111 \| +-----------------------------------------------------------+ \| Ingress ID \| 0xffff (default value) \| +-----------------------------------------------------------+ \| Ingress Wide ID \| 0xffffffff (default value) \| +-----------------------------------------------------------+ \| Egress ID \| 101 \| +-----------------------------------------------------------+ \| Egress Wide ID \| 101101 \| +-----------------------------------------------------------+ \| Namespace Data \| 0xdeadbee0 \| +-----------------------------------------------------------+ \| Namespace Wide Data \| 0xcafec0caf00dc0de \| +-----------------------------------------------------------+ \| Schema ID \| 777 \| +-----------------------------------------------------------+ \| Schema Data \| something that will be 4n-aligned \| +-----------------------------------------------------------+ Note: When Gamma is the destination, Alpha adds an IOAM Pre-allocated Trace option inside a Hop-by-hop, where 164 bytes are pre-allocated for the trace, with 123 as the IOAM-Namespace and with 0xfff00200 as the trace type (= all available options at this time). As a result, and based on IOAM configurations here, only both Alpha and Beta should be capable of inserting their IOAM data while Gamma won't have enough space and will set the overflow bit. Beta +-----------------------------------------------------------+ \| Type \| Value \| +-----------------------------------------------------------+ \| Node ID \| 2 \| +-----------------------------------------------------------+ \| Node Wide ID \| 22222222 \| +-----------------------------------------------------------+ \| Ingress ID \| 201 \| +-----------------------------------------------------------+ \| Ingress Wide ID \| 201201 \| +-----------------------------------------------------------+ \| Egress ID \| 202 \| +-----------------------------------------------------------+ \| Egress Wide ID \| 202202 \| +-----------------------------------------------------------+ \| Namespace Data \| 0xdeadbee1 \| +-----------------------------------------------------------+ \| Namespace Wide Data \| 0xcafec0caf11dc0de \| +-----------------------------------------------------------+ \| Schema ID \| 0xffffff (= None) \| +-----------------------------------------------------------+ \| Schema Data \| \| +-----------------------------------------------------------+ Gamma +-----------------------------------------------------------+ \| Type \| Value \| +-----------------------------------------------------------+ \| Node ID \| 3 \| +-----------------------------------------------------------+ \| Node Wide ID \| 33333333 \| +-----------------------------------------------------------+ \| Ingress ID \| 301 \| +-----------------------------------------------------------+ \| Ingress Wide ID \| 301301 \| +-----------------------------------------------------------+ \| Egress ID \| 0xffff (default value) \| +-----------------------------------------------------------+ \| Egress Wide ID \| 0xffffffff (default value) \| +-----------------------------------------------------------+ \| Namespace Data \| 0xdeadbee2 \| +-----------------------------------------------------------+ \| Namespace Wide Data \| 0xcafec0caf22dc0de \| +-----------------------------------------------------------+ \| Schema ID \| 0xffffff (= None) \| +-----------------------------------------------------------+ \| Schema Data \| \| +-----------------------------------------------------------+ Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:16:11 -07:00
Justin Iurman	de8e80a54c	ipv6: ioam: Documentation for new IOAM sysctls Add documentation for new IOAM sysctls: - ioam6_id and ioam6_id_wide: two per-namespace sysctls - ioam6_enabled, ioam6_id and ioam6_id_wide: three per-interface sysctls Example of IOAM configuration based on the following simple topology: _____ _____ _____ \| \| eth0 eth0 \| \| eth1 eth0 \| \| \| A \|.----------.\| B \|.----------.\| C \| \|_____\| \|_____\| \|_____\| 1) Node and interface IDs can be configured for IOAM: # IOAM ID of A = 1, IOAM ID of A.eth0 = 11 (A) sysctl -w net.ipv6.ioam6_id=1 (A) sysctl -w net.ipv6.conf.eth0.ioam6_id=11 # IOAM ID of B = 2, IOAM ID of B.eth0 = 21, IOAM ID of B.eth1 = 22 (B) sysctl -w net.ipv6.ioam6_id=2 (B) sysctl -w net.ipv6.conf.eth0.ioam6_id=21 (B) sysctl -w net.ipv6.conf.eth1.ioam6_id=22 # IOAM ID of C = 3, IOAM ID of C.eth0 = 31 (C) sysctl -w net.ipv6.ioam6_id=3 (C) sysctl -w net.ipv6.conf.eth0.ioam6_id=31 Note that "_wide" IDs equivalents can be configured the same way. 2) Each node can be configured to form an IOAM domain. For instance, we allow IOAM from A to C only (not the reverse path), i.e. enable IOAM on ingress for B.eth0 and C.eth0: (B) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1 (C) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1 3) An IOAM domain (e.g. ID=123) is defined and made known to each node: (A) ip ioam namespace add 123 (B) ip ioam namespace add 123 (C) ip ioam namespace add 123 4) Finally, an IOAM Pre-allocated Trace can be inserted in traffic sent by A when C (e.g. db02::2) is the destination: (A) ip -6 route add db02::2/128 encap ioam6 trace type 0x800000 ns 123 size 12 dev eth0 Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Justin Iurman	3edede08ff	ipv6: ioam: Support for IOAM injection with lwtunnels Add support for the IOAM inline insertion (only for the host-to-host use case) which is per-route configured with lightweight tunnels. The target is iproute2 and the patch is ready. It will be posted as soon as this patchset is merged. Here is an overview: $ ip -6 ro ad fc00::1/128 encap ioam6 trace type 0x800000 ns 1 size 12 dev eth0 This example configures an IOAM Pre-allocated Trace option attached to the fc00::1/128 prefix. The IOAM namespace (ns) is 1, the size of the pre-allocated trace data block is 12 octets (size) and only the first IOAM data (bit 0: hop_limit + node id) is included in the trace (type) represented as a bitfield. The reason why the in-transit (IPv6-in-IPv6 encapsulation) use case is not implemented is explained on the patchset cover. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00
Justin Iurman	8c6f6fa677	ipv6: ioam: IOAM Generic Netlink API Add Generic Netlink commands to allow userspace to configure IOAM namespaces and schemas. The target is iproute2 and the patch is ready. It will be posted as soon as this patchset is merged. Here is an overview: $ ip ioam Usage: ip ioam { COMMAND \| help } ip ioam namespace show ip ioam namespace add ID [ data DATA32 ] [ wide DATA64 ] ip ioam namespace del ID ip ioam schema show ip ioam schema add ID DATA ip ioam schema del ID ip ioam namespace set ID schema { ID \| none } Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-21 08:14:33 -07:00

1 2 3 4 5 ...

1030188 Commits