linux

Author	SHA1	Message	Date
Geliang Tang	478d770008	mptcp: send out MP_FAIL when data checksum fails When a bad checksum is detected, set the send_mp_fail flag to send out the MP_FAIL option. Add a new function mptcp_has_another_subflow() to check whether there's only a single subflow. When multiple subflows are in use, close the affected subflow with a RST that includes an MP_FAIL option and discard the data with the bad checksum. Set the sk_state of the subsocket to TCP_CLOSE, then the flag MPTCP_WORK_CLOSE_SUBFLOW will be set in subflow_sched_work_if_closed, and the subflow will be closed. When a single subfow is in use, temporarily handled by sending MP_FAIL with a RST too. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 11:02:35 +01:00
Geliang Tang	5580d41b75	mptcp: MP_FAIL suboption receiving This patch added handling for receiving MP_FAIL suboption. Add a new members mp_fail and fail_seq in struct mptcp_options_received. When MP_FAIL suboption is received, set mp_fail to 1 and save the sequence number to fail_seq. Then invoke mptcp_pm_mp_fail_received to deal with the MP_FAIL suboption. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 11:02:34 +01:00
Geliang Tang	c25aeb4e09	mptcp: MP_FAIL suboption sending This patch added the MP_FAIL suboption sending support. Add a new flag named send_mp_fail in struct mptcp_subflow_context. If this flag is set, send out MP_FAIL suboption. Add a new member fail_seq in struct mptcp_out_options to save the data sequence number to put into the MP_FAIL suboption. An MP_FAIL option could be included in a RST or on the subflow-level ACK. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 11:02:34 +01:00
Paolo Abeni	1bff1e43a3	mptcp: optimize out option generation Currently we have several protocol constraints on MPTCP options generation (e.g. MPC and MPJ subopt are mutually exclusive) and some additional ones required by our implementation (e.g. almost all ADD_ADDR variant are mutually exclusive with everything else). We can leverage the above to optimize the out option generation: we check DSS/MPC/MPJ presence in a mutually exclusive way, avoiding many unneeded conditionals in the common cases. Additionally extend the existing constraints on ADD_ADDR opt on all subvariants, so that it becomes fully mutually exclusive with the above and we can skip another conditional statement for the common case. This change is also needed by the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 11:02:34 +01:00
Gilad Naaman	406f42fa0d	net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically. The source of most of the slow down is the `dev_addr_lists.c` module, which mainatins a linked list of HW addresses. When using IPv6, this list grows for each IPv6 address added on a VLAN, since each IPv6 address has a multicast HW address associated with it. When performing any modification to the involved links, this list is traversed many times, often for nothing, all while holding the RTNL lock. Instead, this patch adds an auxilliary rbtree which cuts down traversal time significantly. Performance can be seen with the following script: #!/bin/bash ip netns del test \|\| true 2>/dev/null ip netns add test echo 1 \| ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null set -e ip -n test link add foo type veth peer name bar ip -n test link add b1 type bond ip -n test link add florp type vrf table 10 ip -n test link set bar master b1 ip -n test link set foo up ip -n test link set bar up ip -n test link set b1 up ip -n test link set florp up VLAN_COUNT=1500 BASE_DEV=b1 echo Creating vlans ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done" echo Bringing them up ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i up; done" echo Assiging IPv6 Addresses ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test address add dev foo.\$i 2000::\$i/64; done" echo Attaching to VRF ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT); do ip -n test link set foo.\$i master florp; done" On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance before the patch is (truncated): Creating vlans real 108.35 Bringing them up real 4.96 Assiging IPv6 Addresses real 19.22 Attaching to VRF real 458.84 After the patch: Creating vlans real 5.59 Bringing them up real 5.07 Assiging IPv6 Addresses real 5.64 Attaching to VRF real 25.37 Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Lu Wei <luwei32@huawei.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-25 10:29:07 +01:00
Kangmin Park	a37c5c2669	net: bridge: change return type of br_handle_ingress_vlan_tunnel br_handle_ingress_vlan_tunnel() is only referenced in br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and return non-zero value, goto drop in br_handle_frame(). But, br_handle_ingress_vlan_tunnel() always return 0. So, the routines that check the return value and goto drop has no meaning. Therefore, change return type of br_handle_ingress_vlan_tunnel() to void and remove if statement of br_handle_frame(). Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-24 16:51:09 -07:00
Xu Liu	fab60e29fc	bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com	2021-08-24 14:17:53 -07:00
Yufeng Mo	f3ccfda193	ethtool: extend coalesce setting uAPI with CQE mode In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-24 07:38:29 -07:00
Yufeng Mo	029ee6b143	ethtool: add two coalesce attributes for CQE mode Currently, there are many drivers who support CQE mode configuration, some configure it as a fixed when initialized, some provide an interface to change it by ethtool private flags. In order to make it more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and 'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these parameters can be accessed by ethtool netlink coalesce uAPI. Also add an new structure kernel_ethtool_coalesce, then the new parameter can be added into this struct. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-24 07:38:28 -07:00
Yunsheng Lin	7fb9b66dc9	page_pool: use relaxed atomic for release side accounting There is no need to synchronize the account updating, so use the relaxed atomic to avoid some memory barrier in the data path. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 10:46:31 +01:00
zhang kai	446e7f218b	ipv6: correct comments about fib6_node sernum correct comments in set and get fn_sernum Signed-off-by: zhang kai <zhangkaiheb@126.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:59:01 +01:00
Vladimir Oltean	58adf9dcb1	net: dsa: let drivers state that they need VLAN filtering while standalone As explained in commit `e358bef7c3` ("net: dsa: Give drivers the chance to veto certain upper devices"), the hellcreek driver uses some tricks to comply with the network stack expectations: it enforces port separation in standalone mode using VLANs. For untagged traffic, bridging between ports is prevented by using different PVIDs, and for VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on two ports, so packets with one VLAN cannot leak from one port to another. That is almost fine, and has worked because hellcreek relied on an implicit behavior of the DSA core that was changed by the previous patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in standalone mode, that feature was actually incorrectly reflecting the hardware/driver state, so there was a desire to fix it. This leaves the hellcreek driver in a situation where it has to explicitly request this behavior from the DSA framework. We configure the ports as follows: - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid and will add a VLAN to the hardware tables, giving the driver the opportunity to refuse it through .port_prechangeupper. - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper on top of a bridged hellcreek port will not go through dsa_slave_vlan_rx_add_vid, because there will not be any attempt to offload this VLAN. The driver already disables VLAN awareness, so that upper should receive the traffic it needs. - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid, and can again be vetoed through .port_prechangeupper. It is not actually completely fine, because if I follow through correctly, we can have the following situation: ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 # lan0 now becomes VLAN-unaware ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation This patch fixes that corner case by extending the DSA core logic, based on this requested attribute, to change the VLAN awareness state of the switch (port) when it leaves the bridge. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:30:58 +01:00
Vladimir Oltean	06cfb2df7e	net: dsa: don't advertise 'rx-vlan-filter' when not needed There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit `9b236d2a69` ("net: dsa: Advertise the VLAN offload netdev ability only if switch supports it"). Up to that point, the driver was returning -EOPNOTSUPP and DSA was reconverting that error to 0, making the 8021q upper think all is ok (but obviously the error message was there even prior to this change). After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it is a hard error. Case 2: Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL because they don't implement .port_bridge_{join,leave}). Understandably, a standalone port should not offload VLANs either, it should remain VLAN unaware and any VLAN should be a software VLAN (as long as the hardware is not quirky, that is). In fact, dsa_slave_port_obj_add does do the right thing and rejects switchdev VLAN objects coming from the bridge when that bridge is not offloaded: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); But it seems that the bridge is able to trick us. The __vlan_vid_add from br_vlan.c has: /* Try switchdev op first. In case it is not supported, fallback to * 8021q add. / err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack); if (err == -EOPNOTSUPP) return vlan_vid_add(dev, br->vlan_proto, v->vid); So it says "no, no, you need this VLAN in your life!". And we, naive as we are, say "oh, this comes from the vlan_vid_add code path, it must be an 8021q upper, sure, I'll take that". And we end up with that bridge VLAN installed on our port anyway. But this time, it has the wrong flags: if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN, failed via switchdev, retried via vlan_vid_add, we have this comment: / This API only allows programming tagged, non-PVID VIDs */ So what we do makes absolutely no sense. Backtracing a bit, we see the common pattern. We allow the network stack to think that our standalone ports are VLAN-aware, but they aren't, for the vast majority of switches. The quirky ones should not dictate the norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid methods exist for drivers that need the 'rx-vlan-filter: on' feature in ethtool -k, which can be due to any of the following reasons: 1. vlan_filtering_is_global = true, and some ports are under a VLAN-aware bridge while others are standalone, and the standalone ports would otherwise drop VLAN-tagged traffic. This is described in commit `061f6a505a` ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation"). 2. the ports that are under a VLAN-aware bridge should also set this feature, for 8021q uppers having a VID not claimed by the bridge. In this case, the driver will essentially not even know that the VID is coming from the 8021q layer and not the bridge. 3. Hellcreek. This driver needs it because in standalone mode, it uses unique VLANs per port to ensure separation. For separation of untagged traffic, it uses different PVIDs for each port, and for separation of VLAN-tagged traffic, it never accepts 8021q uppers with the same vid on two ports. If a driver does not fall under any of the above 3 categories, there is no reason why it should advertise the 'rx-vlan-filter' feature, therefore no reason why it should offload the VLANs added through vlan_vid_add. This commit fixes the problem by removing the 'rx-vlan-filter' feature from the slave devices when they operate in standalone mode, and when they offload a VLAN-unaware bridge. The way it works is that vlan_vid_add will now stop its processing here: vlan_add_rx_filter_info: if (!vlan_hw_filter_capable(dev, proto)) return 0; So the VLAN will still be saved in the interface's VLAN RX filtering list, but because it does not declare VLAN filtering in its features, the 8021q module will return zero without committing that VLAN to hardware. This gives the drivers what they want, since it keeps the 8021q VLANs away from the VLAN table until VLAN awareness is enabled (point at which the ports are no longer standalone, hence in the mv88e6xxx case, the check in mv88e6xxx_port_vlan_prepare passes). Since the issue predates the existence of the hellcreek driver, case 3 will be dealt with in a separate patch. The main change that this patch makes is to no longer set NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically (for most switches, never). The second part of the patch addresses an issue that the first part introduces: because the 'rx-vlan-filter' feature is now dynamically toggled, and our .ndo_vlan_rx_add_vid does not get called when 'rx-vlan-filter' is off, we need to avoid bugs such as the following by replaying the VLANs from 8021q uppers every time we enable VLAN filtering: ip link add link lan0 name lan0.100 type vlan id 100 ip addr add 192.168.100.1/24 dev lan0.100 ping 192.168.100.2 # should work ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 ping 192.168.100.2 # should still work ip link set br0 type bridge vlan_filtering 1 ping 192.168.100.2 # should still work but doesn't As reported by Florian, some drivers look at ds->vlan_filtering in their .port_vlan_add() implementation. So this patch also makes sure that ds->vlan_filtering is committed before calling the driver. This is the reason why it is first committed, then restored on the failure path. Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:30:58 +01:00
Vladimir Oltean	67b5fb5db7	net: dsa: properly fall back to software bridging If the driver does not implement .port_bridge_{join,leave}, then we must fall back to standalone operation on that port, and trigger the error path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL. In turn, having a non-NULL dp->bridge_dev when there is no offloading support makes the following things go wrong: - dsa_default_offload_fwd_mark make the wrong decision in setting skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for ports that don't offload the bridge, which should instruct the bridge to forward in software. But this does not happen, dp->bridge_dev is incorrectly set to point to the bridge, so the bridge is told that packets have been forwarded in hardware, which they haven't. - switchdev objects (MDBs, VLANs) should not be offloaded by ports that don't offload the bridge. Standalone ports should behave as packet-in, packet-out and the bridge should not be able to manipulate the pvid of the port, or tag stripping on egress, or ingress filtering. This should already work fine because dsa_slave_port_obj_add has: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); but since dsa_port_offloads_bridge_port works based on dp->bridge_dev, this is again sabotaging us. All the above work in case the port has an unoffloaded LAG interface, so this is well exercised code, we should apply it for plain unoffloaded bridge ports too. Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:30:58 +01:00
Vladimir Oltean	09dba21b43	net: dsa: don't call switchdev_bridge_port_unoffload for unoffloaded bridge ports For ports that have a NULL dp->bridge_dev, dsa_port_to_bridge_port() also returns NULL as expected. Issue #1 is that we are performing a NULL pointer dereference on brport_dev. Issue #2 is that these are ports on which switchdev_bridge_port_offload has not been called, so we should not call switchdev_bridge_port_unoffload on them either. Both issues are addressed by checking against a NULL brport_dev in dsa_port_pre_bridge_leave and exiting early. Fixes: `2f5dc00f7a` ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:30:58 +01:00
Lorenzo Bianconi	f5a4c24e68	mac80211: introduce individual TWT support in AP mode Introduce TWT action frames parsing support to mac80211. Currently just individual TWT agreement are support in AP mode. Whenever the AP receives a TWT action frame from an associated client, after performing sanity checks, it will notify the underlay driver with requested parameters in order to check if they are supported and if there is enough room for a new agreement. The driver is expected to set the agreement result and report it to mac80211. Drivers supporting this have two new callbacks: - add_twt_setup (mandatory) - twt_teardown_request (optional) mac80211 will send an action frame reply according to the result reported by the driver. Tested-by: Peter Chiu <chui-hao.chiu@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/257512f2e22ba42b9f2624942a128dd8f141de4b.1629741512.git.lorenzo@kernel.org [use le16p_replace_bits(), minor cleanups, use (void *) casts, fix to use ieee80211_get_he_iftype_cap() correctly] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-24 10:30:43 +02:00
Yonglong Li	c233ef1390	mptcp: remove MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT are not necessary, we can get these info from pm.local or pm.remote. Drop mptcp_pm_should_add_signal_ipv6 and mptcp_pm_should_add_signal_port too. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:28:29 +01:00
Yonglong Li	f462a44638	mptcp: build ADD_ADDR/echo-ADD_ADDR option according pm.add_signal According to the MPTCP_ADD_ADDR_SIGNAL or MPTCP_ADD_ADDR_ECHO flag, build the ADD_ADDR/ADD_ADDR_ECHO option. In mptcp_pm_add_addr_signal(), use opts->addr to save the announced ADD_ADDR or ADD_ADDR_ECHO address. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:28:28 +01:00
Yonglong Li	119c022096	mptcp: fix ADD_ADDR and RM_ADDR maybe flush addr_signal each other ADD_ADDR shares pm.addr_signal with RM_ADDR, so after RM_ADDR/ADD_ADDR has done, we should not clean ADD_ADDR/RM_ADDR's addr_signal. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:28:28 +01:00
Yonglong Li	18fc1a922e	mptcp: make MPTCP_ADD_ADDR_SIGNAL and MPTCP_ADD_ADDR_ECHO separate Use MPTCP_ADD_ADDR_SIGNAL only for the action of sending ADD_ADDR, and use MPTCP_ADD_ADDR_ECHO only for the action of sending ADD_ADDR echo. Use msk->pm.local to save the announced ADD_ADDR address only, and reuse msk->pm.remote to save the announced ADD_ADDR_ECHO address. To prepare for the next patch. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:28:28 +01:00
Yonglong Li	1f5e9e2f5f	mptcp: move drop_other_suboptions check under pm lock This patch moved the drop_other_suboptions check from mptcp_established_options_add_addr() into mptcp_pm_add_addr_signal(), do it under the PM lock to avoid the race between this check and mptcp_pm_add_addr_signal(). For this, added a new parameter for mptcp_pm_add_addr_signal() to get the drop_other_suboptions value. And drop the other suboptions after the option length check if drop_other_suboptions is true. Additionally, always drop the other suboption for TCP pure ack: that makes both the code simpler and the MPTCP behaviour more consistent. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:28:28 +01:00
Yajun Deng	faf482ca19	net: ipv4: Move ip_options_fragment() out of loop The ip_options_fragment() only called when iter->offset is equal to zero, so move it out of loop, and inline 'Copy the flags to each fragment.' As also, remove the unused parameter in ip_frag_ipcb(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-24 09:24:18 +01:00
Dave Marchevsky	6fc88c354f	bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf attach types and a function to map bpf_attach_type values to the new enum. Inspired by netns_bpf_attach_type. Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever possible. Functionality is unchanged as attach_type_to_prog_type switches in bpf/syscall.c were preventing non-cgroup programs from making use of the invalid cgroup_bpf array slots. As a result struct cgroup_bpf uses 504 fewer bytes relative to when its arrays were sized using MAX_BPF_ATTACH_TYPE. bpf_cgroup_storage is notably not migrated as struct bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type member which is not meant to be opaque. Similarly, bpf_cgroup_link continues to report its bpf_attach_type member to userspace via fdinfo and bpf_link_info. To ease disambiguation, bpf_attach_type variables are renamed from 'type' to 'atype' when changed to cgroup_bpf_attach_type. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com	2021-08-23 17:50:24 -07:00
Jiang Wang	d359902d5c	af_unix: Fix NULL pointer bug in unix_shutdown Commit `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function will call prot->unhash(), which is NULL for SEQPACKET. And kernel will panic. On ARM32, it will show following messages: (it likely affects x86 too). Fix the bug by checking the prot->unhash is NULL or not first. Kernel log: <--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 2fba1ffb *pgd=00000000 Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2 Modules linked in: CPU: 1 PID: 1999 Comm: falkon Tainted: G W 5.14.0-rc5-01175-g94531cfcbe79-dirty #9240 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at 0x0 LR is at unix_shutdown+0x81/0x1a8 pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013 sp : e45aff70 ip : e463a3c0 fp : beb54f04 r10: 00000125 r9 : e45ae000 r8 : c4a56664 r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400 r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 05aa804a DAC: 00000051 Register r0 information: slab PING start c4a56400 pointer offset 0 Register r1 information: NULL pointer Register r2 information: slab task_struct start c5a6b180 pointer offset 0 Register r3 information: NULL pointer Register r4 information: slab PING start c4a56400 pointer offset 0 Register r5 information: non-paged memory Register r6 information: slab PING start c4a56400 pointer offset 100 Register r7 information: non-paged memory Register r8 information: slab PING start c4a56400 pointer offset 612 Register r9 information: non-slab/vmalloc memory Register r10 information: non-paged memory Register r11 information: non-paged memory Register r12 information: slab filp start e463a3c0 pointer offset 0 Process falkon (pid: 1999, stack limit = 0x9ec48895) Stack: (0xe45aff70 to 0xe45b0000) ff60: e45ae000 c5f26a00 00000000 00000125 ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4 ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04 ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000 [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50) [<c07f7fa3>] (__sys_shutdown) from [<c010024b>] (__sys_trace_return+0x1/0x16) Exception stack(0xe45affa8 to 0xe45afff0) Fixes: `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com	2021-08-23 14:56:01 +02:00
Vladimir Oltean	f5e165e72b	net: dsa: track unique bridge numbers across all DSA switch trees Right now, cross-tree bridging setups work somewhat by mistake. In the case of cross-tree bridging with sja1105, all switch instances need to agree upon a common VLAN ID for forwarding a packet that belongs to a certain bridging domain. With TX forwarding offload, the VLAN ID is the bridge VLAN for VLAN-aware bridging, and the tag_8021q TX forwarding offload VID (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging. The VBID for VLAN-unaware bridging is derived from the dp->bridge_num value calculated by DSA independently for each switch tree. If ports from one tree join one bridge, and ports from another tree join another bridge, DSA will assign them the same bridge_num, even though the bridges are different. If cross-tree bridging is supported, this is an issue. Modify DSA to calculate the bridge_num globally across all switch trees. This has the implication for a driver that the dp->bridge_num value that DSA will assign to its ports might not be contiguous, if there are boards with multiple DSA drivers instantiated. Additionally, all bridge_num values eat up towards each switch's ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate, and can be seen as a limitation introduced by this patch. However, that is the lesser evil for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-23 11:52:31 +01:00
Shreyansh Chouhan	9cf448c200	ip6_gre: add validation for csum_start Validate csum_start in gre_handle_offloads before we call _gre_xmit so that we do not crash later when the csum_start value is used in the lco_csum function call. This patch deals with ipv6 code. Fixes: Fixes: `b05229f442` ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-22 21:24:41 +01:00
Shreyansh Chouhan	1d011c4803	ip_gre: add validation for csum_start Validate csum_start in gre_handle_offloads before we call _gre_xmit so that we do not crash later when the csum_start value is used in the lco_csum function call. This patch deals with ipv4 code. Fixes: `c544193214` ("GRE: Refactor GRE tunneling code.") Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-22 21:24:40 +01:00
Chuck Lever	3a12618059	SUNRPC: Server-side disconnect injection Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect enables administrators to turn off server-side disconnect injection while allowing other types of sunrpc errors to be injected. The default setting is that server-side disconnect injection is enabled (ignore=false). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-20 13:50:33 -04:00
Chuck Lever	a4ae308143	SUNRPC: Move client-side disconnect injection Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. Convert the existing client-side disconnect injection infrastructure to use the kernel's generic error injection facility. The generic facility has a richer set of injection criteria. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-20 13:50:32 -04:00
Chuck Lever	c782af2500	SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory This directory will contain a set of administrative controls for enabling error injection for kernel RPC consumers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-20 13:50:32 -04:00
Jakub Kicinski	4af14dbaea	Minor updates: * BSS coloring support * MEI commands for Intel platforms * various fixes/cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmEfiUEACgkQB8qZga/f l8QnDRAAiLKYLSfxTCf73KT/cUFOXPAAPG8r23tpgb1h/GgXcTVSTccnBfDWZWnv cX2YknF2hP3JU4SCjV28io+IvftlYp7gMua8yN6ekMQ31Dzw3kkVQbrNdF7sWglQ XhLws8KrZ6UwFSxqjSg1cX2oAYb6AwkDy5z+1lX+PxBoH5duVlWzCYe3ohXnuZgR 8Y9iz1eIxCprt1aXSIHyyT96a7TTG552T+FjenZM2+wHDu4sXGJLWSSUkM6rXUQI bXV1gon50VaqfmbAfusgmhSmmCfHqlx7P0sBKxdJ17WuuBf3FtGLQX07EzBU/0vD t3jSv4x4tXWhnA+Lyxx89WT13pz89iPnI4OjpxaxE+4wTOeiGrTZcHNMMHrgyEcQ f1PiEwwsTt5PM4y/e1rhdpm2Mrw3VxDOQ+A/vHPUzoLv7ewehGw8IGw6spocA+QL 1YB3g6tiVPEIbfczTY+qJy8E4MRh93toO3O2DywE+UXxC3OR8Eo/tFB+yREzCnnN 6jAMGrYYDONKSiU3x9NRlV5luqPmUa1Rwjv+dkg+g7jES1hAjZa+oKtPffDzzGOn C8bsN8TEiMyHtmtgi5IZQTIS0/rqKVV+rgA0DHoTqRVzuInA1Pmrx0k4H6Vy7zGA HahTJVPoO4VHWTtmztwkbONfEe0hR0OE/MGHgEqzdkbpIqVkMBw= =q8bL -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Minor updates: * BSS coloring support * MEI commands for Intel platforms * various fixes/cleanups * tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: cfg80211: fix BSS color notify trace enum confusion mac80211: Fix insufficient headroom issue for AMSDU mac80211: add support for BSS color change nl80211: add support for BSS coloring mac80211: Use flex-array for radiotap header bitmap mac80211: radiotap: Use BIT() instead of shifts mac80211: Remove unnecessary variable and label mac80211: include <linux/rbtree.h> mac80211: Fix monitor MTU limit so that A-MSDUs get through mac80211: remove unnecessary NULL check in ieee80211_register_hw() mac80211: Reject zero MAC address in sta_info_insert_check() nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage ==================== Link: https://lore.kernel.org/r/20210820105329.48674-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-20 10:09:22 -07:00
Nikolay Aleksandrov	2796d846d7	net: bridge: vlan: convert mcast router global option to per-vlan entry The per-vlan router option controls the port/vlan and host vlan entries' mcast router config. The global option controlled only the host vlan config, but that is unnecessary and incosistent as it's not really a global vlan option, but rather bridge option to control host router config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host vlan and port vlan mcast router config. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-20 15:00:35 +01:00
Nikolay Aleksandrov	a53581d555	net: bridge: mcast: br_multicast_set_port_router takes multicast context as argument Change br_multicast_set_port_router to take port multicast context as its first argument so we can later use it to control port/vlan mcast router option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-20 15:00:35 +01:00
Xiaolong Huang	7e78c597c3	net: qrtr: fix another OOB Read in qrtr_endpoint_post This check was incomplete, did not consider size is 0: if (len != ALIGN(size, 4) + hdrlen) goto err; if size from qrtr_hdr is 0, the result of ALIGN(size, 4) will be 0, In case of len == hdrlen and size == 0 in header this check won't fail and if (cb->type == QRTR_TYPE_NEW_SERVER) { /* Remote node endpoint can bridge other distant nodes / const struct qrtr_ctrl_pkt pkt = data + hdrlen; qrtr_node_assign(node, le32_to_cpu(pkt->server.node)); } will also read out of bound from data, which is hdrlen allocated block. Fixes: `194ccc8829` ("net: qrtr: Support decoding incoming v2 packets") Fixes: `ad9d24c942` ("net: qrtr: fix OOB Read in qrtr_endpoint_post") Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-20 14:41:03 +01:00
David S. Miller	e61fbee7be	bluetooth-next pull request for net-next: - Add support for Foxconn Mediatek Chip - Add support for LG LGSBWAC92/TWCM-K505D - hci_h5 flow control fixes and suspend support - Switch to use lock_sock for SCO and RFCOMM - Various fixes for extended advertising - Reword Intel's setup on btusb unifying the supported generations -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmEe1xEZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKX/YEACMlYxmWJn2birrH5h4c+FA 6hzoDw+Kp+/Qo0FYPgWw6ady+cKuh50itKz8W050JR+n9eVdRehZ3Rlr/Yv2ol51 TSTjRKPbeDmtkGzC9h+dVBgkEERF88mF8FZiFXp+9vG/dfS4Lq2WdWzEFuYmfZyD ZMuI9PsepmprORVI37B1WjZfdUo2XeA9ZKHUVSesgarNg55mZ4T/WEFnEc8KH2rX HiqAeX+H2lt38ZEru7l5Jp6mNnzJJKLcnFjWMHXia865B8dHqC++goMXdJ8Tqcm8 NLs2W1RZgZocVwovwQ17bTiu41VnN7LdVpCig5RGcn1YtQUPcYzqBI971ixQCJUN 7vjqyMV3i+nLLD3FZmD+qYMYH/M2LaLH6fbaN0KBDlElCDHT7/Qu9N2nGreyiqKc uuEXVHbGou3sj/LkBpNKJOGtmNkUo0XN93/giu89ZHGc7BLN1tUJM9NYWaiO1TcD YiD0LO/lqmggCs9SQH0DBTUDNZ1vUDOzmVeD/tu/NqnixzSMseyqeThshZhxz6UT 7fBXvwixl+AhrN2lIxmS4WAtEwOPvaayUW8af7kESlrC4RoFvq+QaghT1D4NDpcq llYlg/gt97Wy3AnIsnvEjd0s+lxGN6byIOBgTfC4jAfPAYA4oxd7N+1vMPFyChUV MwatwE+IE1u2hjQWhMhVuA== =INb2 -----END PGP SIGNATURE----- Merge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for Foxconn Mediatek Chip - Add support for LG LGSBWAC92/TWCM-K505D - hci_h5 flow control fixes and suspend support - Switch to use lock_sock for SCO and RFCOMM - Various fixes for extended advertising - Reword Intel's setup on btusb unifying the supported generations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-20 12:16:05 +01:00
David S. Miller	815cc21d8d	This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update docs about move IRC channel away from freenode, by Sven Eckelmann - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann - Update NULL checks, by Sven Eckelmann (2 patches) - remove remaining skb-copy calls for broadcast packets, by Linus Lüssing -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmEeeK8WHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeofhWD/0YwNndM/FFo/NHcO3GFDZx9eLM dFuO7zdMilzgg462q7+mgi0jXA2Kp50Y+JcCqS2XVRIsMgKTVABflgmSlUIOdDoC A3KKRVgQ1HNPD4WREaEV2CLvBdhR9wEI0jRHvZou7n/VWrfJcUHgdl9aDA2/ptlP NcuYCKC99HCQmvaBt4GZgOunYDeplmo2qLip2gpwJWf9/vkL7HiBe3HtQSh1HI2y EIn4SExZOFcxMmKeJMsYl35OZh9oFv7nTnpZBGyKjA+HS0pu03aaPNRGMjW/pdhF f7V61aDJBU0xU6PjWvUegY4VMInrjW8F10EEJck461J/B9PXjUHUaH8BXXuGBkRM 0kU0Cv21a3Ovz23lgnXSnXu/xjqq5/zZHjnGvyPAMMppAI5f73q/0THtv9iOu+Cz Qf/tYl0BIRir20ZWtddQ9x2W3+cBYPOYrf/tnmWqFhPddenn+xitwTysVA6fOykQ pVksQ5UVpDZasZI9Al+R2M0CBttn7tS/iu95PV9CMST8aRgUuU90yd2Ocg3rRDNQ iEor0AozmO879W460BFQcTILw+D7OdlErUV8H8VW4507imZ7JXGPwZTFxhjM2Xhx wUXo/o2sxt/ITSdtZAeQj8zOXQMtOi3KlXtTl8ZzyRT//YLWah0j4oBf0a8K62/y i1Pd5MgXDQAm8fHkBg== =sHz+ -----END PGP SIGNATURE----- Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - update docs about move IRC channel away from freenode, by Sven Eckelmann - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann - Update NULL checks, by Sven Eckelmann (2 patches) - remove remaining skb-copy calls for broadcast packets, by Linus Lüssing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-20 12:04:50 +01:00
Jakub Kicinski	f444fea789	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/ptp/Kconfig: `55c8fca1da` ("ptp_pch: Restore dependency on PCI") `e5f3155267` ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-19 18:09:18 -07:00
Kangmin Park	61969ef867	Bluetooth: Fix return value in hci_dev_do_close() hci_error_reset() return without calling hci_dev_do_open() when hci_dev_do_close() return error value which is not 0. Also, hci_dev_close() return hci_dev_do_close() function's return value. But, hci_dev_do_close() return always 0 even if hdev->shutdown return error value. So, fix hci_dev_do_close() to save and return the return value of the hdev->shutdown when it is called. Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-19 17:28:40 +02:00
Pavel Skripkin	f41a4b2b5e	Bluetooth: add timeout sanity check to hci_inquiry Syzbot hit "task hung" bug in hci_req_sync(). The problem was in unreasonable huge inquiry timeout passed from userspace. Fix it by adding sanity check for timeout value to hci_inquiry(). Since hci_inquiry() is the only user of hci_req_sync() with user controlled timeout value, it makes sense to check timeout value in hci_inquiry() and don't touch hci_req_sync(). Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+be2baed593ea56c6a84c@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-19 17:27:39 +02:00
Kees Cook	a31e5a4158	Bluetooth: mgmt: Pessimize compile-time bounds-check After gaining __alloc_size hints, GCC thinks it can reach a memcpy() with eir_len == 0 (since it can't see into the rewrite of status). Instead, check eir_len == 0, avoiding this future warning: In function 'eir_append_data', inlined from 'read_local_oob_ext_data_complete' at net/bluetooth/mgmt.c:7210:12: ./include/linux/fortify-string.h:54:29: warning: '__builtin_memcpy' offset 5 is out of the bounds [0, 3] [-Warray-bounds] ... net/bluetooth/hci_request.h:133:2: note: in expansion of macro 'memcpy' 133 \| memcpy(&eir[eir_len], data, data_len); \| ^~~~~~ Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-19 16:51:53 +02:00
Chuck Lever	729580ddc5	svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() svc_xprt_free() already "puts" the bc_xprt before calling the transport's "free" method. No need to do it twice. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-19 08:29:32 -04:00
Eli Cohen	74fc4f8287	net: Fix offloading indirect devices dependency on qdisc order creation Currently, when creating an ingress qdisc on an indirect device before the driver registered for callbacks, the driver will not have a chance to register its filter configuration callbacks. To fix that, modify the code such that it keeps track of all the ingress qdiscs that call flow_indr_dev_setup_offload(). When a driver calls flow_indr_dev_register(), go through the list of tracked ingress qdiscs and call the driver callback entry point so as to give it a chance to register its callback. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-19 13:19:30 +01:00
Eli Cohen	c1c5cb3aee	net/core: Remove unused field from struct flow_indr_dev rcu field is not used. Remove it. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-19 13:19:30 +01:00
Matthieu Baerts	67b12f792d	mptcp: full fully established support after ADD_ADDR If directly after an MP_CAPABLE 3WHS, the client receives an ADD_ADDR with HMAC from the server, it is enough to switch to a "fully established" mode because it has received more MPTCP options. It was then OK to enable the "fully_established" flag on the MPTCP socket. Still, best to check if the ADD_ADDR looks valid by looking if it contains an HMAC (no 'echo' bit). If an ADD_ADDR echo is received while we are not in "fully established" mode, it is strange and then we should not switch to this mode now. But that is not enough. On one hand, the path-manager has be notified the state has changed. On the other hand, the "fully_established" flag on the subflow socket should be turned on as well not to re-send the MP_CAPABLE 3rd ACK content with the next ACK. Fixes: `84dfe3677a` ("mptcp: send out dedicated ADD_ADDR packet") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-19 12:16:54 +01:00
Paolo Abeni	a0eea5f10e	mptcp: fix memory leak on address flush The endpoint cleanup path is prone to a memory leak, as reported by syzkaller: BUG: memory leak unreferenced object 0xffff88810680ea00 (size 64): comm "syz-executor.6", pid 6191, jiffies 4295756280 (age 24.138s) hex dump (first 32 bytes): 58 75 7d 3c 80 88 ff ff 22 01 00 00 00 00 ad de Xu}<...."....... 01 00 02 00 00 00 00 00 ac 1e 00 07 00 00 00 00 ................ backtrace: [<0000000072a9f72a>] kmalloc include/linux/slab.h:591 [inline] [<0000000072a9f72a>] mptcp_nl_cmd_add_addr+0x287/0x9f0 net/mptcp/pm_netlink.c:1170 [<00000000f6e931bf>] genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731 [<00000000f1504a2c>] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] [<00000000f1504a2c>] genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792 [<0000000097e76f6a>] netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504 [<00000000ceefa2b8>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 [<000000008ff91aec>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<000000008ff91aec>] netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340 [<0000000041682c35>] netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929 [<00000000df3aa8e7>] sock_sendmsg_nosec net/socket.c:704 [inline] [<00000000df3aa8e7>] sock_sendmsg+0x14e/0x190 net/socket.c:724 [<000000002154c54c>] ____sys_sendmsg+0x709/0x870 net/socket.c:2403 [<000000001aab01d7>] ___sys_sendmsg+0xff/0x170 net/socket.c:2457 [<00000000fa3b1446>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486 [<00000000db2ee9c7>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<00000000db2ee9c7>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000005873517d>] entry_SYSCALL_64_after_hwframe+0x44/0xae We should not require an allocation to cleanup stuff. Rework the code a bit so that the additional RCU work is no more needed. Fixes: `1729cf186d` ("mptcp: create the listening socket for new port") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-19 12:16:54 +01:00
Alexey Dobriyan	c0891ac15f	isystem: ship and use stdarg.h Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>. stdarg.h is the only userspace header commonly used in the kernel. GPL 2 version of <stdarg.h> can be extracted from http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2021-08-19 09:02:55 +09:00
Gerd Rausch	fb4b1373dc	net/rds: dma_map_sg is entitled to merge entries Function "dma_map_sg" is entitled to merge adjacent entries and return a value smaller than what was passed as "nents". Subsequently "ib_map_mr_sg" needs to work with this value ("sg_dma_len") rather than the original "nents" parameter ("sg_len"). This old RDS bug was exposed and reliably causes kernel panics (using RDMA operations "rds-stress -D") on x86_64 starting with: commit `c588072bba` ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Simply put: Linux 5.11 and later. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/60efc69f-1f35-529d-a7ef-da0549cad143@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-18 15:35:50 -07:00
Xu Liu	6cf1770d63	bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SOCK_OPS We'd like to be able to identify netns from sockops hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818105820.91894-2-liuxu623@gmail.com	2021-08-19 00:30:01 +02:00
Linus Lüssing	808cfdfad5	batman-adv: bcast: remove remaining skb-copy calls We currently have two code paths for broadcast packets: A) self-generated, via batadv_interface_tx()-> batadv_send_bcast_packet(). B) received/forwarded, via batadv_recv_bcast_packet()-> batadv_forw_bcast_packet(). For A), self-generated broadcast packets: The only modifications to the skb data is the ethernet header which is added/pushed to the skb in batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before doing so, batadv_skb_head_push() is called which calls skb_cow_head() to unshare the space for the to be pushed ethernet header. So for this case, it is safe to use skb clones. For B), received/forwarded packets: The same applies as in A) for the to be forwarded packets. Only the ethernet header is added. However after (queueing for) forwarding the packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a packet is additionally decapsulated and is sent up the stack through batadv_recv_bcast_packet()->batadv_interface_rx(). Protocols higher up the stack are already required to check if the packet is shared and create a copy for further modifications. When the next (protocol) layer works correctly, it cannot happen that it tries to operate on the data behind the skb clone which is still queued up for forwarding. Co-authored-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2021-08-18 18:39:00 +02:00
Nick Richardson	7e5a3ef6b4	pktgen: Remove fill_imix_distribution() CONFIG_XFRM dependency Currently, the declaration of fill_imix_distribution() is dependent on CONFIG_XFRM. This is incorrect. Move fill_imix_distribution() declaration out of #ifndef CONFIG_XFRM block. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 11:41:13 +01:00
Wei Wang	4b1327be9f	net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem() Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 11:39:44 +01:00
kaixi.fan	01634047bf	ovs: clear skb->tstamp in forwarding path fq qdisc requires tstamp to be cleared in the forwarding path. Now ovs doesn't clear skb->tstamp. We encountered a problem with linux version 5.4.56 and ovs version 2.14.1, and packets failed to dequeue from qdisc when fq qdisc was attached to ovs port. Fixes: `fb420d5d91` ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: kaixi.fan <fankaixi.li@bytedance.com> Signed-off-by: xiexiaohui <xiexiaohui.xxh@bytedance.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 11:31:13 +01:00
Yajun Deng	41467d2ff4	net: net_namespace: Optimize the code There is only one caller for ops_free(), so inline it. Separate net_drop_ns() and net_free(), so the net_free() can be called directly. Add free_exit_list() helper function for free net_exit_list. ==================== v2: - v1 does not apply, rebase it. ==================== Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:34:48 +01:00
Vladimir Oltean	994d2cbb08	net: dsa: tag_sja1105: be dsa_loop-safe Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:33:15 +01:00
Toke Høiland-Jørgensen	86b9bbd332	sch_cake: fix srchost/dsthost hashing mode When adding support for using the skb->hash value as the flow hash in CAKE, I accidentally introduced a logic error that broke the host-only isolation modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash variable should stay initialised to 0 in cake_hash() in pure host-based hashing mode. Add a check for this before using the skb->hash value as flow_hash. Fixes: `b0c19ed608` ("sch_cake: Take advantage of skb->hash where appropriate") Reported-by: Pete Heist <pete@heistp.net> Tested-by: Pete Heist <pete@heistp.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:14:00 +01:00
Yajun Deng	ec18e84554	net: procfs: add seq_puts() statement for dev_mcast Add seq_puts() statement for dev_mcast, make it more readable. As also, keep vertical alignment for {dev, ptype, dev_mcast} that under /proc/net. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:13:20 +01:00
Randy Dunlap	95d5e6759b	net: RxRPC: make dependent Kconfig symbols be shown indented Make all dependent RxRPC kconfig entries be dependent on AF_RXRPC so that they are presented (indented) after AF_RXRPC instead of being presented at the same level on indentation. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:12:11 +01:00
Geliang Tang	1a0d6136c5	mptcp: local addresses fullmesh In mptcp_pm_nl_add_addr_received(), fill a temporary allocate array of all local address corresponding to the fullmesh endpoint. If such array is empty, keep the current behavior. Elsewhere loop on such array and create a subflow for each local address towards the given remote address Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:10:01 +01:00
Geliang Tang	2843ff6f36	mptcp: remote addresses fullmesh This patch added and managed a new per endpoint flag, named MPTCP_PM_ADDR_FLAG_FULLMESH. In mptcp_pm_create_subflow_or_signal_addr(), if such flag is set, instead of: remote_address((struct sock_common *)sk, &remote); fill a temporary allocated array of all known remote address. After releaseing the pm lock loop on such array and create a subflow for each remote address from the given local. Note that the we could still use an array even for non 'fullmesh' endpoint: with a single entry corresponding to the primary MPC subflow remote address. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:10:01 +01:00
Geliang Tang	ee285257a9	mptcp: drop flags and ifindex arguments This patch added a new helper mptcp_pm_get_flags_and_ifindex_by_id(), and used it in __mptcp_subflow_connect() to get the flags and ifindex values. Then the two arguments flags and ifindex of __mptcp_subflow_connect() can be dropped. Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-18 10:10:01 +01:00
Johannes Berg	c448f0fd2c	cfg80211: fix BSS color notify trace enum confusion The wrong enum was used here, leading to warnings. Just use a u32 instead. Reported-by: kernel test robot <lkp@intel.com> Fixes: `0d2ab3aea5` ("nl80211: add support for BSS coloring") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-18 09:21:52 +02:00
J. Bruce Fields	5a47534462	rpc: fix gss_svc_init cleanup on failure The failure case here should be rare, but it's obviously wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-17 11:47:53 -04:00
Chuck Lever	5c11720767	SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: `89ff87494c` ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-17 11:47:53 -04:00
Chuck Lever	07a92d009f	svcrdma: Convert rdma->sc_rw_ctxts to llist Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>	2021-08-17 11:47:53 -04:00
Chuck Lever	b6c2bfea09	svcrdma: Relieve contention on sc_send_lock. /proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>	2021-08-17 11:47:53 -04:00
Chuck Lever	6c8c84f525	svcrdma: Fewer calls to wake_up() in Send completion handler Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>	2021-08-17 11:47:53 -04:00
Chuck Lever	2f0f88f42f	SUNRPC: Add svc_rqst_replace_page() API Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-08-17 11:47:52 -04:00
Johannes Berg	276e189f8e	mac80211: fix locking in ieee80211_restart_work() Ilan's change to move locking around accidentally lost the wiphy_lock() during some porting, add it back. Fixes: `45daaa1318` ("mac80211: Properly WARN on HW scan before restart") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210817121210.47bdb177064f.Ib1ef79440cd27f318c028ddfc0c642406917f512@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-17 06:51:43 -07:00
Chih-Kang Chang	f50d2ff8f0	mac80211: Fix insufficient headroom issue for AMSDU ieee80211_amsdu_realloc_pad() fails to account for extra_tx_headroom, the original reserved headroom might be eaten. Add the necessary extra_tx_headroom. Fixes: `6e0456b545` ("mac80211: add A-MSDU tx support") Signed-off-by: Chih-Kang Chang <gary.chang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20210816085128.10931-2-pkshih@realtek.com [fix indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-17 15:49:58 +02:00
John Crispin	5f9404abdf	mac80211: add support for BSS color change The color change announcement is very similar to how CSA works where we have an IE that includes a counter. When the counter hits 0, the new color is applied via an updated beacon. This patch makes the CSA counter functionality reusable, rather than implementing it again. This also allows for future reuse incase support for other counter IEs gets added. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/057c1e67b82bee561ea44ce6a45a8462d3da6995.1625247619.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-17 11:58:45 +02:00
John Crispin	0d2ab3aea5	nl80211: add support for BSS coloring This patch adds support for BSS color collisions to the wireless subsystem. Add the required functionality to nl80211 that will notify about color collisions, triggering the color change and notifying when it is completed. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/500b3582aec8fe2c42ef46f3117b148cb7cbceb5.1625247619.git.lorenzo@kernel.org [remove unnecessary NULL initialisation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-17 11:58:21 +02:00
Nikolay Aleksandrov	affce9a774	net: bridge: mcast: toggle also host vlan state in br_multicast_toggle_vlan When changing vlan mcast state by br_multicast_toggle_vlan it iterates over all ports and enables/disables the port mcast ctx based on the new state, but I forgot to update the host vlan (bridge master vlan entry) with the new state so it will be left out. Also that function is not used outside of br_multicast.c, so make it static. Fixes: `f4b7002a70` ("net: bridge: add vlan mcast snooping knob") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-17 10:37:29 +01:00
Nikolay Aleksandrov	3f0d14efe2	net: bridge: mcast: use the correct vlan group helper When dereferencing the port vlan group we should use the rcu helper instead of the one relying on rtnl. In br_multicast_pg_to_port_ctx the entry cannot disappear as we hold the multicast lock and rcu as explained in the comment above it. For the same reason we're ok in br_multicast_start_querier. ============================= WARNING: suspicious RCU usage 5.14.0-rc5+ #429 Tainted: G W ----------------------------- net/bridge/br_private.h:1478 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by swapper/2/0: #0: ffff88822be85eb0 ((&p->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x2da #1: ffff88810b32f260 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_group_expired+0x28/0x13d [bridge] #2: ffffffff824f6c80 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.0+0x0/0x22 [bridge] stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G W 5.14.0-rc5+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x45/0x59 nbp_vlan_group+0x3e/0x44 [bridge] br_multicast_pg_to_port_ctx+0xd6/0x10d [bridge] br_multicast_star_g_handle_mode+0xa1/0x2ce [bridge] ? netlink_broadcast+0xf/0x11 ? nlmsg_notify+0x56/0x99 ? br_mdb_notify+0x224/0x2e9 [bridge] ? br_multicast_del_pg+0x1dc/0x26d [bridge] br_multicast_del_pg+0x1dc/0x26d [bridge] br_multicast_port_group_expired+0xaa/0x13d [bridge] ? __grp_src_delete_marked.isra.0+0x35/0x35 [bridge] ? __grp_src_delete_marked.isra.0+0x35/0x35 [bridge] call_timer_fn+0x134/0x2da __run_timers+0x169/0x193 run_timer_softirq+0x19/0x2d __do_softirq+0x1bc/0x42a __irq_exit_rcu+0x5c/0xb3 irq_exit_rcu+0xa/0x12 sysvec_apic_timer_interrupt+0x5e/0x75 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:default_idle+0xc/0xd Code: e8 14 40 71 ff e8 10 b3 ff ff 4c 89 e2 48 89 ef 31 f6 5d 41 5c e9 a9 e8 c2 ff cc cc cc cc 0f 1f 44 00 00 e8 7f 55 65 ff fb f4 <c3> 0f 1f 44 00 00 55 65 48 8b 2c 25 40 6f 01 00 53 f0 80 4d 02 20 RSP: 0018:ffff88810033bf00 EFLAGS: 00000206 RAX: ffffffff819cf828 RBX: ffff888100328000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff819cfa2d RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffff8881008302c0 R11: 00000000000006db R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x4/0x4 ? default_idle_call+0x15/0x7b default_idle_call+0x4d/0x7b do_idle+0x124/0x2a2 cpu_startup_entry+0x1d/0x1f secondary_startup_64_no_verify+0xb0/0xbb Fixes: `74edfd483d` ("net: bridge: multicast: add helper to get port mcast context from port group") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-17 10:37:29 +01:00
Nikolay Aleksandrov	05d6f38ec0	net: bridge: vlan: account for router port lists when notifying When sending a global vlan notification we should account for the number of router ports when allocating the skb, otherwise we might end up losing notifications. Fixes: `dc002875c2` ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-17 10:37:29 +01:00
Nikolay Aleksandrov	b92dace38f	net: bridge: vlan: enable mcast snooping for existing master vlans We always create a vlan with enabled mcast snooping, so when the user turns on per-vlan mcast contexts they'll get consistent behaviour with the current situation, but one place wasn't updated when a bridge/master vlan which already exists (created due to port vlans) is being added as real bridge vlan (BRIDGE_VLAN_INFO_BRENTRY). We need to enable mcast snooping for that vlan when that happens. Fixes: `7b54aaaf53` ("net: bridge: multicast: add vlan state initialization and control") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-17 10:37:28 +01:00
Jiang Wang	94531cfcbe	af_unix: Add unix_stream_proto for sockmap Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto. Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com	2021-08-16 18:43:39 -07:00
Jiang Wang	77462de14a	af_unix: Add read_sock for stream socket types To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com	2021-08-16 18:42:31 -07:00
Luke Hsiao	e3faa49bce	tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD Since the original TFO server code was implemented in commit `168a8f5805` ("tcp: TCP Fast Open Server - main code path") the TFO server code has supported the sysctl bit flag TFO_SERVER_COOKIE_NOT_REQD. Currently, when the TFO_SERVER_ENABLE and TFO_SERVER_COOKIE_NOT_REQD sysctl bit flags are set, a server connection will accept a SYN with N bytes of data (N > 0) that has no TFO cookie, create a new fast open connection, process the incoming data in the SYN, and make the connection ready for accepting. After accepting, the connection is ready for read()/recvmsg() to read the N bytes of data in the SYN, ready for write()/sendmsg() calls and data transmissions to transmit data. This commit changes an edge case in this feature by changing this behavior to apply to (N >= 0) bytes of data in the SYN rather than only (N > 0) bytes of data in the SYN. Now, a server will accept a data-less SYN without a TFO cookie if TFO_SERVER_COOKIE_NOT_REQD is set. Caveat! While this enables a new kind of TFO (data-less empty-cookie SYN), some firewall rules setup may not work if they assume such packets are not legit TFOs and will filter them. Signed-off-by: Luke Hsiao <lukehsiao@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210816205105.2533289-1-luke.w.hsiao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-16 17:25:49 -07:00
Andrii Nakryiko	fb7dd8bca0	bpf: Refactor BPF_PROG_RUN into a function Turn BPF_PROG_RUN into a proper always inlined function. No functional and performance changes are intended, but it makes it much easier to understand what's going on with how BPF programs are actually get executed. It's more obvious what types and callbacks are expected. Also extra () around input parameters can be dropped, as well as `__` variable prefixes intended to avoid naming collisions, which makes the code simpler to read and write. This refactoring also highlighted one extra issue. BPF_PROG_RUN is both a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning BPF_PROG_RUN into a function causes naming conflict compilation error. So rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org	2021-08-17 00:45:07 +02:00
Kiran K	ecb71f2566	Bluetooth: Fix race condition in handling NOP command For NOP command, need to cancel work scheduled on cmd_timer, on receiving command status or commmand complete event. Below use case might lead to race condition multiple when NOP commands are queued sequentially: hci_cmd_work() { if (atomic_read(&hdev->cmd_cnt) { . . . atomic_dec(&hdev->cmd_cnt); hci_send_frame(hdev,...); schedule_delayed_work(&hdev->cmd_timer,...); } } On receiving event for first NOP, the work scheduled on hdev->cmd_timer is not cancelled and second NOP is dequeued and sent to controller. While waiting for an event for second NOP command, work scheduled on cmd_timer for the first NOP can get scheduled, resulting in sending third NOP command (sending back to back NOP commands). This might cause issues at controller side (like memory overrun, controller going unresponsive) resulting in hci tx timeouts, hardware errors etc. The fix to this issue is to cancel the delayed work scheduled on cmd_timer on receiving command status or command complete event for NOP command (this patch handles NOP command same as any other SIG command). Signed-off-by: Kiran K <kiran.k@intel.com> Reviewed-by: Chethan T N <chethan.tumkur.narayan@intel.com> Reviewed-by: Srivatsa Ravishankar <ravishankar.srivatsa@intel.com> Acked-by: Manish Mandlik <mmandlik@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-16 18:04:23 +02:00
Luiz Augusto von Dentz	7087c4f694	Bluetooth: Store advertising handle so it can be re-enabled This stores the advertising handle/instance into hci_conn so it is accessible when re-enabling the advertising once disconnected. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-16 17:53:48 +02:00
Luiz Augusto von Dentz	cafae4cd62	Bluetooth: Fix handling of LE Enhanced Connection Complete LE Enhanced Connection Complete contains the Local RPA used in the connection which must be used when set otherwise there could problems when pairing since the address used by the remote stack could be the Local RPA: BLUETOOTH CORE SPECIFICATION Version 5.2 \| Vol 4, Part E page 2396 'Resolvable Private Address being used by the local device for this connection. This is only valid when the Own_Address_Type (from the HCI_LE_Create_Connection, HCI_LE_Set_Advertising_Parameters, HCI_LE_Set_Extended_Advertising_Parameters, or HCI_LE_Extended_Create_Connection commands) is set to 0x02 or 0x03, and the Controller generated a resolvable private address for the local device using a non-zero local IRK. For other Own_Address_Type values, the Controller shall return all zeros.' Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-16 17:53:48 +02:00
Kai-Heng Feng	0ea53674d0	Bluetooth: Move shutdown callback before flushing tx and rx queue Commit `0ea9fd001a` ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") introduced a regression that makes mtkbtsdio driver stops working: [ 36.593956] Bluetooth: hci0: Firmware already downloaded [ 46.814613] Bluetooth: hci0: Execution of wmt command timed out [ 46.814619] Bluetooth: hci0: Failed to send wmt func ctrl (-110) The shutdown callback depends on the result of hdev->rx_work, so we should call it before flushing rx_work: -> btmtksdio_shutdown() -> mtk_hci_wmt_sync() -> __hci_cmd_send() -> wait for BTMTKSDIO_TX_WAIT_VND_EVT gets cleared -> btmtksdio_recv_event() -> hci_recv_frame() -> queue_work(hdev->workqueue, &hdev->rx_work) -> clears BTMTKSDIO_TX_WAIT_VND_EVT So move the shutdown callback before flushing TX/RX queue to resolve the issue. Reported-and-tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> Tested-by: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Guenter Roeck <linux@roeck-us.net> Fixes: `0ea9fd001a` ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2021-08-16 17:48:11 +02:00
Nikolay Aleksandrov	175e669247	net: bridge: mcast: account for ipv6 size when dumping querier state We need to account for the IPv6 attributes when dumping querier state. Fixes: 5e924fe6ccfd ("net: bridge: mcast: dump ipv6 querier state") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 13:58:00 +01:00
Nikolay Aleksandrov	cdda378bd8	net: bridge: mcast: drop sizeof for nest attribute's zero size This was a dumb error I made instead of writing nla_total_size(0) for a nest attribute, I wrote nla_total_size(sizeof(0)). Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 606433fe3e11 ("net: bridge: mcast: dump ipv4 querier state") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 13:58:00 +01:00
Nikolay Aleksandrov	f137b7d4ec	net: bridge: mcast: don't dump querier state if snooping is disabled A minor improvement to avoid dumping mcast ctx querier state if snooping is disabled for that context (either bridge or vlan). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 13:57:59 +01:00
Sagi Grimberg	2a14c9ae15	params: lift param_set_uint_minmax to common code It is a useful helper hence move it to common code so others can enjoy it. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-08-16 14:42:22 +02:00
Xin Long	7387a72c5f	tipc: call tipc_wait_for_connect only when dlen is not 0 __tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg() or tipc_connect(). The difference is in tipc_connect(), it will call tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting is done. So there's no need to wait in __tipc_sendmsg() for this case. This patch is to fix it by calling tipc_wait_for_connect() only when dlen is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect(). Note this also fixes the failure in tipcutils/test/ptts/: # ./tipcTS & # ./tipcTC 9 (hang) Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 11:20:56 +01:00
Vladimir Oltean	b2b8913341	net: dsa: tag_8021q: fix notifiers broadcast when they shouldn't, and vice versa During the development of the blamed patch, the "bool broadcast" argument of dsa_port_tag_8021q_vlan_{add,del} was originally called "bool local", and the meaning was the exact opposite. Due to a rookie mistake where the patch was modified at the last minute without retesting, the instances of dsa_port_tag_8021q_vlan_{add,del} are called with the wrong values. During setup and teardown, cross-chip notifiers should not be broadcast to all DSA trees, while during bridging, they should. Fixes: `724395f4dc` ("net: dsa: tag_8021q: don't broadcast during setup/teardown") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 11:14:18 +01:00
Rao Shoaib	19eed72107	af_unix: check socket state when queuing OOB edumazet@google.com pointed out that queue_oob does not check socket state after acquiring the lock. He also pointed to an incorrect usage of kfree_skb and an unnecessary setting of skb length. This patch addresses those issue. Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-16 11:12:37 +01:00
Kuniyuki Iwashima	2c860a43dd	bpf: af_unix: Implement BPF iterator for UNIX domain socket. This patch implements the BPF iterator for the UNIX domain socket. Currently, the batch optimisation introduced for the TCP iterator in the commit `04c7820b77` ("bpf: tcp: Bpf iter batching and lock_sock") is not used for the UNIX domain socket. It will require replacing the big lock for the hash table with small locks for each hash list not to block other processes. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210814015718.42704-2-kuniyu@amazon.co.jp	2021-08-15 00:13:32 -07:00
Nikolay Aleksandrov	ddc649d158	net: bridge: vlan: dump mcast ctx querier state Use the new mcast querier state dump infrastructure and export vlans' mcast context querier state embedded in attribute BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov	85b4108211	net: bridge: mcast: dump ipv6 querier state Add support for dumping global IPv6 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IPV6_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IPV6_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IPV6_OTHER_TIMER] - other querier timeout IPv4 and IPv6 attributes are embedded at the same level of IFLA_BR_MCAST_QUERIER_STATE. If we didn't dump anything we cancel the nest and return. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov	c7fa1d9b1f	net: bridge: mcast: dump ipv4 querier state Add support for dumping global IPv4 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov	c3fb3698f9	net: bridge: mcast: consolidate querier selection for ipv4 and ipv6 We can consolidate both functions as they share almost the same logic. This is easier to maintain and we have a single querier update function. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov	67b746f94f	net: bridge: mcast: make sure querier port/address updates are consistent Use a sequence counter to make sure port/address updates can be read consistently without requiring the bridge multicast_lock. We need to zero out the port and address when the other querier has expired and we're about to select ourselves as querier. br_multicast_read_querier will be used later when dumping querier state. Updates are done only with the multicast spinlock and softirqs disabled, while reads are done from process context and from softirqs (due to notifications). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Nikolay Aleksandrov	bb18ef8e7e	net: bridge: mcast: record querier port device ifindex instead of pointer Currently when a querier port is detected its net_bridge_port pointer is recorded, but it's used only for comparisons so it's fine to have stale pointer, in order to dereference and use the port pointer a proper accounting of its usage must be implemented adding unnecessary complexity. To solve the problem we can just store the netdevice ifindex instead of the port pointer and retrieve the bridge port. It is a best effort and the device needs to be validated that is still part of that bridge before use, but that is small price to pay for avoiding querier reference counting for each port/vlan. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 14:02:43 +01:00
Leon Romanovsky	ed43fbac71	devlink: Clear whole devlink_flash_notify struct The { 0 } doesn't clear all fields in the struct, but tells to the compiler to set all fields to zero and doesn't touch any sub-fields if they exists. The {} is an empty initialiser that instructs to fully initialize whole struct including sub-fields, which is error-prone for future devlink_flash_notify extensions. Fixes: `6700acc5f1` ("devlink: collect flash notify params into a struct") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 13:59:10 +01:00
Leon Romanovsky	11a861d767	devlink: Use xarray to store devlink instances We can use xarray instead of linearly organized linked lists for the devlink instances. This will let us revise the locking scheme in favour of internal xarray locking that protects database. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 13:59:10 +01:00
Leon Romanovsky	437ebfd90a	devlink: Count struct devlink consumers The struct devlink itself is protected by internal lock and doesn't need global lock during operation. That global lock is used to protect addition/removal new devlink instances from the global list in use by all devlink consumers in the system. The future conversion of linked list to be xarray will allow us to actually delete that lock, but first we need to count all struct devlink users. The reference counting provides us a way to ensure that no new user space commands success to grab devlink instance which is going to be destroyed makes it is safe to access it without lock. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 13:59:10 +01:00
Leon Romanovsky	7ca973dc9f	devlink: Remove check of always valid devlink pointer Devlink objects are accessible only after they were registered and have valid devlink_*->devlink pointers. Remove that check and simplify respective fill functions as an outcome of such change. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 13:59:10 +01:00
Leon Romanovsky	cbf6ab672e	devlink: Simplify devlink_pernet_pre_exit call The devlink_pernet_pre_exit() will be called if net namespace exits. That routine is relevant for devlink instances that were assigned to that namespaces first. This assignment is possible only with the following command: "devlink reload DEV netns ...", which already checks reload support. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 13:59:10 +01:00
Paolo Abeni	0460ce229f	mptcp: backup flag from incoming MPJ ack option the parsed incoming backup flag is not propagated to the subflow itself, the client may end-up using it to send data. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/191 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	fc1b4e3b62	mptcp: add mibs for stale subflows processing This allows monitoring exceptional events like active backup scenarios. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	ff5a0b421c	mptcp: faster active backup recovery The msk can use backup subflows to transmit in-sequence data only if there are no other active subflow. On active backup scenario, the MPTCP connection can do forward progress only due to MPTCP retransmissions - rtx can pick backup subflows. This patch introduces a new flag flow MPTCP subflows: if the underlying TCP connection made no progresses for long time, and there are other less problematic subflows available, the given subflow become stale. Stale subflows are not considered active: if all non backup subflows become stale, the MPTCP scheduler can pick backup subflows for plain transmissions. Stale subflows can return in active state, as soon as any reply from the peer is observed. Active backup scenarios can now leverage the available b/w with no restrinction. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	6da14d74e2	mptcp: cleanup sysctl data and helpers Reorder the data in mptcp_pernet to avoid wasting space with no reasons and constify the access helpers. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	1e1d9d6f11	mptcp: handle pending data on closed subflow The PM can close active subflow, e.g. due to ingress RM_ADDR option. Such subflow could carry data still unacked at the MPTCP-level, both in the write and the rtx_queue, which has never reached the other peer. Currently the mptcp-level retransmission will deliver such data, but at a very low rate (at most 1 DSM for each MPTCP rtx interval). We can speed-up the recovery a lot, moving all the unacked in the tcp write_queue, so that it will be pushed again via other subflows, at the speed allowed by them. Also make available the new helper for later patches. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	71b7dec27f	mptcp: less aggressive retransmission strategy The current mptcp re-inject strategy is very aggressive, we have mptcp-level retransmissions even on single subflow connection, if the link in-use is lossy. Let's be a little more conservative: we do retransmit only if at least a subflow has write and rtx queue empty. Additionally use the backup subflows only if the active subflows are stale - no progresses in at least an rtx period and ignore stale subflows for rtx timeout update Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Paolo Abeni	33d41c9cd7	mptcp: more accurate timeout As reported by Maxim, we have a lot of MPTCP-level retransmissions when multilple links with different latencies are in use. This patch refactor the mptcp-level timeout accounting so that the maximum of all the active subflow timeout is used. To avoid traversing the subflow list multiple times, the update is performed inside the packet scheduler. Additionally clean-up a bit timeout handling. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-14 11:37:25 +01:00
Lukas Bulwahn	d8d9ba8dc9	net: 802: remove dead leftover after ipx driver removal Commit `7a2e838d28` ("staging: ipx: delete it from the tree") removes the ipx driver and the config IPX. Since then, there is some dead leftover in ./net/802/, that was once used by the IPX driver, but has no other user. Remove this dead leftover. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 16:30:35 -07:00
Changbin Du	afa79d08c6	net: in_irq() cleanup Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210813145749.86512-1-changbin.du@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 14:09:19 -07:00
Rao Shoaib	876c14ad01	af_unix: fix holding spinlock in oob handling syzkaller found that OOB code was holding spinlock while calling a function in which it could sleep. Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller.appspotmail.com Fixes: `314001f0bf` ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Link: https://lore.kernel.org/r/20210811220652.567434-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 10:31:22 -07:00
Jakub Kicinski	f4083a752a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h `9e26680733` ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") `9e518f2580` ("bnxt_en: 1PPS functions to configure TSIO pins") `099fdeda65` ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h `a2baf4e8bb` ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") `c7603cfa04` ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c `5957cc557d` ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") `2d0b41a376` ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS `7b637cd52f` ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") `7d901a1e87` ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-13 06:41:22 -07:00
Kees Cook	8c89f7b3d3	mac80211: Use flex-array for radiotap header bitmap In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. The it_present member of struct ieee80211_radiotap_header is treated as a flexible array (multiple u32s can be conditionally present). In order for memcpy() to reason (or really, not reason) about the size of operations against this struct, use of bytes beyond it_present need to be treated as part of the flexible array. Add a trailing flexible array and initialize its initial index via pointer arithmetic. Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210806215305.2875621-1-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:58:25 +02:00
Kees Cook	5cafd3784a	mac80211: radiotap: Use BIT() instead of shifts IEEE80211_RADIOTAP_EXT has a value of 31, which means if shift was ever cast to 64-bit, the result would become sign-extended. As a matter of robustness, just replace all the open-coded shifts with BIT(). Suggested-by: David Sterba <dsterba@suse.cz> Link: https://lore.kernel.org/lkml/20210728092323.GW5047@twin.jikos.cz/ Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210806215112.2874773-1-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:58:25 +02:00
dingsenjie	0323689d30	mac80211: Remove unnecessary variable and label The variable ret and label just used as return, so we delete it and use the return statement instead of the goto statement. Signed-off-by: dingsenjie <dingsenjie@yulong.com> Link: https://lore.kernel.org/r/20210805064349.202148-1-dingsenjie@163.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:58:25 +02:00
Johannes Berg	779969e3c8	mac80211: include <linux/rbtree.h> This is needed for the rbtree, and we shouldn't just rely on it getting included somewhere implicitly. Include it explicitly. Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20210715180234.512d64dee655.Ia51c29a9fb1e651e06bc00eabec90974103d333e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:58:23 +02:00
Johan Almbladh	79f5962bae	mac80211: Fix monitor MTU limit so that A-MSDUs get through The maximum MTU was set to 2304, which is the maximum MSDU size. While this is valid for normal WLAN interfaces, it is too low for monitor interfaces. A monitor interface may receive and inject MPDU frames, and the maximum MPDU frame size is larger than 2304. The MPDU may also contain an A-MSDU frame, in which case the size may be much larger than the MTU limit. Since the maximum size of an A-MSDU depends on the PHY mode of the transmitting STA, it is not possible to set an exact MTU limit for a monitor interface. Now the maximum MTU for a monitor interface is unrestricted. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Link: https://lore.kernel.org/r/20210628123246.2070558-1-johan.almbladh@anyfinetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:51:14 +02:00
Dan Carpenter	4a11174d6d	mac80211: remove unnecessary NULL check in ieee80211_register_hw() The address "&sband->iftype_data[i]" points to an array at the end of struct. It can't be NULL and so the check can be removed. Fixes: `bac2fd3d75` ("mac80211: remove use of ieee80211_get_he_sta_cap()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YNmgHi7Rh3SISdog@mwanda Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:51:03 +02:00
YueHaibing	deebea0ae3	mac80211: Reject zero MAC address in sta_info_insert_check() As commit `52dba8d7d5` ("mac80211: reject zero MAC address in add station") said, we don't consider all-zeroes to be a valid MAC address in most places, so also reject it here. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20210626130334.13624-1-yuehaibing@huawei.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-08-13 09:50:43 +02:00
Jakub Kicinski	a9a507013a	Merge tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== ieee802154 for net 2021-08-12 Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw. * tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: net: Fix memory leak in ieee802154_raw_deliver ieee802154: hwsim: fix GPF in hwsim_new_edge_nl ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi ==================== Link: https://lore.kernel.org/r/20210812183912.1663996-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-12 11:50:17 -07:00
Longpeng(Mike)	49b0b6ffe2	vsock/virtio: avoid potential deadlock when vsock device remove There's a potential deadlock case when remove the vsock device or process the RESET event: vsock_for_each_connected_socket: spin_lock_bh(&vsock_table_lock) ----------- (1) ... virtio_vsock_reset_sock: lock_sock(sk) --------------------- (2) ... spin_unlock_bh(&vsock_table_lock) lock_sock() may do initiative schedule when the 'sk' is owned by other thread at the same time, we would receivce a warning message that "scheduling while atomic". Even worse, if the next task (selected by the scheduler) try to release a 'sk', it need to request vsock_table_lock and the deadlock occur, cause the system into softlockup state. Call trace: queued_spin_lock_slowpath vsock_remove_bound vsock_remove_sock virtio_transport_release __vsock_release vsock_release __sock_release sock_close __fput ____fput So we should not require sk_lock in this case, just like the behavior in vhost_vsock or vmci. Fixes: `0ea9e1d3a9` ("VSOCK: Introduce virtio_transport.ko") Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-12 10:57:27 -07:00
Vladimir Oltean	724395f4dc	net: dsa: tag_8021q: don't broadcast during setup/teardown Currently, on my board with multiple sja1105 switches in disjoint trees described in commit `f66a6a69f9` ("net: dsa: permit cross-chip bridging between all trees in the system"), rebooting the board triggers the following benign warnings: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and RX VLANs cannot be found on switch 2's CPU port. But why would switch 2 even attempt to delete switch 1's TX and RX tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast, and it is supposed that it had added those VLANs in the first place (because in dsa_port_tag_8021q_vlan_match, all CPU ports match regardless of their tree index or switch index). The two trees probe asynchronously, and when switch 1 probed, it called dsa_broadcast which did not notify the tree of switch 2, because that didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it _is_ notified of the deletion. Before jumping to introduce a synchronization mechanism between the probing across disjoint switch trees, let's take a step back and see whether we _need_ to do that in the first place. The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port only if switch 1 and 2 were part of a cross-chip bridge. And dsa_tag_8021q_bridge_join takes care precisely of that (but if probing was synchronous, the bridge_join would just end up bumping the VLANs' refcount, because they are already installed by the setup path). Since by the time the ports are bridged, all DSA trees are already set up, and we don't need the tag_8021q VLANs of one switch installed on the other switches during probe time, the answer is that we don't need to fix the synchronization issue. So make the setup and teardown code paths call dsa_port_notify, which notifies only the local tree, and the bridge code paths call dsa_broadcast, which let the other trees know as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 11:46:21 +01:00
Vladimir Oltean	ab97462beb	net: dsa: print more information when a cross-chip notifier fails Currently this error message does not say a lot: [ 32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT but in this form, it is immediately obvious (at least to me) what the problem is, even without further looking at the code: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 11:46:21 +01:00
Nick Richardson	769afb3fda	pktgen: Add output for imix results The bps for imix mode is calculated by: sum(imix_entry.size) / time_elapsed The actual counts of each imix_entry are displayed under the "Current:" section of the interface output in the following format: imix_size_counts: size_1,count_1 size_2,count_2 ... size_n,count_n Example (count = 200000): imix_weights: 256,1 859,3 205,2 imix_size_counts: 256,32082 859,99796 205,68122 Result: OK: 17992362(c17964678+d27684) usec, 200000 (859byte,0frags) 11115pps 47Mb/sec (47977140bps) errors: 0 Summary of changes: Calculate bps based on imix counters when in IMIX mode. Add output for IMIX counters. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 10:50:21 +01:00
Nick Richardson	9014903132	pktgen: Add imix distribution bins In order to represent the distribution of imix packet sizes, a pre-computed data structure is used. It features 100 (IMIX_PRECISION) "bins". Contiguous ranges of these bins represent the respective packet size of each imix entry. This is done to avoid the overhead of selecting the correct imix packet size based on the corresponding weights. Example: imix_weights 40,7 576,4 1500,1 total_weight = 7 + 4 + 1 = 12 pkt_size 40 occurs 7/total_weight = 58% of the time pkt_size 576 occurs 4/total_weight = 33% of the time pkt_size 1500 occurs 1/total_weight = 9% of the time We generate a random number between 0-100 and select the corresponding packet size based on the specified weights. Eg. random number = 358723895 % 100 = 65 Selects the packet size corresponding to index:65 in the pre-computed imix_distribution array. An example of the pre-computed array is below: The imix_distribution will look like the following: 0 -> 0 (index of imix_entry.size == 40) 1 -> 0 (index of imix_entry.size == 40) 2 -> 0 (index of imix_entry.size == 40) [...] -> 0 (index of imix_entry.size == 40) 57 -> 0 (index of imix_entry.size == 40) 58 -> 1 (index of imix_entry.size == 576) [...] -> 1 (index of imix_entry.size == 576) 90 -> 1 (index of imix_entry.size == 576) 91 -> 2 (index of imix_entry.size == 1500) [...] -> 2 (index of imix_entry.size == 1500) 99 -> 2 (index of imix_entry.size == 1500) Create and use "bin" representation of the imix distribution. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 10:50:20 +01:00
Nick Richardson	52a62f8603	pktgen: Parse internet mix (imix) input Adds "imix_weights" command for specifying internet mix distribution. The command is in this format: "imix_weights size_1,weight_1 size_2,weight_2 ... size_n,weight_n" where the probability that packet size_i is picked is: weight_i / (weight_1 + weight_2 + .. + weight_n) The user may provide up to 100 imix entries (size_i,weight_i) in this command. The user specified imix entries will be displayed in the "Params" section of the interface output. Values for clone_skb > 0 is not supported in IMIX mode. Summary of changes: Add flag for enabling internet mix mode. Add command (imix_weights) for internet mix input. Return -ENOTSUPP when clone_skb > 0 in IMIX mode. Display imix_weights in Params. Create data structures to store imix entries and distribution. Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 10:50:20 +01:00
Hoang Le	86704993e6	Revert "tipc: Return the correct errno code" This reverts commit `0efea3c649` because of: - The returning -ENOBUF error is fine on socket buffer allocation. - There is side effect in the calling path tipc_node_xmit()->tipc_link_xmit() when checking error code returning. Fixes: `0efea3c649` ("tipc: Return the correct errno code") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-12 09:44:31 +01:00
Nikolay Aleksandrov	6c4110d9f4	net: bridge: vlan: fix global vlan option range dumping When global vlan options are equal sequentially we compress them in a range to save space and reduce processing time. In order to have the proper range end id we need to update range_end if the options are equal otherwise we get ranges with the same end vlan id as the start. Fixes: `743a53d963` ("net: bridge: vlan: add support for dumping global vlan options") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210810092139.11700-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-11 16:02:34 -07:00
Jeremy Kerr	83f0a0b728	mctp: Specify route types, require rtm_type in RTM_*ROUTE messages This change adds a 'type' attribute to routes, which can be parsed from a RTM_NEWROUTE message. This will help to distinguish local vs. peer routes in a future change. This means userspace will need to set a correct rtm_type in RTM_NEWROUTE and RTM_DELROUTE messages; we currently only accept RTN_UNICAST. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://lore.kernel.org/r/20210810023834.2231088-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-11 16:01:17 -07:00
Eric Dumazet	b69dd5b378	net: igmp: increase size of mr_ifc_count Some arches support cmpxchg() on 4-byte and 8-byte only. Increase mr_ifc_count width to 32bit to fix this problem. Fixes: `4a2b285e7e` ("net: igmp: fix data-race in igmp_ifc_timer_expire()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210811195715.3684218-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-11 15:54:10 -07:00
Neal Cardwell	6de035fec0	tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets Currently if BBR congestion control is initialized after more than 2B packets have been delivered, depending on the phase of the tp->delivered counter the tracking of BBR round trips can get stuck. The bug arises because if tp->delivered is between 2^31 and 2^32 at the time the BBR congestion control module is initialized, then the initialization of bbr->next_rtt_delivered to 0 will cause the logic to believe that the end of the round trip is still billions of packets in the future. More specifically, the following check will fail repeatedly: !before(rs->prior_delivered, bbr->next_rtt_delivered) and thus the connection will take up to 2B packets delivered before that check will pass and the connection will set: bbr->round_start = 1; This could cause many mechanisms in BBR to fail to trigger, for example bbr_check_full_bw_reached() would likely never exit STARTUP. This bug is 5 years old and has not been observed, and as a practical matter this would likely rarely trigger, since it would require transferring at least 2B packets, or likely more than 3 terabytes of data, before switching congestion control algorithms to BBR. This patch is a stable candidate for kernels as far back as v4.9, when tcp_bbr.c was added. Fixes: `0f8782ea14` ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Kevin Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-11 15:00:15 -07:00
Willy Tarreau	6922110d15	net: linkwatch: fix failure to restore device state across suspend/resume After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed that my Ethernet port to which a bond and a VLAN interface are attached appeared to remain up after resuming from suspend with the cable unplugged (and that problem still persists with 5.10-LTS). It happens that the following happens: - the network driver (e1000e here) prepares to suspend, calls e1000e_down() which calls netif_carrier_off() to signal that the link is going down. - netif_carrier_off() adds a link_watch event to the list of events for this device - the device is completely stopped. - the machine suspends - the cable is unplugged and the machine brought to another location - the machine is resumed - the queued linkwatch events are processed for the device - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events are silently dropped - the device is resumed with its link down - the upper VLAN and bond interfaces are never notified that the link had been turned down and remain up - the only way to provoke a change is to physically connect the machine to a port and possibly unplug it. The state after resume looks like this: $ ip -br li \| egrep 'bond\|eth' bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> Placing an explicit call to netdev_state_change() either in the suspend or the resume code in the NIC driver worked around this but the solution is not satisfying. The issue in fact really is in link_watch that loses events while it ought not to. It happens that the test for the device being present was added by commit `124eee3f69` ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") in 4.20 to avoid an access to devices that are not present. Instead of dropping events, this patch proceeds slightly differently by postponing their handling so that they happen after the device is fully resumed. Fixes: `124eee3f69` ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-11 14:43:16 -07:00
Vladimir Oltean	a72808b658	net: dsa: create a helper for locating EtherType DSA headers on TX Create a similar helper for locating the offset to the DSA header relative to skb->data, and make the existing EtherType header taggers to use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:44:58 +01:00
Vladimir Oltean	5d928ff486	net: dsa: create a helper for locating EtherType DSA headers on RX It seems that protocol tagging driver writers are always surprised about the formula they use to reach their EtherType header on RX, which becomes apparent from the fact that there are comments in multiple drivers that mention the same information. Create a helper that returns a void pointer to skb->data - 2, as well as centralize the explanation why that is the case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:44:58 +01:00
Vladimir Oltean	6bef794da6	net: dsa: create a helper which allocates space for EtherType DSA headers Hide away the memmove used by DSA EtherType header taggers to shift the MAC SA and DA to the left to make room for the header, after they've called skb_push(). The call to skb_push() is still left explicit in drivers, to be symmetric with dsa_strip_etype_header, and because not all callers can be refactored to do it (for example, brcm_tag_xmit_ll has common code for a pre-Ethernet DSA tag and an EtherType DSA tag). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:44:58 +01:00
Vladimir Oltean	f1dacd7aea	net: dsa: create a helper that strips EtherType DSA headers on RX All header taggers open-code a memmove that is fairly not all that obvious, and we can hide the details behind a helper function, since the only thing specific to the driver is the length of the header tag. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:44:58 +01:00
Parav Pandit	9c4a7665b4	devlink: Add APIs to publish, unpublish individual parameter Enable drivers to publish/unpublish individual parameter. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Parav Pandit	b40c51efef	devlink: Add API to register and unregister single parameter Currently device configuration parameters can be registered as an array. Due to this a constant array must be registered. A single driver supporting multiple devices each with different device capabilities end up registering all parameters even if it doesn't support it. One possible workaround a driver can do is, it registers multiple single entry arrays to overcome such limitation. Better is to provide a API that enables driver to register/unregister a single parameter. This also further helps in two ways. (1) to reduce the memory of devlink_param_entry by avoiding in registering parameters which are not supported by the device. (2) avoid generating multiple parameter add, delete, publish, unpublish, init value notifications for such unsupported parameters Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Parav Pandit	699784f7b7	devlink: Create a helper function for one parameter registration Create and use a helper function for one parameter registration. Subsequent patch also will reuse this for driver facing routine to register a single parameter. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Parav Pandit	076b2a9dbb	devlink: Add new "enable_vnet" generic device param Add new device generic parameter to enable/disable creation of VDPA net auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_vnet value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the VDPA net functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Parav Pandit	8ddaabee3c	devlink: Add new "enable_rdma" generic device param Add new device generic parameter to enable/disable creation of RDMA auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_rdma value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the RDMA functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Parav Pandit	f13a5ad881	devlink: Add new "enable_eth" generic device param Add new device generic parameter to enable/disable creation of Ethernet auxiliary device and associated device functionality in the devlink instance. User who prefers to disable such functionality can disable it using below example. $ devlink dev param set pci/0000:06:00.0 \ name enable_eth value false cmode driverinit $ devlink dev reload pci/0000:06:00.0 At this point devlink instance do not create auxiliary device for the Ethernet functionality. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 14:34:21 +01:00
Nikolay Aleksandrov	dc002875c2	net: bridge: vlan: use br_rports_fill_info() to export mcast router ports Embed the standard multicast router port export by br_rports_fill_info() into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS. In order to have the same format for the global bridge mcast context and the per-vlan mcast context we need a double-nesting: - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS - MDBA_ROUTER Currently we don't compare router lists, if any router port exists in the bridge mcast contexts we consider their option sets as different and export them separately. In addition we export the router port vlan id when dumping similar to the router port notification format. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	e04d377ff6	net: bridge: mcast: use the proper multicast context when dumping router ports When we are dumping the router ports of a vlan mcast context we need to use the bridge/vlan and port/vlan's multicast contexts to check if IPv4/IPv6 router port is present and later to dump the vlan id. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	a97df080b6	net: bridge: vlan: add support for mcast router global option Add support to change and retrieve global vlan multicast router state which is used for the bridge itself. We just need to pass multicast context to br_multicast_set_router instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	62938182c3	net: bridge: vlan: add support for mcast querier global option Add support to change and retrieve global vlan multicast querier state. We just need to pass multicast context to br_multicast_set_querier instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	cb486ce995	net: bridge: mcast: querier and query state affect only current context type It is a minor optimization and better behaviour to make sure querier and query sending routines affect only the matching multicast context depending if vlan snooping is enabled (vlan ctx vs bridge ctx). It also avoids sending unnecessary extra query packets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	4d5b4e84c7	net: bridge: mcast: move querier state to the multicast context We need to have the querier state per multicast context in order to have per-vlan control, so remove the internal option bit and move it to the multicast context. Also annotate the lockless reads of the new variable. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00
Nikolay Aleksandrov	941121ee22	net: bridge: vlan: add support for mcast startup query interval global option Add support to change and retrieve global vlan multicast startup query interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-11 13:34:41 +01:00

1 2 3 4 5 ...

66492 Commits