linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-11 05:33:09 +00:00

Author	SHA1	Message	Date
Johannes Berg	8876c67e62	wifi: nl80211: require MLD address on link STA add/modify We always need the MLD address and link ID to add or modify the link STA, so require it in the API. Fixes: `577e5b8c39` ("wifi: cfg80211: add API to add/modify/remove a link station") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:13 +02:00
Johannes Berg	956b961337	wifi: mac80211: more station handling sanity checks Add more sanity checks to the API handling, we shouldn't be able to create a station without links, nor should we be able to add a link to a station that wasn't created as an MLD with links in the first place. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:11 +02:00
Johannes Berg	0ad49045f2	wifi: mac80211: fix link sta hash table handling There are two issues here: we unhash the link stations only directly before freeing the station they belong to, and we also don't unhash all the links correctly in all cases. Fix these issues. Fixes: `ba6ddab94f` ("wifi: mac80211: maintain link-sta hash table") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:10 +02:00
Johannes Berg	9aebce6c97	wifi: mac80211: validate link address doesn't change When modifying a link station, validate that the link address doesn't change, except the first time the link is created. Fixes: `b95eb7f0ee` ("wifi: cfg80211/mac80211: separate link params from station params") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:28:08 +02:00
Johannes Berg	6d8e0f84f8	wifi: mac80211: mlme: set sta.mlo to mlo state At this point, we've already changed link_id to be zero for a non-MLO connection, so use the 'mlo' variable rather than link ID to determine the MLO status of the station. Fixes: `bd363ee533` ("wifi: mac80211: mlme: set sta.mlo correctly") Fixes: `81151ce462` ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:59 +02:00
Johannes Berg	0f13f3c322	wifi: mac80211: fast-xmit: handle non-MLO clients If there's a non-MLO client, the A2 must be set to the BSSID of the link since no translation will happen in lower layers and it's needed that way for encryption. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:58 +02:00
Johannes Berg	1f6389440c	wifi: mac80211: fix RX MLD address translation We should only translate addr3 here if it's the BSSID. Fixes: `42fb9148c0` ("wifi: mac80211: do link->MLD address translation on RX") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:48 +02:00
Johannes Berg	206c8c0680	wifi: mac80211: fix NULL pointer deref with non-MLD STA If we have a non-MLD STA on an AP MLD, we crash while adding the station. Fix that, in this case we need to use the STA's address also on the link data structure. Fixes: `f36fe0a2df` ("wifi: mac80211: fix up link station creation/insertion") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:47 +02:00
Johannes Berg	553a282cb2	wifi: mac80211: mlme: fix override calculation In my previous changes here, I neglected to take the old conn_flags into account that might still be present from the authentication, and thus ieee80211_setup_assoc_link() can misbehave, as well as the override calculation being wrong. Fix that by ORing in the old flags. Fixes: `1845c1d4a4` ("wifi: mac80211: mlme: refactor assoc link setup") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:45 +02:00
Johannes Berg	8a9be422f5	wifi: mac80211: tx: use AP address in some places for MLO In a few places we need to use the AP (MLD) address, not the deflink BSSID, the link address translation will happen later. To make that work properly for fast-xmit, set up the ap_addr in the vif.cfg earlier. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2022-07-22 14:27:41 +02:00
Alan Brady	16576a034c	ping: support ipv6 ping socket flow labels Ping sockets don't appear to make any attempt to preserve flow labels created and set by userspace using IPV6_FLOWINFO_SEND. Instead they are clobbered by autolabels (if enabled) or zero. Grab the flowlabel out of the msghdr similar to how rawv6_sendmsg does it and move the memset up so it doesn't get zeroed after. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-22 12:40:27 +01:00
Lorenzo Bianconi	ef69aa3a98	net: netfilter: Add kfuncs to set and change CT status Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in order to set nf_conn field of allocated entry or update nf_conn status field of existing inserted entry. Use nf_ct_change_status_common to share the permitted status field changes between netlink and BPF side by refactoring ctnetlink_change_status. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi	0b38923644	net: netfilter: Add kfuncs to set and change CT timeout Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in order to change nf_conn timeout. This is same as ctnetlink_change_timeout, hence code is shared between both by extracting it out to __nf_ct_change_timeout. It is also updated to return an error when it sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing. It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master. Apart from this, bpf_ct_set_timeout is only called for newly allocated CT so it doesn't need to inspect the status field just yet. Sharing the helpers even if it was possible would make timeout setting helper sensitive to order of setting status and timeout after allocation. Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Lorenzo Bianconi	d7e79c97c0	net: netfilter: Add kfuncs to allocate and insert CT Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry kfuncs in order to insert a new entry from XDP and TC programs. Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common code. We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK. Later this helper will be reused as a helper to set timeout of allocated but not yet inserted CT entry. The allocation functions return struct nf_conn___init instead of nf_conn, to distinguish allocated CT from an already inserted or looked up CT. This is later used to enforce restrictions on what kfuncs allocated CT can be used with. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi	aed8ee7feb	net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup Move common checks inside the common function, and maintain the only difference the two being how to obtain the struct net * from ctx. No functional change intended. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi	56e948ffc0	bpf: Add support for forcing kfunc args to be trusted Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which means each pointer argument must be trusted, which we define as a pointer that is referenced (has non-zero ref_obj_id) and also needs to have its offset unchanged, similar to how release functions expect their argument. This allows a kfunc to receive pointer arguments unchanged from the result of the acquire kfunc. This is required to ensure that kfunc that operate on some object only work on acquired pointers and not normal PTR_TO_BTF_ID with same type which can be obtained by pointer walking. The restrictions applied to release arguments also apply to trusted arguments. This implies that strict type matching (not deducing type by recursively following members at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are used for trusted pointer arguments. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 21:03:09 -07:00
Kumar Kartikeya Dwivedi	a4703e3184	bpf: Switch to new kfunc flags infrastructure Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-21 20:59:42 -07:00
Jaehee Park	b66eb3a6e4	net: ipv6: avoid accepting values greater than 2 for accept_untracked_na The accept_untracked_na sysctl changed from a boolean to an integer when a new knob '2' was added. This patch provides a safeguard to avoid accepting values that are not defined in the sysctl. When setting a value greater than 2, the user will get an 'invalid argument' warning. Fixes: `aaa5f515b1` ("net: ipv6: new accept_untracked_na option to accept na only if in-network") Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-21 19:11:10 -07:00
Jakub Kicinski	dde06aaa89	tls: rx: release the sock lock on locking timeout Eric reports we should release the socket lock if the entire "grab reader lock" operation has failed. The callers assume they don't have to release it or otherwise unwind. Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+16e72110feb2b653ef27@syzkaller.appspotmail.com Fixes: `4cbc325ed6` ("tls: rx: allow only one reader at a time") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220720203701.2179034-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-21 18:58:11 -07:00
Luiz Augusto von Dentz	1f7435c8f6	Bluetooth: mgmt: Fix using hci_conn_abort This fixes using hci_conn_abort instead of using hci_conn_abort_sync. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:16:10 -07:00
Luiz Augusto von Dentz	a86ddbffa6	Bluetooth: Use bt_status to convert from errno If a command cannot be sent or there is a internal error an errno maybe set instead of a command status. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:15:55 -07:00
Luiz Augusto von Dentz	ca2045e059	Bluetooth: Add bt_status This adds bt_status which can be used to convert Unix errno to Bluetooth status. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:15:31 -07:00
Luiz Augusto von Dentz	1bbf4023cf	Bluetooth: hci_sync: Split hci_dev_open_sync This splits hci_dev_open_sync so each stage is handle by its own function so it is easier to identify each stage. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:15:17 -07:00
Manish Mandlik	7cf5c2978f	Bluetooth: hci_sync: Refactor remove Adv Monitor Make use of hci_cmd_sync_queue for removing an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:55 -07:00
Manish Mandlik	b747a83690	Bluetooth: hci_sync: Refactor add Adv Monitor Make use of hci_cmd_sync_queue for adding an advertisement monitor. Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:32 -07:00
Zijun Hu	63b1a7dd38	Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Core driver addtionally checks LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING to decide if HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting are broken, so remove this unnecessary quirk. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:14:10 -07:00
Zijun Hu	766ae2422b	Bluetooth: hci_sync: Check LMP feature bit instead of quirk BT core driver should addtionally check LMP feature bit "Erroneous Data Reporting" instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting are broken. BLUETOOTH CORE SPECIFICATION Version 5.3 \| Vol 2, Part C \| page 587 This feature indicates whether the device is able to support the Packet_Status_Flag and the HCI commands HCI_Write_Default_- Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_- Data_Reporting. the quirk was introduced by 'commit `cde1a8a992` ("Bluetooth: btusb: Fix and detect most of the Chinese Bluetooth controllers")' to mark HCI commands HCI_Read\|Write_Default_Erroneous_Data_Reporting broken by BT device driver, but the reason why these two HCI commands are broken is that feature "Erroneous Data Reporting" is not enabled by firmware, this scenario is illustrated by below log of QCA controllers with USB I/F: @ RAW Open: hcitool (privileged) version 2.22 < HCI Command: Read Local Supported Commands (0x04\|0x0002) plen 0 > HCI Event: Command Complete (0x0e) plen 68 Read Local Supported Commands (0x04\|0x0002) ncmd 1 Status: Success (0x00) Commands: 288 entries ...... Read Default Erroneous Data Reporting (Octet 18 - Bit 2) Write Default Erroneous Data Reporting (Octet 18 - Bit 3) ...... < HCI Command: Read Default Erroneous Data Reporting (0x03\|0x005a) plen 0 > HCI Event: Command Complete (0x0e) plen 4 Read Default Erroneous Data Reporting (0x03\|0x005a) ncmd 1 Status: Unknown HCI Command (0x01) < HCI Command: Read Local Supported Features (0x04\|0x0003) plen 0 > HCI Event: Command Complete (0x0e) plen 12 Read Local Supported Features (0x04\|0x0003) ncmd 1 Status: Success (0x00) Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87 3 slot packets ...... Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:13:18 -07:00
Zijun Hu	0feb8af027	Bluetooth: hci_sync: Correct hci_set_event_mask_page_2_sync() event mask Event HCI_Truncated_Page_Complete should belong to central and HCI_Peripheral_Page_Response_Timeout should belong to peripheral, but hci_set_event_mask_page_2_sync() take these two events for wrong roles, so correct it by this change. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:13:03 -07:00
Luiz Augusto von Dentz	6828b58307	Bluetooth: hci_sync: Don't remove connected devices from accept list These devices are likely going to be reprogrammed when disconnected so this avoid a whole bunch of commands attempt to remove and the add back to the list. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Zhengping Jiang <jiangzp@google.com>	2022-07-21 17:09:34 -07:00
Luiz Augusto von Dentz	0900b1c62f	Bluetooth: hci_sync: Fix not updating privacy_mode When programming a new entry into the resolving list it shall default to network mode since the params may contain the mode programmed when the device was last added to the resolving list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209745 Fixes: `853b70b506` ("Bluetooth: hci_sync: Set Privacy Mode when updating the resolving list") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Zhengping Jiang <jiangzp@google.com>	2022-07-21 17:09:19 -07:00
Tamas Koczka	9f30de9e03	Bluetooth: Collect kcov coverage from hci_rx_work Annotate hci_rx_work() with kcov_remote_start() and kcov_remote_stop() calls, so remote KCOV coverage is collected while processing the rx_q queue which is the main incoming Bluetooth packet queue. Coverage is associated with the thread which created the packet skb. The collected extra coverage helps kernel fuzzing efforts in finding vulnerabilities. This change only has effect if the kernel is compiled with CONFIG_KCOV, otherwise kcov_ functions don't do anything. Signed-off-by: Tamas Koczka <poprdi@google.com> Tested-by: Aleksandr Nogikh <nogikh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:09:06 -07:00
Zhengping Jiang	68253f3cd7	Bluetooth: hci_sync: Fix resuming scan after suspend resume After resuming, remove setting scanning_paused to false, because it is checked and set to false in hci_resume_scan_sync. Also move setting the value to false before updating passive scan, because the value is used when resuming passive scan. Fixes: `3b42055388` (Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan) Signed-off-by: Zhengping Jiang <jiangzp@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:08:52 -07:00
Zhengping Jiang	d7b2fdfb53	Bluetooth: mgmt: Fix refresh cached connection info Set the connection data before calling get_conn_info_sync, so it can be verified the connection is still connected, before refreshing cached values. Fixes: `47db6b4299` ("Bluetooth: hci_sync: Convert MGMT_OP_GET_CONN_INFO") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:07:50 -07:00
Luiz Augusto von Dentz	34a718bc86	Bluetooth: HCI: Fix not always setting Scan Response/Advertising Data The scan response and advertising data needs to be tracked on a per instance (adv_info) since when these instaces are removed so are their data, to fix that new flags are introduced which is used to mark when the data changes and then checked to confirm when the data needs to be synced with the controller. Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:07:30 -07:00
Luiz Augusto von Dentz	dd7b8cdde0	Bluetooth: eir: Fix using strlen with hdev->{dev_name,short_name} Both dev_name and short_name are not guaranteed to be NULL terminated so this instead use strnlen and then attempt to determine if the resulting string needs to be truncated or not. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216018 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>	2022-07-21 17:07:16 -07:00
Xiaohui Zhang	a5133fe87e	Bluetooth: use memset avoid memory leaks Similar to the handling of l2cap_ecred_connect in commit `d3715b2333` ("Bluetooth: use memset avoid memory leaks"), we thought a patch might be needed here as well. Use memset to initialize structs to prevent memory leaks in l2cap_le_connect Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:07:02 -07:00
Dan Carpenter	9111786492	Bluetooth: fix an error code in hci_register_dev() Preserve the error code from hci_register_suspend_notifier(). Don't return success. Fixes: d6bb2a91f95b ("Bluetooth: Unregister suspend with userchannel") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:06:49 -07:00
Abhishek Pandit-Subedi	359ee4f834	Bluetooth: Unregister suspend with userchannel When HCI_USERCHANNEL is used, unregister the suspend notifier when binding and register when releasing. The userchannel socket should be left alone after open is completed. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:05:58 -07:00
Abhishek Pandit-Subedi	0acef50ba3	Bluetooth: Fix index added after unregister When a userchannel socket is released, we should check whether the hdev is already unregistered before sending out an IndexAdded. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:05:42 -07:00
Schspa Shi	877afadad2	Bluetooth: When HCI work queue is drained, only queue chained work The HCI command, event, and data packet processing workqueue is drained to avoid deadlock in commit `76727c02c1` ("Bluetooth: Call drain_workqueue() before resetting state"). There is another delayed work, which will queue command to this drained workqueue. Which results in the following error report: Bluetooth: hci2: command 0x040f tx timeout WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140 Workqueue: events hci_cmd_timeout RIP: 0010:__queue_work+0xdad/0x1140 RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000 RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08 RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700 R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60 R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800 FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? queue_work_on+0xcb/0x110 ? lockdep_hardirqs_off+0x90/0xd0 queue_work_on+0xee/0x110 process_one_work+0x996/0x1610 ? pwq_dec_nr_in_flight+0x2a0/0x2a0 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x41/0x50 worker_thread+0x665/0x1080 ? process_one_work+0x1610/0x1610 kthread+0x2e9/0x3a0 ? kthread_complete_and_exit+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the timeout workqueue while command workqueue is draining. Fixes: `76727c02c1` ("Bluetooth: Call drain_workqueue() before resetting state") Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:05:22 -07:00
Alain Michaud	629f66aaca	Bluetooth: clear the temporary linkkey in hci_conn_cleanup If a hardware error occurs and the connections are flushed without a disconnection_complete event being signaled, the temporary linkkeys are not flushed. This change ensures that any outstanding flushable linkkeys are flushed when the connection are flushed from the hash table. Additionally, this also makes use of test_and_clear_bit to avoid multiple attempts to delete the link key that's already been flushed. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2022-07-21 17:04:53 -07:00
Jakub Kicinski	6e0e846ee2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-21 13:03:39 -07:00
Jakub Kicinski	602ae008ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-20 18:05:51 -07:00
Justin Stitt	aa8c7cdbae	netfilter: xt_TPROXY: remove pr_debug invocations pr_debug calls are no longer needed in this file. Pablo suggested "a patch to remove these pr_debug calls". This patch has some other beneficial collateral as it also silences multiple Clang -Wformat warnings that were present in the pr_debug calls. diff from v1 -> v2: * converted if statement one-liner style * x == NULL is now !x Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:56:00 +02:00
Florian Westphal	f02e7dc4cf	netfilter: flowtable: prefer refcount_inc With refcount_inc_not_zero, we'd also need a smp_rmb or similar, followed by a test of the CONFIRMED bit. However, the ct pointer is taken from skb->_nfct, its refcount must not be 0 (else, we'd already have a use-after-free bug). Use refcount_inc() instead to clarify the ct refcount is expected to be at least 1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:55:39 +02:00
Christophe JAILLET	5787db7c90	netfilter: ipvs: Use the bitmap API to allocate bitmaps Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:55:39 +02:00
Oz Shlomo	c0f47c2822	net/sched: cls_api: Fix flow action initialization The cited commit refactored the flow action initialization sequence to use an interface method when translating tc action instances to flow offload objects. The refactored version skips the initialization of the generic flow action attributes for tc actions, such as pedit, that allocate more than one offload entry. This can cause potential issues for drivers mapping flow action ids. Populate the generic flow action fields for all the flow action entries. Fixes: `c54e1d920f` ("flow_offload: add ops to tc_action_ops for flow action setup") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> ---- v1 -> v2: - coalese the generic flow action fields initialization to a single loop Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:54:27 +01:00
Kuniyuki Iwashima	a11e5b3e7a	tcp: Fix data-races around sysctl_tcp_max_reordering. While reading sysctl_tcp_max_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `dca145ffaa` ("tcp: allow for bigger reordering level") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	2d17d9c738	tcp: Fix a data-race around sysctl_tcp_abort_on_overflow. While reading sysctl_tcp_abort_on_overflow, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	0b484c9191	tcp: Fix a data-race around sysctl_tcp_rfc1337. While reading sysctl_tcp_rfc1337, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	4e08ed41cb	tcp: Fix a data-race around sysctl_tcp_stdurg. While reading sysctl_tcp_stdurg, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	1a63cb91f0	tcp: Fix a data-race around sysctl_tcp_retrans_collapse. While reading sysctl_tcp_retrans_collapse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	4845b5713a	tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. While reading sysctl_tcp_slow_start_after_idle, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `35089bb203` ("[TCP]: Add tcp_slow_start_after_idle sysctl.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	7c6f2a86ca	tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts. While reading sysctl_tcp_thin_linear_timeouts, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `36e31b0af5` ("net: TCP thin linear timeouts") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	e7d2ef837e	tcp: Fix data-races around sysctl_tcp_recovery. While reading sysctl_tcp_recovery, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `4f41b1c58a` ("tcp: use RACK to detect losses") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima	52e65865de	tcp: Fix a data-race around sysctl_tcp_early_retrans. While reading sysctl_tcp_early_retrans, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `eed530b6c6` ("tcp: early retransmit") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	3666f666e9	tcp: Fix data-races around sysctl knobs related to SYN option. While reading these knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_sack - tcp_window_scaling - tcp_timestamps Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	9b55c20f83	ip: Fix data-races around sysctl_ip_prot_sock. sysctl_ip_prot_sock is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. Fixes: `4548b683b7` ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	8895a9c2ac	ipv4: Fix data-races around sysctl_fib_multipath_hash_fields. While reading sysctl_fib_multipath_hash_fields, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `ce5c9c20d3` ("ipv4: Add a sysctl to control multipath hash fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	7998c12a08	ipv4: Fix data-races around sysctl_fib_multipath_hash_policy. While reading sysctl_fib_multipath_hash_policy, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `bf4e0a3db9` ("net: ipv4: add support for ECMP hash policy choice") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima	87507bcb4f	ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh. While reading sysctl_fib_multipath_use_neigh, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `a6db4494d2` ("net: ipv4: Consider failed nexthops in multipath routes") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:14:49 +01:00
David S. Miller	ef5621758a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-07-20 1) Fix a policy refcount imbalance in xfrm_bundle_lookup. From Hangyu Hua. 2) Fix some clang -Wformat warnings. Justin Stitt ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:11:58 +01:00
Jakub Kicinski	7f9eee196e	Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux Pavel Begunkov says: ==================== io_uring zerocopy send The patchset implements io_uring zerocopy send. It works with both registered and normal buffers, mixing is allowed but not recommended. Apart from usual request completions, just as with MSG_ZEROCOPY, io_uring separately notifies the userspace when buffers are freed and can be reused (see API design below), which is delivered into io_uring's Completion Queue. Those "buffer-free" notifications are not necessarily per request, but the userspace has control over it and should explicitly attaching a number of requests to a single notification. The series also adds some internal optimisations when used with registered buffers like removing page referencing. From the kernel networking perspective there are two main changes. The first one is passing ubuf_info into the network layer from io_uring (inside of an in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info caching on the io_uring side, but also helps to avoid cross-referencing and synchronisation problems. The second part is an optional optimisation removing page referencing for requests with registered buffers. Benchmarking UDP with an optimised version of the selftest (see [1]), which sends a bunch of requests, waits for completions and repeats. "+ flush" column posts one additional "buffer-free" notification per request, and just "zc" doesn't post buffer notifications at all. NIC (requests / second): IO size \| non-zc \| zc \| zc + flush 4000 \| 495134 \| 606420 (+22%) \| 558971 (+12%) 1500 \| 551808 \| 577116 (+4.5%) \| 565803 (+2.5%) 1000 \| 584677 \| 592088 (+1.2%) \| 560885 (-4%) 600 \| 596292 \| 598550 (+0.4%) \| 555366 (-6.7%) dummy (requests / second): IO size \| non-zc \| zc \| zc + flush 8000 \| 1299916 \| 2396600 (+84%) \| 2224219 (+71%) 4000 \| 1869230 \| 2344146 (+25%) \| 2170069 (+16%) 1200 \| 2071617 \| 2361960 (+14%) \| 2203052 (+6%) 600 \| 2106794 \| 2381527 (+13%) \| 2195295 (+4%) Previously it also brought a massive performance speedup compared to the msg_zerocopy tool (see [3]), which is probably not super interesting. There is also an additional bunch of refcounting optimisations that was omitted from the series for simplicity and as they don't change the picture drastically, they will be sent as follow up, as well as flushing optimisations closing the performance gap b/w two last columns. For TCP on localhost (with hacks enabling localhost zerocopy) and including additional overhead for receive: IO size \| non-zc \| zc 1200 \| 4174 \| 4148 4096 \| 7597 \| 11228 Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the omitted optimisations will somewhat help, should look better for 4000, but couldn't test properly because of setup problems. Links: liburing (benchmark + tests): [1] https://github.com/isilence/liburing/tree/zc_v4 kernel repo: [2] https://github.com/isilence/linux/tree/zc_v4 RFC v1: [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/ RFC v2: https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/ Net patches based: git@github.com:isilence/linux.git zc_v4-net-base or https://github.com/isilence/linux/tree/zc_v4-net-base API design overview: The series introduces an io_uring concept of notifactors. From the userspace perspective it's an entity to which it can bind one or more requests and then requesting to flush it. Flushing a notifier makes it impossible to attach new requests to it, and instructs the notifier to post a completion once all requests attached to it are completed and the kernel doesn't need the buffers anymore. Notifications are stored in notification slots, which should be registered as an array in io_uring. Each slot stores only one notifier at any particular moment. Flushing removes it from the slot and the slot automatically replaces it with a new notifier. All operations with notifiers are done by specifying an index of a slot it's currently in. When registering a notification the userspace specifies a u64 tag for each slot, which will be copied in notification completion entries as cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32 sequence number counting notifiers of a slot. ==================== Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:22:41 -07:00
Pavel Begunkov	eb315a7d13	tcp: support externally provided ubufs Teach tcp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	1fd3ae8c90	ipv6/udp: support externally provided ubufs Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	c445f31b3c	ipv4/udp: support externally provided ubufs Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	753f1ca4e1	net: introduce managed frags infrastructure Some users like io_uring can do page pinning more efficiently, so we want a way to delegate referencing to other subsystems. For that add a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold page references and upper layers are responsivle to managing page lifetime. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. It's needed, for instance, to avoid mixing managed modes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
David Ahern	ebe73a284f	net: Allow custom iter handler in msghdr Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split. Signed-off-by: David Ahern <dsahern@kernel.org> [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	7c701d92b2	skbuff: carry external ubuf_info in msghdr Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:42 -07:00
Zhengchao Shao	fd18942244	bpf: Don't redirect packets with invalid pkt_len Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-07-19 09:50:54 -07:00
Vladimir Oltean	1699b4d502	net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering The "ds" iterator variable used in dsa_port_reset_vlan_filtering() -> dsa_switch_for_each_port() overwrites the "dp" received as argument, which is later used to call dsa_port_vlan_filtering() proper. As a result, switches which do enter that code path (the ones with vlan_filtering_is_global=true) will dereference an invalid dp in dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge. Use a dedicated "other_dp" iterator variable to avoid this from happening. Fixes: `d0004a020b` ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:14:23 -07:00
Vladimir Oltean	4db2a5ef4c	net: dsa: fix dsa_port_vlan_filtering when global The blamed refactoring commit changed a "port" iterator with "other_dp", but still looked at the slave_dev of the dp outside the loop, instead of other_dp->slave from the loop. As a result, dsa_port_vlan_filtering() would not call dsa_slave_manage_vlan_filtering() except for the port in cause, and not for all switch ports as expected. Fixes: `d0004a020b` ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core") Reported-by: Lucian Banu <Lucian.Banu@westermo.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:14:23 -07:00
Jiri Pirko	f655dacb59	net: devlink: remove unused locked functions Remove locked versions of functions that are no longer used by anyone. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	012ec02ae4	netdevsim: convert driver to use unlocked devlink API during init/fini Prepare for devlink reload being called with devlink->lock held and convert the netdevsim driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	eb0e9fa2c6	net: devlink: add unlocked variants of devlink_region_create/destroy() functions Add unlocked variants of devlink_region_create/destroy() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:48 -07:00
Jiri Pirko	70a2ff8936	net: devlink: add unlocked variants of devlink_dpipe() functions Add unlocked variants of devlink_dpipe() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:47 -07:00
Jiri Pirko	755cfa69c4	net: devlink: add unlocked variants of devlink_sb() functions Add unlocked variants of devlink_sb() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:47 -07:00
Jiri Pirko	c223d6a4bf	net: devlink: add unlocked variants of devlink_resource() functions Add unlocked variants of devlink_resource() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:46 -07:00
Jiri Pirko	852e85a704	net: devlink: add unlocked variants of devling_trap() functions Add unlocked variants of devl_trap() functions to be used in drivers called-in with devlink->lock held. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:46 -07:00
Moshe Shemesh	e26fde2f5b	net: devlink: avoid false DEADLOCK warning reported by lockdep Add a lock_class_key per devlink instance to avoid DEADLOCK warning by lockdep, while locking more than one devlink instance in driver code, for example in opening VFs flow. Kernel log: [ 101.433802] ============================================ [ 101.433803] WARNING: possible recursive locking detected [ 101.433810] 5.19.0-rc1+ #35 Not tainted [ 101.433812] -------------------------------------------- [ 101.433813] bash/892 is trying to acquire lock: [ 101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core] [ 101.433909] but task is already holding lock: [ 101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.433989] other info that might help us debug this: [ 101.433990] Possible unsafe locking scenario: [ 101.433991] CPU0 [ 101.433991] ---- [ 101.433992] lock(&devlink->lock); [ 101.433993] lock(&devlink->lock); [ 101.433995] * DEADLOCK * [ 101.433996] May be due to missing lock nesting notation [ 101.433996] 6 locks held by bash/892: [ 101.433998] #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0 [ 101.434009] #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520 [ 101.434017] #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520 [ 101.434023] #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310 [ 101.434031] #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core] [ 101.434108] #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430 [ 101.434116] stack backtrace: [ 101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35 [ 101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 101.434130] Call Trace: [ 101.434133] <TASK> [ 101.434135] dump_stack_lvl+0x57/0x7d [ 101.434145] __lock_acquire.cold+0x1df/0x3e7 [ 101.434151] ? register_lock_class+0x1880/0x1880 [ 101.434157] lock_acquire+0x1c1/0x550 [ 101.434160] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434229] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434232] ? __xa_alloc+0x1ed/0x2d0 [ 101.434236] ? ksys_write+0xf3/0x1d0 [ 101.434239] __mutex_lock+0x12c/0x14b0 [ 101.434243] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434312] ? probe_one+0x3c/0x690 [mlx5_core] [ 101.434380] ? devlink_alloc_ns+0x11b/0x910 [ 101.434385] ? mutex_lock_io_nested+0x1320/0x1320 [ 101.434388] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434391] ? lockdep_init_map_type+0x21a/0x7d0 [ 101.434393] ? __init_swait_queue_head+0x70/0xd0 [ 101.434397] probe_one+0x3c/0x690 [mlx5_core] [ 101.434467] pci_device_probe+0x1b4/0x480 [ 101.434471] really_probe+0x1e0/0xaa0 [ 101.434474] __driver_probe_device+0x219/0x480 [ 101.434478] driver_probe_device+0x49/0x130 [ 101.434481] __device_attach_driver+0x1b8/0x280 [ 101.434484] ? driver_allows_async_probing+0x140/0x140 [ 101.434487] bus_for_each_drv+0x123/0x1a0 [ 101.434489] ? bus_for_each_dev+0x1a0/0x1a0 [ 101.434491] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434494] ? trace_hardirqs_on+0x2d/0x100 [ 101.434498] __device_attach+0x1a3/0x430 [ 101.434501] ? device_driver_attach+0x1e0/0x1e0 [ 101.434503] ? pci_bridge_d3_possible+0x1e0/0x1e0 [ 101.434506] ? pci_create_resource_files+0xeb/0x190 [ 101.434511] pci_bus_add_device+0x6c/0xa0 [ 101.434514] pci_iov_add_virtfn+0x9e4/0xe00 [ 101.434517] ? trace_hardirqs_on+0x2d/0x100 [ 101.434521] sriov_enable+0x64a/0xca0 [ 101.434524] ? pcibios_sriov_disable+0x10/0x10 [ 101.434528] mlx5_core_sriov_configure+0xab/0x280 [mlx5_core] [ 101.434602] sriov_numvfs_store+0x20a/0x310 [ 101.434605] ? sriov_totalvfs_show+0xc0/0xc0 [ 101.434608] ? sysfs_file_ops+0x170/0x170 [ 101.434611] ? sysfs_file_ops+0x117/0x170 [ 101.434614] ? sysfs_file_ops+0x170/0x170 [ 101.434616] kernfs_fop_write_iter+0x348/0x520 [ 101.434619] new_sync_write+0x2e5/0x520 [ 101.434621] ? new_sync_read+0x520/0x520 [ 101.434624] ? lock_acquire+0x1c1/0x550 [ 101.434626] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 101.434630] vfs_write+0x5cb/0x8d0 [ 101.434633] ksys_write+0xf3/0x1d0 [ 101.434635] ? __x64_sys_read+0xb0/0xb0 [ 101.434638] ? lockdep_hardirqs_on_prepare+0x286/0x400 [ 101.434640] ? syscall_enter_from_user_mode+0x1d/0x50 [ 101.434643] do_syscall_64+0x3d/0x90 [ 101.434647] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 101.434650] RIP: 0033:0x7f5ff536b2f7 [ 101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7 [ 101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001 [ 101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001 [ 101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002 [ 101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700 [ 101.434673] </TASK> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 20:10:46 -07:00
Pavel Begunkov	2e07a521e1	skbuff: add SKBFL_DONT_ORPHAN flag We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 19:58:45 -07:00
Pavel Begunkov	1b4b2b09d4	skbuff: don't mix ubuf_info from different sources We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they might be not compatible. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 19:58:45 -07:00
Pavel Begunkov	773ba4fe91	ipv6: avoid partial copy for zc Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such copies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 19:58:29 -07:00
Pavel Begunkov	8eb77cc739	ipv4: avoid partial copy for zc Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such copies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-18 19:58:03 -07:00
Tetsuo Handa	3598cb6e18	wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop() lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit `f856373e2f` ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: `f856373e2f` ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp	2022-07-18 15:01:14 +03:00
Kuniyuki Iwashima	021266ec64	tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout. While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `cf1ef3f071` ("net/tcp_fastopen: Disable active side TFO in certain scenarios") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	5a54213318	tcp: Fix data-races around sysctl_tcp_fastopen. While reading sysctl_tcp_fastopen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `2100c8d2d9` ("net-tcp: Fast Open base") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	79539f3474	tcp: Fix data-races around sysctl_max_syn_backlog. While reading sysctl_max_syn_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	cbfc649558	tcp: Fix a data-race around sysctl_tcp_tw_reuse. While reading sysctl_tcp_tw_reuse, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	39e24435a7	tcp: Fix data-races around some timeout sysctl knobs. While reading these sysctl knobs, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - tcp_retries1 - tcp_retries2 - tcp_orphan_retries - tcp_fin_timeout Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	46778cd16e	tcp: Fix data-races around sysctl_tcp_reordering. While reading sysctl_tcp_reordering, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	4177f54589	tcp: Fix data-races around sysctl_tcp_migrate_req. While reading sysctl_tcp_migrate_req, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `f9ac779f88` ("net: Introduce net.ipv4.tcp_migrate_req.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	f2e383b5bb	tcp: Fix data-races around sysctl_tcp_syncookies. While reading sysctl_tcp_syncookies, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	20a3b1c0f6	tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries. While reading sysctl_tcp_syn(ack)?_retries, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	f2f316e287	tcp: Fix data-races around keepalive sysctl knobs. While reading sysctl_tcp_keepalive_(time\|probes\|intvl), they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	8ebcc62c73	igmp: Fix data-races around sysctl_igmp_qrv. While reading sysctl_igmp_qrv, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next. qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); Fixes: `a9fe8e2994` ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima	6ae0f2e553	igmp: Fix data-races around sysctl_igmp_max_msf. While reading sysctl_igmp_max_msf, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:53 +01:00
Kuniyuki Iwashima	6305d821e3	igmp: Fix a data-race around sysctl_igmp_max_memberships. While reading sysctl_igmp_max_memberships, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:53 +01:00
Kuniyuki Iwashima	f6da2267e7	igmp: Fix data-races around sysctl_igmp_llm_reports. While reading sysctl_igmp_llm_reports, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next. if (ipv4_is_local_multicast(pmc->multiaddr) && !READ_ONCE(net->ipv4.sysctl_igmp_llm_reports)) Fixes: `df2cf4a78e` ("IGMP: Inhibit reports for local multicast groups") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 12:21:53 +01:00
Tariq Toukan	f08d8c1bb9	net/tls: Fix race in TLS device down flow Socket destruction flow and tls_device_down function sync against each other using tls_device_lock and the context refcount, to guarantee the device resources are freed via tls_dev_del() by the end of tls_device_down. In the following unfortunate flow, this won't happen: - refcount is decreased to zero in tls_device_sk_destruct. - tls_device_down starts, skips the context as refcount is zero, going all the way until it flushes the gc work, and returns without freeing the device resources. - only then, tls_device_queue_ctx_destruction is called, queues the gc work and frees the context's device resources. Solve it by decreasing the refcount in the socket's destruction flow under the tls_device_lock, for perfect synchronization. This does not slow down the common likely destructor flow, in which both the refcount is decreased and the spinlock is acquired, anyway. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-18 11:33:07 +01:00

1 2 3 4 5 ...

70062 Commits