linux

Author	SHA1	Message	Date
Sebastian Andrzej Siewior	3fb4430e73	net: caif: Use netif_rx(). Since commit `baebdf48c3` ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 12:02:19 +00:00
Sebastian Andrzej Siewior	4343b866aa	net: sgi-xp: Use netif_rx(). Since commit `baebdf48c3` ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Robin Holt <robinmholt@gmail.com> Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Mike Travis <mike.travis@hpe.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 12:02:19 +00:00
Sebastian Andrzej Siewior	aa4e5761bf	net: xtensa: Use netif_rx(). Since commit `baebdf48c3` ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: linux-xtensa@linux-xtensa.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 12:02:19 +00:00
Sebastian Andrzej Siewior	21f95a88ea	docs: networking: Use netif_rx(). Since commit `baebdf48c3` ("net: dev: Makes sure netif_rx() can be invoked in any context.") the function netif_rx() can be used in preemptible/thread context as well as in interrupt context. Use netif_rx(). Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 12:02:19 +00:00
David S. Miller	4ee508ff78	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-03-03 Jacob Keller says: This series refactors the ice networking driver VF storage from a simple static array to a hash table. It also introduces krefs and proper locking and protection to prevent common use-after-free and concurrency issues. There are two motivations for this work. First is to make the ice driver more resilient by preventing a whole class of use-after-free bugs that can occur around concurrent access to VF structures while removing VFs. The second is to prepare the ice driver for future virtualization work to support Scalable IOV, an alternative VF implementation compared to Single Root IOV. The new VF implementation will allow for more dynamic VF creation and removal, necessitating a more robust implementation for VF storage that can't rely on the existing mechanisms to prevent concurrent access violations. The first few patches are cleanup and preparatory work needed to make the conversion to the hash table safe. Following this preparatory work is a patch to migrate the VF structures and variables to a new sub-structure for code clarity. Next introduce new interface functions to abstract the VF storage. Finally, the driver is actually converted to the hash table and kref implementation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:45:51 +00:00
David S. Miller	f2ecfa06af	Merge branch 'ocelot-felix-cleanups' Vladimir Oltean says: ==================== Cleanups for ocelot/felix drivers This patch set is an assorted collection of minor cleanups brought to the felix DSA driver and ocelot switch library. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:24 +00:00
Vladimir Oltean	162fbf6a2f	net: dsa: felix: remove redundant assignment in felix_8021q_cpu_port_deinit Due to an apparently incorrect conflict resolution on my part in commit `54c3198460` ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware"), "ocelot->ports[port]->is_dsa_8021q_cpu = false" was supposed to be replaced by "ocelot_port_unset_dsa_8021q_cpu(ocelot, port)" which does the same thing, and more. But now we have both, so the direct assignment is redundant. Remove it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	5d3bb7dda4	net: dsa: felix: print error message in felix_check_xtr_pkt() Packet extraction failures over register-based MMIO are silent, and difficult to pinpoint. Add an error message to remedy this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	dbd032856b	net: dsa: felix: initialize "err" to 0 in felix_check_xtr_pkt() Automated tools complain that felix_check_xtr_pkt() has logic to drain the CPU queue on the reception of a PTP packet over Ethernet, yet it returns an uninitialized error code in the case where the CPU queue was empty. This is not likely to happen (/possible if hardware works correctly), but it isn't a fatal condition either. The PTP packet will be dequeued from the CPU queue when the next PTP packet arrives. So initialize "err" to 0 for the case where nothing was dequeued during this iteration. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	d219b4b674	net: dsa: felix: drop the ptp_type argument from felix_check_xtr_pkt() The DSA ->port_rxtstamp() function is never called for PTP_CLASS_NONE: dsa_skb_defer_rx_timestamp: if (type == PTP_CLASS_NONE) return false; if (likely(ds->ops->port_rxtstamp)) return ds->ops->port_rxtstamp(ds, p->dp->index, skb, type); So practically, the argument is unused, so remove it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	28c1305b0b	net: dsa: felix: remove ocelot->npi assignment from felix_8021q_cpu_port_init This assignment is redundant, since ocelot->npi has already been set to -1 by felix_npi_port_deinit(). Call path: felix_change_tag_protocol -> felix_del_tag_protocol(DSA_TAG_PROTO_OCELOT) -> felix_teardown_tag_npi -> felix_npi_port_deinit -> felix_set_tag_protocol(DSA_TAG_PROTO_OCELOT_8021Q) -> felix_setup_tag_8021q -> felix_8021q_cpu_port_init Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	c3cde44f3c	net: mscc: ocelot: use pretty names for IPPROTO_UDP and IPPROTO_TCP Hardcoding these IP protocol numbers in is2_entry_set() obscures the purpose of the code, so replace the magic numbers with the definitions from linux/in.h. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Vladimir Oltean	c5a0edaeb9	net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_block_remove_filter Simplify ocelot_vcap_block_remove_filter by using list_for_each_entry instead of list_for_each. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-04 09:21:23 +00:00
Dust Li	f9f52c3474	net/smc: fix document build WARNING from smc-sysctl.rst Stephen reported the following warning messages from smc-sysctl.rst Documentation/networking/smc-sysctl.rst:3: WARNING: Title overline too short. Documentation/networking/smc-sysctl.rst: WARNING: document isn't included in any toctree Fix the title overline and add smc-sysctl entry into Documentation/networking/index.rst Fixes: `12bbb0d163` ("net/smc: add sysctl for autocorking") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Link: https://lore.kernel.org/r/20220303113527.62047-1-dust.li@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 21:24:34 -08:00
Haowen Bai	2f5e65de04	net: marvell: Use min() instead of doing it manually Fix following coccicheck warning: drivers/net/ethernet/marvell/mv643xx_eth.c:1664:35-36: WARNING opportunity for min() Signed-off-by: Haowen Bai <baihaowen88@gmail.com> Link: https://lore.kernel.org/r/1646271529-7659-1-git-send-email-baihaowen88@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 21:06:43 -08:00
Jacob Keller	3d5985a185	ice: convert VF storage to hash table with krefs and RCU The ice driver stores VF structures in a simple array which is allocated once at the time of VF creation. The VF structures are then accessed from the array by their VF ID. The ID must be between 0 and the number of allocated VFs. Multiple threads can access this table: * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust * interrupts, such as due to messages from the VF using the virtchnl communication * processing such as device reset * commands to add or remove VFs The current implementation does not keep track of when all threads are done operating on a VF and can potentially result in use-after-free issues caused by one thread accessing a VF structure after it has been released when removing VFs. Some of these are prevented with various state flags and checks. In addition, this structure is quite static and does not support a planned future where virtualization can be more dynamic. As we begin to look at supporting Scalable IOV with the ice driver (as opposed to just supporting Single Root IOV), this structure is not sufficient. In the future, VFs will be able to be added and removed individually and dynamically. To allow for this, and to better protect against a whole class of use-after-free bugs, replace the VF storage with a combination of a hash table and krefs to reference track all of the accesses to VFs through the hash table. A hash table still allows efficient look up of the VF given its ID, but also allows adding and removing VFs. It does not require contiguous VF IDs. The use of krefs allows the cleanup of the VF memory to be delayed until after all threads have released their reference (by calling ice_put_vf). To prevent corruption of the hash table, a combination of RCU and the mutex table_lock are used. Addition and removal from the hash table use the RCU-aware hash macros. This allows simple read-only look ups that iterate to locate a single VF can be fast using RCU. Accesses which modify the hash table, or which can't take RCU because they sleep, will hold the mutex lock. By using this design, we have a stronger guarantee that the VF structure can't be released until after all threads are finished operating on it. We also pave the way for the more dynamic Scalable IOV implementation in the future. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 11:57:18 -08:00
Jakub Kicinski	80901bff81	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net net/batman-adv/hard-interface.c commit `690bb6fb64` ("batman-adv: Request iflink once in batadv-on-batadv check") commit `6ee3c393ee` ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit `4d08b7b57e` ("net/smc: Fix cleanup when register ULP fails") commit `462791bbfa` ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 11:55:12 -08:00
Linus Torvalds	b949c21fc2	Networking fixes for 5.17-rc7, including fixes from can, xfrm, wifi, bluetooth, and netfilter. Current release - regressions: - iwlwifi: don't advertise TWT support, prevent FW crash - xfrm: fix the if_id check in changelink - xen/netfront: destroy queues before real_num_tx_queues is zeroed - bluetooth: fix not checking MGMT cmd pending queue, make scanning work again Current release - new code bugs: - mptcp: make SIOCOUTQ accurate for fallback socket - bluetooth: access skb->len after null check - bluetooth: hci_sync: fix not using conn_timeout - smc: fix cleanup when register ULP fails - dsa: restore error path of dsa_tree_change_tag_proto - iwlwifi: fix build error for IWLMEI - iwlwifi: mvm: propagate error from request_ownership to the user Previous releases - regressions: - xfrm: fix pMTU regression when reported pMTU is too small - xfrm: fix TCP MSS calculation when pMTU is close to 1280 - bluetooth: fix bt_skb_sendmmsg not allocating partial chunks - ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks - ipv6: prevent leaks in igmp6 when input queues get full - fix up skbs delta_truesize in UDP GRO frag_list - eth: e1000e: fix possible HW unit hang after an s0ix exit - eth: e1000e: correct NVM checksum verification flow - ptp: ocp: fix large time adjustments Previous releases - always broken: - tcp: make tcp_read_sock() more robust in presence of urgent data - xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate - xfrm: fix xfrm_migrate issues when address family changes - dcb: flush lingering app table entries for unregistered devices - smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error - mac80211: fix EAPoL rekey fail in 802.3 rx path - mac80211: fix forwarded mesh frames AC & queue selection - netfilter: nf_queue: fix socket access races and bugs - batman-adv: fix ToCToU iflink problems and check the result belongs to the expected net namespace - can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting - can: rcar_canfd: register the CAN device when fully ready - eth: igb, igc: phy: drop premature return leaking HW semaphore - eth: ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc(), prevent live lock when link goes down - eth: stmmac: only enable DMA interrupts when ready - eth: sparx5: move vlan checks before any changes are made - eth: iavf: fix races around init, removal, resets and vlan ops - ibmvnic: more reset flow fixes Misc: - eth: fix return value of __setup handlers Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIhAo4ACgkQMUZtbf5S IrtLdQ/+LstJ6PQpRVa9Pu68Vu0pjwg0FiuBt/C4G//N46uvOvo5ub+Lx0JajZMt m4FBEFji2AXnfvbV/4WIZw+slcEfn2r1oprh0aqS5Ba+s3gQMbcl1C5daXcf7Tte tLFSVsxPBl2AEthps4YFSyMQyczrwVry20uBBgswTkDfyrN4uSLFBKmVQscEsRtQ dxt2AbxazStJM60Q+PI9Zfru3bXGEFgaG07z8RnTTvIJQFpYYsFUMIsee+30GYdc nQRAvrPwFBcSdwzaDf2WLe26MalJ1r7fXe9Mta1IMBFc/e0/8BQWZ6DT8x5n/snc gRJRL37E6V6QCtf80GLR7wR9/NkOxckeva3Z2yt6lOyUkVu4FStFA71iF4g9zH2W GfGw8ejD++suGR+YRqA8ou1vR69to+Q2M8VP+m75sdI0XU61oGquSPOUKyGQJOfx ndCtVW82FaAnQDTs0OBAdliPTCLkTONl0Bezr7htyAiEb8dcNMZESg/szabI+mZS ZKMu+rtw5DFMUsFx0ihAj5vE9mmbnsm/b1Mj+WjziOAD00p/WGu64ot7tMTJtv9B zkVNDbYwg1pFNIeiF/2FRXtsELad6VtUQJL2GQ0vkwas5jXAmymrtjuo5iiHZ9nR Oo9OdwhIFHYWbWTkGMW45uKX3MWTwz5Wne/xGiTpVpke7zhgTso= =TqqZ -----END PGP SIGNATURE----- Merge tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, xfrm, wifi, bluetooth, and netfilter. Lots of various size fixes, the length of the tag speaks for itself. Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees which had been lagging as you pointed out previously. But there's also a larger than we'd like portion of fixes for bugs from previous releases. Three more fixes still under discussion, including and xfrm revert for uAPI error. Current release - regressions: - iwlwifi: don't advertise TWT support, prevent FW crash - xfrm: fix the if_id check in changelink - xen/netfront: destroy queues before real_num_tx_queues is zeroed - bluetooth: fix not checking MGMT cmd pending queue, make scanning work again Current release - new code bugs: - mptcp: make SIOCOUTQ accurate for fallback socket - bluetooth: access skb->len after null check - bluetooth: hci_sync: fix not using conn_timeout - smc: fix cleanup when register ULP fails - dsa: restore error path of dsa_tree_change_tag_proto - iwlwifi: fix build error for IWLMEI - iwlwifi: mvm: propagate error from request_ownership to the user Previous releases - regressions: - xfrm: fix pMTU regression when reported pMTU is too small - xfrm: fix TCP MSS calculation when pMTU is close to 1280 - bluetooth: fix bt_skb_sendmmsg not allocating partial chunks - ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks - ipv6: prevent leaks in igmp6 when input queues get full - fix up skbs delta_truesize in UDP GRO frag_list - eth: e1000e: fix possible HW unit hang after an s0ix exit - eth: e1000e: correct NVM checksum verification flow - ptp: ocp: fix large time adjustments Previous releases - always broken: - tcp: make tcp_read_sock() more robust in presence of urgent data - xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate - xfrm: fix xfrm_migrate issues when address family changes - dcb: flush lingering app table entries for unregistered devices - smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error - mac80211: fix EAPoL rekey fail in 802.3 rx path - mac80211: fix forwarded mesh frames AC & queue selection - netfilter: nf_queue: fix socket access races and bugs - batman-adv: fix ToCToU iflink problems and check the result belongs to the expected net namespace - can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting - can: rcar_canfd: register the CAN device when fully ready - eth: igb, igc: phy: drop premature return leaking HW semaphore - eth: ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc(), prevent live lock when link goes down - eth: stmmac: only enable DMA interrupts when ready - eth: sparx5: move vlan checks before any changes are made - eth: iavf: fix races around init, removal, resets and vlan ops - ibmvnic: more reset flow fixes Misc: - eth: fix return value of __setup handlers" * tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() selftests: mlxsw: resource_scale: Fix return value selftests: mlxsw: tc_police_scale: Make test more robust net: dcb: disable softirqs in dcbnl_flush_dev() bnx2: Fix an error message sfc: extend the locking on mcdi->seqno net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe() tcp: make tcp_read_sock() more robust bpf, sockmap: Do not ignore orig_len parameter net: ipa: add an interconnect dependency net: fix up skbs delta_truesize in UDP GRO frag_list iwlwifi: mvm: return value for request_ownership nl80211: Update bss channel on channel switch for P2P_CLIENT iwlwifi: fix build error for IWLMEI ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments batman-adv: Don't expect inter-netns unique iflink indices ...	2022-03-03 11:10:56 -08:00
Jacob Keller	fb916db1f0	ice: introduce VF accessor functions Before we switch the VF data structure storage mechanism to a hash, introduce new accessor functions to define the new interface. * ice_get_vf_by_id is a function used to obtain a reference to a VF from the table based on its VF ID * ice_has_vfs is used to quickly check if any VFs are configured * ice_get_num_vfs is used to get an exact count of how many VFs are configured We can drop the old ice_validate_vf_id function, since every caller was just going to immediately access the VF table to get a reference anyways. This way we simply use the single ice_get_vf_by_id to both validate the VF ID is within range and that there exists a VF with that ID. This change enables us to more easily convert the codebase to the hash table since most callers now properly use the interface. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 10:54:33 -08:00
Jacob Keller	000773c00f	ice: factor VF variables to separate structure We maintain a number of values for VFs within the ice_pf structure. This includes the VF table, the number of allocated VFs, the maximum number of supported SR-IOV VFs, the number of queue pairs per VF, the number of MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events. We're about to add a few more variables to this list. Clean this up first by extracting these members out into a new ice_vfs structure defined in ice_virtchnl_pf.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 10:54:29 -08:00
Linus Torvalds	e58bd49da6	- Fix memory detection for MT7621 devices - Fix setnocoherentio kernel option - Fix warning when CONFIG_SCHED_CORE is enabled -----BEGIN PGP SIGNATURE----- iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmIg8MAaHHRzYm9nZW5k QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDWNA/+I9PXwkDfsis1bFWDC4tE FI7IQWPN4Cm03gVzuWIZqMzYicesvF8Kg5GmAWv0NPuuRa3wHAFqhvvm339kELXq YHVtcNUW4vMe5SEmKXZeL4S2k4tW/D7IZwwSgSsaAg4APz2fyJTogprnkb7crRi/ C82rONLj+ksePyG/duEfDa78l7dNrT127Jxw3H161XB+oYweEsi3SrXrOMo7lzel iZ4VMa5E63DSCxzSzMKQvj5YKr/2JFozVeKJe0X+zREbASzAabH2TIQUVOSUBzPq Bhv/4kxgUlZVDVsS18DcGI3mC9z9jH9rCmveUVjNofshqZpNiqFdAxyjU7jra4qc o4+Rk/O9lQXRQVMBMuzgEadGdC9yuBVjRiMXAQgPbNRQwMy7xfqiDpRZJkE99+VR HXikcKuIF1CZn33RgHmSGnE5QfBSol/9yrD3dC+439zi3l5fy3dkMJb/lNs/bWDi rW7pZ4d1p+DWZHmF8p5w0rWZzhbN+yXhjUM9j5Urs3l7dTRnSXJQ4Se6l5Nsmr9O s7a+HQYYZgrG/xPfSBExiBLVwg3JOkHC+HzMQPJQPRHAApxzYMIeDEh5MUIu+qVN /ShKPjcF+fsZIjqKmL+n6Wy8KZOCkrDXc4I3TiAfLUtjhoJc9yiCz+avhIYuaCON Dn3/wpfCNwYVM/DfBoKxcZ0= =VOCV -----END PGP SIGNATURE----- Merge tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix memory detection for MT7621 devices - Fix setnocoherentio kernel option - Fix warning when CONFIG_SCHED_CORE is enabled * tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: ralink: mt7621: use bitwise NOT instead of logical mips: setup: fix setnocoherentio() boolean setting MIPS: smp: fill in sibling and core maps earlier MIPS: ralink: mt7621: do memory detection on KSEG1	2022-03-03 10:38:28 -08:00
Linus Torvalds	4d5ae2340d	A few `lcd2s` fixes for `auxdisplay` from Andy Shevchenko. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmIgrbgACgkQGXyLc2ht IW2v7Q//YHd66M2nFmvfchfIS46x1kXeJo4hlTqJSnAZ2kWzafMMBcFaFb2rtlC6 Uqe5vEJiqCmOGv45p0s5GycIXbDlcrlAIhMu+FOrbYQmyuCT7ysSAOQdfn9j8Pwl Px5pc/UmeZrcz+WFkvyPvkLUjDgoH5dLt0rK7ZBO2MxTJUoFfZVZ/WhEuhT7JyKA aP79AYPbwXbvyC4LXuT1umxeuNOQr2npajeovkZ15geWjZ/z0YLQ2XCWvOrkVG0N XCYFezLaCebKzu2IwwYYv0KLmJNfdaHcbE5MDLbYxU5N6Uwaq+JnD+GshiPHqgGM 60TyrJiNx8Qaib9WBoAUXsGtGH8Bohwuyra89nA8jz6dWN3ko9uE6iJAmSY38Iz7 li4553fwSeRt9GDbNckqYO5jEmpUtIaX/eDgvxU6uOE05HhwzXAw3A2w+oV46tWO R7cr2sl8u7lCtyyuZQ/ETmqC1y2QlmS6W5pht9jCwcon7I+c2UAyG6yd1Mx+fIhh RixY39sV4oNxJmNibHfueKxXuQaiLxGqRS0KLmKEkmXhyPKfczWaPsic0UaW70QB W+WVhUHlQT4qjxWZkMDPaojOdynptP5qs/s94/2/YSLrC0MGkb1qFI0pMMhcCm4e yQnaSdtlNQ/Z2zdFWUFeGDB5bW+Bo1HnHoJkrt+GQdA8+ITf4qo= =yVdT -----END PGP SIGNATURE----- Merge tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux Pull auxdisplay fixes from Miguel Ojeda: "A few lcd2s fixes from Andy Shevchenko" * tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux: auxdisplay: lcd2s: Use proper API to free the instance of charlcd object auxdisplay: lcd2s: Fix memory leak in ->remove() auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature	2022-03-03 10:31:09 -08:00
Eric Dumazet	2d3916f318	ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() While investigating on why a synchronize_net() has been added recently in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report() might drop skbs in some cases. Discussion about removing synchronize_net() from ipv6_mc_down() will happen in a different thread. Fixes: `f185de28d9` ("mld: add new workqueues for process mld events") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 09:47:06 -08:00
Jacob Keller	c4c2c7db64	ice: convert ice_for_each_vf to include VF entry iterator The ice_for_each_vf macro is intended to be used to loop over all VFs. The current implementation relies on an iterator that is the index into the VF array in the PF structure. This forces all users to perform a look up themselves. This abstraction forces a lot of duplicate work on callers and leaks the interface implementation to the caller. Replace this with an implementation that includes the VF pointer the primary iterator. This version simplifies callers which just want to iterate over every VF, as they no longer need to perform their own lookup. The "i" iterator value is replaced with a new unsigned int "bkt" parameter, as this will match the necessary interface for replacing the VF array with a hash table. For now, the bkt is the VF ID, but in the future it will simply be the hash bucket index. Document that it should not be treated as a VF ID. This change aims to simplify switching from the array to a hash table. I considered alternative implementations such as an xarray but decided that the hash table was the simplest and most suitable implementation. I also looked at methods to hide the bkt iterator entirely, but I couldn't come up with a feasible solution that worked for hash table iterators. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	19281e8668	ice: use ice_for_each_vf for iteration during removal When removing VFs, the driver takes a weird approach of assigning pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary variable. This logic has been in the driver for a long time, and seems to have been carried forward from i40e. We want to refactor the way VFs are stored, and iterating over the data structure without the ice_for_each_vf interface impedes this work. The logic relies on implicitly using the num_alloc_vfs as a sort of "safe guard" for accessing VF data. While this sort of guard makes sense for Single Root IOV where all VFs are added at once, the data structures don't work for VFs which can be added and removed dynamically. We also have a separate state flag, ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against concurrent removal and access. Avoid the custom tmp iteration and replace it with the standard ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until after this loop finishes. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	59e1f857e3	ice: remove checks in ice_vc_send_msg_to_vf The ice_vc_send_msg_to_vf function is used by the PF to send a response to a VF. This function has overzealous checks to ensure its not passed a NULL VF pointer and to ensure that the passed in struct ice_vf has a valid vf_id sub-member. These checks have existed since commit `1071a8358a` ("ice: Implement virtchnl commands for AVF support") and function as simple sanity checks. We are planning to refactor the ice driver to use a hash table along with appropriate locks in a future refactor. This change will modify how the ice_validate_vf_id function works. Instead of a simple >= check to ensure the VF ID is between some range, it will check the hash table to see if the specified VF ID is actually in the table. This requires that the function properly lock the table to prevent race conditions. The checks may seem ok at first glance, but they don't really provide much benefit. In order for ice_vc_send_msg_to_vf to have these checks fail, the callers must either (1) pass NULL as the VF, (2) construct an invalid VF pointer manually, or (3) be using a VF pointer which becomes invalid after they obtain it properly using ice_get_vf_by_id. For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show that in most cases the functions already operate assuming their VF pointer is valid, such as by derferencing vf->pf or other members. They obtain the VF pointer by accessing the VF array using the VF ID, which can never produce a NULL value (since its a simple address operation on the array it will not be NULL. The sole exception for (1) is that ice_vc_process_vf_msg will forward a NULL VF pointer to this function as part of its goto error handler logic. This requires some minor cleanup to simply exit immediately when an invalid VF ID is detected (Rather than use the same error flow as the rest of the function). For (2), it is unexpected for a flow to construct a VF pointer manually instead of accessing the VF array. Defending against this is likely to just hide bad programming. For (3), it is definitely true that VF pointers could become invalid, for example if a thread is processing a VF message while the VF gets removed. However, the correct solution is not to add additional checks like this which do not guarantee to prevent the race. Instead we plan to solve the root of the problem by preventing the possibility entirely. This solution will require the change to a hash table with proper locking and reference counts of the VF structures. When this is done, ice_validate_vf_id will require locking of the hash table. This will be problematic because all of the callers of ice_vc_send_msg_to_vf will already have to take the lock to obtain the VF pointer anyways. With a mutex, this leads to a double lock that could hang the kernel thread. Avoid this by removing the checks which don't provide much value, so that we can safely add the necessary protections properly. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	44efe75f73	ice: move VFLR acknowledge during ice_free_vfs After removing all VFs, the driver clears the VFLR indication for VFs. This has been in ice since the beginning of SR-IOV support in the ice driver. The implementation was copied from i40e, and the motivation for the VFLR indication clearing is described in the commit `f7414531a0` ("i40e: acknowledge VFLR when disabling SR-IOV") The commit explains that we need to clear the VFLR indication because the virtual function undergoes a VFLR event. If we don't indicate that it is complete it can cause an issue when VFs are re-enabled due to a "phantom" VFLR. The register block read was added under a pci_vfs_assigned check originally. This was done because we added the check after calling pci_disable_sriov. This was later moved to disable SRIOV earlier in the flow so that the VF drivers could be torn down before we removed functionality. Move the VFLR acknowledge into the main loop that tears down VF resources. This avoids using the tmp value for iterating over VFs multiple times. The result will make it easier to refactor the VF array in a future change. It's possible we might want to modify this flow to also stop checking pci_vfs_assigned. However, it seems reasonable to keep this change: we should only clear the VFLR if we actually disabled SR-IOV. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	294627a67e	ice: move clear_malvf call in ice_free_vfs The ice_mbx_clear_malvf function is used to clear the indication and count of how many times a VF was detected as malicious. During ice_free_vfs, we use this function to ensure that all removed VFs are reset to a clean state. The call currently is done at the end of ice_free_vfs() using a tmp value to iterate over all of the entries in the bitmap. This separate iteration using tmp is problematic for a planned refactor of the VF array data structure. To avoid this, lets move the call slightly higher into the function inside the loop where we teardown all of the VFs. This avoids one use of the tmp value used for iteration. We'll fix the other user in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	cd0f4f3b2c	ice: pass num_vfs to ice_set_per_vf_res() We are planning to replace the simple array structure tracking VFs with a hash table. This change will also remove the "num_alloc_vfs" variable. Instead, new access functions to use the hash table as the source of truth will be introduced. These will generally be equivalent to existing checks, except during VF initialization. Specifically, ice_set_per_vf_res() cannot use the hash table as it will be operating prior to VF structures being inserted into the hash table. Instead of using pf->num_alloc_vfs, simply pass the num_vfs value in from the caller. Note that a sub-function of ice_set_per_vf_res, ice_determine_res, also implicitly depends on pf->num_alloc_vfs. Replace ice_determine_res with a simpler inline implementation based on rounddown_pow_of_two. Note that we must explicitly check that the argument is non-zero since it does not play well with zero as a value. Instead of using the function and while loop, simply calculate the number of queues we have available by dividing by num_vfs. Check if the desired queues are available. If not, round down to the nearest power of 2 that fits within our available queues. This matches the behavior of ice_determine_res but is easier to follow as simple in-line logic. Remove ice_determine_res entirely. With this change, we no longer depend on the pf->num_alloc_vfs during the initialization phase of VFs. This will allow us to safely remove it in a future planned refactor of the VF data structures. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:48 -08:00
Jacob Keller	b03d519d34	ice: store VF pointer instead of VF ID The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated with the VFs. This API was designed with the idea that VFs are stored in a simple array that was expected to be static throughout most of the driver's life. We plan on refactoring VF storage in a few key ways: 1) converting from a simple static array to a hash table 2) using krefs to track VF references obtained from the hash table 3) use RCU to delay release of VF memory until after all references are dropped This is motivated by the goal to ensure that the lifetime of VF structures is accounted for, and prevent various use-after-free bugs. With the existing vsi->vf_id, the reference tracking for VFs would become somewhat convoluted, because each VSI maintains a vf_id field which will then require performing a look up. This means all these flows will require reference tracking and proper usage of rcu_read_lock, etc. We know that the VF VSI will always be backed by a valid VF structure, because the VSI is created during VF initialization and removed before the VF is destroyed. Rely on this and store a reference to the VF in the VSI structure instead of storing a VF ID. This will simplify the usage and avoid the need to perform lookups on the hash table in the future. For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a vsi->vf pointer is valid when dealing with VF VSIs. This will aid in debugging code which violates this assumption and avoid more disastrous panics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:47 -08:00
Jacob Keller	df830543d6	ice: refactor unwind cleanup in eswitch mode The code for supporting eswitch mode and port representors on VFs uses an unwind based cleanup flow when handling errors. These flows are used to cleanup and get everything back to the state prior to attempting to switch from legacy to representor mode or back. The unwind iterations make sense, but complicate a plan to refactor the VF array structure. In the future we won't have a clean method of reversing an iteration of the VFs. Instead, we can change the cleanup flow to just iterate over all VF structures and clean up appropriately. First notice that ice_repr_add_for_all_vfs and ice_repr_rem_from_all_vfs have an additional step of re-assigning the VC ops. There is no good reason to do this outside of ice_repr_add and ice_repr_rem. It can simply be done as the last step of these functions. Second, make sure ice_repr_rem is safe to call on a VF which does not have a representor. Check if vf->repr is NULL first and exit early if so. Move ice_repr_rem_from_all_vfs above ice_repr_add_for_all_vfs so that we can call it from the cleanup function. In ice_eswitch.c, replace the unwind iteration with a call to ice_eswitch_release_reprs. This will go through all of the VFs and revert the VF back to the standard model without the eswitch mode. To make this safe, ensure this function checks whether or not the represent or has been moved. Rely on the metadata destination in vf->repr->dst. This must be NULL if the representor has not been moved to eswitch mode. Ensure that we always re-assign this value back to NULL after freeing it, and move the ice_eswitch_release_reprs so that it can be called from the setup function. With these changes, eswitch cleanup no longer uses an unwind flow that is problematic for the planned VF data structure change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2022-03-03 08:46:47 -08:00
Vladimir Oltean	e1bec7fa1c	net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change The blamed commit said one thing but did another. It explains that we should restore the "return err" to the original "goto out_unwind_tagger", but instead it replaced it with "goto out_unlock". When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a multi-switch tree, the switches would end up not using the same tagging protocol. Fixes: `0b0e2ff103` ("net: dsa: restore error path of dsa_tree_change_tag_proto") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:39:12 -08:00
Maciej Fijalkowski	6c7273a266	ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() Commit `c685c69fba` ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") addressed the ring transient state when MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the interface to through down/up. Maurice reported that when carrier is not ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU cycles due to the constant NAPI rescheduling as ixgbe_poll() states that there is still some work to be done. To fix this, do not set work_done to false for a !netif_carrier_ok(). Fixes: `c685c69fba` ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") Reported-by: Maurice Baijens <maurice.baijens@ellips.com> Tested-by: Maurice Baijens <maurice.baijens@ellips.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:26:55 -08:00
Jakub Kicinski	312f2d500a	Merge branch 'selftests-mlxsw-a-couple-of-fixes' Ido Schimmel says: ==================== selftests: mlxsw: A couple of fixes Patch #1 fixes a breakage due to a change in iproute2 output. The real problem is not iproute2, but the fact that the check was not strict enough. Fixed by using JSON output instead. Targeting at net so that the test will pass as part of old and new kernels regardless of iproute2 version. Patch #2 fixes an issue uncovered by the first one. ==================== Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:14:06 -08:00
Amit Cohen	196f9bc050	selftests: mlxsw: resource_scale: Fix return value The test runs several test cases and is supposed to return an error in case at least one of them failed. Currently, the check of the return value of each test case is in the wrong place, which can result in the wrong return value. For example: # TESTS='tc_police' ./resource_scale.sh TEST: 'tc_police' [default] 968 [FAIL] tc police offload count failed Error: mlxsw_spectrum: Failed to allocate policer index. We have an error talking to the kernel Command failed /tmp/tmp.i7Oc5HwmXY:969 TEST: 'tc_police' [default] overflow 969 [ OK ] ... TEST: 'tc_police' [ipv4_max] overflow 969 [ OK ] $ echo $? 0 Fix this by moving the check to be done after each test case. Fixes: `059b18e21c` ("selftests: mlxsw: Return correct error code in resource scale test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:14:01 -08:00
Amit Cohen	dc97520753	selftests: mlxsw: tc_police_scale: Make test more robust The test adds tc filters and checks how many of them were offloaded by grepping for 'in_hw'. iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control action offload") added offload indication to tc actions, producing the following output: $ tc filter show dev swp2 ingress ... filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0 eth_type ipv6 dst_ip 2001:db8:1::7bf skip_sw in_hw in_hw_count 1 action order 1: police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b ref 1 bind 1 not_in_hw used_hw_stats immediate The current grep expression matches on both 'in_hw' and 'not_in_hw', resulting in incorrect results. Fix that by using JSON output instead. Fixes: `5061e77326` ("selftests: mlxsw: Add scale test for tc-police") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:14:01 -08:00
Vladimir Oltean	10b6bb62ae	net: dcb: disable softirqs in dcbnl_flush_dev() Ido Schimmel points out that since commit `52cff74eef` ("dcbnl : Disable software interrupts before taking dcb_lock"), the DCB API can be called by drivers from softirq context. One such in-tree example is the chelsio cxgb4 driver: dcb_rpl -> cxgb4_dcb_handle_fw_update -> dcb_ieee_setapp If the firmware for this driver happened to send an event which resulted in a call to dcb_ieee_setapp() at the exact same time as another DCB-enabled interface was unregistering on the same CPU, the softirq would deadlock, because the interrupted process was already holding the dcb_lock in dcbnl_flush_dev(). Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as in the rest of the dcbnl code. Fixes: `91b0383fef` ("net: dcb: flush lingering app table entries for unregistered devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-03 08:01:55 -08:00
Christophe JAILLET	8ccffe9ac3	bnx2: Fix an error message Fix an error message and report the correct failing function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:48:40 +00:00
David S. Miller	25bf4df4d1	Merge branch 'ptp-ocp-next' Jonathan Lemon says: ==================== ptp: ocp: TOD and monitoring updates Add a series of patches for monitoring the status of the driver and adjusting TOD handling, especially around leap seconds. Add documentation for the new sysfs nodes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
Jonathan Lemon	4db073174f	docs: ABI: Document new timecard sysfs nodes. Add documentation for the tod_correction, clock_status_drift, and clock_status_offset nodes. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
Vadim Fedorenko	e68462a0d9	ptp: ocp: adjust utc_tai_offset to TOD info utc_tai_offset is used to correct IRIG, DCF and NMEA outputs and is set during initialisation but is not corrected during leap second announce event. Add watchdog code to control this correction. Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
Vadim Fedorenko	44a412d13b	ptp: ocp: add tod_correction attribute TOD correction register is used to compensate for leap seconds in different domains. Export it as an attribute with write access. Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
Vadim Fedorenko	2f23f486cf	ptp: ocp: Expose clock status drift and offset Monitoring of clock variance could be done through checking the offset and the drift updates that are applied to atomic clocks. Expose these values as attributes for the timecard. Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
Vadim Fedorenko	9f492c4cb2	ptp: ocp: add TOD debug information TOD information is currently displayed only on module load, which doesn't provide updated information as the system runs. Create a debug file which provides the current TOD status information, and move the information display there. Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:42:46 +00:00
David S. Miller	01e2d15796	Merge branch 'skb-mono-delivery-time' Martin KaFai Lau says: ==================== Preserve mono delivery time (EDT) in skb->tstamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. This set is to keep the mono delivery time and make it available to the final egress interface. Please see individual patch for the details. [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf v6: - Add kdoc and use non-UAPI type in patch 6 (Jakub) v5: netdev: - Patch 3 in v4 is broken down into smaller patches 3, 4, and 5 in v5 - The mono_delivery_time bit clearing in __skb_tstamp_tx() is done in __net_timestamp() instead. This is patch 4 in v5. - Missed a skb_clear_delivery_time() for the 'skip_classify' case in dev.c in v4. That is fixed in patch 5 in v5 for correctness. The skb_clear_delivery_time() will be moved to a later stage in Patch 10, so it was an intermediate error in v4. - Added delivery time handling for nfnetlink_{log, queue}.c in patch 9 (Daniel) - Added delivery time handling in the IPv6 IOAM hop-by-hop option which has an experimental IANA assigned value 49 in patch 8 - Added delivery time handling in nf_conntrack for the ipv6 defrag case in patch 7 - Removed unlikely() from testing skb->mono_delivery_time (Daniel) bpf: - Remove the skb->tstamp dance in ingress. Depends on bpf insn rewrite to return 0 if skb->tstamp has delivery time in patch 11. It is to backward compatible with the existing tc-bpf@ingress in patch 11. - bpf_set_delivery_time() will also allow dtime == 0 and dtime_type == BPF_SKB_DELIVERY_TIME_NONE as argument in patch 12. v4: netdev: - Push the skb_clear_delivery_time() from ip_local_deliver() and ip6_input() to ip_local_deliver_finish() and ip6_input_finish() to accommodate the ipvs forward path. This is the notable change in v4 at the netdev side. - Patch 3/8 first does the skb_clear_delivery_time() after sch_handle_ingress() in dev.c and this will make the tc-bpf forward path work via the bpf_redirect_*() helper. - The next patch 4/8 (new in v4) will then postpone the skb_clear_delivery_time() from dev.c to the ip_local_deliver_finish() and ip6_input_finish() after taking care of the tstamp usage in the ip defrag case. This will make the kernel forward path also work, e.g. the ip[6]_forward(). - Fixed a case v3 which missed setting the skb->mono_delivery_time bit when sending TCP rst/ack in some cases (e.g. from a ctl_sk). That case happens at ip_send_unicast_reply() and tcp_v6_send_response(). It is fixed in patch 1/8 (and then patch 3/8) in v4. bpf: - Adding __sk_buff->delivery_time_type instead of adding __sk_buff->mono_delivery_time as in v3. The tc-bpf can stay with one __sk_buff->tstamp instead of having two 'time' fields while one is 0 and another is not. tc-bpf can use the new __sk_buff->delivery_time_type to tell what is stored in __sk_buff->tstamp. - bpf_skb_set_delivery_time() helper is added to set __sk_buff->tstamp from non mono delivery_time to mono delivery_time - Most of the convert_ctx_access() bpf insn rewrite in v3 is gone, so no new rewrite added for __sk_buff->tstamp. The only rewrite added is for reading the new __sk_buff->delivery_time_type. - Added selftests, test_tc_dtime.c v3: - Feedback from v2 is using shinfo(skb)->tx_flags could be racy. - Considered to reuse a few bits in skb->tstamp to represent different semantics, other than more code churns, it will break the bpf usecase which currently can write and then read back the skb->tstamp. - Went back to v1 idea on adding a bit to skb and address the feedbacks on v1: - Added one bit skb->mono_delivery_time to flag that the skb->tstamp has the mono delivery_time (EDT), instead of adding a bit to flag if the skb->tstamp has been forwarded or not. - Instead of resetting the delivery_time back to the (rcv) timestamp during recvmsg syscall which may be too late and not useful, the delivery_time reset in v3 happens earlier once the stack knows that the skb will be delivered locally. - Handled the tapping@ingress case by af_packet - No need to change the (rcv) timestamp to mono clock base as in v1. The added one bit to flag skb->mono_delivery_time is enough to keep the EDT delivery_time during forward. - Added logic to the bpf side to make the existing bpf running at ingress can still get the (rcv) timestamp when reading the __sk_buff->tstamp. New __sk_buff->mono_delivery_time is also added. Test is still needed to test this piece. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:49 +00:00
Martin KaFai Lau	c803475fd8	bpf: selftests: test skb->tstamp in redirect_neigh This patch adds tests on forwarding the delivery_time for the following cases - tcp/udp + ip4/ip6 + bpf_redirect_neigh - tcp/udp + ip4/ip6 + ip[6]_forward - bpf_skb_set_delivery_time - The old rcv timestamp expectation on tc-bpf@ingress Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:49 +00:00
Martin KaFai Lau	8d21ec0e46	bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time() * __sk_buff->delivery_time_type: This patch adds __sk_buff->delivery_time_type. It tells if the delivery_time is stored in __sk_buff->tstamp or not. It will be most useful for ingress to tell if the __sk_buff->tstamp has the (rcv) timestamp or delivery_time. If delivery_time_type is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp. Two non-zero types are defined for the delivery_time_type, BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC. For UNSPEC, it can only happen in egress because only mono delivery_time can be forwarded to ingress now. The clock of UNSPEC delivery_time can be deduced from the skb->sk->sk_clockid which is how the sch_etf doing it also. * Provide forwarded delivery_time to tc-bpf@ingress: With the help of the new delivery_time_type, the tc-bpf has a way to tell if the __sk_buff->tstamp has the (rcv) timestamp or the delivery_time. During bpf load time, the verifier will learn if the bpf prog has accessed the new __sk_buff->delivery_time_type. If it does, it means the tc-bpf@ingress is expecting the skb->tstamp could have the delivery_time. The kernel will then read the skb->tstamp as-is during bpf insn rewrite without checking the skb->mono_delivery_time. This is done by adding a new prog->delivery_time_access bit. The same goes for writing skb->tstamp. * bpf_skb_set_delivery_time(): The bpf_skb_set_delivery_time() helper is added to allow setting both delivery_time and the delivery_time_type at the same time. If the tc-bpf does not need to change the delivery_time_type, it can directly write to the __sk_buff->tstamp as the existing tc-bpf has already been doing. It will be most useful at ingress to change the __sk_buff->tstamp from the (rcv) timestamp to a mono delivery_time and then bpf_redirect_*(). bpf only has mono clock helper (bpf_ktime_get_ns), and the current known use case is the mono EDT for fq, and only mono delivery time can be kept during forward now, so bpf_skb_set_delivery_time() only supports setting BPF_SKB_DELIVERY_TIME_MONO. It can be extended later when use cases come up and the forwarding path also supports other clock bases. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:49 +00:00
Martin KaFai Lau	7449197d60	bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress The current tc-bpf@ingress reads and writes the __sk_buff->tstamp as a (rcv) timestamp which currently could either be 0 (not available) or ktime_get_real(). This patch is to backward compatible with the (rcv) timestamp expectation at ingress. If the skb->tstamp has the delivery_time, the bpf insn rewrite will read 0 for tc-bpf running at ingress as it is not available. When writing at ingress, it will also clear the skb->mono_delivery_time bit. /* BPF_READ: a = __sk_buff->tstamp / if (!skb->tc_at_ingress \|\| !skb->mono_delivery_time) a = skb->tstamp; else a = 0 / BPF_WRITE: __sk_buff->tstamp = a */ if (skb->tc_at_ingress) skb->mono_delivery_time = 0; skb->tstamp = a; [ A note on the BPF_CGROUP_INET_INGRESS which can also access skb->tstamp. At that point, the skb is delivered locally and skb_clear_delivery_time() has already been done, so the skb->tstamp will only have the (rcv) timestamp. ] If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time has to be cleared also. It could be done together during convert_ctx_access(). However, the latter patch will also expose the skb->mono_delivery_time bit as __sk_buff->delivery_time_type. Changing the delivery_time_type in the background may surprise the user, e.g. the 2nd read on __sk_buff->delivery_time_type may need a READ_ONCE() to avoid compiler optimization. Thus, in expecting the needs in the latter patch, this patch does a check on !skb->tstamp after running the tc-bpf and clears the skb->mono_delivery_time bit if needed. The earlier discussion on v4 [0]. The bpf insn rewrite requires the skb's mono_delivery_time bit and tc_at_ingress bit. They are moved up in sk_buff so that bpf rewrite can be done at a fixed offset. tc_skip_classify is moved together with tc_at_ingress. To get one bit for mono_delivery_time, csum_not_inet is moved down and this bit is currently used by sctp. [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:48 +00:00
Martin KaFai Lau	cd14e9b7b8	net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally The previous patches handled the delivery_time in the ingress path before the routing decision is made. This patch can postpone clearing delivery_time in a skb until knowing it is delivered locally and also set the (rcv) timestamp if needed. This patch moves the skb_clear_delivery_time() from dev.c to ip_local_deliver_finish() and ip6_input_finish(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:48 +00:00
Martin KaFai Lau	80fcec6751	net: Get rcv tstamp if needed in nfnetlink_{log, queue}.c If skb has the (rcv) timestamp available, nfnetlink_{log, queue}.c logs/outputs it to the userspace. When the locally generated skb is looping from egress to ingress over a virtual interface (e.g. veth, loopback...), skb->tstamp may have the delivery time before it is known that will be delivered locally and received by another sk. Like handling the delivery time in network tapping, use ktime_get_real() to get the (rcv) timestamp. The earlier added helper skb_tstamp_cond() is used to do this. false is passed to the second 'cond' arg such that doing ktime_get_real() or not only depends on the netstamp_needed_key static key. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 14:38:48 +00:00

1 2 3 4 5 ...

1075683 Commits