linux

Author	SHA1	Message	Date
Dongliang Mu	2961f562bb	usbnet: fix the indentation of one code snippet Every line of code should start with tab (8 characters) Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210123051102.1091541-1-mudongliangabcd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 17:06:47 -08:00
Nikolay Aleksandrov	3e841bacf7	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: `baa74d39ca` ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 16:26:50 -08:00
wengjianfeng	02c2694090	nfc: fix typo change 'regster' to 'register' Signed-off-by: wengjianfeng <wengjianfeng@yulong.com> Acked-by: Mark Greer <mgreer@animalcreek.com> Link: https://lore.kernel.org/r/20210123082550.3748-1-samirweng1979@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:35:26 -08:00
wengjianfeng	afe197f44e	nfc: fdp: fix typo issue change 'paquet' to 'packet' Signed-off-by: wengjianfeng <wengjianfeng@yulong.com> Link: https://lore.kernel.org/r/20210123074835.9448-1-samirweng1979@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:33:42 -08:00
Jakub Kicinski	6d70cd2a42	Merge branch 'bnxt_en-error-recovery-improvements' Michael Chan says: ==================== bnxt_en: Error recovery improvements. This series contains a number of improvements in the area of error recovery. Most error recovery scenarios are tightly coordinated with the firmware. A number of patches add retry logic to establish connection with the firmware if there are indications that the firmware is still alive and will likely transition back to the normal state. Some patches speed up the recovery process and make it more reliable. There are some cleanup patches as well. ==================== Link: https://lore.kernel.org/r/1611558501-11022-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:07 -08:00
Michael Chan	0da65f4932	bnxt_en: Do not process completion entries after fatal condition detected. Once the firmware fatal condition is detected, we should cease comminication with the firmware and hardware quickly even if there are many completion entries in the completion rings. This will speed up the recovery process and prevent further I/Os that may cause further exceptions. Do not proceed in the NAPI poll function if fatal condition is detected. Call napi_complete() and return without arming interrupts. Cleanup of all rings and reset are imminent. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:05 -08:00
Michael Chan	5863b10aa8	bnxt_en: Consolidate firmware reset event logging. Combine the three netdev_warn() calls into a single call, printed at the NETIF_MSG_HW log level. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Michael Chan	4f036b2e75	bnxt_en: Improve firmware fatal error shutdown sequence. In the event of a fatal firmware error, firmware will notify the host and then it will proceed to do core reset when it sees that all functions have disabled Bus Master. To prevent Master Aborts and other hard errors, we need to quiesce all activities in addition to disabling Bus Master before the chip goes into core reset. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Michael Chan	38290e3729	bnxt_en: Modify bnxt_disable_int_sync() to be called more than once. In the event of a fatal firmware error, we want to disable IRQ early in the recovery sequence. This change will allow it to be called safely again as part of the normal shutdown sequence. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Michael Chan	e340a5c4fb	bnxt_en: Add a new BNXT_STATE_NAPI_DISABLED flag to keep track of NAPI state. Up until now, we don't need to keep track of this state because NAPI is always enabled once and disabled once during bring up and shutdown. For better error recovery in subsequent patches, we want to quiesce the device earlier during fatal error conditions. The normal shutdown sequence will disable NAPI again and the flag will prevent disabling NAPI twice. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Michael Chan	339eeb4bd9	bnxt_en: Add bnxt_fw_reset_timeout() helper. This code to check if we have reached the maximum wait time after firmware reset is used multiple times. Add a helper function to do this. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Vasundhara Volam	5d06eb5cb1	bnxt_en: Retry open if firmware is in reset. Firmware may be in the middle of reset when the driver tries to do ifup. In that case, firmware will return a special error code and the driver will retry 10 times with 50 msecs delay after each retry. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Edwin Peer	6882c36cf8	bnxt_en: attempt to reinitialize after aborted reset Drawing a hard line on aborted resets prevents a NIC open in some scenarios that may otherwise be recoverable. For example, if a firmware recovery happened while a PF was down and an attempt was made to bring up an associated VF in this state, then it was impossible to ever bring up this VF without a rebind or reload of its driver. Attempt to reinitialize the firmware when an aborted reset (or failed init after a reset) is discovered during open - it may succeed. Also take care to allow the user to retry opening the NIC even after an aborted reset. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:04 -08:00
Edwin Peer	a44daa8fcb	bnxt_en: log firmware debug notifications Firmware is capable of generating asynchronous debug notifications. The event data is opaque to the driver and is simply logged. Debug notifications can be enabled by turning on hardware status messages using the ethtool msglvl interface. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Vasundhara Volam	881d8353b0	bnxt_en: Add an upper bound for all firmware command timeouts. The timeout period for firmware messages is passed to the driver from the firmware in the response of the first command. This timeout period is multiplied by a factor for certain long running commands such as NVRAM commands. In some cases, the timeout period can become really long and it can cause hung task warnings if firmware has crashed or is not responding. To avoid such long delays, cap all firmware commands to a max timeout value of 40 seconds. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Vasundhara Volam	3e3c09b0e9	bnxt_en: Move reading VPD info after successful handshake with fw. If firmware is in reset or in bad state, it won't be able to return VPD data. Move bnxt_vpd_read_info() until after bnxt_fw_init_one_p1() successfully returns. By then we would have established proper communications with the firmware. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Michael Chan	d1cbd1659c	bnxt_en: Retry sending the first message to firmware if it is under reset. The first HWRM_VER_GET message to firmware during probe may timeout if firmware is under reset. This can happen during hot-plug for example. On P5 and newer chips, we can check if firmware is in the boot stage by reading a status register. Retry 5 times if the status register shows that firmware is not ready and not in error state. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Edwin Peer	b187e4bae0	bnxt_en: handle CRASH_NO_MASTER during bnxt_open() Add missing support for handling NO_MASTER crashes while ports are administratively down (ifdown). On some SoC platforms, the driver needs to assist the firmware to recover from a crash via OP-TEE. This is performed in a similar fashion to what is done during driver probe. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Michael Chan	fe1b853572	bnxt_en: Define macros for the various health register states. Define macros to check for the various states in the lower 16 bits of the health register. Replace the C code that checks for these values with the newly defined macros. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Michael Chan	16db632304	bnxt_en: Update firmware interface to 1.10.2.11. Updates to backing store APIs, QoS profiles, and push buffer initial index support. Since the new HWRM_FUNC_BACKING_STORE_CFG message size has increased, we need to add some compat. logic to fall back to the smaller legacy size if firmware cannot accept the larger message size. The new fields added to the structure are not used yet. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 19:20:03 -08:00
Jakub Kicinski	ae189ccb1b	Merge branch 'dsa-add-mt7530-gpio-support' DENG Qingfang says: ==================== dsa: add MT7530 GPIO support MT7530's LED controller can be used as GPIO controller. Add support for it. ==================== Link: https://lore.kernel.org/r/20210125044322.6280-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:29:15 -08:00
DENG Qingfang	429a0edeef	net: dsa: mt7530: MT7530 optional GPIO support MT7530's LED controller can drive up to 15 LED/GPIOs. Add support for GPIO control and allow users to use its GPIOs by setting gpio-controller property in device tree. Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:29:04 -08:00
DENG Qingfang	974d5ba60d	dt-bindings: net: dsa: add MT7530 GPIO controller binding Add device tree binding to support MT7530 GPIO controller. Signed-off-by: DENG Qingfang <dqfext@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:28:14 -08:00
DENG Qingfang	4fd5979209	net: ethernet: mediatek: support setting MTU MT762x HW, except for MT7628, supports frame length up to 2048 (maximum length on GDM), so allow setting MTU up to 2030. Also set the default frame length to the hardware default 1518. Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210125042046.5599-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:14 -08:00
Jiapeng Zhong	8d21c882ab	bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:07 -08:00
Subbaraya Sundeep	b9b7421a01	octeontx2-af: Support ESP/AH RSS hashing Support SPI and sequence number fields of ESP/AH header to be hashed for RSS. By default ESP/AH fields are not considered for RSS and needs to be set explicitly as below: ethtool -U eth0 rx-flow-hash esp4 sdfn or ethtool -U eth0 rx-flow-hash ah4 sdfn or ethtool -U eth0 rx-flow-hash esp6 sdfn or ethtool -U eth0 rx-flow-hash ah6 sdfn To disable hashing of ESP fields: ethtool -U eth0 rx-flow-hash esp4 sd or ethtool -U eth0 rx-flow-hash ah4 sd or ethtool -U eth0 rx-flow-hash esp6 sd or ethtool -U eth0 rx-flow-hash ah6 sd Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/1611378552-13288-1-git-send-email-sundeep.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:14:58 -08:00
Heiner Kallweit	24f97b6af9	tg3: improve PCI VPD access When working on the PCI VPD code I also tested with a Broadcom BCM95719 card. tg3 uses internal NVRAM access with this card, so I forced it to PCI VPD mode for testing. PCI VPD access fails (i + PCI_VPD_LRDT_TAG_SIZE + j > len) because only TG3_NVM_VPD_LEN (256) bytes are read, but PCI VPD has 400 bytes on this card. So add a constant TG3_NVM_PCI_VPD_MAX_LEN that defines the maximum PCI VPD size. The actual VPD size is returned by pci_read_vpd(). In addition it's not worth looping over pci_read_vpd(). If we miss the 125ms timeout per VPD dword read then definitely something is wrong, and if the tg3 module loading is killed then there's also not much benefit in retrying the VPD read. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/cb9e9113-0861-3904-87e0-d4c4ab3c8860@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 15:24:19 -08:00
Jakub Kicinski	a61e4b6076	Merge branch 'net-dsa-hellcreek-add-taprio-offloading' Kurt Kanzenbach says: ==================== net: dsa: hellcreek: Add TAPRIO offloading The switch has support for the 802.1Qbv Time Aware Shaper (TAS). Traffic schedules may be configured individually on each front port. Each port has eight egress queues. The traffic is mapped to a traffic class respectively via the PCP field of a VLAN tagged frame. Previous attempts: * https://lkml.kernel.org/netdev/20201121115703.23221-1-kurt@linutronix.de/ * https://lkml.kernel.org/netdev/20210116124922.32356-1-kurt@linutronix.de/ ==================== Link: https://lore.kernel.org/r/20210123105633.16753-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 21:25:18 -08:00
Kurt Kanzenbach	24dfc6eb39	net: dsa: hellcreek: Add TAPRIO offloading support The switch has support for the 802.1Qbv Time Aware Shaper (TAS). Traffic schedules may be configured individually on each front port. Each port has eight egress queues. The traffic is mapped to a traffic class respectively via the PCP field of a VLAN tagged frame. The TAPRIO Qdisc already implements that. Therefore, this interface can simply be reused. Add .port_setup_tc() accordingly. The activation of a schedule on a port is split into two parts: * Programming the necessary gate control list (GCL) * Setup delayed work for starting the schedule The hardware supports starting a schedule up to eight seconds in the future. The TAPRIO interface provides an absolute base time. Therefore, periodic delayed work is leveraged to check whether a schedule may be started or not. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 21:25:16 -08:00
Loic Poulain	b80b5dbf11	net: mhi: Set wwan device type The 'wwan' devtype is meant for devices that require additional configuration to be used, like WWAN specific APN setup over AT/QMI commands, rmnet link creation, etc. This is the case for MHI (Modem host Interface) netdev which targets modem/WWAN endpoints. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/1611328554-1414-1-git-send-email-loic.poulain@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 21:17:36 -08:00
Jakub Kicinski	acb4151f5d	Merge branch 'udp-allow-forwarding-of-plain-non-fraglisted-udp-gro-packets' Alexander Lobakin says: ==================== udp: allow forwarding of plain (non-fraglisted) UDP GRO packets This series allows to form UDP GRO packets in cases without sockets (for forwarding). To not change the current datapath, this is performed only when the new corresponding netdev feature is enabled via Ethtool (and fraglisted GRO is disabled). Prior to this point, only fraglisted UDP GRO was available. Plain UDP GRO shows better forwarding performance when a target NIC is capable of GSO UDP offload. Since v3 [2]: - rename introduced netdev feature to reflect that it targets forwarding and don't touch fraglisted GRO at all (Willem de Bruijn). Since v2 [1]: - convert to a series; - new: add new netdev_feature to explicitly enable/disable UDP GRO when there is no socket, defaults to off (Paolo Abeni). Since v1 [0]: - drop redundant 'if (sk)' check (Alexander Duyck); - add a ref in the commit message to one more commit that was an important step for UDP GRO forwarding. [0] https://lore.kernel.org/netdev/20210112211536.261172-1-alobakin@pm.me [1] https://lore.kernel.org/netdev/20210113103232.4761-1-alobakin@pm.me [2] https://lore.kernel.org/netdev/20210118193122.87271-1-alobakin@pm.me ==================== Link: https://lore.kernel.org/r/20210122181909.36340-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:18:21 -08:00
Alexander Lobakin	36707061d6	udp: allow forwarding of plain (non-fraglisted) UDP GRO packets Commit `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") actually not only added a support for fraglisted UDP GRO, but also tweaked some logics the way that non-fraglisted UDP GRO started to work for forwarding too. Commit `2e4ef10f58` ("net: add GSO UDP L4 and GSO fraglists to the list of software-backed types") added GSO UDP L4 to the list of software GSO to allow virtual netdevs to forward them as is up to the real drivers. Tests showed that currently forwarding and NATing of plain UDP GRO packets are performed fully correctly, regardless if the target netdevice has a support for hardware/driver GSO UDP L4 or not. Add the last element and allow to form plain UDP GRO packets if we are on forwarding path, and the new NETIF_F_GRO_UDP_FWD is enabled on a receiving netdevice. If both NETIF_F_GRO_FRAGLIST and NETIF_F_GRO_UDP_FWD are set, fraglisted GRO takes precedence. This keeps the current behaviour and is generally more optimal for now, as the number of NICs with hardware USO offload is relatively small. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:18:16 -08:00
Alexander Lobakin	6f1c0ea133	net: introduce a netdev feature for UDP GRO forwarding Introduce a new netdev feature, NETIF_F_GRO_UDP_FWD, to allow user to turn UDP GRO on and off for forwarding. Defaults to off to not change current datapath. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:16:24 -08:00
Jakub Kicinski	692347a931	Merge branch 'remove-unneeded-phy-time-stamping-option' Richard Cochran says: ==================== Remove unneeded PHY time stamping option. The NETWORK_PHY_TIMESTAMPING configuration option adds additional checks into the networking hot path, and it is only needed by two rather esoteric devices, namely the TI DP83640 PHYTER and the ZHAW InES 1588 IP core. Very few end users have these devices, and those that do have them are building specialized embedded systems. Unfortunately two unrelated drivers depend on this option, and two defconfigs enable it. It is probably my fault for not paying enough attention in reviews. This series corrects the gratuitous use of NETWORK_PHY_TIMESTAMPING. ==================== Link: https://lore.kernel.org/r/cover.1611198584.git.richardcochran@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:23:38 -08:00
Richard Cochran	04cbb740ce	net: mvpp2: Remove unneeded Kconfig dependency. The mvpp2 is an Ethernet driver, and it implements MAC style time stamping of PTP frames. It has no need of the expensive option to enable PHY time stamping. Remove the incorrect dependency. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:23:35 -08:00
Richard Cochran	57ba00774b	net: dsa: mv88e6xxx: Remove bogus Kconfig dependency. The mv88e6xxx is a DSA driver, and it implements DSA style time stamping of PTP frames. It has no need of the expensive option to enable PHY time stamping. Remove the bogus dependency. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:23:31 -08:00
Jakub Kicinski	e7b76db362	Merge branch 'net-ipa-napi-poll-updates' Alex Elder says: ==================== net: ipa: NAPI poll updates While reviewing the IPA NAPI polling code in detail I found two problems. This series fixes those, and implements a few other improvements to this part of the code. The first two patches are minor bug fixes that avoid extra passes through the poll function. The third simplifies code inside the polling loop a bit. The last two update how interrupts are disabled; previously it was possible for another I/O completion condition to be recorded before NAPI got scheduled. ==================== Link: https://lore.kernel.org/r/20210121114821.26495-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:16:02 -08:00
Alex Elder	7bd9785f68	net: ipa: disable IEOB interrupts before clearing Currently in gsi_isr_ieob(), event ring IEOB interrupts are disabled one at a time. The loop disables the IEOB interrupt for all event rings represented in the event mask. Instead, just disable them all at once. Disable them all before clearing the interrupt condition. This guarantees we'll schedule NAPI for each event once, before another IEOB interrupt could be signaled. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:16:00 -08:00
Alex Elder	5725593e6f	net: ipa: repurpose gsi_irq_ieob_disable() Rename gsi_irq_ieob_disable() to be gsi_irq_ieob_disable_one(). Introduce a new function gsi_irq_ieob_disable() that takes a mask of events to disable rather than a single event id. This will be used in the next patch. Rename gsi_irq_ieob_enable() to be gsi_irq_ieob_enable_one() to be consistent. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:16:00 -08:00
Alex Elder	223f5b34b4	net: ipa: have gsi_channel_update() return a value Have gsi_channel_update() return the first transaction in the updated completed transaction list, or NULL if no new transactions have been added. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:16:00 -08:00
Alex Elder	148604e7ea	net: ipa: heed napi_complete() return value Pay attention to the return value of napi_complete(), completing polling only if it returns true. Just use napi rather than &channel->napi as the argument passed to napi_complete(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:16:00 -08:00
Alex Elder	c80c4a1ea4	net: ipa: count actual work done in gsi_channel_poll() There is an off-by-one problem in gsi_channel_poll(). The count of transactions completed is incremented each time through the loop before determining whether there is any more work to do. As a result, if we exit the loop early the counter its value is one more than the number of transactions actually processed. Instead, increment the count after processing, to ensure it reflects the number of processed transactions. The result is more naturally described as a for loop rather than a while loop, so change that. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 13:15:59 -08:00
Jakub Kicinski	59a49d9617	Merge branch 'mlxsw-expose-number-of-physical-ports' Ido Schimmel says: ==================== mlxsw: Expose number of physical ports The switch ASIC has a limited capacity of physical ports that it can support. While each system is brought up with a different number of ports, this number can be increased via splitting up to the ASIC's limit. Expose physical ports as a devlink resource so that user space will have visibility into the maximum number of ports that can be supported and the current occupancy. With this resource it is possible, for example, to write generic (i.e., not platform dependent) tests for port splitting. Patch #1 adds the new resource and patch #2 adds a selftest. v2: * Add the physical ports resource as a generic devlink resource so that it could be re-used by other device drivers ==================== Link: https://lore.kernel.org/r/20210121131024.2656154-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:42:15 -08:00
Danielle Ratson	5154b1b826	selftests: mlxsw: Add a scale test for physical ports Query the maximum number of supported physical ports using devlink-resource and test that this number can be reached by splitting each of the splittable ports to its width. Test that an error is returned in case the maximum number is exceeded. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:42:13 -08:00
Danielle Ratson	321f7ab0d4	mlxsw: Register physical ports as a devlink resource The switch ASIC has a limited capacity of physical ('flavour physical' in devlink terminology) ports that it can support. While each system is brought up with a different number of ports, this number can be increased via splitting up to the ASIC's limit. Expose physical ports as a devlink resource so that user space will have visibility to the maximum number of ports that can be supported and the current occupancy. In addition, add a "Generic Resources" section in devlink-resource documentation so the different drivers will be aligned by the same resource name when exposing to user space. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:42:13 -08:00
Jakub Kicinski	351876424e	Merge branch 'htb-offload' Maxim Mikityanskiy says: ==================== HTB offload This series adds support for HTB offload to the HTB qdisc, and adds usage to mlx5 driver. The previous RFCs are available at [1], [2]. The feature is intended to solve the performance bottleneck caused by the single lock of the HTB qdisc, which prevents it from scaling well. The HTB algorithm itself is offloaded to the device, eliminating the need to take the root lock of HTB on every packet. Classification part is done in clsact (still in software) to avoid acquiring the lock, which imposes a limitation that filters can target only leaf classes. The speedup on Mellanox ConnectX-6 Dx was 14.2 times in the UDP multi-stream test, compared to software HTB implementation (more details in the mlx5 patch). [1]: https://www.spinics.net/lists/netdev/msg628422.html [2]: https://www.spinics.net/lists/netdev/msg663548.html v2 changes: Fixed sparse and smatch warnings. Formatted HTB patches to 80 chars per line. v3 changes: Fixed the CI failure on parisc with 16-bit xchg by replacing it with WRITE_ONCE. Fixed the capability bits in mlx5_ifc.h and the value of MLX5E_QOS_MAX_LEAF_NODES. v4 changes: Check if HTB is root when offloading. Add extack for hardware errors. Rephrase explanations of how it works in the commit message. Remove %hu from format strings. Add resiliency when leaf_del_last fails to create a new leaf node. ==================== Link: https://lore.kernel.org/r/20210119120815.463334-1-maximmi@mellanox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:32 -08:00
Maxim Mikityanskiy	214baf2287	net/mlx5e: Support HTB offload This commit adds support for HTB offload in the mlx5e driver. Performance: NIC: Mellanox ConnectX-6 Dx CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT) 100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each 48 traffic classes, flower used for steering No shaping (rate limits set to 4 Gbit/s per TC) - checking for max throughput. Baseline: 98.7 Gbps, 8.25 Mpps HTB: 6.7 Gbps, 0.56 Mpps HTB offload: 95.6 Gbps, 8.00 Mpps Limitations: 1. 256 leaf nodes, 3 levels of depth. 2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and the bandwidth is split among the siblings according to these weights. Other parameters for classes are not supported. Ethtool statistics support for QoS SQs are also added. The counters are called qos_txN_, where N is the QoS queue number (starting from 0, the numeration is separate from the normal SQs), and is the counter name (the counters are the same as for the normal SQs). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	8327158624	sch_htb: Stats for offloaded HTB This commit adds support for statistics of offloaded HTB. Bytes and packets counters for leaf and inner nodes are supported, the values are taken from per-queue qdiscs, and the numbers that the user sees should have the same behavior as the software (non-offloaded) HTB. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	d03b195b5a	sch_htb: Hierarchical QoS hardware offload HTB doesn't scale well because of contention on a single lock, and it also consumes CPU. This patch adds support for offloading HTB to hardware that supports hierarchical rate limiting. In the offload mode, HTB passes control commands to the driver using ndo_setup_tc. The driver has to replicate the whole hierarchy of classes and their settings (rate, ceil) in the NIC. Every modification of the HTB tree caused by the admin results in ndo_setup_tc being called. After this setup, the HTB algorithm is done completely in the NIC. An SQ (send queue) is created for every leaf class and attached to the hierarchy, so that the NIC can calculate and obey aggregated rate limits, too. In the future, it can be changed, so that multiple SQs will back a single leaf class. ndo_select_queue is responsible for selecting the right queue that serves the traffic class of each packet. The data path works as follows: a packet is classified by clsact, the driver selects a hardware queue according to its class, and the packet is enqueued into this queue's qdisc. This solution addresses two main problems of scaling HTB: 1. Contention by flow classification. Currently the filters are attached to the HTB instance as follows: # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10 It's possible to move classification to clsact egress hook, which is thread-safe and lock-free: # tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10 This way classification still happens in software, but the lock contention is eliminated, and it happens before selecting the TX queue, allowing the driver to translate the class to the corresponding hardware queue in ndo_select_queue. Note that this is already compatible with non-offloaded HTB and doesn't require changes to the kernel nor iproute2. 2. Contention by handling packets. HTB is not multi-queue, it attaches to a whole net device, and handling of all packets takes the same lock. When HTB is offloaded, it registers itself as a multi-queue qdisc, similarly to mq: HTB is attached to the netdev, and each queue has its own qdisc. Some features of HTB may be not supported by some particular hardware, for example, the maximum number of classes may be limited, the granularity of rate and ceil parameters may be different, etc. - so, the offload is not enabled by default, a new parameter is used to enable it: # tc qdisc replace dev eth0 root handle 1: htb offload Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	4dd78a7373	net: sched: Add extack to Qdisc_class_ops.delete In a following commit, sch_htb will start using extack in the delete class operation to pass hardware errors in offload mode. This commit prepares for that by adding the extack parameter to this callback and converting usage of the existing qdiscs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00

1 2 3 4 5 ...

983617 Commits