linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-01 16:41:39 +00:00

Author	SHA1	Message	Date
Peng Li	7ceb40b820	net: hns3: remove unused macro definition Some macros are defined but unused, so remove them. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Huazhong Tan	11ef971f5a	net: hns3: remove an unused parameter in hclge_vf_rate_param_check() Parameter vf in hclge_vf_rate_param_check() is unused now, so remove it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Huazhong Tan	64749c9c38	net: hns3: remove redundant return value of hns3_uninit_all_ring() Since hns3_uninit_all_ring() only returns 0, so remove this redundant return value and function declaration in hns3_enet.h. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Peng Li	cad8dfe82a	net: hns3: change hclge_query_bd_num() param type The type of parameter mpf_bd_num and pf_bd_num in hclge_query_bd_num() should be u32* instead of int*, so change them. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Peng Li	6e7f109ee9	net: hns3: change hclge_parse_speed() param type The type of parameters in hclge_parse_speed() should be unsigned type, so change them. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Jiaran Zhang	c5aaf17618	net: hns3: modify some unmacthed types print parameter Fix an issue where the formatting symbol of the formatting input and output function does not match the actual type. Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Yufeng Mo	9393eb5034	net: hns3: clean up unnecessary parentheses in macro definitions In macro definitions, parentheses are unnecessary in some cases, such as the calling parameter of a function, the left variable of the equal sign, and so on. So remove these unnecessary parentheses according to these rules. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Peng Li	9d2a1cea69	net: hns3: remove the shaper param magic number To make the code more readable, this patch adds a definition for the magic number 126 used for the default shaper param ir_b, and rename macro DIVISOR_IR_B_126. No functional change. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Jian Shen	ae9e492a36	net: hns3: remove redundant client_setup_tc handle Since the real tx queue number and real rx queue number always be updated when netdev opens, it's redundant to call hclge_client_setup_tc to do the same thing. So remove it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Yonglong Liu	0256844d0f	net: hns3: clean up some incorrect variable types in hclge_dbg_dump_tm_map() queue_id, qset_id and other IDs are unsigned type, so modify the corresponding local variables' type in hclge_dbg_dump_tm_map() from signed to unsigned. kstrtouint() and the print format should be updated as well. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:34:07 -08:00
Stefano Garzarella	1c5fae9c9a	vsock: fix locking in vsock_shutdown() In vsock_shutdown() we touched some socket fields without holding the socket lock, such as 'state' and 'sk_flags'. Also, after the introduction of multi-transport, we are accessing 'vsk->transport' in vsock_send_shutdown() without holding the lock and this call can be made while the connection is in progress, so the transport can change in the meantime. To avoid issues, we hold the socket lock when we enter in vsock_shutdown() and release it when we leave. Among the transports that implement the 'shutdown' callback, only hyperv_transport acquired the lock. Since the caller now holds it, we no longer take it. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:31:22 -08:00
David S. Miller	adbb4fb028	Merge branch 'implement-kthread-based-napi-poll' Wei Wang says: ==================== implement kthread based napi polle The idea of moving the napi poll process out of softirq context to a kernel thread based context is not new. Paolo Abeni and Hannes Frederic Sowa have proposed patches to move napi poll to kthread back in 2016. And Felix Fietkau has also proposed patches of similar ideas to use workqueue to process napi poll just a few weeks ago. The main reason we'd like to push forward with this idea is that the scheduler has poor visibility into cpu cycles spent in softirq context, and is not able to make optimal scheduling decisions of the user threads. For example, we see in one of the application benchmark where network load is high, the CPUs handling network softirqs has ~80% cpu util. And user threads are still scheduled on those CPUs, despite other more idle cpus available in the system. And we see very high tail latencies. In this case, we have to explicitly pin away user threads from the CPUs handling network softirqs to ensure good performance. With napi poll moved to kthread, scheduler is in charge of scheduling both the kthreads handling network load, and the user threads, and is able to make better decisions. In the previous benchmark, if we do this and we pin the kthreads processing napi poll to specific CPUs, scheduler is able to schedule user threads away from these CPUs automatically. And the reason we prefer 1 kthread per napi, instead of 1 workqueue entity per host, is that kthread is more configurable than workqueue, and we could leverage existing tuning tools for threads, like taskset, chrt, etc to tune scheduling class and cpu set, etc. Another reason is if we eventually want to provide busy poll feature using kernel threads for napi poll, kthread seems to be more suitable than workqueue. Furthermore, for large platforms with 2 NICs attached to 2 sockets, kthread is more flexible to be pinned to different sets of CPUs. In this patch series, I revived Paolo and Hannes's patch in 2016 and made modifications. Then there are changes proposed by Felix, Jakub, Paolo and myself on top of those, with suggestions from Eric Dumazet. In terms of performance, I ran tcp_rr tests with 1000 flows with various request/response sizes, with RFS/RPS disabled, and compared performance between softirq vs kthread vs workqueue (patchset proposed by Felix Fietkau). Host has 56 hyper threads and 100Gbps nic, 8 rx queues and only 1 numa node. All threads are unpinned. req/resp QPS 50%tile 90%tile 99%tile 99.9%tile softirq 1B/1B 2.75M 337us 376us 1.04ms 3.69ms kthread 1B/1B 2.67M 371us 408us 455us 550us workq 1B/1B 2.56M 384us 435us 673us 822us softirq 5KB/5KB 1.46M 678us 750us 969us 2.78ms kthread 5KB/5KB 1.44M 695us 789us 891us 1.06ms workq 5KB/5KB 1.34M 720us 905us 1.06ms 1.57ms softirq 1MB/1MB 11.0K 79ms 166ms 306ms 630ms kthread 1MB/1MB 11.0K 75ms 177ms 303ms 596ms workq 1MB/1MB 11.0K 79ms 180ms 303ms 587ms When running workqueue implementation, I found the number of threads used is usually twice as much as kthread implementation. This probably introduces higher scheduling cost, which results in higher tail latencies in most cases. I also ran an application benchmark, which performs fixed qps remote SSD read/write operations, with various sizes. Again, both with RFS/RPS disabled. The result is as follows: op_size QPS 50%tile 95%tile 99%tile 99.9%tile softirq 4K 572.6K 385us 1.5ms 3.16ms 6.41ms kthread 4K 572.6K 390us 803us 2.21ms 6.83ms workq 4k 572.6K 384us 763us 3.12ms 6.87ms softirq 64K 157.9K 736us 1.17ms 3.40ms 13.75ms kthread 64K 157.9K 745us 1.23ms 2.76ms 9.87ms workq 64K 157.9K 746us 1.23ms 2.76ms 9.96ms softirq 1M 10.98K 2.03ms 3.10ms 3.7ms 11.56ms kthread 1M 10.98K 2.13ms 3.21ms 4.02ms 13.3ms workq 1M 10.98K 2.13ms 3.20ms 3.99ms 14.12ms In this set of tests, the latency is predominant by the SSD operation. Also, the user threads are much busier compared to tcp_rr tests. We have to pin the kthreads/workqueue threads to limit to a few CPUs, to not disturb user threads, and provide some isolation. Changes since v9: Small change in napi_poll() in patch 1. Split napi_kthread_stop() functionality to add separately in napi_disable() and netif_napi_del() in patch 2. Add description for napi_set_threaded() and return dev->threaded when dev->napi_list is empty for threaded sysfs in patch 3. Changes since v8: Added description for threaded param in struct net_device in patch 2. Changes since v7: Break napi_set_threaded() into 2 parts, one to create kthread called from netif_napi_add(), the other to set threaded bit in napi_enable(), to get rid of inconsistency through all napi in 1 dev. Added documentation for /sys/class/net/<dev>/threaded. Changes since v6: Added memory barrier in napi_set_threaded(). Changed /sys/class/net/<dev>/thread to a ternary value. Change dev->threaded to a bit instead of bool. Changes since v5: Removed ASSERT_RTNL() from napi_set_threaded() and removed rtnl_lock() operation from napi_enable(). Changes since v4: Recorded the threaded setting in dev and restore it in napi_enable(). Changes since v3: Merged and rearranged patches in a logical order for easier review. Changed sysfs control to be per device. Changes since v2: Corrected typo in patch 1, and updated the cover letter with more detailed and updated test results. Changes since v1: Replaced kthread_create() with kthread_run() in patch 5 as suggested by Felix Fietkau. Changes since RFC: Renamed the kthreads to be napi/<dev>-<napi_id> in patch 5 as suggested by Hannes Frederic Sowa. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:27:28 -08:00
Wei Wang	5fdd2f0e5c	net: add sysfs attribute to control napi threaded mode This patch adds a new sysfs attribute to the network device class. Said attribute provides a per-device control to enable/disable the threaded mode for all the napi instances of the given network device, without the need for a device up/down. User sets it to 1 or 0 to enable or disable threaded mode. Note: when switching between threaded and the current softirq based mode for a napi instance, it will not immediately take effect if the napi is currently being polled. The mode switch will happen for the next time napi_schedule() is called. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Co-developed-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:27:28 -08:00
Wei Wang	29863d41bb	net: implement threaded-able napi poll loop support This patch allows running each napi poll loop inside its own kernel thread. The kthread is created during netif_napi_add() if dev->threaded is set. And threaded mode is enabled in napi_enable(). We will provide a way to set dev->threaded and enable threaded mode without a device up/down in the following patch. Once that threaded mode is enabled and the kthread is started, napi_schedule() will wake-up such thread instead of scheduling the softirq. The threaded poll loop behaves quite likely the net_rx_action, but it does not have to manipulate local irqs and uses an explicit scheduling point based on netdev_budget. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Co-developed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:27:28 -08:00
Felix Fietkau	898f8015ff	net: extract napi poll functionality to __napi_poll() This commit introduces a new function __napi_poll() which does the main logic of the existing napi_poll() function, and will be called by other functions in later commits. This idea and implementation is done by Felix Fietkau <nbd@nbd.name> and is proposed as part of the patch to move napi work to work_queue context. This commit by itself is a code restructure. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:27:28 -08:00
David S. Miller	49c2547b82	Merge branch 'hns3-fixes' Huazhong Tan says: ==================== net: hns3: fixes for -net The parameters sent from vf may be unreliable. If these parameters are used directly, memory overwriting may occur. So this series adds some checks for this case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:20:43 -08:00
Yufeng Mo	532cfc0df1	net: hns3: add a check for index in hclge_get_rss_key() The index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this index before using it in hclge_get_rss_key(). Fixes: `a638b1d8cc` ("net: hns3: fix get VF RSS issue") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:20:43 -08:00
Yufeng Mo	326334aad0	net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() The tqp_index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this tqp_index before using it in hclge_get_ring_chain_from_mbx(). Fixes: `84e095d64e` ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:20:43 -08:00
Yufeng Mo	67a69f84ca	net: hns3: add a check for queue_id in hclge_reset_vf_queue() The queue_id is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this queue_id before using it in hclge_reset_vf_queue(). Fixes: `1a426f8b40` ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 15:20:43 -08:00
Vladimir Oltean	eb4733d7cf	net: dsa: felix: implement port flushing on .phylink_mac_link_down There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: `bdeced75b1` ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:41:11 -08:00
Rafał Miłecki	4feffeadbc	net: broadcom: bcm4908enet: add BCM4908 controller driver BCM4908 SoCs family uses Ethernel controller that includes UniMAC but uses different DMA engine (than other controllers) and requires different programming. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:34:53 -08:00
Rafał Miłecki	387d1c1819	dt-bindings: net: document BCM4908 Ethernet controller BCM4908 is a family of SoCs with integrated Ethernet controller. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:34:53 -08:00
David S. Miller	fc1a8db3d5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-02-09 1) Support TSO on xfrm interfaces. From Eyal Birger. 2) Variable calculation simplifications in esp4/esp6. From Jiapeng Chong / Jiapeng Zhong. 3) Fix a return code in xfrm_do_migrate. From Zheng Yongjun. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:23:41 -08:00
Jay Vosburgh	8cf5d8cc3e	Documentation: networking: ip-sysctl: Document src_valid_mark sysctl Provide documentation for src_valid_mark sysctl, which was added in commit `28f6aeea3f` ("net: restore ip source validation"). Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:16:03 -08:00
Michael Walle	1e2e61af19	net: phy: broadcom: remove BCM5482 1000Base-BX support It is nowhere used in the kernel. It also seems to be lacking the proper fiber advertise flags. Remove it. Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:12:39 -08:00
Michael Walle	f15008fbaa	net: phy: drop explicit genphy_read_status() op genphy_read_status() is already the default for the .read_status() op. Drop the unnecessary references. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-09 11:10:25 -08:00
Eryk Rybak	613142b0bb	i40e: Log error for oversized MTU on device When attempting to link XDP prog with MTU larger than supported, user is not informed why XDP linking fails. Adding proper error message: "MTU too large to enable XDP". Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 17:25:27 -08:00
Cristian Dumitrescu	f020fa1a79	i40e: consolidate handling of XDP program actions Consolidate the actions performed on the packet based on the XDP program result into a separate function that is easier to read and maintain. Simplify the i40e_construct_skb_zc function, so that the input xdp buffer is always freed, regardless of whether the output skb is successfully created or not. Simplify the behavior of the i40e_clean_rx_irq_zc function, so that the current packet descriptor is dropped when function i40_construct_skb_zc returns an error as opposed to re-processing the same description on the next invocation. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 17:19:03 -08:00
Cristian Dumitrescu	d4178c31a5	i40e: remove the redundant buffer info updates For performance reasons, remove the redundant buffer info updates (*bi = NULL). The buffers ready to be cleaned can easily be tracked based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 17:19:03 -08:00
Cristian Dumitrescu	f12738b6ec	i40e: remove unnecessary cleaned_count updates For performance reasons, remove the redundant updates of the cleaned_count variable, as its value can be computed based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 17:19:03 -08:00
Cristian Dumitrescu	c8a8ca3408	i40e: remove unnecessary memory writes of the next to clean pointer For performance reasons, avoid writing the ring next-to-clean pointer value back to memory on every update, as it is not really necessary. Instead, simply read it at initialization into a local copy, update the local copy as necessary and write the local copy back to memory after the last update. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 17:19:03 -08:00
David S. Miller	5ea3c72ccf	Merge branch 'route-offload-failure' net: Add support for route offload failure notifications Ido Schimmel says: ==================== This is a complementary series to the one merged in commit `389cb1ecc8` ("Merge branch 'add-notifications-when-route-hardware-flags-change'"). The previous series added RTM_NEWROUTE notifications to user space whenever a route was successfully installed in hardware or when its state in hardware changed. This allows routing daemons to delay advertisement of routes until they are installed in hardware. However, if route installation failed, a routing daemon will wait indefinitely for a notification that will never come. The aim of this series is to provide a failure notification via a new flag (RTM_F_OFFLOAD_FAILED) in the RTM_NEWROUTE message. Upon such a notification a routing daemon may decide to withdraw the route from the FIB. Series overview: Patch #1 adds the new RTM_F_OFFLOAD_FAILED flag Patches #2-#3 and #4-#5 add failure notifications to IPv4 and IPv6, respectively Patches #6-#8 teach netdevsim to fail route installation via a new knob in debugfs Patch #9 extends mlxsw to mark routes with the new flag Patch #10 adds test cases for the new notification over netdevsim ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	9ee53e3753	selftests: netdevsim: Test route offload failure notifications Add cases to verify that when debugfs variable "fail_route_offload" is set, notification with "rt_offload_failed" flag is received. Extend the existing cases to verify that when sysctl "fib_notify_on_flag_change" is set to 2, the kernel emits notifications only for failed route installation. $ ./fib_notifications.sh TEST: IPv4 route addition [ OK ] TEST: IPv4 route deletion [ OK ] TEST: IPv4 route replacement [ OK ] TEST: IPv4 route offload failed [ OK ] TEST: IPv6 route addition [ OK ] TEST: IPv6 route deletion [ OK ] TEST: IPv6 route replacement [ OK ] TEST: IPv6 route offload failed [ OK ] Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	a4cb1c02c3	mlxsw: spectrum_router: Set offload_failed flag When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion fails, FIB abort is triggered. After aborting, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	134c753242	netdevsim: fib: Add debugfs to debug route offload failure Add "fail_route_offload" flag to disallow offloading routes. It is needed to test "offload failed" notifications. Create the flag as part of nsim_fib_create() under fib directory and set it to false by default. When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and "fail_route_offload" value is true, set the appropriate hardware flag to make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Ido Schimmel	f57ab5b75f	netdevsim: dev: Initialize FIB module after debugfs Initialize the dummy FIB offload module after debugfs, so that the FIB module could create its own directory there. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	484a4dfb75	netdevsim: fib: Do not warn if route was not found for several events The next patch will add the ability to fail route offload controlled by debugfs variable called "fail_route_offload". If we vetoed the addition, we might get a delete or append notification for a route we do not have. Therefore, do not warn if route was not found. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	6fad361ae9	IPv6: Extend 'fib_notify_on_flag_change' sysctl Add the value '2' to 'fib_notify_on_flag_change' to allow sending notifications only for failed route installation. Separate value is added for such notifications because there are less of them, so they do not impact performance and some users will find them more important. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	0c5fcf9e24	IPv6: Add "offload failed" indication to routes After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv6 routes, so that users will have better visibility into the offload process. 'struct fib6_info' is extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase struct size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	648106c30a	IPv4: Extend 'fib_notify_on_flag_change' sysctl Add the value '2' to 'fib_notify_on_flag_change' to allow sending notifications only for failed route installation. Separate value is added for such notifications because there are less of them, so they do not impact performance and some users will find them more important. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	36c5100e85	IPv4: Add "offload failed" indication to routes After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. To avoid such cases, previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, this behavior is controlled by sysctl. With the above mentioned behavior, it is possible to know from user-space if the route was offloaded, but if the offload fails there is no indication to user-space. Following a failure, a routing daemon will wait indefinitely for a notification that will never come. This patch adds an "offload_failed" indication to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias', and 'struct fib_rt_info' are extended with new field that indicates if route offload failed. Note that the new field is added using unused bit and therefore there is no need to increase structs size. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:03 -08:00
Amit Cohen	49fc251360	rtnetlink: Add RTM_F_OFFLOAD_FAILED flag The flag indicates to user space that route offload failed. Previous patch set added the ability to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed, but if the offload fails there is no indication to user-space. The flag will be used in subsequent patches by netdevsim and mlxsw to indicate to user space that route offload failed, so that users will have better visibility into the offload process. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:47:02 -08:00
Tony Nguyen	a851dfa8df	Documentation: ice: update documentation The ice documentation has not been updated since the initial commits of the driver. Update the documentation with features and information that are now available. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Tony Nguyen	741106f7bd	ice: Improve MSI-X fallback logic Currently if the driver is unable to get all the MSI-X vectors it wants, it falls back to the minimum configuration which equates to a single Tx/Rx traffic queue pair. Instead of using the minimum configuration, if given more vectors than the minimum, utilize those vectors for additional traffic queues after accounting for other interrupts. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>	2021-02-08 16:27:01 -08:00
Mitch Williams	fe6cd89050	ice: Fix trivial error message This message indicates an error on close, not open. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Bruce Allan	7a63dae0fa	ice: remove unnecessary casts Casting a void * rvalue in an assignment is unnecessary in C; remove the casts. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Chinh T Cao	fc2d1165d4	ice: Refactor DCB related variables out of the ice_port_info struct Refactor the DCB related variables out of the ice_port_info_struct. The goal is to make the ice_port_info struct cleaner. Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Jesse Brandeburg	1d9f7ca324	ice: fix writeback enable logic The writeback enable logic was incorrectly implemented (due to misunderstanding what the side effects of the implementation would be during polling). Fix this logic issue, while implementing a new feature allowing the user to control the writeback frequency using the knobs for controlling interrupt throttling that we already have. Basically if you leave adaptive interrupts enabled, the writeback frequency will be varied even if busy_polling or if napi-poll is in use. If the interrupt rates are set to a fixed value by ethtool -C and adaptive is off, the driver will allow the user-set interrupt rate to guide how frequently the hardware will complete descriptors to the driver. Effectively the user will get a control over the hardware efficiency, allowing the choice between immediate interrupts or delayed up to a maximum of the interrupt rate, even when interrupts are disabled during polling. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Ben Shelton	4f8a14976a	ice: Use PSM clock frequency to calculate RL profiles The core clock frequency is currently hardcoded at 446 MHz for the RL profile calculations. This causes issues since not all devices use that clock frequency. Read the GLGEN_CLKSTAT_SRC register to determine which PSM clock frequency is selected. This ensures that the rate limiter profile calculations will be correct. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00
Kiran Patil	b126bd6bcd	ice: create scheduler aggregator node config and move VSIs Create set scheduler aggregator node and move for VSIs into respective scheduler node. Max children per aggregator node is 64. There are two types of aggregator node(s) created. 1. dedicated node for PF and _CTRL VSIs 2. dedicated node(s) for VFs. As part of reset and rebuild, aggregator nodes are recreated and VSIs are moved to respective aggregator node. Having related VSIs in respective tree avoid starvation between PF and VF w.r.t Tx bandwidth. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Co-developed-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Victor Raj <victor.raj@intel.com> Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2021-02-08 16:27:01 -08:00

... 2 3 4 5 6 ...

985441 Commits