linux

Author	SHA1	Message	Date
Aya Levin	e78ab16459	devlink: Add DMAC filter generic packet trap Add packet trap that can report packets that were dropped due to destination MAC filtering. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 19:53:40 -08:00
Jakub Kicinski	5998dd0217	More updates: * many minstrel improvements, including removal of the old minstrel in favour of minstrel_ht * speed improvements on FQ * support for RX decapsulation (header conversion) offload * RTNL reduction: limit RTNL usage in the wireless stack mostly to where really needed (regulatory not yet) to reduce contention on it * various other small updates -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmAR1hoACgkQB8qZga/f l8QPVRAAjbF502gkrM2+VjC31xNrSVtAXn5LIsTHUUee1Kwhhe7VHNqrSYtR+poQ jdtBaaszlnkbBqThvIqD6AvbJgUeFhDB9Jcyynhbaq3wPcDiu7IuJikBzkN2Mhyq bzKu46jFPG6l31ItCKF9mToujlmcJrJVoS5mWJEi3QIWc9B3dwPsznO57tCVznmU 95zkhs6tSwgk9ENpDoulXxvpsnlnIzo8f1z0Lr4h9TbzvwzTEA2+by0hmnBk/vZN wjBJXNVMlimp+lmgVms/gQfuGG6NHtmtatqsu+iKCRbO5AjOlvvql2pjkf5yNZXP GD2LkOcM3DiNP1Mf3pP31du1cFBo7nq+P1/ChekmY1A6/xV3GXFsKofGDCSONu1n E5+X+6MFt1gGHAwh8yxmg2HUjpuq6fApnx656HqlYluXqeCRR0hcv0iJ3j9ApAVJ Qwn6cP1rjCXXHsbOdGQi/Vx0xmUjEzNw3+j9x5nC5+zM82joRKxFMqZA8revxRUj gjkiu6EC4ZWWBzTXY9+qITSgEO8b1y7emBgdKSAQplWDAQnaa8nPGBdQ7QmwPdpV vOSZ6Dd4dMKh8Qk7FWPHxYZJKoDz+i0a8g1m9CogC3TO8hNLRt6D37ffTdoOag2m 648kFBBS4SghKAjH6P4Nf6cfAPHpX/afivnSTpRK691cZ+2pajc= =gIbl -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2021-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== More updates: * many minstrel improvements, including removal of the old minstrel in favour of minstrel_ht * speed improvements on FQ * support for RX decapsulation (header conversion) offload * RTNL reduction: limit RTNL usage in the wireless stack mostly to where really needed (regulatory not yet) to reduce contention on it * tag 'mac80211-next-for-net-next-2021-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (24 commits) mac80211: minstrel_ht: fix regression in the max_prob_rate fix virt_wifi: fix deadlock on RTNL cfg80211: avoid holding the RTNL when calling the driver cfg80211: change netdev registration/unregistration semantics mac80211: minstrel_ht: fix rounding error in throughput calculation mac80211: minstrel_ht: increase stats update interval mac80211: minstrel_ht: fix max probability rate selection mac80211: minstrel_ht: improve sample rate selection mac80211: minstrel_ht: improve ampdu length estimation mac80211: minstrel_ht: remove old ewma based rate average code mac80211: remove legacy minstrel rate control mac80211: minstrel_ht: add support for OFDM rates on non-HT clients mac80211: minstrel_ht: clean up CCK code mac80211: introduce aql_enable node in debugfs cfg80211: Add phyrate conversion support for extended MCS in 60GHz band cfg80211: add VHT rate entries for MCS-10 and MCS-11 mac80211: reduce peer HE MCS/NSS to own capabilities mac80211: remove NSS number of 160MHz if not support 160MHz for HE mac80211_hwsim: add 6GHz channels mac80211: add LDPC encoding to ieee80211_parse_tx_radiotap ... ==================== Link: https://lore.kernel.org/r/20210127210915.135550-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 19:01:06 -08:00
Jakub Kicinski	df9d80470a	linux-can-next-for-5.12-20210127 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmARLD8THG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqdlXB/48nQ5I+Z1wnhPvbtvyH4tk9XSbJaTt 4HH+i3R5RUAzHcOmfm2PQHe9/DxiogOQAFv9Lo0t7HN449bM3LMHrhTCcJIrIRf9 VxFSk4H97wjHR0Zj6TlEe++CTUPUalCpkCluERwqYP9WXRRklXL1mju+WNKnMMl0 9fl4CvQDWjB2wNXXoZ1SVuoFxyeqiKQHJy9n3Wez8sQTIlguOZvm8glDQlyb4v+q rSxpCUrlpOVv6/11NqxQ7CfGdfTgLUi1a4greriwf1PjEXvDArXMjpDG3bo0kbgy 7Iv0U9GsvtzOPB+6XKxEFeYTKFaixyLugYBAadfvs0lVEIFP1mtlYvQs =pHI/ -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.12-20210127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2021-01-27 The first two patches are by me and fix typos on the CAN gw protocol and the flexcan driver. The next patch is by Vincent Mailhol and targets the CAN driver infrastructure, it exports the function that converts the CAN state into a human readable string. A patch by me, which target the CAN driver infrastructure, too, makes the calculation in can_fd_len2dlc() more readable. A patch by Tom Rix fixes a checkpatch warning in the mcba_usb driver. The next seven patches target the mcp251xfd driver. Su Yanjun's patch replaces several hardcoded assumptions when calling regmap, by using regmap_get_val_bytes(). The remaining patches are by me. First an open coded check is replaced by an existing helper function, then in the TX path the padding for CAN-FD frames is cleaned up. The next two patches clean up the RTR frame handling in the RX and TX path. Then support for len8_dlc is added. The last patch adds BQL support. * tag 'linux-can-next-for-5.12-20210127' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: mcp251xfd: add BQL support can: mcp251xfd: add len8_dlc support can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): don't copy data for RTR CAN frames in TX-path can: mcp251xfd: mcp251xfd_hw_rx_obj_to_skb(): don't copy data for RTR CAN frames in RX-path can: mcp251xfd: mcp251xfd_tx_obj_from_skb(): clean up padding of CAN-FD frames can: mcp251xfd: mcp251xfd_start_xmit(): use mcp251xfd_get_tx_free() to check TX is is full can: mcp251xfd: replace sizeof(u32) with val_bytes in regmap can: mcba_usb: remove h from printk format specifier can: length: can_fd_len2dlc(): make legnth calculation readable again can: dev: export can_get_state_str() function can: flexcan: fix typos can: gw: fix typo ==================== Link: https://lore.kernel.org/r/20210127092227.2775573-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 18:53:10 -08:00
Hoang Huu Le	2a9063b7ff	tipc: remove duplicated code in tipc_msg_create Remove a duplicate code checking for header size in tipc_msg_create() as it's already being done in tipc_msg_init(). Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20210127025123.6390-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 18:50:07 -08:00
Nikolay Aleksandrov	2dba407f99	net: bridge: multicast: make tracked EHT hosts limit configurable Add two new port attributes which make EHT hosts limit configurable and export the current number of tracked EHT hosts: - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed. Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've increased it to 40 to have space for two more future entries. v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c, no functional change Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Nikolay Aleksandrov	89268b056e	net: bridge: multicast: add per-port EHT hosts limit Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Masahiro Yamada	864e898ba3	net: remove redundant 'depends on NET' These Kconfig files are included from net/Kconfig, inside the if NET ... endif. Remove 'depends on NET', which we know it is already met. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125232026.106855-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:04:12 -08:00
Masahiro Yamada	d32f834cd6	net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile CONFIG_NET_L3_MASTER_DEV is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_NET_L3_MASTER_DEV) form. Use obj-y in net/l3mdev/Makefile because Kbuild visits this Makefile only when CONFIG_NET_L3_MASTER_DEV=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-4-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	0cfd99b487	net: switchdev: use obj-$(CONFIG_NET_SWITCHDEV) form in net/Makefile CONFIG_NET_SWITCHDEV is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_NET_SWITCHDEV) form. Use obj-y in net/switchdev/Makefile because Kbuild visits this Makefile only when CONFIG_NET_SWITCHDEV=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-3-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	1e328ed559	net: dcb: use obj-$(CONFIG_DCB) form in net/Makefile CONFIG_DCB is a bool option. Change the ifeq conditional to the standard obj-$(CONFIG_DCB) form. Use obj-y in net/dcb/Makefile because Kbuild visits this Makefile only when CONFIG_DCB=y. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-2-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	8b5f4eb3ab	net: move CONFIG_NET guard to top Makefile When CONFIG_NET is disabled, nothing under the net/ directory is compiled. Move the CONFIG_NET guard to the top Makefile so the net/ directory is entirely skipped. When Kbuild visits net/Makefile, CONFIG_NET is obvioulsy 'y' because CONFIG_NET is a bool option. Clean up net/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231659.106201-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:03:52 -08:00
Masahiro Yamada	69783429cd	net: sysctl: remove redundant #ifdef CONFIG_NET CONFIG_NET is a bool option, and this file is compiled only when CONFIG_NET=y. Remove #ifdef CONFIG_NET, which we know it is always met. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20210125231421.105936-1-masahiroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:02:43 -08:00
Matthieu Baerts	1f2f1931b2	mptcp: pm nl: reduce variable scope To avoid confusions like when working on the previous patch, better to declare and assign this variable only where it is needed. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:55 -08:00
Matthieu Baerts	7b9b0f7e12	mptcp: pm nl: support IPv4 mapped in v6 addresses On one side, we can allow the creation of subflows between v4 mapped in v6 and v4 addresses. For that we look for v4mapped addresses between the local address we want to select and the remote one. On the other side, we also properly deal with received v4mapped addresses, either announced ones or set via Netlink. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/122 Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:52 -08:00
Matthieu Baerts	50a13bc394	mptcp: support MPJoin with IPv4 mapped in v6 sk With an IPv4 mapped in v6 socket, we were trying to call inet6_bind() with an IPv4 address resulting in a -EINVAL error because the given addr_len -- size of the address structure -- was too short. We now make sure to use address structures for the same family as the MPTCP socket for both the bind() and the connect(). It means we convert v4 addresses to v4 mapped in v6 or the opposite if needed. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/122 Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:53:47 -08:00
Di Zhu	275b1e88ca	pktgen: fix misuse of BUG_ON() in pktgen_thread_worker() pktgen create threads for all online cpus and bond these threads to relevant cpu repecivtily. when this thread firstly be woken up, it will compare cpu currently running with the cpu specified at the time of creation and if the two cpus are not equal, BUG_ON() will take effect causing panic on the system. Notice that these threads could be migrated to other cpus before start running because of the cpu hotplug after these threads have created. so the BUG_ON() used here seems unreasonable and we can replace it with WARN_ON() to just printf a warning other than panic the system. Signed-off-by: Di Zhu <zhudi21@huawei.com> Link: https://lore.kernel.org/r/20210125124229.19334-1-zhudi21@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 16:46:37 -08:00
Felix Fietkau	d3b9b45f7e	mac80211: minstrel_ht: fix regression in the max_prob_rate fix Since mi->max_prob_rate is overwritten after the loop that calls minstrel_ht_set_best_prob_rate, the new best rate needs to be written to *dest Fixes: `a7fca4e403` ("mac80211: minstrel_ht: fix max probability rate selection") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210126154409.6755-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-27 22:06:38 +01:00
Marc Kleine-Budde	12da7a1f3c	can: gw: fix typo This patch fixes a typo found by codespell. Fixes: `94c23097f9` ("can: gw: support modification of Classical CAN DLCs") Link: https://lore.kernel.org/r/20210127085529.2768537-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2021-01-27 10:01:46 +01:00
Praveen Chaudhary	6b2e04bc24	net: allow user to set metric on default route learned via Router Advertisement For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This patch adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces: auto eth0 iface eth0 inet dhcp metric 4261413864 IPv4 Kernel Route Table: $ ip route list default via 172.21.47.1 dev eth0 metric 4261413864 FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.] Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m i.e. User can prefer Default Router learned via Routing Protocol in IPv4. Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705 IP monitor: [When IPv6 RA is received] default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high Kernel IPv6 routing table $ ip -6 route list default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.] Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m If the metric is changed later, the effect will be seen only when next IPv6 RA is received, because the default route must be fully controlled by RA msg. Below metric is changed from 1996489705 to 1996489704. $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704 net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704 IP monitor: [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric] Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 18:39:45 -08:00
Nikolay Aleksandrov	3e841bacf7	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: `baa74d39ca` ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 16:26:50 -08:00
Johannes Berg	a05829a722	cfg80211: avoid holding the RTNL when calling the driver Currently, _everything_ in cfg80211 holds the RTNL, and if you have a slow USB device (or a few) you can get some bad lock contention on that. Fix that by re-adding a mutex to each wiphy/rdev as we had at some point, so we have locking for the wireless_dev lists and all the other things in there, and also so that drivers still don't have to worry too much about it (they still won't get parallel calls for a single device). Then, we can restrict the RTNL to a few cases where we add or remove interfaces and really need the added protection. Some of the global list management still also uses the RTNL, since we need to have it anyway for netdev management, but we only hold the RTNL for very short periods of time here. Link: https://lore.kernel.org/r/20210122161942.81df9f5e047a.I4a8e1a60b18863ea8c5e6d3a0faeafb2d45b2f40@changeid Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> [marvell driver issues] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2021-01-26 11:55:50 +01:00
Jiapeng Zhong	8d21c882ab	bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:07 -08:00
Alexander Lobakin	36707061d6	udp: allow forwarding of plain (non-fraglisted) UDP GRO packets Commit `9fd1ff5d2a` ("udp: Support UDP fraglist GRO/GSO.") actually not only added a support for fraglisted UDP GRO, but also tweaked some logics the way that non-fraglisted UDP GRO started to work for forwarding too. Commit `2e4ef10f58` ("net: add GSO UDP L4 and GSO fraglists to the list of software-backed types") added GSO UDP L4 to the list of software GSO to allow virtual netdevs to forward them as is up to the real drivers. Tests showed that currently forwarding and NATing of plain UDP GRO packets are performed fully correctly, regardless if the target netdevice has a support for hardware/driver GSO UDP L4 or not. Add the last element and allow to form plain UDP GRO packets if we are on forwarding path, and the new NETIF_F_GRO_UDP_FWD is enabled on a receiving netdevice. If both NETIF_F_GRO_FRAGLIST and NETIF_F_GRO_UDP_FWD are set, fraglisted GRO takes precedence. This keeps the current behaviour and is generally more optimal for now, as the number of NICs with hardware USO offload is relatively small. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:18:16 -08:00
Alexander Lobakin	6f1c0ea133	net: introduce a netdev feature for UDP GRO forwarding Introduce a new netdev feature, NETIF_F_GRO_UDP_FWD, to allow user to turn UDP GRO on and off for forwarding. Defaults to off to not change current datapath. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 20:16:24 -08:00
Danielle Ratson	321f7ab0d4	mlxsw: Register physical ports as a devlink resource The switch ASIC has a limited capacity of physical ('flavour physical' in devlink terminology) ports that it can support. While each system is brought up with a different number of ports, this number can be increased via splitting up to the ASIC's limit. Expose physical ports as a devlink resource so that user space will have visibility to the maximum number of ports that can be supported and the current occupancy. In addition, add a "Generic Resources" section in devlink-resource documentation so the different drivers will be aligned by the same resource name when exposing to user space. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:42:13 -08:00
Maxim Mikityanskiy	8327158624	sch_htb: Stats for offloaded HTB This commit adds support for statistics of offloaded HTB. Bytes and packets counters for leaf and inner nodes are supported, the values are taken from per-queue qdiscs, and the numbers that the user sees should have the same behavior as the software (non-offloaded) HTB. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	d03b195b5a	sch_htb: Hierarchical QoS hardware offload HTB doesn't scale well because of contention on a single lock, and it also consumes CPU. This patch adds support for offloading HTB to hardware that supports hierarchical rate limiting. In the offload mode, HTB passes control commands to the driver using ndo_setup_tc. The driver has to replicate the whole hierarchy of classes and their settings (rate, ceil) in the NIC. Every modification of the HTB tree caused by the admin results in ndo_setup_tc being called. After this setup, the HTB algorithm is done completely in the NIC. An SQ (send queue) is created for every leaf class and attached to the hierarchy, so that the NIC can calculate and obey aggregated rate limits, too. In the future, it can be changed, so that multiple SQs will back a single leaf class. ndo_select_queue is responsible for selecting the right queue that serves the traffic class of each packet. The data path works as follows: a packet is classified by clsact, the driver selects a hardware queue according to its class, and the packet is enqueued into this queue's qdisc. This solution addresses two main problems of scaling HTB: 1. Contention by flow classification. Currently the filters are attached to the HTB instance as follows: # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10 It's possible to move classification to clsact egress hook, which is thread-safe and lock-free: # tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10 This way classification still happens in software, but the lock contention is eliminated, and it happens before selecting the TX queue, allowing the driver to translate the class to the corresponding hardware queue in ndo_select_queue. Note that this is already compatible with non-offloaded HTB and doesn't require changes to the kernel nor iproute2. 2. Contention by handling packets. HTB is not multi-queue, it attaches to a whole net device, and handling of all packets takes the same lock. When HTB is offloaded, it registers itself as a multi-queue qdisc, similarly to mq: HTB is attached to the netdev, and each queue has its own qdisc. Some features of HTB may be not supported by some particular hardware, for example, the maximum number of classes may be limited, the granularity of rate and ceil parameters may be different, etc. - so, the offload is not enabled by default, a new parameter is used to enable it: # tc qdisc replace dev eth0 root handle 1: htb offload Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Maxim Mikityanskiy	4dd78a7373	net: sched: Add extack to Qdisc_class_ops.delete In a following commit, sch_htb will start using extack in the delete class operation to pass hardware errors in offload mode. This commit prepares for that by adding the extack parameter to this callback and converting usage of the existing qdiscs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:41:29 -08:00
Arjun Roy	7eeba1706e	tcp: Add receive timestamp support for receive zerocopy. tcp_recvmsg() uses the CMSG mechanism to receive control information like packet receive timestamps. This patch adds CMSG fields to struct tcp_zerocopy_receive, and provides receive timestamps if available to the user. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:05:56 -08:00
Arjun Roy	925bba24e6	tcp: Remove CMSG magic numbers for tcp_recvmsg(). At present, tcp_recvmsg() uses flags to track if any CMSGs are pending and what those CMSGs are. These flags are currently magic numbers, used only within tcp_recvmsg(). To prepare for receive timestamp support in tcp receive zerocopy, gently refactor these magic numbers into enums. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 20:05:56 -08:00
Nikolay Aleksandrov	d5a1022283	net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletes Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	e87e4b5caa	net: bridge: multicast: handle block pg delete for all cases A block report can result in empty source and host sets for both include and exclude groups so if there are no hosts left we can safely remove the group. Pull the block group handling so it can cover both cases and add a check if EHT requires the delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	c9739016a0	net: bridge: multicast: add EHT host filter_mode handling We should be able to handle host filter mode changing. For exclude mode we must create a zero-src entry so the group will be kept even without any S,G entries (non-zero source sets). That entry doesn't count to the entry limit and can always be created, its timer is refreshed on new exclude reports and if we change the host filter mode to include then it gets removed and we rely only on the non-zero source sets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	b66bf55bbc	net: bridge: multicast: optimize TO_INCLUDE EHT timeouts This is an optimization specifically for TO_INCLUDE which sends queries for the older entries and thus lowers the S,G timers to LMQT. If we have the following situation for a group in either include or exclude mode: - host A was interested in srcs X and Y, but is timing out - host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts to LMQT - host B sends BLOCK src Z after LMQT time has passed => since host B is the last host we can delete the group, but if we still have host A's EHT entries for X and Y (i.e. if they weren't lowered to LMQT previously) then we'll have to wait another LMQT time before deleting the group, with this optimization we can directly remove it regardless of the group mode as there are no more interested hosts Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	ddc255d993	net: bridge: multicast: add EHT include and exclude handling Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - is_include: create missing entries - to_include: flush existing entries and create a new set from the report, obviously if the src set is empty then we delete the group - group exclude - is_exclude: create missing entries - to_exclude: flush existing entries and create a new set from the report, any empty source set entries are removed If the group is in a different mode then we just flush all entries reported by the host and we create a new set with the new mode entries created from the report. If the report is include type, the source list is empty and the group has empty sources' set then we remove it. Any source set entries which are empty are removed as well. If the group is in exclude mode it can exist without any S,G entries (allowing for all traffic to pass). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	474ddb37fa	net: bridge: multicast: add EHT allow/block handling Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	dba6b0a5ca	net: bridge: multicast: add EHT host delete function Now that we can delete set entries, we can use that to remove EHT hosts. Since the group's host set entries exist only when there are related source set entries we just have to flush all source set entries joined by the host set entry and it will be automatically removed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	baa74d39ca	net: bridge: multicast: add EHT source set handling functions Add EHT source set and set-entry create, delete and lookup functions. These allow to manipulate source sets which contain their own host sets with entries which joined that S,G. We're limiting the maximum number of tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is the current maximum of S,G entries for a group. There's a per-set timer which will be used to destroy the whole set later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	5b16328879	net: bridge: multicast: add EHT host handling functions Add functions to create, destroy and lookup an EHT host. These are per-host entries contained in the eht_host_tree in net_bridge_port_group which are used to store a list of all sources (S,G) entries joined for that group by each host, the host's current filter mode and total number of joined entries. No functional changes yet, these would be used in later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	8f07b83119	net: bridge: multicast: add EHT structures and definitions Add EHT structures for tracking hosts and sources per group. We keep one set for each host which has all of the host's S,G entries, and one set for each multicast source which has all hosts that have joined that S,G. For each host, source entry we record the filter_mode and we keep an expiry timer. There is also one global expiry timer per source set, it is updated with each set entry update, it will be later used to lower the set's timer instead of lowering each entry's timer separately. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	e7cfcf2c18	net: bridge: multicast: calculate idx position without changing ptr We need to preserve the srcs pointer since we'll be passing it for EHT handling later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	0ad57c99e8	net: bridge: multicast: __grp_src_block_incl can modify pg Prepare __grp_src_block_incl() for being able to cause a notification due to changes. Currently it cannot happen, but EHT would change that since we'll be deleting sources immediately. Make sure that if the pg is deleted we don't return true as that would cause the caller to access freed pg. This patch shouldn't cause any functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	54bea72196	net: bridge: multicast: pass host src address to IGMPv3/MLDv2 functions We need to pass the host address so later it can be used for explicit host tracking. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	9e10b9e656	net: bridge: multicast: rename src_size to addr_size Rename src_size argument to addr_size in preparation for passing host address as an argument to IGMPv3/MLDv2 functions. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Paolo Abeni	b19bc2945b	mptcp: implement delegated actions On MPTCP-level ack reception, the packet scheduler may select a subflow other then the current one. Prior to this commit we rely on the workqueue to trigger action on such subflow. This changeset introduces an infrastructure that allows any MPTCP subflow to schedule actions (MPTCP xmit) on others subflows without resorting to (multiple) process reschedule. A dummy NAPI instance is used instead. When MPTCP needs to trigger action an a different subflow, it enqueues the target subflow on the NAPI backlog and schedule such instance as needed. The dummy NAPI poll method walks the sockets backlog and tries to acquire the (BH) socket lock on each of them. If the socket is owned by the user space, the action will be completed by the sock release cb, otherwise push is started. This change leverages the delegated action infrastructure to avoid invoking the MPTCP worker to spool the pending data, when the packet scheduler picks a subflow other then the one currently processing the incoming MPTCP-level ack. Additionally we further refine the subflow selection invoking the packet scheduler for each chunk of data even inside __mptcp_subflow_push_pending(). v1 -> v2: - fix possible UaF at shutdown time, resetting sock ops after removing the ulp context Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:21:02 -08:00
Paolo Abeni	40dc9416cc	mptcp: schedule work for better snd subflow selection Otherwise the packet scheduler policy will not be enforced when pushing pending data at MPTCP-level ack reception time. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:21:02 -08:00
Paolo Abeni	ec369c3a33	mptcp: do not queue excessive data on subflows The current packet scheduler can enqueue up to sndbuf data on each subflow. If the send buffer is large and the subflows are not symmetric, this could lead to suboptimal aggregate bandwidth utilization. Limit the amount of queued data to the maximum send window. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:21:02 -08:00
Paolo Abeni	5cf92bbadc	mptcp: re-enable sndbuf autotune After commit `6e628cd3a8` ("mptcp: use mptcp release_cb for delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE on its subflow. As a side effect, autotune never takes place, as it happens inside tcp_new_space(), which in turn is called only when the mentioned bit is set. Let's sendmsg() set the subflows NOSPACE bit when looking for more memory and use the subflow write_space callback to propagate the snd buf update and wake-up the user-space. Additionally, this allows dropping a bunch of duplicate code and makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows. Fixes: `6e628cd3a8` ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:21:02 -08:00
Paolo Abeni	866f26f2a9	mptcp: always graft subflow socket to parent Currently, incoming subflows link to the parent socket, while outgoing ones link to a per subflow socket. The latter is not really needed, except at the initial connect() time and for the first subflow. Always graft the outgoing subflow to the parent socket and free the unneeded ones early. This allows some code cleanup, reduces the amount of memory used and will simplify the next patch Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:21:02 -08:00
Yousuk Seung	e7ed11ee94	tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports the time-to-live or hop limit of the latest incoming packet with SCM_TSTAMP_ACK. The value exported may not be from the packet that acks the sequence when incoming packets are aggregated. Exporting the time-to-live or hop limit value of incoming packets helps to estimate the hop count of the path of the flow that may change over time. Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 18:20:52 -08:00

1 2 3 4 5 ...

63752 Commits