linux

Author	SHA1	Message	Date
Vladimir Oltean	b8e997c490	net: dsa: introduce a separate cross-chip notifier type for host MDBs Commit `abd49535c3` ("net: dsa: execute dsa_switch_mdb_add only for routing port in cross-chip topologies") does a surprisingly good job even for the SWITCHDEV_OBJ_ID_HOST_MDB use case, where DSA simply translates a switchdev object received on dp into a cross-chip notifier for dp->cpu_dp. To visualize how that works, imagine the daisy chain topology below and consider a SWITCHDEV_OBJ_ID_HOST_MDB object emitted on sw2p0. How does the cross-chip notifier know to match on all the right ports (sw0p4, the dedicated CPU port, sw1p4, an upstream DSA link, and sw2p4, another upstream DSA link)? \| sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] [ ] [ ] [ ] [ ] [ x ] \| +---------+ \| sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] [ ] [ ] [ ] [ ] [ x ] \| +---------+ \| sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] [ ] [ ] [ ] [ ] [ x ] The answer is simple: the dedicated CPU port of sw2p0 is sw0p4, and dsa_routing_port returns the upstream port for all switches. That is fine, but there are other topologies where this does not work as well. There are trees with "H" topologies in the wild, where there are 2 or more switches with DSA links between them, but every switch has its dedicated CPU port. For these topologies, it seems stupid for the neighbor switches to install an MDB entry on the routing port, since these multicast addresses are fundamentally different than the usual ones we support (and that is the justification for this patch, to introduce the concept of a termination plane multicast MAC address, as opposed to a forwarding plane multicast MAC address). For example, when a SWITCHDEV_OBJ_ID_HOST_MDB would get added to sw0p0, without this patch, it would get treated as a regular port MDB on sw0p2 and it would match on the ports below (including the sw1p3 routing port). \| \| sw0p0 sw0p1 sw0p2 sw0p3 sw1p3 sw1p2 sw1p1 sw1p0 [ user ] [ user ] [ cpu ] [ dsa ] [ dsa ] [ cpu ] [ user ] [ user ] [ ] [ ] [ x ] [ ] ---- [ x ] [ ] [ ] [ ] With the patch, the host MDB notifier on sw0p0 matches only on the local switch, which is what we want for a termination plane address. \| \| sw0p0 sw0p1 sw0p2 sw0p3 sw1p3 sw1p2 sw1p1 sw1p0 [ user ] [ user ] [ cpu ] [ dsa ] [ dsa ] [ cpu ] [ user ] [ user ] [ ] [ ] [ x ] [ ] ---- [ ] [ ] [ ] [ ] Name this new matching function "dsa_switch_host_address_match" since we will be reusing it soon for host FDB entries as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:23 -07:00
Vladimir Oltean	63609c8fac	net: dsa: introduce dsa_is_upstream_port and dsa_switch_is_upstream_of In preparation for the new cross-chip notifiers for host addresses, let's introduce some more topology helpers which we are going to use to discern switches that are in our path towards the dedicated CPU port from switches that aren't. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:23 -07:00
Vladimir Oltean	b117e1e8a8	net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del We want to add reference counting for FDB entries in cross-chip topologies, and in order for that to have any chance of working and not be unbalanced (leading to entries which are never deleted), we need to ensure that higher layers are sane, because if they aren't, it's garbage in, garbage out. For example, if we add a bridge FDB entry twice, the bridge properly errors out: $ bridge fdb add dev swp0 00:01:02:03:04:07 master static $ bridge fdb add dev swp0 00:01:02:03:04:07 master static RTNETLINK answers: File exists However, the same thing cannot be said about the bridge bypass operations: $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ echo $? 0 But one 'bridge fdb del' is enough to remove the entry, no matter how many times it was added. The bridge bypass operations are impossible to maintain in these circumstances and lack of support for reference counting the cross-chip notifiers is holding us back from making further progress, so just drop support for them. The only way left for users to install static bridge FDB entries is the proper one, using the "master static" flags. With this change, rtnl_fdb_add() falls back to calling ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare IFF_UNICAST_FLT, this results in us going to promiscuous mode: $ bridge fdb add dev swp0 00:01:02:03:04:05 [ 28.206743] device swp0 entered promiscuous mode $ bridge fdb add dev swp0 00:01:02:03:04:05 RTNETLINK answers: File exists So even if it does not completely fail, there is at least some indication that it is behaving differently from before, and closer to user space expectations, I would argue (the lack of a "local\|static" specifier defaults to "local", or "host-only", so dev_uc_add() is a reasonable default implementation). If the generic implementation of .ndo_fdb_add provided by Vlad Yasevich is a proof of anything, it only proves that the implementation provided by DSA was always wrong, by not looking at "ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local" flag which means that the FDB entry points towards the host). It all used to mean the same thing to DSA. Update the documentation so that the users are not confused about what's going on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:23 -07:00
Vladimir Oltean	f851a721a6	net: bridge: allow br_fdb_replay to be called for the bridge device When a port joins a bridge which already has local FDB entries pointing to the bridge device itself, we would like to offload those, so allow the "dev" argument to be equal to the bridge too. The code already does what we need in that case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:23 -07:00
Tobias Waldekranz	6eb38bf8eb	net: bridge: switchdev: send FDB notifications for host addresses Treat addresses added to the bridge itself in the same way as regular ports and send out a notification so that drivers may sync it down to the hardware FDB. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:23 -07:00
Vladimir Oltean	3e19ae7c6f	net: bridge: use READ_ONCE() and WRITE_ONCE() compiler barriers for fdb->dst Annotate the writer side of fdb->dst: - fdb_create() - br_fdb_update() - fdb_add_entry() - br_fdb_external_learn_add() with WRITE_ONCE() and the reader side: - br_fdb_test_addr() - br_fdb_update() - fdb_fill_info() - fdb_add_entry() - fdb_delete_by_addr_and_port() - br_fdb_external_learn_add() - br_switchdev_fdb_notify() with compiler barriers such that the readers do not attempt to reload fdb->dst multiple times, leading to potentially different destination ports when the fdb entry is updated concurrently. This is especially important in read-side sections where fdb->dst is used more than once, but let's convert all accesses for the sake of uniformity. Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 10:46:22 -07:00
David S. Miller	84fe73996c	Merge branch 'do_once_lite' Tanner Love says: ==================== net: update netdev_rx_csum_fault() print dump only once First patch implements DO_ONCE_LITE to abstract uses of the ".data.once" trick. It is defined in its own, new header file -- rather than alongside the existing DO_ONCE in include/linux/once.h -- because include/linux/once.h includes include/linux/jump_label.h, and this causes the build to break for some architectures if include/linux/once.h is included in include/linux/printk.h or include/asm-generic/bug.h. Second patch uses DO_ONCE_LITE in netdev_rx_csum_fault to print dump only once. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:54:57 -07:00
Tanner Love	127d7355ab	net: update netdev_rx_csum_fault() print dump only once Printing this stack dump multiple times does not provide additional useful information, and consumes time in the data path. Printing once is sufficient. Changes v2: Format indentation properly Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:54:57 -07:00
Tanner Love	a358f40600	once: implement DO_ONCE_LITE for non-fast-path "do once" functionality Certain uses of "do once" functionality reside outside of fast path, and so do not require jump label patching via static keys, making existing DO_ONCE undesirable in such cases. Replace uses of __section(".data.once") with DO_ONCE_LITE(_IF)? This patch changes the return values of xfs_printk_once, printk_once, and printk_deferred_once. Before, they returned whether the print was performed, but now, they always return true. This is okay because the return values of the following macros are entirely ignored throughout the kernel: - xfs_printk_once - xfs_warn_once - xfs_notice_once - xfs_info_once - printk_once - pr_emerg_once - pr_alert_once - pr_crit_once - pr_err_once - pr_warn_once - pr_notice_once - pr_info_once - pr_devel_once - pr_debug_once - printk_deferred_once - orc_warn Changes v3: - Expand commit message to explain why changing return values of xfs_printk_once, printk_once, printk_deferred_once is benign v2: - Fix i386 build warnings Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:54:57 -07:00
Nathan Chancellor	b74ef9f9cb	net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe() Clang warns: drivers/net/ethernet/microchip/sparx5/sparx5_main.c:760:29: warning: variable 'mac_addr' is uninitialized when used here [-Wuninitialized] if (of_get_mac_address(np, mac_addr)) { ^~~~~~~~ drivers/net/ethernet/microchip/sparx5/sparx5_main.c:669:14: note: initialize the variable 'mac_addr' to silence this warning u8 *mac_addr; ^ = NULL 1 warning generated. mac_addr is only used to store the value retrieved from of_get_mac_address(), which is then copied into the base_mac member of the sparx5 struct using ether_addr_copy(). It is easier to just use the base_mac address directly, which avoids the warning and the extra copy. Fixes: `3cfa11bac9` ("net: sparx5: add the basic sparx5 driver") Link: https://github.com/ClangBuiltLinux/linux/issues/1413 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:50:21 -07:00
Vladimir Oltean	74e7feff0e	net: dsa: sja1105: fix dynamic access to L2 Address Lookup table for SJA1110 The SJA1105P/Q/R/S and SJA1110 may have the same layout for the command to read/write/search for L2 Address Lookup entries, but as explained in the comments at the beginning of the sja1105_dynamic_config.c file, the command portion of the buffer is at the end, and we need to obtain a pointer to it by adding the length of the entry to the buffer. Alas, the length of an L2 Address Lookup entry is larger in SJA1110 than it is for SJA1105P/Q/R/S, so we need to create a common helper to access the command buffer, and this receives as argument the length of the entry buffer. Fixes: `3e77e59bf8` ("net: dsa: sja1105: add support for the SJA1110 switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:49:05 -07:00
Horatiu Vultur	f7458934b0	net: bridge: mrp: Update the Test frames for MRA According to the standard IEC 62439-2, in case the node behaves as MRA and needs to send Test frames on ring ports, then these Test frames need to have an Option TLV and a Sub-Option TLV which has the type AUTO_MGR. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:46:10 -07:00
David S. Miller	f0305e732a	bluetooth-next pull request for net-next: - Add support for QCA_ROME device (0cf3:e500) and RTL8822CE - Update management interface revision to 21 - Use of incluse language - Proper handling of HCI_LE_Advertising_Set_Terminated event - Recovery handing of HCI ncmd=0 - Various memory fixes -----BEGIN PGP SIGNATURE----- iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmDaAwwZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQwaD/4oPoEH/jdEhkvVUpHJebm2 GYNa+zvW5PwqgK1wvdg9Ou0fpgLuHMH+XPyhVMciG3N6kpbAGnyrGTlpKyu8uqh3 EDBPYoU3Rx/0+gUCrEXZrkSSyds8SuzD+ALkEzHtg2ISkUGff0FiFyDbc7TXG4ue h1n87LgouCEq0kYEKZAbXnSKDt0aDBIeXGCMjZ/K4DctfeLZTTvzA8HMbX/ektzm DOcZdHD1i47ijsu7/PZhesnqw/RGmMXaXfMvO9uuXM4N5qrZdDdFB9eDXRhHTwwF 79P3bYYweLvQloMXKQytnGUElPqceMNznCB1LsvxxDqTJuEwqF5nVzJ3TJJJHon2 Cq3JP3QthlpomRqaKlf+raQy618u5yF3APef9ZrUeLABfTjy2CIh0IhKixW7SsP0 VH3b4z8ClRodRTOYJpEx0Ncs9krccbmNf73FCf1kzjPEP7oKLuuA37WcolQ+7fjd UiCLaKJ9flHFH3qeBzAii2vqe07Qy/l/SnG33nB1bDv4g0XnXI/Apb70z95jZcJL jUBmlWbWcxKV1euTuIGpWiZPfhZyTVN0D3mcLcdCJhxHj79ZlUZcA1WdA699Crbo Pc4icu/INzDybQ5oivqu78Ajdun3XGq/orlZfDtbjZNHoJpjNQTHhSNdPZhM9Bgr uvNnaggWfjCXeSZ6ljY/Nw== =2NdY -----END PGP SIGNATURE----- Merge tag 'for-net-next-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for QCA_ROME device (0cf3:e500) and RTL8822CE - Update management interface revision to 21 - Use of incluse language - Proper handling of HCI_LE_Advertising_Set_Terminated event - Recovery handing of HCI ncmd=0 - Various memory fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:35:50 -07:00
David S. Miller	e1289cfb63	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2021-06-28 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 12 day(s) which contain a total of 56 files changed, 394 insertions(+), 380 deletions(-). The main changes are: 1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney. 2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski. 3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev. 4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria. 5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards. 6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa. 7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi. 8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim. 9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young. 10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer. 11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski. 12) Fix up bpfilter startup log-level to info level, from Gary Lin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 15:28:03 -07:00
Andreas Roeseler	1fd07f33c3	ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages This patch builds off of commit `2b246b2569` and adds functionality to respond to ICMPV6 PROBE requests. Add icmp_build_probe function to construct PROBE requests for both ICMPV4 and ICMPV6. Modify icmpv6_rcv to detect ICMPV6 PROBE messages and call the icmpv6_echo_reply handler. Modify icmpv6_echo_reply to build a PROBE response message based on the queried interface. This patch has been tested using a branch of the iputils git repo which can be found here: https://github.com/Juniper-Clinic-2020/iputils/tree/probe-request Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:29:45 -07:00
Yang Yingliang	83300c69e7	net: sparx5: fix error return code in sparx5_register_notifier_blocks() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `d6fce51419` ("net: sparx5: add switching support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:20:23 -07:00
Yang Yingliang	8f4c38f758	net: sparx5: fix return value check in sparx5_create_targets() In case of error, the function devm_ioremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: `3cfa11bac9` ("net: sparx5: add the basic sparx5 driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:20:22 -07:00
Yang Yingliang	f00af5cc58	net: sparx5: check return value after calling platform_get_resource() It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Fixes: `3cfa11bac9` ("net: sparx5: add the basic sparx5 driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:20:22 -07:00
David S. Miller	4bec3cea34	mlx5-updates-2021-06-26 This series provides small updates to mlx5 driver. 1) Increase hairpin buffer size 2) Improve peroformance in SF allocation 3) Add IPsec support to uplink representor 4) Add stats for number of deleted kTLS TX offloaded connections 5) Add support for flow sampler in SW steering -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDW19AACgkQSD+KveBX +j40vQgApjASekCSudKXPv0ZsmRt2gOb7CFteXaqT09/DFQt6bvLFmSE6xg49s0U U9JrRVd+gaxqP3CU+rse2u8z6kn5g9Z1IiEbT2F81i5SIiZsFIZT8w0jeGUxWT8l Wg4wNlei2C0TYr2GD5YHD9fZTqE3q7OzMqAFqq5bKK6dkdXhoBVj7qraSgiPGoe8 vvMSUNEbvXFKhwfAv2AbJBaAkB34NW7H0g8YqozKu0jgOpnO9fXVCq7BzLotL7uo ZQzZYP4hkS7qTUWvQt9C0Nflj1t3Z6M9SxQS4HB7eRkcXooHecrdDqifMewhO4G7 wGlqK68oFAHfBG08naBrORoKSLuGqA== =+PTn -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-06-26 This series provides small updates to mlx5 driver. 1) Increase hairpin buffer size 2) Improve peroformance in SF allocation 3) Add IPsec support to uplink representor 4) Add stats for number of deleted kTLS TX offloaded connections 5) Add support for flow sampler in SW steering ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:15:07 -07:00
David S. Miller	3095f512e3	Merge branch 'bridge-replay-helpers' Vladimir Oltean says: ==================== Cleanup for the bridge replay helpers This patch series brings some improvements to the logic added to the bridge and DSA to handle LAG interfaces sandwiched between a bridge and a DSA switch port. br0 / \ / \ bond0 swp2 / \ / \ swp0 swp1 In particular, it ensures that the switchdev object additions and deletions are well balanced per physical port. This is important for future work in the area of offloading local bridge FDB entries to hardware in the context of DSA requesting a replay of those entries at bridge join time (this will be submitted in a future patch series). Due to some difficulty ensuring that the deletion of local FDB entries pointing towards the bridge device itself is notified to switchdev in time (before the switchdev port disconnects from the bridge), this is potentially still not the final form in which the replay helpers will exist. I'm thinking about moving from the pull mode (in which DSA requests the replay) to a push mode (in which the bridge initiates the replay). Nonetheless, these preliminary changes are needed either way. The patch series also addresses some feedback from Nikolai which is long overdue by now (sorry). Switchdev driver maintainers were deliberately omitted due to the trivial nature of the driver changes (just a function prototype). Changes in v2: - fix build issue in patch 4 (function prototype mismatch) - move switchdev object unsync to the NETDEV_PRECHANGEUPPER code path ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:04 -07:00
Vladimir Oltean	7491894532	net: dsa: replay a deletion of switchdev objects for ports leaving a bridged LAG When a DSA switch port leaves a bonding interface that is under a bridge, there might be dangling switchdev objects on that port left behind, because the bridge is not aware that its lower interface (the bond) changed state in any way. Call the bridge replay helpers with adding=false before changing dp->bridge_dev to NULL, because we need to simulate to dsa_slave_port_obj_del() that these notifications were emitted by the bridge. We add this hook to the NETDEV_PRECHANGEUPPER event handler, because we are calling into switchdev (and the __switchdev_handle_port_obj_del fanout helpers expect the upper/lower adjacency lists to still be valid) and PRECHANGEUPPER is the last moment in time when they still are. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	4ede74e73b	net: dsa: refactor the prechangeupper sanity checks into a dedicated function We need to add more logic to the DSA NETDEV_PRECHANGEUPPER event handler, more exactly we need to request an unsync of switchdev objects. In order to fit more code, refactor the existing logic into a helper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	7e8c18586d	net: bridge: allow the switchdev replay functions to be called for deletion When a switchdev port leaves a LAG that is a bridge port, the switchdev objects and port attributes offloaded to that port are not removed: ip link add br0 type bridge ip link add bond0 type bond mode 802.3ad ip link set swp0 master bond0 ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 nomaster VLAN 100 will remain installed on swp0 despite it going into standalone mode, because as far as the bridge is concerned, nothing ever happened to its bridge port. Let's extend the bridge vlan, fdb and mdb replay functions to take a 'bool adding' argument, and make DSA and ocelot call the replay functions with 'adding' as false from the switchdev unsync path, for the switch port that leaves the bridge. Note that this patch in itself does not salvage anything, because in the current pull mode of operation, DSA still needs to call the replay helpers with adding=false. This will be done in another patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	bdf123b455	net: bridge: constify variables in the replay helpers Some of the arguments and local variables for the newly added switchdev replay helpers can be const, so let's make them so. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	0d2cfbd41c	net: bridge: ignore switchdev events for LAG ports which didn't request replay There is a slight inconvenience in the switchdev replay helpers added recently, and this is when: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 master bond0 ip link set swp1 master bond0 Since the underlying driver (currently only DSA) asks for a replay of VLANs when swp0 and swp1 join the LAG because it is bridged, what will happen is that DSA will try to react twice on the VLAN event for swp0. This is not really a huge problem right now, because most drivers accept duplicates since the bridge itself does, but it will become a problem when we add support for replaying switchdev object deletions. Let's fix this by adding a blank void *ctx in the replay helpers, which will be passed on by the bridge in the switchdev notifications. If the context is NULL, everything is the same as before. But if the context is populated with a valid pointer, the underlying switchdev driver (currently DSA) can use the pointer to 'see through' the bridge port (which in the example above is bond0) and 'know' that the event is only for a particular physical port offloading that bridge port, and not for all of them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	69bfac968a	net: switchdev: add a context void pointer to struct switchdev_notifier_info In the case where the driver asks for a replay of a certain type of event (port object or attribute) for a bridge port that is a LAG, it may do so because this port has just joined the LAG. But there might already be other switchdev ports in that LAG, and it is preferable that those preexisting switchdev ports do not act upon the replayed event. The solution is to add a context to switchdev events, which is NULL most of the time (when the bridge layer initiates the call) but which can be set to a value controlled by the switchdev driver when a replay is requested. The driver can then check the context to figure out if all ports within the LAG should act upon the switchdev event, or just the ones that match the context. We have to modify all switchdev_handle_* helper functions as well as the prototypes in the drivers that use these helpers too, because these helpers hide the underlying struct switchdev_notifier_info from us and there is no way to retrieve the context otherwise. The context structure will be populated and used in later patches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	97558e880f	net: ocelot: delete call to br_fdb_replay Not using this driver, I did not realize it doesn't react to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE notifications, but it implements just the bridge bypass operations (.ndo_fdb_{add,del}). So the call to br_fdb_replay just produces notifications that are ignored, delete it for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Vladimir Oltean	e887b2df62	net: bridge: include the is_local bit in br_fdb_replay Since commit `2c4eca3ef7` ("net: bridge: switchdev: include local flag in FDB notifications"), the bridge emits SWITCHDEV_FDB_ADD_TO_DEVICE events with the is_local flag populated (but we ignore it nonetheless). We would like DSA to start treating this bit, but it is still not populated by the replay helper, so add it there too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 14:09:03 -07:00
Jakub Kicinski	a78cae2476	xdp: Move the rxq_info.mem clearing to unreg_mem_model() xdp_rxq_info_unreg() implicitly calls xdp_rxq_info_unreg_mem_model(). This may well be confusing to the driver authors, and lead to double free if they call xdp_rxq_info_unreg_mem_model() before xdp_rxq_info_unreg() (when mem model type == MEM_TYPE_PAGE_POOL). In fact error path of mvpp2_rxq_init() seems to currently do exactly that. The double free will result in refcount underflow in page_pool_destroy(). Make the interface a little more programmer friendly by clearing type and id so that xdp_rxq_info_unreg_mem_model() can be called multiple times. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210625221612.2637086-1-kuba@kernel.org	2021-06-28 23:07:59 +02:00
David S. Miller	a1b05634e1	Merge branch 'bnxt_en-ptp' Michael Chan says: ==================== bnxt_en: Add hardware PTP timestamping support on 575XX devices Add PTP RX and TX hardware timestamp support on 575XX devices. These devices use the two-step method to implement the IEEE-1588 timestamping support. v2: Add spinlock to serialize access to the timecounter. Use .do_aux_work() for the periodic timer reading and to get the TX timestamp from the firmware. Propagate error code from ptp_clock_register(). Make the 64-bit timer access safe on 32-bit CPUs. Read PHC using direct register access. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:06 -07:00
Michael Chan	93cb62d98e	bnxt_en: Enable hardware PTP support Call bnxt_ptp_init() to initialize and register with the clock driver to enable PTP support. Call bnxt_ptp_free() to unregister and clean up during shutdown. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:06 -07:00
Pavan Chebbi	83bb623c96	bnxt_en: Transmit and retrieve packet timestamps Setup the TXBD to enable TX timestamp if requested. At TX packet DMA completion, if we requested TX timestamp on that packet, we defer to .do_aux_work() to obtain the TX timestamp from the firmware before we free the TX SKB. v2: Use .do_aux_work() to get the TX timestamp from firmware. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:06 -07:00
Pavan Chebbi	7f5515d19c	bnxt_en: Get the RX packet timestamp If the RX packet is timestamped by the hardware, the RX completion record will contain the lower 32-bit of the timestamp. This needs to be combined with the upper 16-bit of the periodic timestamp that we get from the timer. The previous snapshot in ptp->old_timer is used to make sure that the snapshot is not ahead of the RX timestamp and we adjust for wrap-around if needed. v2: Make ptp->old_time read access safe on 32-bit CPUs. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:06 -07:00
Pavan Chebbi	390862f45c	bnxt_en: Get the full 48-bit hardware timestamp periodically From the bnxt_timer(), read the 48-bit hardware running clock periodically and store it in ptp->current_time. The previous snapshot of the clock will be stored in ptp->old_time. The old_time snapshot will be used in the next patches to compute the RX packet timestamps. v2: Use .do_aux_work() to read the timer periodically. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:06 -07:00
Michael Chan	118612d519	bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods Add the clock APIs to set/get/adjust the hw clock, and the related ioctls and ethtool methods. v2: Propagate error code from ptp_clock_register(). Add spinlock to serialize access to the timecounter. The timecounter is accessed in process context and the RX datapath. Read the PHC using direct registers. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:05 -07:00
Michael Chan	ae5c42f0b9	bnxt_en: Get PTP hardware capability from firmware Store PTP hardware info in a structure if hardware and firmware support PTP. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:05 -07:00
Michael Chan	78eeadb8fe	bnxt_en: Update firmware interface to 1.10.2.47 Adding the PTP related firmware interface is the main change. There is also a name change for admin_mtu, requiring code fixup. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:41:05 -07:00
David S. Miller	2eeae3a5cb	Merge branch 'hns3-next' Guangbin Huang says: ==================== net: hns3: add new debugfs commands This series adds three new debugfs commands for the HNS3 ethernet driver. change log: V1 -> V2: 1. remove patch "net: hns3: add support for link diagnosis info in debugfs" and use ethtool extended link state to implement similar function according to Jakub Kicinski's opinion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:34:58 -07:00
Jian Shen	d59daf6a4c	net: hns3: add support for dumping MAC umv counter in debugfs This patch adds support of dumping MAC umv counter in debugfs, which will be helpful for debugging. The display style is below: $ cat umv_info num_alloc_vport : 2 max_umv_size : 256 wanted_umv_size : 256 priv_umv_size : 85 share_umv_size : 86 vport(0) used_umv_num : 1 vport(1) used_umv_num : 1 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:34:58 -07:00
Jian Shen	03a92fe8ce	net: hns3: add support for FD counter in debugfs Previously, the flow director counter is not enabled. To improve the maintainability for chechking whether flow director hit or not, enable flow director counter for each function, and add debugfs query inerface to query the counters for each function. The debugfs command is below: cat fd_counter func_id hit_times pf 0 vf0 0 vf1 0 Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:34:58 -07:00
David S. Miller	c948b46a7d	Merge branch 'tipc-next' Menglong Dong says: ==================== net: tipc: fix FB_MTU eat two pages and do some code cleanup In the first patch, FB_MTU is redefined to make sure data size will not exceed PAGE_SIZE. Besides, I removed the alignment for buf_size in tipc_buf_acquire, because skb_alloc_fclone will do the alignment job. In the second patch, I removed align() in msg.c and replace it with ALIGN(). Changes since V5: - remove blank line after Fixes in commit log in the first patch Changes since V4: - remove ONE_PAGE_SKB_SZ and replace it with one_page_mtu in the first patch. - fix some code style problems for the second patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:31:57 -07:00
Menglong Dong	d4cfb7fe57	net: tipc: replace align() with ALIGN in msg.c The function align() which is defined in msg.c is redundant, replace it with ALIGN() and introduce a BUF_ALIGN(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:31:57 -07:00
Menglong Dong	0c6de0c943	net: tipc: fix FB_MTU eat two pages FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory allocation fails, which can avoid unnecessary sending failures. The value of FB_MTU now is 3744, and the data size will be: (3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \ SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3)) which is larger than one page(4096), and two pages will be allocated. To avoid it, replace '3744' with a calculation: (PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size, and it's not necessary to make alignment for buf_size in tipc_buf_acquire(). So, just remove it. Fixes: `4c94cc2d3d` ("tipc: fall back to smaller MTU if allocation of local send skb fails") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:31:57 -07:00
David S. Miller	1b077ce1c5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git /klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-06-28 1) Remove an unneeded error assignment in esp4_gro_receive(). From Yang Li. 2) Add a new byseq state hashtable to find acquire states faster. From Sabrina Dubroca. 3) Remove some unnecessary variables in pfkey_create(). From zuoqilin. 4) Remove the unused description from xfrm_type struct. From Florian Westphal. 5) Fix a spelling mistake in the comment of xfrm_state_ok(). From gushengxian. 6) Replace hdr_off indirections by a small helper function. From Florian Westphal. 7) Remove xfrm4_output_finish and xfrm6_output_finish declarations, they are not used anymore.From Antony Antony. 8) Remove xfrm replay indirections. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:17:16 -07:00
David S. Miller	007b312c6f	Lots of changes: * aggregation handling improvements for some drivers * hidden AP discovery on 6 GHz and other HE 6 GHz improvements * minstrel improvements for no-ack frames * deferred rate control for TXQs to improve reaction times * virtual time-based airtime scheduler * along with various little cleanups/fixups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmDWUHgACgkQB8qZga/f l8RxXA/+KjkrFUGDHlMTIzS/i03EcB+idLYy+XBWYuO3PN4p/uCCXLzdGfiRKTxp iCAjYp0ZP01FlN4sOIN+qqrUMoN0e8xqBEY1DFbJELBB1knV4VR0FeBPPcBMUEgE xAjdZnIsfKO1yU2mrdQVDnkvXr4jHDALePvGwCQe+4KUbwCmVjo6nb534v17Ie3C MDNnGq395fCvo9NW+Yvzw6s9ZLdhr28bGhSBgBrqmPBx/9vm3b7BA8qO+X6BzAny 87S2x+z40wabBujCGMvw9J5H6OJ0ZLfbT0TumNznqQbCbmwRdSUC+W/cjgRq/aQn CA3E9T7tofIr1mk9EgcjE8ax91TB/TTX5Nh9Huki25B7rNRBM57VnuEuzmXP5qBS xgsr60agQHp2ZAGRPBC9msPcPCbvRfdl9ckTAF8/4vI7uEXSyYViX92M0F1FEy60 Mw13B8WeRA4bH22xHpqdXsyl0x2OQHXHlLHfpzmhcUy0i2cQOQMsWbl3g6CJMTq+ /InNSaG+cqrCVse734KJLumRhOq4+MkedYJlqL7uHZJrXlLgQ9Z8ewANjJ/BjTsQ d8I16XkKPyv8UiJEwHZ6NaHvU48LR8T/r4ym9c/rz0DTWHZyp87QMYa55iwuk4Zo cW7G3f/N7dMk9fGRxgfIrnNwNUCMsGcBSJeJyr5qT7u8+RW0bRo= =nFvj -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2021-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes berg says: ==================== Lots of changes: * aggregation handling improvements for some drivers * hidden AP discovery on 6 GHz and other HE 6 GHz improvements * minstrel improvements for no-ack frames * deferred rate control for TXQs to improve reaction times * virtual time-based airtime scheduler * along with various little cleanups/fixups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:06:12 -07:00
Matthieu Baerts	c4512c63b1	mptcp: fix 'masking a bool' warning Dan Carpenter reported an issue introduced in commit `fde56eea01` ("mptcp: refine mptcp_cleanup_rbuf") where a new boolean (ack_pending) is masked with 0x9. This is not the intention to ignore values by using a boolean. This variable should not have a 'bool' type: we should keep the 'u8' to allow this comparison. Fixes: `fde56eea01` ("mptcp: refine mptcp_cleanup_rbuf") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 13:04:06 -07:00
David S. Miller	8eb517a2a4	Merge branch 'reset-mac' Guillaume Nault says: ==================== net: reset MAC header consistently across L3 virtual devices Some virtual L3 devices, like vxlan-gpe and gre (in collect_md mode), reset the MAC header pointer after they parsed the outer headers. This accurately reflects the fact that the decapsulated packet is pure L3 packet, as that makes the MAC header 0 bytes long (the MAC and network header pointers are equal). However, many L3 devices only adjust the network header after decapsulation and leave the MAC header pointer to its original value. This can confuse other parts of the networking stack, like TC, which then considers the outer headers as one big MAC header. This patch series makes the following L3 tunnels behave like VXLAN-GPE: bareudp, ipip, sit, gre, ip6gre, ip6tnl, gtp. The case of gre is a bit special. It already resets the MAC header pointer in collect_md mode, so only the classical mode needs to be adjusted. However, gre also has a special case that expects the MAC header pointer to keep pointing to the outer header even after decapsulation. Therefore, patch 4 keeps an exception for this case. Ideally, we'd centralise the call to skb_reset_mac_header() in ip_tunnel_rcv(), to avoid manual calls in ipip (patch 2), sit (patch 3) and gre (patch 4). That's unfortunately not feasible currently, because of the gre special case discussed above that precludes us from resetting the MAC header unconditionally. The original motivation is to redirect bareudp packets to Ethernet devices (as described in patch 1). The rest of this series aims at bringing consistency across all L3 devices (apart from gre's special case unfortunately). Note: the gtp patch results from pure code inspection and has been compiled tested only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 12:44:18 -07:00
Guillaume Nault	b2d898c8a5	gtp: reset mac_header after decap For consistency with other L3 tunnel devices, reset the mac_header pointer after decapsulation. This makes the mac_header 0 bytes long, thus making it clear that this skb has no mac_header. Compile tested only. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 12:44:17 -07:00
Guillaume Nault	da5a2e49f0	ip6_tunnel: allow redirecting ip6gre and ipxip6 packets to eth devices Reset the mac_header pointer even when the tunnel transports only L3 data (in the ARPHRD_ETHER case, this is already done by eth_type_trans). This prevents other parts of the stack from mistakenly accessing the outer header after the packet has been decapsulated. In practice, this allows to push an Ethernet header to ipip6, ip6ip6, mplsip6 or ip6gre packets and redirect them to an Ethernet device: $ tc filter add dev ip6tnl0 ingress matchall \ action vlan push_eth dst_mac 00:00:5e:00:53:01 \ src_mac 00:00:5e:00:53:00 \ action mirred egress redirect dev eth0 Without this patch, push_eth refuses to add an ethernet header because the skb appears to already have a MAC header. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 12:44:17 -07:00
Guillaume Nault	aab1e898c2	gre: let mac_header point to outer header only when necessary Commit `e271c7b442` ("gre: do not keep the GRE header around in collect medata mode") did reset the mac_header for the collect_md case. Let's extend this behaviour to classical gre devices as well. ipgre_header_parse() seems to be the only case that requires mac_header to point to the outer header. We can detect this case accurately by checking ->header_ops. For all other cases, we can reset mac_header. This allows to push an Ethernet header to ipgre packets and redirect them to an Ethernet device: $ tc filter add dev gre0 ingress matchall \ action vlan push_eth dst_mac 00:00:5e:00:53:01 \ src_mac 00:00:5e:00:53:00 \ action mirred egress redirect dev eth0 Before this patch, this worked only for collect_md gre devices. Now this works for regular gre devices as well. Only the special case of gre devices that use ipgre_header_ops isn't supported. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-28 12:44:17 -07:00

1 2 3 4 5 ...

1017650 Commits