linux

Author	SHA1	Message	Date
Petr Machata	34639fa383	mlxsw: Enforce firmware version for Spectrum-3 In a fashion similar to the other Spectrum systems, enforce a specific firmware version for Spectrum-3 so that the driver and firmware are always in sync with regards to new features. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:14:13 -07:00
Petr Machata	69c8a8c543	mlxsw: Bump firmware version to XX.2007.1168 This version comes with fixes to the following problems, among others: - Wrong shaper configuration on Spectrum-1 - Bogus temperature reading on Spectrum-2 - Problems in setting egress buffer size after MTU change on Spectrum-2 Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:14:13 -07:00
Masanari Iida	13fdc4193c	mlxsw: spectrum_dcb: Fix a spelling typo in spectrum_dcb.c This patch fixes a spelling typo in spectrum_dcb.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:03:54 -07:00
Maor Gottlieb	608ca553c9	net/mlx5: Add support in query QP, CQ and MKEY segments Introduce new resource dump segments - PRM_QUERY_QP, PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource dump in PRM query format. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-06-23 17:26:10 +03:00
Maor Gottlieb	d63cc24933	net/mlx5: Export resource dump interface Export some of the resource dump API. mlx5_ib driver will use it in downstream patches. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-06-23 17:26:10 +03:00
Petr Machata	ce10d7d4ad	mlxsw: spectrum_acl: Support FLOW_ACTION_MANGLE for TCP, UDP ports Spectrum-2 supports an ACL action L4_PORT, which allows TCP and UDP source and destination port number change. Offload suitable mangles to this action. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	faad0525c0	mlxsw: core_acl_flex_actions: Add L4_PORT_ACTION Add fields related to L4_PORT_ACTION, which is used for changing of TCP and UDP port numbers. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	3cc9a15a0b	mlxsw: spectrum: Split handling of pedit mangle by chip type Certain ACL actions are only available on some Spectrum revisions. In particular, L4_PORT_ACTION is not available on Spectrum-1. Introduce a new ops struct intended to hold these differences, mlxsw_sp_rulei_ops. Prime it with a sole member, act_mangle_field, meant for handling of pedit mangles. Create two ops structures, one for Spectrum-1, the other for Spectrum-2 and above. Add callbacks for act_mangle_field and dispatch to the common handler. Invoke mlxsw_sp_rulei_ops.act_mangle_field from the field mangler instead of calling the common handler directly. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Ido Schimmel	f3fe412b0a	mlxsw: spectrum: Do not rely on machine endianness The second commit cited below performed a cast of 'u32 buffsize' to '(u16 )' when calling mlxsw_sp_port_headroom_8x_adjust(): mlxsw_sp_port_headroom_8x_adjust(mlxsw_sp_port, (u16 ) &buffsize); Colin noted that this will behave differently on big endian architectures compared to little endian architectures. Fix this by following Colin's suggestion and have the function accept and return 'u32' instead of passing the current size by reference. Fixes: `da382875c6` ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Fixes: `60833d54d5` ("mlxsw: spectrum: Adjust headroom buffers for 8x ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Colin Ian King <colin.king@canonical.com> Suggested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:29:51 -07:00
Jarod Wilson	bf3a058de5	mlx5: become aware of when running as a bonding slave I've been unable to get my hands on suitable supported hardware to date, but I believe this ought to be all that is needed to enable the mlx5 driver to also work with bonding active-backup crypto offload passthru. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:38:57 -07:00
Parav Pandit	330077d14d	net/mlx5: E-switch, Supporting setting devlink port function mac address Enable user to set mac address of the PCI PF and VF port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	1094795ce4	net/mlx5: Split mac address setting function for using state_lock Refactor mac address setting function to let caller hold the necessary state_lock mutex, so that subsequent patch and use this helper routine. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	f099fde16d	net/mlx5: E-switch, Support querying port function mac address Support querying mac address of the eswitch devlink port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	443bf36eb5	net/mlx5: Move helper to eswitch layer To use port number to port index conversion at eswitch level, move it to eswitch header. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	bd93975353	net/mlx5: E-switch, Introduce and use eswitch support check helper Introduce an helper routine to get esw from a devlink device and use it at eswitch callbacks and in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	fa997825eb	net/mlx5: Constify mac address pointer Since none of the functions need to modify the input mac address, constify them. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
wenxu	a1db217861	net: flow_offload: fix flow_indr_dev_unregister path If the representor is removed, then identify the indirect flow_blocks that need to be removed by the release callback and the port representor structure. To identify the port representor structure, a new indr.cb_priv field needs to be introduced. The flow_block also needs to be removed from the driver list from the cleanup path. Fixes: `1fac52da59` ("net: flow_offload: consolidate indirect flow_block infrastructure") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 20:12:58 -07:00
wenxu	66f1939a1b	flow_offload: use flow_indr_block_cb_alloc/remove function Prepare fix the bug in the next patch. use flow_indr_block_cb_alloc/remove function and remove the __flow_block_indr_binding. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 20:12:58 -07:00
Po Liu	4b61d3e8d3	net: qos offload add flow status with dropped count This patch adds a drop frames counter to tc flower offloading. Reporting h/w dropped frames is necessary for some actions. Some actions like police action and the coming introduced stream gate action would produce dropped frames which is necessary for user. Status update shows how many filtered packets increasing and how many dropped in those packets. v2: Changes - Update commit comments suggest by Jiri Pirko. Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 12:53:30 -07:00
Ido Schimmel	60833d54d5	mlxsw: spectrum: Adjust headroom buffers for 8x ports The port's headroom buffers are used to store packets while they traverse the device's pipeline and also to store packets that are egress mirrored. On Spectrum-3, ports with eight lanes use two headroom buffers between which the configured headroom size is split. In order to prevent packet loss, multiply the calculated headroom size by two for 8x ports. Fixes: `da382875c6` ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-16 13:46:27 -07:00
Linus Torvalds	96144c58ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix cfg80211 deadlock, from Johannes Berg. 2) RXRPC fails to send norigications, from David Howells. 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from Geliang Tang. 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu. 5) The ucc_geth driver needs __netdev_watchdog_up exported, from Valentin Longchamp. 6) Fix hashtable memory leak in dccp, from Wang Hai. 7) Fix how nexthops are marked as FDB nexthops, from David Ahern. 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni. 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien. 10) Fix link speed reporting in iavf driver, from Brett Creeley. 11) When a channel is used for XSK and then reused again later for XSK, we forget to clear out the relevant data structures in mlx5 which causes all kinds of problems. Fix from Maxim Mikityanskiy. 12) Fix memory leak in genetlink, from Cong Wang. 13) Disallow sockmap attachments to UDP sockets, it simply won't work. From Lorenz Bauer. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: ethernet: ti: ale: fix allmulti for nu type ale net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init net: atm: Remove the error message according to the atomic context bpf: Undo internal BPF_PROBE_MEM in BPF insns dump libbpf: Support pre-initializing .bss global variables tools/bpftool: Fix skeleton codegen bpf: Fix memlock accounting for sock_hash bpf: sockmap: Don't attach programs to UDP sockets bpf: tcp: Recv() should return 0 when the peer socket is closed ibmvnic: Flush existing work items before device removal genetlink: clean up family attributes allocations net: ipa: header pad field only valid for AP->modem endpoint net: ipa: program upper nibbles of sequencer type net: ipa: fix modem LAN RX endpoint id net: ipa: program metadata mask differently ionic: add pcie_print_link_status rxrpc: Fix race between incoming ACK parser and retransmitter net/mlx5: E-Switch, Fix some error pointer dereferences net/mlx5: Don't fail driver on failure to create debugfs net/mlx5e: CT: Fix ipv6 nat header rewrite actions ...	2020-06-13 16:27:13 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Dan Carpenter	09a9297574	net/mlx5: E-Switch, Fix some error pointer dereferences We can't leave "counter" set to an error pointer. Otherwise either it will lead to an error pointer dereference later in the function or it leads to an error pointer dereference when we call mlx5_fc_destroy(). Fixes: `07bab95026` ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:08 -07:00
Leon Romanovsky	17e73d47cd	net/mlx5: Don't fail driver on failure to create debugfs Clang warns: drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:6: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!priv->dbg_root) { ^~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1303:9: note: uninitialized use occurs here return err; ^~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:2: note: remove the 'if' if its condition is always false if (!priv->dbg_root) { ^~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1259:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. The check of returned value of debugfs_create_dir() is wrong because by the design debugfs failures should never fail the driver and the check itself was wrong too. The kernel compiled without CONFIG_DEBUG_FS will return ERR_PTR(-ENODEV) and not NULL as expected. Fixes: `11f3b84d70` ("net/mlx5: Split mdev init and pci init") Link: https://github.com/ClangBuiltLinux/linux/issues/1042 Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:06 -07:00
Oz Shlomo	0d156f2ded	net/mlx5e: CT: Fix ipv6 nat header rewrite actions Set the ipv6 word fields according to the hardware definitions. Fixes: `ac991b48d4` ("net/mlx5e: CT: Offload established flows") Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:04 -07:00
Parav Pandit	98f91c4576	net/mlx5: Fix devlink objects and devlink device unregister sequence Current below problems exists. 1. devlink device is registered by mlx5_load_one(). But it is not unregistered by mlx5_unload_one(). This is incorrect. 2. Above issue leads to, When mlx5 PCI device is removed, currently devlink device is unregistered before devlink ports are unregistered in below ladder diagram. remove_one() mlx5_devlink_unregister() [..] devlink_unregister() <- ports are still registered! mlx5_unload_one() mlx5_unregister_device() mlx5_remove_device() mlx5e_remove() mlx5e_devlink_port_unregister() devlink_port_unregister() 3. Condition checking for registering and unregister device are not symmetric either in these routines. Hence, fix the sequence by having load and unload routines symmetric and in right order. i.e. (a) register devlink device followed by registering devlink ports (b) unregister devlink ports followed by devlink device Do this based on boot and cleanup flags instead of different conditions. Fixes: `c6acd629ee` ("net/mlx5e: Add support for devlink-port in non-representors mode") Fixes: `f60f315d33` ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:02 -07:00
Parav Pandit	60904cd349	net/mlx5: Disable reload while removing the device While unregistration is in progress, user might be reloading the interface. This can race with unregistration in below flow which uses the resources which are getting disabled by reload flow. Hence, disable the devlink reloading first when removing the device. CPU0 CPU1 ---- ---- local_pci_remove() devlink_mutex remove_one() devlink_nl_cmd_reload() mlx5_unregister_device() devlink_reload() ops->reload_down() mlx5_unload_one() Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:00 -07:00
Aya Levin	5f1572e617	net/mlx5e: Fix ethtool hfunc configuration change Changing RX hash function requires rearranging of RQT internal indexes, the user isn't exposed to such changes and these changes do not affect the user configured indirection table. Rebuild RQ table on hfunc change. Fixes: `bdfc028de1` ("net/mlx5e: Fix ethtool RX hash func configuration change") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:57 -07:00
Maxim Mikityanskiy	36d45fb9d2	net/mlx5e: Fix repeated XSK usage on one channel After an XSK is closed, the relevant structures in the channel are not zeroed. If an XSK is opened the second time on the same channel without recreating channels, the stray values in the structures will lead to incorrect operation of queues, which causes CQE errors, and the new socket doesn't work at all. This patch fixes the issue by explicitly zeroing XSK-related structs in the channel on XSK close. Note that those structs are zeroed on channel creation, and usually a configuration change (XDP program is set) happens on XSK open, which leads to recreating channels, so typical XSK usecases don't suffer from this issue. However, if XSKs are opened and closed on the same channel without removing the XDP program, this bug reproduces. Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:55 -07:00
Denis Efremov	47a357de2b	net/mlx5: DR, Fix freeing in dr_create_rc_qp() Variable "in" in dr_create_rc_qp() is allocated with kvzalloc() and should be freed with kvfree(). Fixes: `297cccebdc` ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:53 -07:00
Shay Drory	b6e0b6bebe	net/mlx5: Fix fatal error handling during device load Currently, in case of fatal error during mlx5_load_one(), we cannot enter error state until mlx5_load_one() is finished, what can take several minutes until commands will get timeouts, because these commands can't be processed due to the fatal error. Fix it by setting dev->state as MLX5_DEVICE_STATE_INTERNAL_ERROR before requesting the lock. Fixes: `c1d4d2e92a` ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:51 -07:00
Shay Drory	42ea9f1b5c	net/mlx5: drain health workqueue in case of driver load error In case there is a work in the health WQ when we teardown the driver, in driver load error flow, the health work will try to read dev->iseg, which was already unmap in mlx5_pci_close(). Fix it by draining the health workqueue first thing in mlx5_pci_close(). Trace of the error: BUG: unable to handle page fault for address: ffffb5b141c18014 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 1fe95d067 P4D 1fe95d067 PUD 1fe95e067 PMD 1b7823067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 6755 Comm: kworker/u128:2 Not tainted 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? mlx5_health_try_recover+0x4d/0x270 [mlx5_core] mlx5_fw_fatal_reporter_recover+0x16/0x20 [mlx5_core] devlink_health_reporter_recover+0x1c/0x50 devlink_health_report+0xfb/0x240 mlx5_fw_fatal_reporter_err_work+0x65/0xd0 [mlx5_core] process_one_work+0x1fb/0x4e0 ? process_one_work+0x16b/0x4e0 worker_thread+0x4f/0x3d0 kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 ? kthread_cancel_delayed_work_sync+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache 8021q garp mrp stp llc ipmi_devintf ipmi_msghandler rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core mlx5_core sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mlxfw crypto_simd cryptd glue_helper input_leds hyperv_fb intel_rapl_perf joydev serio_raw pci_hyperv pci_hyperv_mini mac_hid hv_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 hv_utils hid_generic hv_storvsc ptp hid_hyperv hid hv_netvsc hyperv_keyboard pps_core scsi_transport_fc psmouse hv_vmbus i2c_piix4 floppy pata_acpi CR2: ffffb5b141c18014 ---[ end trace b12c5503157cad24 ]--- RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:38 in_atomic(): 0, irqs_disabled(): 1, pid: 6755, name: kworker/u128:2 INFO: lockdep is turned off. CPU: 3 PID: 6755 Comm: kworker/u128:2 Tainted: G D 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: dump_stack+0x63/0x88 ___might_sleep+0x10a/0x130 __might_sleep+0x4a/0x80 exit_signals+0x33/0x230 ? blocking_notifier_call_chain+0x16/0x20 do_exit+0xb1/0xc30 ? kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 Fixes: `52c368dc3d` ("net/mlx5: Move health and page alloc init to mdev_init") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:48 -07:00
Linus Torvalds	af7b480103	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: - Fix the build with certain Kconfig combinations for the Chelsio inline TLS device, from Rohit Maheshwar and Vinay Kumar Yadavi. - Fix leak in genetlink, from Cong Lang. - Fix out of bounds packet header accesses in seg6, from Ahmed Abdelsalam. - Two XDP fixes in the ENA driver, from Sameeh Jubran - Use rwsem in device rename instead of a seqcount because this code can sleep, from Ahmed S. Darwish. - Fix WoL regressions in r8169, from Heiner Kallweit. - Fix qed crashes in kdump mode, from Alok Prasad. - Fix the callbacks used for certain thermal zones in mlxsw, from Vadim Pasternak. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) net: dsa: lantiq_gswip: fix and improve the unsupported interface error mlxsw: core: Use different get_trend() callbacks for different thermal zones net: dp83869: Reset return variable if PHY strap is read rhashtable: Drop raw RCU deref in nested_table_free cxgb4: Use kfree() instead kvfree() where appropriate net: qed: fixes crash while running driver in kdump kernel vsock/vmci: make vmci_vsock_transport_cb() static net: ethtool: Fix comment mentioning typo in IS_ENABLED() net: phy: mscc: fix Serdes configuration in vsc8584_config_init net: mscc: Fix OF_MDIO config check net: marvell: Fix OF_MDIO config check net: dp83867: Fix OF_MDIO config check net: dp83869: Fix OF_MDIO config check net: ethernet: mvneta: fix MVNETA_SKB_HEADROOM alignment ethtool: linkinfo: remove an unnecessary NULL check net/xdp: use shift instead of 64 bit division crypto/chtls:Fix compile error when CONFIG_IPV6 is disabled inet_connection_sock: clear inet_num out of destroy helper yam: fix possible memory leak in yam_init_driver lan743x: Use correct MAC_CR configuration for 1 GBit speed ...	2020-06-07 17:27:45 -07:00
Vadim Pasternak	2dc2f76005	mlxsw: core: Use different get_trend() callbacks for different thermal zones The driver registers three different types of thermal zones: For the ASIC itself, for port modules and for gearboxes. Currently, all three types use the same get_trend() callback which does not work correctly for the ASIC thermal zone. The callback assumes that the device data is of type 'struct mlxsw_thermal_module', whereas for the ASIC thermal zone 'struct mlxsw_thermal' is passed as device data. Fix this by using one get_trend() callback for the ASIC thermal zone and another for the other two types. Fixes: `6f73862fab` ("mlxsw: core: Add the hottest thermal zone detection") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-07 16:59:43 -07:00
Linus Torvalds	242b233198	RDMA 5.8 merge window pull request A few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and the removal of FMR has been a recurring discussion theme for a long time. The usual smattering of features and bug fixes. - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY= =9zTe -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...	2020-06-05 14:05:57 -07:00
Max Gurtovoy	1f55b7ab90	RDMA/mlx4: Remove FMR support for memory registration HCA's that are driven by mlx4 driver support FRWR method to register memory. Remove the ancient and unsafe FMR method. Link: https://lore.kernel.org/r/8-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-06-02 20:32:54 -03:00
Lorenzo Bianconi	1b698fa5d8	xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org	2020-06-01 15:02:53 -07:00
Ido Schimmel	88e2774961	mlxsw: spectrum_trap: Register ACL control traps In a similar fashion to other control traps, register ACL control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	8110668ecd	mlxsw: spectrum_trap: Register layer 3 control traps In a similar fashion to layer 2 control traps, register layer 3 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	39c10350cf	mlxsw: spectrum_trap: Register layer 2 control traps In a similar fashion to other traps, register layer 2 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	45b1c87313	mlxsw: spectrum_trap: Factor out common Rx listener function We currently have an Rx listener function for exception traps that marks received skbs with 'offload_fwd_mark' and injects them to the kernel's Rx path. The marking is done because all these exceptions occur during L3 forwarding, after the packets were potentially flooded at L2. A subsequent patch will add support for control traps. Packets received via some of these control traps need different handling: 1. Packets might not need to be marked with 'offload_fwd_mark'. For example, if packet was trapped before L2 forwarding 2. Packets might not need to be injected to the kernel's Rx path. For example, sampled packets are reported to user space via the psample module Factor out a common Rx listener function that only reports trapped packets to devlink. Call it from mlxsw_sp_rx_no_mark_listener() and mlxsw_sp_rx_mark_listener() that will inject the packets to the kernel's Rx path, without and with the marking, respectively. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	1e292f5c11	mlxsw: spectrum_trap: Move layer 3 exceptions to exceptions trap group The layer 3 exceptions are still subject to the same trap policer, so nothing changes, but user space can choose to assign a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Pablo Neira Ayuso	9eabd18871	mlx5: update indirect block support Register ndo callback via flow_indr_dev_register() and flow_indr_dev_unregister(). No need for mlx5e_rep_indr_clean_block_privs() since flow_block_cb_free() already releases the internal mapping via ->release callback, which in this case is mlx5e_rep_indr_tc_block_unbind(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:41:50 -07:00
David S. Miller	1806c13dc2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-31 17:48:46 -07:00
Saeed Mahameed	eb24387183	net/mlx5e: Make mlx5e_dcbnl_ops static Fix sparse warning: drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c:988:29: error: symbol 'mlx5e_dcbnl_ops' was not declared. Should it be static? Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:23 -07:00
Saeed Mahameed	58ff18e12c	net/mlx5e: en_tc: Fix cast to restricted __be32 warning Fixes sparse warnings: warning: cast to restricted __be32 warning: restricted __be32 degrades to integer Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	c51323ee7a	net/mlx5e: en_tc: Fix incorrect type in initializer warnings Fix some trivial warnings of the type: warning: incorrect type in initializer (different base types) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	aee3e9c457	net/mlx5: Accel: fpga tls fix cast to __be64 and incorrect argument types tls handle and rcd_sn are actually big endian and not in host format. Fix that. Fix the following sparse warnings: drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:177:21: warning: cast to restricted __be64 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:178:52: warning: incorrect type in argument 2 (different base types) expected unsigned int [usertype] handle got restricted __be32 [usertype] handle Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	2553f421f4	net/mlx5: cmd: Fix memset with byte count warning Fix sparse warning: drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1949:15: warning: memset with byte count of 271720 mlx5_cmd_stats array is too big to be held inline in mlx5_cmd. Allocate it separately. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	9ff2e92c46	net/mlx5: DR: Fix incorrect type in return expression dr_ste_crc32_calc() calculates crc32 and should return it in HW format. It is being used to calculate a u32 index, hence we force the return value of u32 to avoid the sparse warning: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c:115:16: warning: incorrect type in return expression (different base types) expected unsigned int got restricted __be32 [usertype] Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	c2ba2c2287	net/mlx5: DR: Fix cast to restricted __be32 raw_ip actual type is __be32 and not u32. Fix that and get rid of the warning. drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c:906:31: warning: cast to restricted __be32 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	618f88c4c4	net/mlx5: DR: Fix incorrect type in argument HW spec objects should receive a void ptr to work on, the MLX5_SET/GET macro will know how to handle it. No need to provide explicit or wrong pointer type in this case. warning: incorrect type in argument 1 (different base types) expected unsigned long long const [usertype] sw_action got restricted __be64 [usertype] [assigned] sw_action Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Eli Cohen	f7e3ac424a	net/mlx5e: Use generic API to build MPLS label Make use of generic API mpls_entry_encode() to build mpls label and get rid of local function. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Arnd Bergmann	e1167e1611	net/mlx5: reduce stack usage in qp_read_field Moving the mlx5_ifc_query_qp_out_bits structure on the stack was a bit excessive and now causes the compiler to complain on 32-bit architectures: drivers/net/ethernet/mellanox/mlx5/core/debugfs.c: In function 'qp_read_field': drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:274:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Revert the previous patch partially to use dynamically allocation as the code did before. Unfortunately there is no good error handling in case the allocation fails. Fixes: `57a6c5e992` ("net/mlx5: Replace hand written QP context struct with automatic getters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Nathan Chancellor	2861904697	net/mlx5e: Don't use err uninitialized in mlx5e_attach_decap Clang warns: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:6: warning: variable 'err' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (IS_ERR(d->pkt_reformat)) { ^~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3718:6: note: uninitialized use occurs here if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:2: note: remove the 'if' if its condition is always true if (IS_ERR(d->pkt_reformat)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3670:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. It is not wrong, err is only ever initialized in if statements but this one is not in one. Initialize err to 0 to fix this. Fixes: `14e6b038af` ("net/mlx5e: Add support for hw decapsulation of MPLS over UDP") Link: https://github.com/ClangBuiltLinux/linux/issues/1037 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Saeed Mahameed	2950d1d64f	net/mlx5: Kconfig: Fix spelling typo "mdoe"->"mode" Fixes: `d956873f90` ("net/mlx5e: Introduce kconfig var for TC support") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>	2020-05-29 21:20:19 -07:00
Jesper Dangaard Brouer	56e2287b41	mlx5: fix xdp data_meta setup in mlx5e_fill_xdp_buff The helper function xdp_set_data_meta_invalid() must be called after setting xdp->data as it depends on it. The bug was introduced in the cited patch below, and cause the kernel to crash when using BPF helper bpf_xdp_adjust_head() on mlx5 driver. Fixes: `39d6443c8d` ("mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Reported-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:19 -07:00
Saeed Mahameed	971ae1ed03	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux net/mlx5: Add ability to read and write ECE options net/mlx5: Add support for RDMA TX FT headers modifying net/mlx5: Move iseg access helper routines close to mlx5_core driver net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits net/mlx5: Add support in forward to namespace {IB/net}/mlx5: Simplify don't trap code net/mlx5: Replace zero-length array with flexible-array Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 14:38:57 -07:00
Pablo Neira Ayuso	a683012a8e	net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta() The drivers reports EINVAL to userspace through netlink on invalid meta match. This is confusing since EINVAL is usually reserved for malformed netlink messages. Replace it by more meaningful codes. Fixes: `6d65bc64e2` ("net/mlx5e: Add mlx5e_flower_parse_meta support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:54 -07:00
Vlad Buslov	cb9a0641b5	net/mlx5e: Fix MLX5_TC_CT dependencies Change MLX5_TC_CT config dependencies to include MLX5_ESWITCH instead of MLX5_CORE_EN && NET_SWITCHDEV, which are already required by MLX5_ESWITCH. Without this change mlx5 fails to compile if user disables MLX5_ESWITCH without also manually disabling MLX5_TC_CT. Fixes: `4c3844d9e9` ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:54 -07:00
Tal Gilboa	ebeaf084ad	net/mlx5e: Properly set default values when disabling adaptive moderation Add a call to mlx5e_reset_rx/tx_moderation() when enabling/disabling adaptive moderation, in order to select the proper default values. In order to do so, we separate the logic of selecting the moderation values and setting moderion mode (CQE/EQE based). Fixes: `0088cbbc4b` ("net/mlx5e: Enable CQE based moderation on TX CQ") Fixes: `9908aa2929` ("net/mlx5e: CQE based moderation") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Aya Levin	b623603bbb	net/mlx5e: Fix arch depending casting issue in FEC Change type of active_fec to u32 to match the type expected by mlx5e_get_fec_mode. Copy active_fec and configured_fec values to unsigned long before preforming bitwise manipulations. Take the same approach when configuring FEC over 50G link modes: copy the policy into an unsigned long and only than preform bitwise operations. Fixes: `2132b71f78` ("net/mlx5e: Advertise globaly supported FEC modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Maor Dickman	20300aafa7	net/mlx5e: Remove warning "devices are not on same switch HW" On tunnel decap rule insertion, the indirect mechanism will attempt to offload the rule on all uplink representors which will trigger the "devices are not on same switch HW, can't offload forwarding" message for the uplink which isn't on the same switch HW as the VF representor. The above flow is valid and shouldn't cause warning message, fix by removing the warning and only report this flow using extack. Fixes: `321348475d` ("net/mlx5e: Fix allowed tc redirect merged eswitch offload cases") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Roi Dayan	0a2a6f498f	net/mlx5e: Fix stats update for matchall classifier It's bytes, packets, lastused. Fixes: `fcb64c0f56` ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:52 -07:00
Mark Bloch	8fc3e29be9	net/mlx5: Fix crash upon suspend/resume Currently a Linux system with the mlx5 NIC always crashes upon hibernation - suspend/resume. Add basic callbacks so the NIC could be suspended and resumed. Fixes: `9603b61de1` ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:52 -07:00
David S. Miller	1eba1110f0	mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ \| VM _________________ \| \| \|FAILOVER device \| \| \| \|________________\| \| \| \| \| \| ____\|_____ \| \| \| \| \| \| ______ \|___ ____\|_______ \| \| \| VF PT \| \|VIRTIO-NET \| \| \| \| device \| \| device \| \| \| \|_________\| \|___________\| \| \|___________\|______________\|________\| \| \| \| HYPERVISOR \| \| ____\|______ \| \| macvtap \| \| \|virtio BE \| \| \|___________\| \| \| \| ____\|_____ \| \|host VF \| \| \|_________\| \| \| _____\|______ _____\|_____ \| PT VF \| \| host VF \| \|representor\| \|representor\| \|___________\| \|___________\| \ / \ / \ / \ / _________________ \_______/ \| \| _______\|________ \| V-SWITCH \| \|VF representors \|________________\| (OVS) \| \| bond \| \|________________\| \|________________\| \| ________\|________ \| Uplink \| \| representor \| \|_________________\| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7PEFAACgkQSD+KveBX +j4Z5Af+NYwihYZpQYBBN00K7Wu10XZ65u5MbGSDmzpdN62w0kKfjsJ70bb9aiws h8LC7lspdMLRMMn9pWwFKshyF6RoSD9Ku3ZYhUbtj+hJLElAd9IwGt6pPKr8hPDd 9h+ZcBkacdhNwWKf7CKThic0c/0PLdVyzRysHxcQWKSMPCTdgiL5Z3PQHA0TM6J3 6Excs2z7kSuuyyxQ1cyWCaqSz4rqCrYyd8Ws4HOPhXgSbX14Q3mtMsBDayx2gHNW rdVbaNN6s2o0TxbrCwd0AaNP3UWcnjNqu1ohxgJiSe8y+MHMoB0OMoO+6vQJnwNI bzpZEioswV1zdgK3qNmXqbHOiHRSVQ== =xM1D -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ \| VM _________________ \| \| \|FAILOVER device \| \| \| \|________________\| \| \| \| \| \| ____\|_____ \| \| \| \| \| \| ______ \|___ ____\|_______ \| \| \| VF PT \| \|VIRTIO-NET \| \| \| \| device \| \| device \| \| \| \|_________\| \|___________\| \| \|___________\|______________\|________\| \| \| \| HYPERVISOR \| \| ____\|______ \| \| macvtap \| \| \|virtio BE \| \| \|___________\| \| \| \| ____\|_____ \| \|host VF \| \| \|_________\| \| \| _____\|______ _____\|_____ \| PT VF \| \| host VF \| \|representor\| \|representor\| \|___________\| \|___________\| \ / \ / \ / \ / _________________ \_______/ \| \| _______\|________ \| V-SWITCH \| \|VF representors \|________________\| (OVS) \| \| bond \| \|________________\| \|________________\| \| ________\|________ \| Uplink \| \| representor \| \|_________________\| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:04:12 -07:00
Alex Vesker	ed03a418ab	net/mlx5: DR, Split RX and TX lock for parallel insertion Change the locking flow to support RX and TX locks, splitting the single lock to two will allow inserting rules in parallel for RX and TX parts of the FDB. Locking the dr_domain will be done by locking the RX domain and the TX domain locks, this is mostly used for control operations on the dr_domain. When inserting rules for RX or TX the single nic_doamin RX or TX lock will be used. Splitting the lock is safe since RX and TX domains are logically separated from each other, shared objects such the send-ring and memory pool are protected by locks. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:52 -07:00
Alex Vesker	cedb28191f	net/mlx5: DR, Add a spinlock to protect the send ring Adding this lock will allow writing steering entries without locking the dr_domain and allow parallel insertion. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:51 -07:00
Eli Britstein	fca533041a	net/mlx5e: Optimize performance for IPv4/IPv6 ethertype The HW is optimized for IPv4/IPv6. For such cases, pending capability, avoid matching on ethertype, and use ip_version field instead. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:51 -07:00
Eli Britstein	4a5d5d7392	net/mlx5e: Helper function to set ethertype Set ethertype match in a helper function as a pre-step towards optimizing it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Parav Pandit	810cbb2554	net/mlx5: Add missing mutex destroy Add mutex destroy calls to balance with mutex_init() done in the init path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Vu Pham	9728366f53	net/mlx5e: Use change upper event to setup representors' bond_metadata Use change upper event to detect slave representor from enslaving/unslaving to/from lag device. On enslaving event, call mlx5_enslave_rep() API to create, add this slave representor shadow entry to the slaves list of bond_metadata structure representing master lag device and use its metadata to setup ingress acl metadata header. On unslaving event, resetting the vport of unslaved representor to use its default ingress/egress acls and rx rules with its default_metadata. The last slave will free the shared bond_metadata and its unique metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Vu Pham	88e96e533c	net/mlx5e: Slave representors sharing unique metadata for match Bonded slave representors' vports must share a unique metadata for match. On enslaving event of slave representor to lag device, allocate new unique "bond_metadata" for match if this is the first slave. The subsequent enslaved representors will share the same unique "bond_metadata". On unslaving event of slave representor, reset the slave representor's vport to use its own default metadata. Replace ingress acl and rx rules of the slave representors' vports using new vport->bond_metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Vu Pham	133dcfc577	net/mlx5: E-Switch, Alloc and free unique metadata for match Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Vu Pham	d97555e145	net/mlx5e: Add bond_metadata and its slave entries Adding bond_metadata and its slave entries to represent a lag device and its slaves VF representors. Bond_metadata structure includes a unique metadata shared by slaves VF respresentors, and a list of slaves representors slave entries. On enslaving event, create a bond_metadata structure representing the upper lag device of this slave representor if it has not been created yet. Create and add entry for the slave representor to the slaves list. On unslaving event, free the slave entry of the slave representor. On the last unslave event, free the bond_metadata structure and its resources. Introduce APIs to create and remove bond_metadata and its resources, enslave and unslave VF representor slave entries. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Or Gerlitz	d34eb2fcd0	net/mlx5e: Offload flow rules to active lower representor When a bond device is created over one or more non uplink representors, and when a flow rule is offloaded to such bond device, offload a rule to the active lower device. Assuming that this is active-backup lag, the rules should be offloaded to the active lower device which is the representor of the direct path (not the failover). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:48 -07:00
Vu Pham	553f932838	net/mlx5e: Support tc block sharing for representors Currently offloading a rule over a tc block shared by multiple representors fails because an e-switch global hashtable to keep the mapping from tc cookies to mlx5e flow instances is used, and tc block sharing offloads the same rule/cookie multiple times, each time for different representor sharing the tc block. Changing the implementation and behavior by acknowledging and returning success if the same rule/cookie is offloaded again to other slave representor sharing the tc block by setting, checking and comparing the netdev that added the rule first. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:48 -07:00
Or Gerlitz	7e51891a23	net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule Register a notifier block to handle netdev events for bond device of non-uplink representors to support eswitch vports bonding. When a non-uplink representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all slave netdevs (active + standby) to forward to this representor's vport. Use change lower netdev event to do this. Use change upper event to detect slave representor unslaved from lag device to delete its vport egress acl forward rule if any. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	bf773dc0e6	net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule By default, e-switch vport's egress acl just forward packets to its counterpart NIC vport using existing egress acl table. During port failover in bonding scenario where two VFs representors are bonded, the egress acl forward-to-vport rule will be added to the existing egress acl table of e-switch vport of passive/inactive slave representor to forward packets to other NIC vport ie. the active slave representor's NIC vport to handle egress "failover" traffic. Enable egress acl and have APIs to create and destroy egress acl forward-to-vport rule and group. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	07bab95026	net/mlx5: E-Switch, Refactor eswitch ingress acl codes Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	ea651a86d4	net/mlx5: E-Switch, Refactor eswitch egress acl codes Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:46 -07:00
Jason Gunthorpe	e4fdf7625b	Merge branch 'mellanox/mlx5-next' into rdma.git for/next From the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in following patches * branch 'mellanox/mlx5-next': net/mlx5: Add ability to read and write ECE options net/mlx5: Add support for RDMA TX FT headers modifying net/mlx5: Move iseg access helper routines close to mlx5_core driver net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:01:17 -03:00
Colin Ian King	7cf4eda481	mlxsw: spectrum_router: remove redundant initialization of pointer br_dev The pointer br_dev is being initialized with a value that is never read and is being updated with a new value later on. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:23:27 -07:00
Ido Schimmel	10d3757fcb	mlxsw: spectrum_router: Allow programming link-local prefix routes The device has a trap for IPv6 packets that need be routed and have a unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw to ignore link-local routes, as the packets will be trapped to the CPU in any case. However, since link-local routes are not programmed, it is possible for routed packets to hit the default route which might also be programmed to trap packets. This means that packets with a link-local destination IP might be trapped for the wrong reason. To overcome this, allow programming link-local prefix routes (usually one fe80::/64 per-table), so that the packets will be forwarded until reaching the link-local trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	9785b92b44	mlxsw: spectrum: Add packet traps for BFD packets Bidirectional Forwarding Detection (BFD) provides "low-overhead, short-duration detection of failures in the path between adjacent forwarding engines" (RFC 5880). This is accomplished by exchanging BFD packets between the two forwarding engines. Up until now these packets were trapped via the general local delivery (i.e., IP2ME) trap which also traps a lot of other packets that are not as time-sensitive as BFD packets. Expose dedicated traps for BFD packets so that user space could configure a dedicated policer for them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	dacc4e3acf	mlxsw: spectrum: Treat IPv6 link-local SIP as an exception IPv6 packets that need to be forwarded and have a link-local source IP are dropped by the kernel and an ICMPv6 "Destination unreachable" is sent to the sending host. As such, change the trap group of such packets so that they do not interfere with IPv6 management packets. In the future this trap will be exposed as an exception via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	1260e083d4	mlxsw: spectrum: Share one group for all locally delivered packets Routed IP packets with the Router Alert option need to be trapped to the CPU as they might need to be locally delivered to raw sockets with the IP_ROUTER_ALERT / IPV6_ROUTER_ALERT socket option. Move them to the same group with other packets that might need to be trapped following route lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	500769bebe	mlxsw: reg: Move all trap groups under the same enum After the previous patch the split is no longer necessary and all the trap groups can be moved under the same enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	b87bde80da	mlxsw: spectrum_trap: Do not hard code "thin" policer identifier As explained in commit `e612523041` ("mlxsw: spectrum_trap: Introduce dummy group with thin policer"), the purpose of the "thin" policer is to pass as less packets as possible to the CPU. The identifier of this policer is currently set according to the maximum number of used trap groups, but this is fragile: On Spectrum-1 the maximum number of policers is less than the maximum number of trap groups, which might result in an invalid policer identifier in case the number of used trap groups grows beyond the policer limit. Solve this by dynamically allocating the policer identifier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	03cb0ce0dd	mlxsw: switchx2: Move SwitchX-2 trap groups out of main enum The number of Spectrum trap groups is not infinite, but two identifiers are occupied by SwitchX-2 specific trap groups. Free these identifiers by moving them out of the main enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	025b7de7f4	mlxsw: spectrum: Reduce priority of locally delivered packets To align with recent recommended values. Will be configurable by future patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	1e3cd58942	mlxsw: spectrum: Use same trap group for local routes and link-local destination Packets with an IPv6 link-local destination (i.e., fe80::/10) should not be forwarded and are therefore trapped to the CPU for local delivery. Since these packets are trapped for the same logical reason as packets hitting local routes, associate both traps with the same group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	d322309d72	mlxsw: spectrum: Use separate trap group for FID miss When a packet enters the device it is classified to a filtering identifier (FID) based on the ingress port and VLAN. The FID miss trap is used to trap packets for which a FID could not be found. In mlxsw this trap should only be triggered when a port is enslaved to an OVS bridge and a matching ACL rule could not be found, so as to trigger learning. These packets are therefore completely unrelated to packets hitting local routes and should be in a different group. Move them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	954eef2677	mlxsw: spectrum: Use same trap group for various IPv6 packets Group these various IPv6 packets (e.g., router solicitations, router advertisement) together and subject them to the same policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	412df3d1bb	mlxsw: spectrum: Rename IPv6 ND trap group The IPv6 Neighbour Discovery (ND) group will be used for various IPv6 packets, not all of which fall under the definition of ND, so rename it to "IPV6" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	761bc42fbe	mlxsw: spectrum: Use same switch case for identical groups Trap groups that use the same policer settings can share the same switch case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	3c2d8a046a	mlxsw: spectrum: Use dedicated trap group for ACL trap Packets that are trapped via tc's trap action are currently subject to the same policer as packets hitting local routes. The latter are critical to the correct functioning of the control plane, while the former are mainly used for traffic inspection. Split the ACL trap to a separate group with its own policer. Use a higher priority for these traps than for traps using mirror action (e.g., ARP, IGMP). Otherwise, packets matching both traps will not be forwarded in hardware (because of trap action) and also not forwarded in software because they will be marked with 'offload_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Guillaume Nault	58cff782cc	flow_dissector: Parse multiple MPLS Label Stack Entries The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:22:58 -07:00
Ido Schimmel	154388e112	mlxsw: spectrum: Fix spelling mistake in trap's name Fix incorrect spelling of "advertisement". Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	ce3c3bf0bf	mlxsw: spectrum: Use dedicated trap group for sampled packets The rate with which packets are sampled is determined by user space, so there is no need to associate such packets with a policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	b33f5d9fb7	mlxsw: spectrum: Use same trap group for IPv6 ND and ARP packets Both packet types are needed for the same reason (neighbour discovery), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	32446438cc	mlxsw: spectrum: Rename ARP trap group The ARP trap group will be used for IPv6 ND traps in the next patch, so rename it to "NEIGH_DISCOVERY" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	d88f8cc158	mlxsw: spectrum_trap: Remove unnecessary field Now that traffic class (TC) and priority are set to the same value, there is no need to store both. Remove the first. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	5047d819f5	mlxsw: spectrum: Align TC and trap priority The traffic class (TC) attribute of packet traps determines through which TC a packet trap will be scheduled through the CPU port. The priority attribute determines which trap will be triggered in case several packet traps match a packet. We try to configure these attributes to the same value for all packet traps as there is little reason not to. Some packet traps did not use the same value, so rectify that now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	e0d848477a	mlxsw: spectrum_buffers: Assign non-zero quotas to TC 0 of the CPU port As explained in commit `9ffcc3725f` ("mlxsw: spectrum: Allow packets to be trapped from any PG"), incoming packets can be admitted to the shared buffer and forwarded / trapped, if: (Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres) \|\| (Ingress{Port}.Usage < Min \|\| Ingress{Port,PG} < Min \|\| Egress{Port}.Usage < Min \|\| Egress{Port,TC}.Usage < Min) Trapped packets are scheduled to transmission through the CPU port. Currently, the minimum and maximum quotas of traffic class (TC) 0 of the CPU port are 0, which means it is not usable. Assign non-zero quotas to TC 0 of the CPU port, so that it could be utilized by subsequent patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	938e6d0b76	mlxsw: spectrum: Change default rate and priority of DHCP packets Reduce the default acceptable rate of DHCP packets to 128 packets per second and reduce their priority. This is reasonable given the Spectrum ASICs are limited to 128 ports at the moment. These are only the default values. Users will be able to modify them via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	0ecb947412	mlxsw: spectrum: Trap IPv4 DHCP packets in router Currently, IPv4 DHCP packets are trapped during L2 forwarding, which means that packets might be trapped unnecessarily. Instead, only trap the DHCP packets that reach the router. Either because they were flooded to the router port or forwarded to it by the FDB. This is consistent with the corresponding IPv6 trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	99129069b7	mlxsw: spectrum: Use same trap group for MLD and IGMP packets Both packet types are needed for the same reason (multicast snooping), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	debb7af686	mlxsw: spectrum: Rename IGMP trap group The IGMP trap group will be used for MLD traps in the next patch, so rename it to "MC_SNOOPING" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
David S. Miller	13209a8f73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 13:47:27 -07:00
David S. Miller	e3181e9a72	mlx5-fixes-2020-05-22 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7IbksACgkQSD+KveBX +j5T8Af/XT6b23VlSn2Km4tg8WQNDRJLdq1s6fTS5SGcyc0awxfH07cvYvJ26kKW kmdDNijkVbd0ma2UxHiiD3vmE8Vs85gZ6BDNyl485x/cH3zFzAm54R5fZdnK5JgN YNgdFP0MOwPtAdDtxLH+r8aOyNKncIOmCZrMNnxVgI+IytG1L5QLnS6GeQy2zyIx 9F/9sihta2z567IstGu2wvmgviSHVk/zV9yqn/orD9tV6oFvvrBQMlEt8l27b1tA 4bajbHIyc1WmfQ+wg56eXATdbqCQ2YYfMjhchiCfFv5DhnMnPi5bV0PNR9Rq0CYw 05xpF16/85uvDbTizsgGNZ1Pb1nGsQ== =oFWF -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-05-22 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.13 ('net/mlx5: Add command entry handling completion') For -stable v5.2 ('net/mlx5: Fix error flow in case of function_setup failure') ('net/mlx5: Fix memory leak in mlx5_events_init') For -stable v5.3 ('net/mlx5e: Update netdev txq on completions during closure') ('net/mlx5e: kTLS, Destroy key object after destroying the TIS') ('net/mlx5e: Fix inner tirs handling') For -stable v5.6 ('net/mlx5: Fix cleaning unmanaged flow tables') ('net/mlx5: Fix a race when moving command interface to events mode') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:39:45 -07:00
David S. Miller	46c54f9500	mlx5-updates-2020-05-22 This series includes two updates and one cleanup patch 1) Tang Bim, clean-up with IS_ERR() usage 2) Vlad introduces a new mlx5 kconfig flag for TC support This is required due to the high volume of current and upcoming development in the eswitch and representors areas where some of the feature are TC based such as the downstream patches of MPLSoUDP and the following representor bonding support for VF live migration and uplink representor dynamic loading. For this Vlad kept TC specific code in tc.c and rep/tc.c and organized non TC code in representors specific files. 3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7IZFEACgkQSD+KveBX +j7IEQf/RFv633bWTlL63fEJjViRv1rjfkbyaXrGVL3gzr/Er01DeAPR22CNOlC3 bu1jHLKqVn0Mg0g5g2B4/H/7JoFbMBRTy4MXpM5VrQCIqwMuXG4zhWuoUj7ncQ5w kXHAU6DUuZRn8/x1JLQOHDRTzKhav7ldT+nvvoKEMrad/DEMGz+bq67xh4l8nfi+ ktSFAO0UFi9ysb25CMfdqIqAL0J5nAJ7DNhw5x7IvtwUxNxate7HtBaBhBgZ9NWv jYf8R3p+7JdgvVW18pZhmjbaBqaApXcZrC7rI07PR6rCOAHfToX6miR8gUtpIEno itQkzYt9UF2dgNwMmxoJLqnUNiy/Cg== =wkSR -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-22 This series includes two updates and one cleanup patch 1) Tang Bim, clean-up with IS_ERR() usage 2) Vlad introduces a new mlx5 kconfig flag for TC support This is required due to the high volume of current and upcoming development in the eswitch and representors areas where some of the feature are TC based such as the downstream patches of MPLSoUDP and the following representor bonding support for VF live migration and uplink representor dynamic loading. For this Vlad kept TC specific code in tc.c and rep/tc.c and organized non TC code in representors specific files. 3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:37:00 -07:00
Qiushi Wu	febfd9d3c7	net/mlx4_core: fix a memory leak bug. In function mlx4_opreq_action(), pointer "mailbox" is not released, when mlx4_cmd_box() return and error, causing a memory leak bug. Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can free this pointer. Fixes: `fe6f700d6c` ("net/mlx4_core: Respond to operation request by firmware") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:34:37 -07:00
David S. Miller	a152b85984	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your net-next tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 18:30:34 -07:00
Shay Drory	4f7400d5cb	net/mlx5: Fix error flow in case of function_setup failure Currently, if an error occurred during mlx5_function_setup(), we keep dev->state as DEVICE_STATE_UP. Fixing it by adding a goto label. Fixes: `e161105e58` ("net/mlx5: Function setup/teardown procedures") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:58 -07:00
Roi Dayan	d37bd5e81e	net/mlx5e: CT: Correctly get flow rule The correct way is to us the flow_cls_offload_flow_rule() wrapper instead of f->rule directly. Fixes: `4c3844d9e9` ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:56 -07:00
Moshe Shemesh	5e911e2c06	net/mlx5e: Update netdev txq on completions during closure On sq closure when we free its descriptors, we should also update netdev txq on completions which would not arrive. Otherwise if we reopen sqs and attach them back, for example on fw fatal recovery flow, we may get tx timeout. Fixes: `29429f3300` ("net/mlx5e: Timeout if SQ doesn't flush during close") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:54 -07:00
Roi Dayan	9ca415399d	net/mlx5: Annotate mutex destroy for root ns Invoke mutex_destroy() to catch any errors. Fixes: `2cc43b494a` ("net/mlx5_core: Managing root flow table") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:52 -07:00
Roi Dayan	6eb7a268a9	net/mlx5: Don't maintain a case of del_sw_func being null Add del_sw_func cb for root ns. Now there is no need to maintain a case of del_sw_func being null when freeing the node. Fixes: `2cc43b494a` ("net/mlx5_core: Managing root flow table") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:50 -07:00
Roi Dayan	aee37f3d94	net/mlx5: Fix cleaning unmanaged flow tables Unmanaged flow tables doesn't have a parent and tree_put_node() assume there is always a parent if cleaning is needed. fix that. Fixes: `5281a0c909` ("net/mlx5: fs_core: Introduce unmanaged flow tables") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:48 -07:00
Moshe Shemesh	df14ad1ecc	net/mlx5: Fix memory leak in mlx5_events_init Fix memory leak in mlx5_events_init(), in case create_single_thread_workqueue() fails, events struct should be freed. Fixes: `5d3c537f90` ("net/mlx5: Handle event of power detection in the PCIE slot") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:46 -07:00
Roi Dayan	a16b8e0dcf	net/mlx5e: Fix inner tirs handling In the cited commit inner_tirs argument was added to create and destroy inner tirs, and no indication was added to mlx5e_modify_tirs_hash() function. In order to have a consistent handling, use inner_indir_tir[0].tirn in tirs destroy/modify function as an indication to whether inner tirs are created. Inner tirs are not created for representors and before this commit, a call to mlx5e_modify_tirs_hash() was sending HW commands to modify non-existent inner tirs. Fixes: `46dc933cee` ("net/mlx5e: Provide explicit directive if to create inner indirect tirs") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:44 -07:00
Tariq Toukan	16736e11f4	net/mlx5e: kTLS, Destroy key object after destroying the TIS The TLS TIS object contains the dek/key ID. By destroying the key first, the TIS would contain an invalid non-existing key ID. Reverse the destroy order, this also acheives the desired assymetry between the destroy and the create flows. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:42 -07:00
Maor Dickman	321348475d	net/mlx5e: Fix allowed tc redirect merged eswitch offload cases After changing the parent_id to be the same for both NICs of same The cited commit wrongly allow offload of tc redirect flows from VF to uplink and vice versa when devcies are on different eswitch, these cases aren't supported by HW. Disallow the above offloads when devcies are on different eswitch and VF LAG is not configured. Fixes: `f6dc1264f1` ("net/mlx5e: Disallow tc redirect offload cases we don't support") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:40 -07:00
Eran Ben Elisha	f7936ddd35	net/mlx5: Avoid processing commands before cmdif is ready When driver is reloading during recovery flow, it can't get new commands till command interface is up again. Otherwise we may get to null pointer trying to access non initialized command structures. Add cmdif state to avoid processing commands while cmdif is not ready. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:38 -07:00
Eran Ben Elisha	d43b7007db	net/mlx5: Fix a race when moving command interface to events mode After driver creates (via FW command) an EQ for commands, the driver will be informed on new commands completion by EQE. However, due to a race in driver's internal command mode metadata update, some new commands will still be miss-handled by driver as if we are in polling mode. Such commands can get two non forced completion, leading to already freed command entry access. CREATE_EQ command, that maps EQ to the command queue must be posted to the command queue while it is empty and no other command should be posted. Add SW mechanism that once the CREATE_EQ command is about to be executed, all other commands will return error without being sent to the FW. Allow sending other commands only after successfully changing the driver's internal command mode metadata. We can safely return error to all other commands while creating the command EQ, as all other commands might be sent from the user/application during driver load. Application can rerun them later after driver's load was finished. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:36 -07:00
Moshe Shemesh	17d00e839d	net/mlx5: Add command entry handling completion When FW response to commands is very slow and all command entries in use are waiting for completion we can have a race where commands can get timeout before they get out of the queue and handled. Timeout completion on uninitialized command will cause releasing command's buffers before accessing it for initialization and then we will get NULL pointer exception while trying access it. It may also cause releasing buffers of another command since we may have timeout completion before even allocating entry index for this command. Add entry handling completion to avoid this race. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:34 -07:00
Eli Cohen	582234b465	net/mlx5e: Support pedit on mpls over UDP decap Allow to modify ethernet headers while decapsulating mpls over UDP packets. This is implemented using the same reformat object used for decapsulation. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:23 -07:00
Eli Cohen	14e6b038af	net/mlx5e: Add support for hw decapsulation of MPLS over UDP MPLS over UDP is supported in hardware by using a packet reformat object with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the outer L3, L4 and MPLS headers, and allows for setting the L2 headers of the resulting decapsulated packet. For the hardware to operate correctly, the configuration of the firmware must have FLEX_PARSER_PROFILE_ENABLE = 1. Example tc rule: tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \ 6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \ action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \ mirred egress redirect dev enp59s0f0_0 We use pedit to set the correct destination MAC. For MPLS over UDP decapsulation to take place, the driver logic requires the following: 1. flower filter added on bareudp device. 2. action mpls pop 3. zero or more pedit munge actions 4. one redirect action Current implementation supports only IPv4 and no VLAN. tc filter show output looks like this: filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 enc_src_ip 8.8.8.24 enc_dst_port 6635 in_hw in_hw_count 1 action order 1: mpls pop protocol ip pipe index 2 ref 1 bind 1 action order 2: pedit action pipe keys 2 index 1 ref 1 bind 1 key #0 at eth+0: val 00112233 mask 00000000 key #1 at eth+4: val 44210000 mask 0000ffff action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen index 2 ref 1 bind 1 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:21 -07:00
Eli Cohen	72046a91d1	net/mlx5e: Allow to match on mpls parameters Support matching on MPLS over UDP parameters using misc2 section of match parameters. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:19 -07:00
Eli Cohen	f828ca6a2f	net/mlx5e: Add support for hw encapsulation of MPLS over UDP MPLS over UDP is supported by adding a rule on a representor net device which does tunnel_key set, push mpls and forward to a baredup device. At the hardware level we use a packet_reformat_context object to do the encapsulation of the packet. The resulting packet looks as follows (left side transmitted first): outer L2 \| outer IP \| UDP \| MPLS \| inner L3 and data \| Example usage: tc filter add dev $rep0 protocol ip prio 1 root flower skip_sw \ action tunnel_key set src_ip 8.8.8.21 dst_ip 8.8.8.24 id 555 \ dst_port 6635 tos 4 ttl 6 csum action mpls push protocol 0x8847 \ label 555 tc 3 action mirred egress redirect dev bareudp0 This is how the filter is shown with tc filter show: tc filter show dev enp59s0f0_0 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 skip_sw in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 8.8.8.21 dst_ip 8.8.8.24 key_id 555 dst_port 6635 csum tos 0x4 ttl 6 pipe index 1 ref 1 bind 1 action order 2: mpls push protocol mpls_uc label 555 tc 3 ttl 255 pipe index 1 ref 1 bind 1 action order 3: mirred (Egress Redirect to device bareudp0) stolen index 1 ref 1 bind 1 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:18 -07:00
Vlad Buslov	d956873f90	net/mlx5e: Introduce kconfig var for TC support In order to improve code maintainability and readability, introduce new CONFIG_MLX5_CLS_ACT kconfig variable to control compilation of TC hardware offloads implementation. This allows distinguishing between features that require TC support (MPLSoUDP, etc.) and features that just rely on representor functionality (rep_bond for live migration, etc.). Modify rep_tc.h, rep_neigh.h, en_tc.h and chains.h files to provide stubs for functions that are called from generic code. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:14 -07:00
Vlad Buslov	e2394a61d2	net/mlx5e: Move TC-specific code from en_main.c to en_tc.c As a preparation for introducing new kconfig option that controls compilation of all TC offloads code in mlx5, extract TC-specific code from en_main.c to en_tc.c. This allows easily compiling out the code by only including new source in make file when corresponding kconfig is enabled instead of adding multiple ifdef blocks to en_main. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:12 -07:00
Vlad Buslov	549c243e4e	net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c As a preparation for introducing new kconfig option that controls compilation of all TC offloads code in mlx5, extract neigh-specific code from en_rep.c to standalone file. This allows easily compiling out the code by only including new source in make file when corresponding kconfig is enabled instead of adding multiple ifdef blocks to en_rep. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:10 -07:00
Vlad Buslov	768c3667e6	net/mlx5e: Extract TC-specific code from en_rep.c to rep/tc.c As a preparation for introducing new kconfig option that controls compilation of all TC offloads code in mlx5, extract TC-specific code from en_rep.c to standalone file. This allows easily compiling out the code by only including new source in make file when corresponding kconfig is enabled instead of adding multiple ifdef blocks to en_rep. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:08 -07:00
Tang Bin	2639324a8f	net/mlx5e: Use IS_ERR() to check and simplify code Use IS_ERR() and PTR_ERR() instead of PTR_ERR_OR_ZERO() to simplify code, avoid redundant judgements. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 16:46:07 -07:00
Jiri Pirko	4340f42f20	mlxsw: spectrum: Fix use-after-free of split/unsplit/type_set in case reload fails In case of reload fail, the mlxsw_sp->ports contains a pointer to a freed memory (either by reload_down() or reload_up() error path). Fix this by initializing the pointer to NULL and checking it before dereferencing in split/unsplit/type_set callpaths. Fixes: `24cc68ad6c` ("mlxsw: core: Add support for reload") Reported-by: Danielle Ratson <danieller@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 16:08:14 -07:00
Edward Cree	060b6381ef	net: flow_offload: simplify hw stats check handling Make FLOW_ACTION_HW_STATS_DONT_CARE be all bits, rather than none, so that drivers and __flow_action_hw_stats_check can use simple bitwise checks. Pre-fill all actions with DONT_CARE in flow_rule_alloc(), rather than relying on implicit semantics of zero from kzalloc, so that callers which don't configure action stats themselves (i.e. netfilter) get the correct behaviour by default. Only the kernel's internal API semantics change; the TC uAPI is unaffected. v4: move DONT_CARE setting to flow_rule_alloc() for robustness and simplicity. v3: set DONT_CARE in nft and ct offload. v2: rebased on net-next, removed RFC tags. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 15:52:08 -07:00
Björn Töpel	39d6443c8d	mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4 Mpps). rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim) v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim) v3->v4: Remove unused variable num_xsk_frames. (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com	2020-05-21 17:31:27 -07:00
Magnus Karlsson	a71506a4fd	xsk: Move driver interface to xdp_sock_drv.h Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub) Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com	2020-05-21 17:31:26 -07:00
Jason Gunthorpe	eafd47fc20	Linux 5.7-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7BzV8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGg8EH/A2pXMTxtc96RI4S sttEsUQqbakFS0Z/2tQPpMGr/qW2e5eHgsTX/a3SiUeZiIXk6f4lMFkMuctzBf7p X77cNEDwGOEdbtCXTsMcmKSde7sP2zCXsPB8xTWLyE6rnaFRgikwwkeqgkIKhp1h bvOQV0t9HNGvxGAM0iZeOvQAvFl4vd7nS123/MYbir9cugfQUSJRueQ4BiCiJqVE 6cNA7/vFzDJuFGszzIrJ7HXn/IdQMMWHkvTDjgBw0GZw1mDbGFbfbZwOeTz1ojCt smUQ4tIFxBa/VA5zx7dOy2P2keHbSVf4VLkZRPcceT7OqVS65ETmFDp+qt5NdWM5 vZ8+7/0= =CyYH -----END PGP SIGNATURE----- Merge tag 'v5.7-rc6' into rdma.git for-next Linux 5.7-rc6 Conflict in drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c resolved by deleting dr_cq_event, matching how netdev resolved it. Required for dependencies in the following patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-21 17:08:27 -03:00
Michael Guralnik	ecf814e0e1	net/mlx5: Add support for RDMA TX FT headers modifying Support adding header modifying actions to the RDMA TX flow table. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-18 09:21:46 -07:00
Parav Pandit	555af0c3fa	net/mlx5: Move iseg access helper routines close to mlx5_core driver Only mlx5_core driver handles fw initialization check and command interface revision check. Hence move them inside the mlx5_core driver where it is used. This avoid exposing these helpers to all mlx5 drivers. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-18 09:21:46 -07:00
Raed Salem	356d411c26	net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits Remove the "metadata_reg_b" field and all uses of this field in code to match the device specification. As this field is not in use in SW steering it is safe to remove it. Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-18 09:21:46 -07:00
Ido Schimmel	200b7cca0b	mlxsw: spectrum_trap: Store all trap data in one array Each trap registered with devlink is mapped to one or more Rx listeners. These listeners allow the switch driver (e.g., mlxsw_spectrum) to register a function that is called when a packet is received (trapped) for a specific reason. Currently, three arrays are used to describe the mapping between the logical devlink traps and the Rx listeners. Instead, get rid of these arrays and store all the information in one array that is easier to validate and extend with more per-trap information. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 16:42:31 -07:00
Ido Schimmel	b14a40dbde	mlxsw: spectrum_trap: Store all trap group data in one array Use one array to store all the information about all the trap groups instead of hard coding it in code. This will be used in future patches to disable certain functionality (e.g., policer binding) on a trap group basis. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 16:42:31 -07:00
Ido Schimmel	cc678f4dbc	mlxsw: spectrum_trap: Store all trap policer data in one array Instead of maintaining an array of policers and a linked list, only maintain an array. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 16:42:31 -07:00
Ido Schimmel	85d4ec5925	mlxsw: spectrum_trap: Move struct definition out of header file 'struct mlxsw_sp_trap_policer_item' is only used in one file, so move it there. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-16 16:42:31 -07:00
Tariq Toukan	3f3ab178c7	net/mlx5e: Take DCBNL-related definitions into dedicated files Take DCBNL-related definitions out of the common en.h header, Use a dedicated header file for exposing them. Some need not to be exposed, use them locally in the .c file. Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the generic control flows. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:36 -07:00
Maxim Mikityanskiy	5ffb4d858b	net/mlx5e: Calculate SQ stop room in a robust way Currently, different formulas are used to estimate the space that may be taken by WQEs in the SQ during a single packet transmit. This space is called stop room, and it's checked in the end of packet transmit to find out if the next packet could overflow the SQ. If it could, the driver tells the kernel to stop sending next packets. Many factors affect the stop room: 1. Padding with NOPs to avoid WQEs spanning over page boundaries. 2. Enabled and disabled offloads (TLS, upcoming MPWQE). 3. The maximum size of a WQE. The padding is performed before every WQE if it doesn't fit the current page. The current formula assumes that only one padding will be required per packet, and it doesn't take into account that the WQEs posted during the transmission of a single packet might exceed the page size in very rare circumstances. For example, to hit this condition with 4096-byte pages, TLS offload will have to interrupt an almost-full MPWQE session, be in the resync flow and try to transmit a near to maximum amount of data. To avoid SQ overflows in such rare cases after MPWQE is added, this patch introduces a more robust formula to estimate the stop room. The new formula uses the fact that a WQE of size X will not require more than X-1 WQEBBs of padding. More exact estimations are possible, but they result in much more complex and error-prone code for little gain. Before this patch, the TLS stop room included space for both INNOVA and ConnectX TLS offloads that couldn't run at the same time anyway, so this patch accounts only for the active one. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:34 -07:00
Erez Shitrit	8b46d424a7	net/mlx5e: IPoIB, Drop multicast packets that this interface sent After enabled loopback packets for IPoIB, we need to drop these packets that this HCA has replicated and came back to the same interface that sent them. Fixes: `4c6c615e3f` ("net/mlx5e: IPoIB, Add PKEY child interface nic profile") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:32 -07:00
Erez Shitrit	80639b199c	net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfaces Enable loopback of unicast and multicast traffic for IPoIB enhanced mode. This will allow interfaces with the same pkey to communicate between them e.g cloned interfaces that located in different namespaces. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:30 -07:00
Roi Dayan	9102d836d2	net/mlx5e: CT: Fix offload with CT action after CT NAT action It could be a chain of rules will do action CT again after CT NAT Before this fix matching will break as we get into the CT table after NAT changes and not CT NAT. Fix this by adding pre ct and pre ct nat tables to skip ct/ct_nat tables and go straight to post_ct table if ct/nat was already done. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:27 -07:00
Eran Ben Elisha	90bf1c8dbd	net/mlx5: Move internal timer read function to clock library Move mlx5_read_internal_timer() into lib/clock.c file as it is being used there. As such, make this function a static one. In addition, rearrange headers include to support function move. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:25 -07:00
Paul Blakey	49c0355d30	net/mlx5: Wait for inactive autogroups Currently, if one thread tries to add an entry to an autogrouped table with no free matching group, while another thread is in the process of creating a new matching autogroup, it doesn't wait for the new group creation, and creates an unnecessary new autogroup. Instead of skipping inactive, wait on the write lock of those groups. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:22 -07:00
Parav Pandit	41798df9bf	net/mlx5: Drain wq first during PCI device removal mlx5_unload_one() is done with cleanup = true only once. So instead of doing health wq drain inside the if(), directly do during PCI device removal. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:20 -07:00
Parav Pandit	4162f58b47	net/mlx5: Have single error unwinding path Having multiple error unwinding path are error prone. Lets have just one error unwinding path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:17 -07:00
Eran Ben Elisha	e7f860e210	net/mlx5: Fix a bug of releasing wrong chunks on > 4K page size systems On systems with page size larger than 4K, a fwp object has few 4K chunks. Fix a bug in fwp free flow where the chunk address was dropped and fwp->addr was used instead (first chunk address). This caused a wrong update of fwp->bitmask which later can cause errors in re-alloc fwp chunk flow. In order to fix this it, re-factor the release flow: - Free 4k: Releases a specific 4k chunk inside the fwp, defined by starting address. - Free fwp: Unconditionally release the whole fwp and its resources. Free addr will call free fwp if all chunks were released, in order to do code sharing. In addition, fix npages to count for all released chunks correctly. Fixes: `c6168161f6` ("net/mlx5: Add support for release all pages event") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:15 -07:00
Eran Ben Elisha	2726cd4a29	net/mlx5: Dedicate fw page to the requesting function The cited patch assumes that all chuncks in a fw page belong to the same function, thus the driver must dedicate fw page to the requesting function, which is actually what was intedned in the original fw pages allocator design, hence the fwp->func_id ! Up until the cited patch everything worked ok, but now "relase all pages" is broken on systems with page_size > 4k. Fix this by dedicating fw page to the requesting function id via adding a func_id parameter to alloc_4k() function. Fixes: `c6168161f6` ("net/mlx5: Add support for release all pages event") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-15 15:44:12 -07:00
Jesper Dangaard Brouer	d628ee4fef	mlx5: Rx queue setup time determine frame_sz for XDP The mlx5 driver have multiple memory models, which are also changed according to whether a XDP bpf_prog is attached. The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.: # ethtool --set-priv-flags mlx5p2 rx_striding_rq off On the general case with 4K page_size and regular MTU packet, then the frame_sz is 2048 and 4096 when XDP is enabled, in both modes. The info on the given frame size is stored differently depending on the RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe. In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ). In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is what the XDP case cares about. To reduce effect on fast-path, this patch determine the frame_sz at setup time, to avoid determining the memory model runtime. Variable is named frame0_sz to make it clear that this is only the frame size of the first fragment. This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe as it have done a DMA-map on the entire PAGE_SIZE. The driver also already does a XDP length check against sq->hw_mtu on the possible XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe(). V3+4: Change variable name first_frame_sz to frame0_sz V2: Fix that frag_size need to be recalc before creating SKB. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945348021.97035.12295039384250022883.stgit@firesoul	2020-05-14 21:21:56 -07:00
Jesper Dangaard Brouer	d201ea9ebc	mlx4: Add XDP frame size and adjust max XDP MTU The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945341893.97035.2688142527052329942.stgit@firesoul	2020-05-14 21:21:55 -07:00
Maor Gottlieb	9254f8ed15	net/mlx5: Add support in forward to namespace Currently, fs_core supports rule of forward the traffic to continue matching in the next priority, now we add support to forward the traffic matching in the next namespace. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-05-13 18:56:31 +03:00
Maor Gottlieb	14c129e301	{IB/net}/mlx5: Simplify don't trap code The fs_core already supports creation of rules with multiple actions/destinations. Refactor fs_core to handle the case when don't trap rule is created with destination. Adapt the calling code in the driver. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-05-13 18:56:18 +03:00
Jiri Pirko	67ed68fc0c	mlxsw: spectrum_flower: Forbid to insert flower rules in collision with matchall rules On ingress, the matchall rules doing mirroring and sampling are offloaded into hardware blocks that are processed before any flower rules. On egress, the matchall mirroring rules are offloaded into hardware block that is processed after all flower rules. Therefore check the priorities of inserted flower rules against existing matchall rules and ensure the correct ordering. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Jiri Pirko	18346b70ab	mlxsw: spectrum_matchall: Forbid to insert matchall rules in collision with flower rules On ingress, the matchall rules doing mirroring and sampling are offloaded into hardware blocks that are processed before any flower rules. On egress, the matchall mirroring rules are offloaded into hardware block that is processed after all flower rules. Therefore check the priorities of inserted matchall rules against existing flower rules and ensure the correct ordering. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Jiri Pirko	aed65285fb	mlxsw: spectrum_matchall: Expose a function to get min and max rule priority Introduce an infrastructure that allows to get minimum and maximum rule priority for specified chain. This is going to be used by a subsequent patch to enforce ordering between flower and matchall filters. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Jiri Pirko	5a2939b9d7	mlxsw: spectrum_matchall: Put matchall list into substruct of flow struct As there are going to be other matchall specific fields in flow structure, put the existing list field into matchall substruct. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Jiri Pirko	593bb84379	mlxsw: spectrum_flower: Expose a function to get min and max rule priority Introduce an infrastructure that allows to get minimum and maximum rule priority for specified chain. This is going to be used by a subsequent patch to enforce ordering between flower and matchall filters. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Jiri Pirko	18aa23b31f	mlxsw: spectrum_matchall: Restrict sample action to be allowed only on ingress HW supports packet sampling on ingress only. Check and fail if user is adding sample on egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-05-09 16:02:43 -07:00
Tariq Toukan	28bff09518	net/mlx5e: Enhance ICOSQ WQE info fields The same WQE opcode might be used in different ICOSQ flows and WQE types. To have a better distinguishability, replace it with an enum that better indicates the WQE type and flow it is used for. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:42 -07:00
Tariq Toukan	6b74f60ef5	net/mlx5: Accel, Remove unnecessary header include The include of Ethernet driver header in core is not needed and actually wrong. Remove it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:42 -07:00
Tariq Toukan	41a8e4ebb4	net/mlx5e: Use struct assignment for WQE info updates Struct assignment looks more clean, and implies resetting the not assigned fields to zero, instead of holding values from older ring cycles. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:41 -07:00
Tariq Toukan	05dfd57082	net/mlx5e: Take TX WQE info structures out of general EN header Into the txrx header file. The mlx5e_sq_wqe_info structure describes WQE info for the ICOSQ, rename it to better reflect this. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:41 -07:00
Tariq Toukan	f713ce1de8	net/mlx5e: kTLS, Do not fill edge for the DUMP WQEs in TX flow Every single DUMP WQE resides in a single WQEBB. As the pi is calculated per each one separately, there is no real need for a contiguous room for them, allow them to populate different WQ fragments. This reduces WQ waste and improves its utilization. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:41 -07:00
Tariq Toukan	ab1e0ce99d	net/mlx5e: kTLS, Fill work queue edge separately in TX flow For the static and progress context params WQEs, do the edge filling separately. This improves the WQ utilization, code readability, and reduces the chance of future bugs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:40 -07:00
Maxim Mikityanskiy	714c88a38b	net/mlx5e: Split TX acceleration offloads into two phases After previous modifications, the offloads are no longer called one by one, the pi is calculated and the wqe is cleared on between of TLS and IPSEC offloads, which doesn't quite fit mlx5e_accel_handle_tx's purpose. This patch splits mlx5e_accel_handle_tx into two functions that correspond to two logical phases of running offloads: 1. Before fetching a WQE. Here runs the code that can post WQEs on its own, before the main WQE is fetched. It's the main part of TLS offload. 2. After fetching a WQE. Here runs the code that updates the WQE's fields, but can't post other WQEs any more. It's a minor part of TLS offload that sets the tisn field in the cseg, and eseg-based offloads (currently IPSEC, and later patches will move GENEVE and checksum offloads there, too). It allows to make mlx5e_xmit take care of all actions needed to transmit a packet in the right order, improve the structure of the code and reduce unnecessary operations. The structure will be further improved in the following patches (all eseg-based offloads will be moved to a single place, and reserving space for the main WQE will happen between phase 1 and phase 2 of offloads to eliminate unneeded data movements). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:40 -07:00
Maxim Mikityanskiy	5546100038	net/mlx5e: Update UDP fields of the SKB for GSO first mlx5e_udp_gso_handle_tx_skb updates the length field in the UDP header in case of GSO. It doesn't interfere with other offloads, so do it first to simplify further restructuring of the code. This way we'll make all independent modifications to the SKB before starting to work with WQEs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:40 -07:00
Maxim Mikityanskiy	2eeb6e3841	net/mlx5e: Make TLS offload independent of wqe and pi TLS offload may write a 32-bit field (tisn) to the cseg of the WQE. To do that, it receives pi and wqe pointers. As TLS offload may also send additional WQEs, it has to update pi and wqe, and in many cases it even doesn't use pi calculated before and wqe zeroed before and does it itself. Also, mlx5e_sq_xmit has to copy the whole cseg if it goes to the mlx5e_fill_sq_frag_edge flow. This all is not efficient. It's more efficient to do the following: 1. Just return tisn from TLS offload and make the caller fill it in a more appropriate place. 2. Calculate pi and clear wqe after calling TLS offload. 3. If TLS offload has to send WQEs, calculate pi and clear wqe just before that. It's already done in all places anyway, so this commit allows to remove some redundant memsets and calls. Copying of cseg will be eliminated in one of the following commits, and all other stuff is done here. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:40 -07:00
Maxim Mikityanskiy	0bdb078c74	net/mlx5e: Pass only eseg to IPSEC offload IPSEC offload needs to modify the eseg of the WQE that is being filled, but it receives a pointer to the whole WQE. To make the contract stricter, pass only the pointer to the eseg of that WQE. This commit is preparation for the following refactoring of offloads in the TX path and for the MPWQE support. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:39 -07:00
Maxim Mikityanskiy	3df711db05	net/mlx5e: Return void from mlx5e_sq_xmit and mlx5i_sq_xmit mlx5e_sq_xmit and mlx5i_sq_xmit always return NETDEV_TX_OK. Drop the return value to simplify the code. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:39 -07:00
Maxim Mikityanskiy	7f8546f3f0	net/mlx5e: Unify checks of TLS offloads Both INNOVA and ConnectX TLS offloads perform the same checks in the beginning. Unify them to reduce repeating code. Do WARN_ON_ONCE on netdev mismatch and finish with an error in both offloads, not only in the ConnectX one. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:39 -07:00
Maxim Mikityanskiy	f02bac9ad6	net/mlx5e: Return bool from TLS and IPSEC offloads TLS and IPSEC offloads currently return struct sk_buff *, but the value is either NULL or the same skb that was passed as a parameter. Return bool instead to provide stronger guarantees to the calling code (it won't need to support handling a different SKB that could be potentially returned before this change) and to simplify restructuring this code in the following commits. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:39 -07:00
Saeed Mahameed	76cd622fe2	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge includes updates to bonding driver needed for the rdma stack, to avoid conflicts with the RDMA branch. Maor Gottlieb Says: ==================== Bonding: Add support to get xmit slave The following series adds support to get the LAG master xmit slave by introducing new .ndo - ndo_get_xmit_slave. Every LAG module can implement it and it first implemented in the bond driver. This is follow-up to the RFC discussion [1]. The main motivation for doing this is for drivers that offload part of the LAG functionality. For example, Mellanox Connect-X hardware implements RoCE LAG which selects the TX affinity when the resources are created and port is remapped when it goes down. The first part of this patchset introduces the new .ndo and add the support to the bonding module. The second part adds support to get the RoCE LAG xmit slave by building skb of the RoCE packet based on the AH attributes and call to the new .ndo. The third part change the mlx5 driver driver to set the QP's affinity port according to the slave which found by the .ndo. ==================== Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-09 01:05:30 -07:00
Jacob Keller	c75a33c84b	net: remove newlines in NL_SET_ERR_MSG_MOD The NL_SET_ERR_MSG_MOD macro is used to report a string describing an error message to userspace via the netlink extended ACK structure. It should not have a trailing newline. Add a cocci script which catches cases where the newline marker is present. Using this script, fix the handful of cases which accidentally included a trailing new line. I couldn't figure out a way to get a patch mode working, so this script only implements context, report, and org. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-07 17:56:14 -07:00
Jason Yan	5a7c45097c	net: mlx4: remove unneeded variable "err" in mlx4_en_ethtool_add_mac_rule() Fix the following coccicheck warning: drivers/net/ethernet/mellanox/mlx4/en_ethtool.c:1396:5-8: Unneeded variable: "err". Return "0" on line 1411 Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-07 13:04:21 -07:00
David S. Miller	3793faad7b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-06 22:10:13 -07:00
Pablo Neira Ayuso	16f8036086	net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver that the frontend does not need counters, this hw stats type request never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests the driver to disable the stats, however, if the driver cannot disable counters, it bails out. TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_* except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*. Fixes: `319a1d1947` ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-06 20:13:10 -07:00
Jason Yan	f9cbf19c7f	net: mlx4: remove unneeded variable "err" in mlx4_en_get_rxfh() Fix the following coccicheck warning: drivers/net/ethernet/mellanox/mlx4/en_ethtool.c:1238:5-8: Unneeded variable: "err". Return "0" on line 1252 Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-06 13:57:30 -07:00
Tariq Toukan	40e473071d	net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc() When ENOSPC is set the idx is still valid and gets set to the global MLX4_SINK_COUNTER_INDEX. However gcc's static analysis cannot tell that ENOSPC is impossible from mlx4_cmd_imm() and gives this warning: drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be used uninitialized in this function [-Wmaybe-uninitialized] 2552 \| priv->def_counter[port] = idx; Also, when ENOSPC is returned mlx4_allocate_default_counters should not fail. Fixes: `6de5f7f6a1` ("net/mlx4_core: Allocate default counter per port") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-04 10:42:06 -07:00
Maor Gottlieb	c6bc6041b1	net/mlx5: Add support to get lag physical port Add function to get the device physical port of the lag slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-01 12:15:38 -07:00
Maor Gottlieb	64363e61c7	net/mlx5: Change lag mutex lock to spin lock The lag lock could be a spin lock, the critical section is short and there is no need that the thread will sleep. Change the lock that protects the LAG structure from mutex to spin lock. It is required for next patch that need to access this structure from context that we can't sleep. In addition there is no need to hold this lock when query the congestion counters. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-01 12:15:38 -07:00
Jiri Pirko	6ef4889fc0	mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly Vregion helpers to get min and max priority depend on the correct ordering of vchunks in the vregion list. However, the current code always adds new chunk to the end of the list, no matter what the priority is. Fix this by finding the correct place in the list and put vchunk there. Fixes: `22a677661f` ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 20:41:14 -07:00
David S. Miller	3857c77624	mlx5-updates-2020-04-30 1) Add release all pages support, From Eran. to release all FW pages at once on driver unload, when supported by FW. 2) From Maxim and Tariq, Trivial Data path cleanup and code improvements in preparation for their next features, TLS offload and TX performance improvements 3) Multiple cleanups. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl6rBpYACgkQSD+KveBX +j5L4Qf+MA17+ENeqPfMLRKHtSn9D50M3+8uDuYd4VK0uqQIQHbBpxHNM4FLa1sI WTBx/HnFKq5eZdOvYiZExVxpOcBWk+KVoIq8r4IPHsCU3Y2BqQOc6qqbi8haQ7J8 fgrdi+gbS02N8MD45uUbNP/8JhZxN+4s0uEaH9cQ68sSorZOF1VtExAttTpQoqso Zur9gQH3MfYmQBPbr7mj4OsKiho7cb17UPadASyiLjvD7QDJ+++73PHF5YCGuSy0 HTSvdBZOesBzaeD7qTwdq3bFILYiuiNmMlLotdvXmuAceXfa7lO8bqvwlc8an+No UMOBELwlw6tqVjSohMtqlbEMS9PNXQ== =XSFa -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-04-30 1) Add release all pages support, From Eran. to release all FW pages at once on driver unload, when supported by FW. 2) From Maxim and Tariq, Trivial Data path cleanup and code improvements in preparation for their next features, TLS offload and TX performance improvements 3) Multiple cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:15:16 -07:00
Ido Schimmel	ca0892235a	mlxsw: spectrum_span: Remove old SPAN API Remove the old SPAN API now that matchall-based and flower-based mirroring were converted to use the new API. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	835d6b8c1a	mlxsw: spectrum_span: Use new analyzed ports list during speed / MTU change As previously explained, each port whose outgoing traffic is analyzed needs to have an egress mirror buffer. The size of the egress mirror buffer is calculated based on various parameters, two of which are the speed and the MTU of the port. Therefore, when the MTU or the speed of a port change, the SPAN code is called to see if the egress mirror buffer of the port needs to be adjusted. Currently, this is done by traversing all the SPAN agents and for each SPAN agent the list of bound ports is traversed. Instead of the above, traverse the recently added list of analyzed ports. This will later allow us to remove the old SPAN API. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	7240db69c3	mlxsw: spectrum_acl: Convert flower-based mirroring to new SPAN API In flower-based mirroring, mirroring is done with ACLs and the SPAN agent is not bound to a port. Instead its identifier is specified in an ACL action. Convert this type of mirroring to use the new API. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	c1d7845dfb	mlxsw: spectrum: Convert matchall-based mirroring to new SPAN API In matchall-based mirroring, mirroring is not done with ACLs, but a SPAN agent is bound to the ingress / egress of a port and all incoming / outgoing traffic is mirrored. Convert this type of mirroring to use the new API. First the SPAN agent is resolved, then the port is marked as analyzed and its egress mirror buffer is potentially allocated. Lastly, the binding is performed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	c056618c53	mlxsw: spectrum_span: Add APIs to bind / unbind a SPAN agent Currently, a SPAN agent can only be bound to a per-port trigger where the trigger is either an incoming packet (INGRESS) or an outgoing packet (EGRESS) to / from the port. A follow-up patch set will introduce the concept of global triggers and per-{port, TC} enablement. With global triggers, the trigger entry is only keyed by a trigger and not by a port and a trigger. The trigger can be, for example, a packet that was early dropped. While the binding between the SPAN agent and the trigger is performed only once, the trigger entry needs to be reference counted, as the trigger can be enabled on multiple ports. Add APIs to bind / unbind a SPAN agent to a trigger and reference count the trigger entry in preparation for global triggers. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	14366da6b5	mlxsw: spectrum_span: Wrap buffer change in a function The code that adjusts the egress buffer size is not symmetric at the moment. The update is done via a call to mlxsw_sp_span_port_buffer_update(), but the disablement is done inline by invoking the write to SBIB register directly. Wrap the disablement code in mlxsw_sp_span_port_buffer_disable(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	eb773c3a2d	mlxsw: spectrum_span: Rename function Next patch will introduce mlxsw_sp_span_port_buffer_disable() function that disables the egress buffer on an analyzed port. Rename the opposite function that updates the buffer on an analyzed port accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	ed04458d4a	mlxsw: spectrum_span: Add APIs to get / put an analyzed port An analyzed port is a port whose incoming / outgoing traffic is mirrored to a SPAN agent and analyzed on a remote server. A port can be analyzed by multiple tc filters and therefore the corresponding analyzed port entry needs to be reference counted. This is significant because ports whose outgoing traffic is analyzed need to have an egress mirror buffer. Add APIs to get / put an analyzed port. Allocate an egress mirror buffer on a port when it is first inspected at egress and free the buffer when it is no longer inspected at egress. Protect the list of analyzed ports with a mutex, as a later patch will traverse it from a context in which RTNL lock is not held. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Ido Schimmel	466010342e	mlxsw: spectrum_span: Add APIs to get / put a SPAN agent Given a netdev that packets should be mirrored to, create a SPAN agent and return its identifier to the caller. The SPAN agent is reference counted, as multiple tc-mirred actions can point to the same destination netdev. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-30 13:02:32 -07:00
Maxim Mikityanskiy	ec9cdca066	net/mlx5e: Unify reserving space for WQEs In our fast-path design, a WQE (Work Queue Element) must not cross the page boundary. To enforce that, for WQEs consisting of more than one BB (Basic Block), the driver checks the available contiguous space in the WQ in advance, and if it's not enough, it pads it with NOPs. This patch modifies the code that calculates the position of next WQE, considering the padding, and prepares the WQE. This code is common for all SQ types. In this patch it's reorganized in a way that makes the usage pattern unified for all SQ types, and makes the implementations self-contained and look almost the same, preparing the repeating code to further attempts to deduplicate it. One place is left as is: mlx5e_sq_xmit and mlx5e_fill_sq_frag_edge call inside, because it is special in a way that it may also copy WQE's cseg and eseg when reserving space. This will be eliminated in one of the following patches, and this place will be converted to the new approach, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:46 -07:00
Maxim Mikityanskiy	7d42c8e9ab	net/mlx5e: Rename ICOSQ WQE info struct and field Structs mlx5e_txqsq and mlx5e_xdpsq contain wqe_info arrays to store supplementary information corresponding to WQEs in the queue. Struct mlx5e_icosq also has such an array, but it's called differently - ico_wqe. This patch renames it to unify with the other SQs. In addition, rename the struct to emphasize its specific usage. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:45 -07:00
Maxim Mikityanskiy	fed0c6cfcd	net/mlx5e: Fetch WQE: reuse code and enforce typing There are multiple functions mlx5{e,i}__fetch_wqe that contain the same code, that is repeated, because they operate on different SQ struct types. mlx5e_sq_fetch_wqe also returns void , instead of the concrete WQE type. This commit generalizes the fetch WQE operation by putting this code into a single function. To simplify calls of the generic function in concrete use cases, macros are provided that substitute the right WQE size and cast the return type. Before this patch, fetch_wqe used to calculate pi itself, but the value was often known to the caller. This calculation is moved outside to eliminate this unnecessary step and prepare for the fill_frag_edge refactoring in the next patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:45 -07:00
Tariq Toukan	e2e11dbf36	net/mlx5e: XDP, Print the offending TX descriptor on error completion Upon an error completion on an XDP SQ, print the offending WQE to ease the debug process. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:45 -07:00
Tariq Toukan	f1b95753ee	net/mlx5e: TX, Generalise code and usage of error CQE dump Error CQE was dumped only for TXQ SQs. Generalise the function, and add usage for error completions on ICO SQs and XDP SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:44 -07:00
Tariq Toukan	e658664c77	net/mlx5e: Use proper name field for the UMR key Even though some of the WQE control segment's field share the same memory bits (a union of fields), prefer having the right field name for every different usage. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:44 -07:00
Eran Ben Elisha	c6168161f6	net/mlx5: Add support for release all pages event If FW sets release_all_pages bit in MLX5_EVENT_TYPE_PAGE_REQUEST, driver shall release all pages of a given function id, with no further pages reclaim negotiation with FW nor MANAGE_PAGES commands from driver towards FW. Upon receiving this bit as part of pages reclaim event, driver will initiate release all flow, in which it will iterate and release all function's pages. As part of driver <-> FW capabilities handshake, FW will report release_all_pages max HCA cap bit, and driver will set the release_all_pages bit in HCA cap. NIC: ConnectX-4 Lx CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz Test case: Simulataniously FLR 4 VFs, and measure FW release pages by driver. Before: 3.18 Sec After: 0.31 Sec Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:44 -07:00
Eran Ben Elisha	c7636942d2	net/mlx5: Rate limit page not found error messages Thousands of pages are released with free_addr() function. In case of buggy sync between FW and driver on released address, the log will be flooded with error messages. Use mlx5_core_warn_rl() to limit it. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:44 -07:00
Eran Ben Elisha	c655c1f469	net/mlx5: Add helper function to release fw page Factor out the fwp address release page to an helper function, will be used in the downstream patch. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:44 -07:00
Tariq Toukan	51dde00b8f	net/mlx5: Remove unused field in EQ The size field in EQ is not in use. Remove it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:43 -07:00
Paul Blakey	d2658b4a1d	net/mlx5: CT: Remove unused variables Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:43 -07:00
Roi Dayan	70a5698a56	net/mlx5e: CT: Avoid false warning about rule may be used uninitialized Avoid gcc warning by preset rule to invalid ptr. Fixes: `4c3844d9e9` ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:43 -07:00
Zheng Bin	e59b254cbe	net/mlx5e: Remove unneeded semicolon Fixes coccicheck warning: drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:690:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:42 -07:00
Parav Pandit	9c8e7434e0	net/mlx5e: Use helper API to get devlink port index for all port flavours Use existing helper API to get unique devlink port index for all devlink port flavours. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:42 -07:00
Raed Salem	72d3fef161	net/mlx5: IPsec, Fix coverity issue The cited commit introduced the following coverity issue at functions mlx5_fpga_is_ipsec_device() and mlx5_fpga_ipsec_release_sa_ctx(): - bit_and_with_zero: accel_xfrm->attrs.action & MLX5_ACCEL_ESP_ACTION_DECRYPT is always 0. As MLX5_ACCEL_ESP_ACTION_DECRYPT is not a bitwise flag and was wrongly used with bitwise operation, the above expression is always zero value as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero. Fix by using "==" comparison operator instead. Fixes: `7dfee4b1d7` ("net/mlx5: IPsec, Refactor SA handle creation and destruction") Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 10:10:42 -07:00
Saeed Mahameed	a6b1b93605	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 updates for both net-next and rdma-next: 1) HW bits and definitions for TLS and IPsec offlaods 2) Release all pages capability bits 3) New command interface helpers and some code cleanup as a result 4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:49:53 -07:00
Roi Dayan	67b38de646	net/mlx5e: Fix q counters on uplink representors Need to allocate the q counters before init_rx which needs them when creating the rq. Fixes: `8520fa57a4` ("net/mlx5e: Create q counters on uplink representors") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:33 -07:00
Moshe Shemesh	cece6f432c	net/mlx5: Fix command entry leak in Internal Error State Processing commands by cmd_work_handler() while already in Internal Error State will result in entry leak, since the handler process force completion without doorbell. Forced completion doesn't release the entry and event completion will never arrive, so entry should be released. Fixes: `73dd3a4839` ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:33 -07:00
Moshe Shemesh	f3cb3cebe2	net/mlx5: Fix forced completion access non initialized command entry mlx5_cmd_flush() will trigger forced completions to all valid command entries. Triggered by an asynch event such as fast teardown it can happen at any stage of the command, including command initialization. It will trigger forced completion and that can lead to completion on an uninitialized command entry. Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is initialized will ensure force completion is treated only if command entry is initialized. Fixes: `73dd3a4839` ("net/mlx5: Avoid using pending command interface slots") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:32 -07:00
Erez Shitrit	8075411d93	net/mlx5: DR, On creation set CQ's arm_db member to right value In polling mode, set arm_db member to a value that will avoid CQ event recovery by the HW. Otherwise we might get event without completion function. In addition,empty completion function to was added to protect from unexpected events. Fixes: `297cccebdc` ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:32 -07:00
Parav Pandit	f8d1eddaf9	net/mlx5: E-switch, Fix mutex init order In cited patch mutex is initialized after its used. Below call trace is observed. Fix the order to initialize the mutex early enough. Similarly follow mirror sequence during cleanup. kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock) kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938 __mutex_lock+0x7d6/0x8a0 kernel: Call Trace: kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mark_held_locks+0x55/0x70 kernel: ? __slab_free+0x274/0x400 kernel: ? lockdep_hardirqs_on+0x140/0x1d0 kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core] kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core] kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core] kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core] kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core] kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core] kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 Fixes: `96e326878f` ("net/mlx5e: Eswitch, Use per vport tables for mirroring") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:32 -07:00
Parav Pandit	e986453905	net/mlx5: E-switch, Fix printing wrong error value When mlx5_modify_header_alloc() fails, instead of printing the error value returned, current error log prints 0. Fix by printing correct error value returned by mlx5_modify_header_alloc(). Fixes: `6724e66b90` ("net/mlx5: E-Switch, Get reg_c1 value on miss") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:32 -07:00
Parav Pandit	799499850a	net/mlx5: E-switch, Fix error unwinding flow for steering init failure Error unwinding is done incorrectly in the cited commit. When steering init fails, there is no need to perform steering cleanup. When vport error exists, error cleanup should be mirror of the setup routine, i.e. to perform steering cleanup before metadata cleanup. This avoids the call trace in accessing uninitialized objects which are skipped during steering_init() due to failure in steering_init(). Call trace: mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header actions 1, max supported 0 E-Switch: Failed to create restore mod header BUG: kernel NULL pointer dereference, address: 00000000000000d0 [ 677.263079] mlx5_destroy_flow_group+0x13/0x80 [mlx5_core] [ 677.268921] esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core] [ 677.275281] esw_offloads_enable+0x1a5/0x800 [mlx5_core] [ 677.280949] mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] [ 677.287227] mlx5_devlink_eswitch_mode_set+0x1af/0x320 [ 677.293741] devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 [ 677.299217] genl_rcv_msg+0x1eb/0x430 Fixes: `7983a675ba` ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-30 09:20:31 -07:00
Raed Salem	244faedfd4	net/mlx5: Refactor imm_inval_pkey field in cqe struct The imm_inval_pkey field can hold four different types of data, depends on the usage, the data could be one of the below: - Immediate field of the received message - Invalidate rkey - Pkey of the packet - Flow table metadata Current implementation doesn't reflect the intended usage of the field at usage time. Reflect the different types by replace this field with a union, modify code where this field is used to reflect its intended usage. Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-28 12:45:15 -07:00
Erez Shitrit	dff8e2d152	net/mlx5: Use aligned variable while allocating ICM memory The alignment value is part of the input structure, so use it and spare extra memory allocation when is not needed. Now, using the new ability when allocating icm for Direct-Rule insertion. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-04-28 12:45:10 -07:00
Huy Nguyen	d65dbedfd2	net/mlx5: Add support for COPY steering action Add COPY type to modify_header action. IPsec feature is the first feature that needs COPY steering action. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com>	2020-04-28 12:44:44 -07:00
Jiri Pirko	19f06771ca	mlxsw: spectrum: Move flow offload binding into spectrum_flow.c Move the code taking case of setup of flow offload into spectrum_flow.c Do small renaming of callbacks on the way. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	3c650136af	mlxsw: spectrum_matchall: Process matchall events from the same cb as flower Currently there are two callbacks registered: one for matchall, one for flower. This causes the user to see "in_hw_count 2" in TC filter dump. Because of this and also as a preparation for future matchall offload for rules equivalent to flower-all-match, move the processing of shared block into matchall.c. Leave only one cb for mlxsw driver per-block. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	481ff57aad	mlxsw: spectrum: Avoid copying sample values and use RCU pointer direcly instead Currently, only the psample_group is accessed using RCU on RX path. However, it is possible (unlikely) that other sample values get change during RX processing. Fix this by having the port->sample struct accessed as RCU pointer, containing all sample values including psample_group pointer. That avoids extra alloc per-port, copying the values and the race condition described above. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	dd0fbc89d2	mlxsw: spectrum_matchall: Push per-port rule add/del into separate functions As the replace/destroy is going to be used later on per-block, push the per-port rule addition/deletion into separate functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	47fa15eae4	mlxsw: spectrum_matchall: Move ingress indication into mall_entry Instead of having it in mirror_entry structure, move it to mall_entry and set it during rule insertion. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	c7ea0e162f	mlxsw: spectrum_matchall: Pass mall_entry as arg to mlxsw_sp_mall_port_sample_add() In the preparation for future changes, have the mlxsw_sp_mall_port_sample_add() function to accept mall_entry including all needed info originally obtained from cls and act pointers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	780ba878a1	mlxsw: spectrum_matchall: Pass mall_entry as arg to mlxsw_sp_mall_port_mirror_add() In the preparation for future changes, have the mlxsw_sp_mall_port_mirror_add() function to accept mall_entry including the "to_dev" originally obtained from act pointer. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	6c8cd435b5	mlxsw: spectrum_acl: Use block variable in mlxsw_sp_acl_rule_del() On couple of places in mlxsw_sp_acl_rule_del(), block variable is not used directly as it could be. So do it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	d7fcc98622	mlxsw: spectrum: Push matchall bits into a separate file Similar to flower, have matchall related code in a separate file. Do some small renaming on the way (consistent "mall" prefixes, dropped "_tc_", dropped "_port_" where suitable). Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	d52238eb7b	mlxsw: spectrum: Push flow_block related functions into a separate file The code around flow_block is currently mixed in spectrum_acl.c. However, as it really does not directly relate to ACL part only, push the bits into a separate file. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	3bc3ffb6e9	mlxsw: spectrum: Rename acl_block to flow_block The acl_block structure is going to be used for non-acl case - matchall offload. So rename it accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Jiri Pirko	49c958ccd2	mlxsw: spectrum_acl: Move block helpers into inline header functions The struct is defined in the header, no need to have the helpers in the c file. Move the helpers to the header. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-27 12:43:29 -07:00
Zou Wei	c90af587a9	net/mlx4_core: Add missing iounmap() in error path This fixes the following coccicheck warning: drivers/net/ethernet/mellanox/mlx4/crdump.c:200:2-8: ERROR: missing iounmap; ioremap on line 190 and execution via conditional on line 198 Fixes: `7ef19d3b1d` ("devlink: report error once U32_MAX snapshot ids have been used") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-25 20:43:56 -07:00
David S. Miller	d483389678	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Simple overlapping changes to linux/vermagic.h Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-25 20:18:53 -07:00
Zheng Bin	10395e99f4	net/mlxfw: Remove unneeded semicolon Fixes coccicheck warning: drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c:79:2-3: Unneeded semicolon drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c:162:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 16:56:22 -07:00
Ido Schimmel	4780dbdbd9	mlxsw: spectrum_span: Replace zero-length array with flexible-array member In a similar fashion to commit `e99f8e7f88` ("mlxsw: Replace zero-length array with flexible-array member"), use a flexible-array member to get a compiler warning in case the flexible array does not occur last in the structure. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 15:41:51 -07:00
Ido Schimmel	4c00dafc59	mlxsw: spectrum_span: Use 'refcount_t' for reference counting 'refcount_t' is very useful for catching over/under flows. Convert the SPAN agent objects to use it instead of 'int' for their reference count. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 15:41:51 -07:00
Ido Schimmel	c0c2899cf6	mlxsw: spectrum_span: Remove unnecessary debug prints To the best of my knowledge, these debug prints were never used. Remove them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 15:41:51 -07:00
Amit Cohen	7f9b099bd9	mlxsw: spectrum_span: Rename parms() to parms_set() Use a more meaningful name for parms() function. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 15:41:51 -07:00
Amit Cohen	8146458fcd	mlxsw: spectrum_span: Reduce nesting in mlxsw_sp_span_entry_configure() Use early return to avoid unnecessary nesting. Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-24 15:41:51 -07:00
Eric Dumazet	cf4058dbaa	net/mlx4_en: use napi_complete_done() in TX completion In order to benefit from the new napi_defer_hard_irqs feature, we need to use napi_complete_done() variant in this driver. RX path is already using it, this patch implements TX completion side. mlx4_en_process_tx_cq() now returns the amount of retired packets, instead of a boolean, so that mlx4_en_poll_tx_cq() can pass this value to napi_complete_done(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-23 12:43:20 -07:00
Dan Carpenter	c391eb8366	mlxsw: Fix some IS_ERR() vs NULL bugs The mlxsw_sp_acl_rulei_create() function is supposed to return an error pointer from mlxsw_afa_block_create(). The problem is that these functions both return NULL instead of error pointers. Half the callers expect NULL and half expect error pointers so it could lead to a NULL dereference on failure. This patch changes both of them to return error pointers and changes all the callers which checked for NULL to check for IS_ERR() instead. Fixes: `4cda7d8d70` ("mlxsw: core: Introduce flexible actions support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-23 12:34:43 -07:00

... 3 4 5 6 7 ...

6686 Commits