linux

Author	SHA1	Message	Date
Petr Machata	dbcdb61aaf	mlxsw: spectrum_ptp: Fix validation in mlxsw_sp1_ptp_packet_finish() Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates whether the corresponding port is still valid. However the condition is incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to compare it to skb->dev. The condition needs to check whether the port is present and skb->dev still refers to that port (or else is NULL). If that does not hold, bail out. Add a pair of parentheses to fix the condition. Fixes: `d92e4e6e33` ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-02 15:31:20 -07:00
Petr Machata	87ee07f8e2	mlxsw: spectrum: PTP: Support ethtool get_ts_info The get_ts_info callback is used for obtaining information about timestamping capabilities of a network device. On Spectrum-1, implement it to advertise the PHC and the capability to do HW timestamping, and the supported RX and TX filters. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:35 -07:00
Petr Machata	8748642751	mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port. Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find which PTP messages need to be timestamped and configure MTPPPC accordingly. The SIOCGHWTSTAMP ioctl is getter for the current configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	a773c76cb8	mlxsw: spectrum: PTP: Configure PTP traps and FIFO events Configure MTPTPT to set which message types should arrive under which PTP trap, and MOGCR to clear the timestamp queue after its contents are reported through PTP_ING_FIFO or PTP_EGR_FIFO. With this configuration, PTP packets start arriving through the PTP traps. However since timestamping is disabled by default and there is currently no way to enable it, they will not be timestamped. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	5d23e41597	mlxsw: spectrum: PTP: Garbage-collect unmatched entries On Spectrum-1, timestamped PTP packets and the corresponding timestamps need to be kept in caches until both are available, at which point they are matched up and packets forwarded as appropriate. However, not all packets will ever see their timestamp, and not all timestamps will ever see their packet. It is therefore necessary to dispose of such abandoned entries. To that end, introduce a garbage collector to collect entries that have not had their counterpart turn up within about a second. The GC maintains a monotonously-increasing value of GC cycle. Every entry that is put to the hash table is annotated with the GC cycle at which it should be collected. When the GC runs, it walks the hash table, and collects the objects according to their GC cycle annotation. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	d92e4e6e33	mlxsw: spectrum: PTP: Support timestamping on Spectrum-1 On Spectrum-1, timestamps arrive through a pair of dedicated events: MLXSW_TRAP_ID_PTP_ING_FIFO and _EGR_FIFO. The payload delivered with those traps is contents of the timestamp FIFO at a given port in a given direction. Add a Spectrum-1-specific handler for these two events which decodes the timestamps and forwards them to the PTP module. Add a function that parses a packet, dispatching to ptp_classify_raw(), and decodes PTP message type, domain number, and sequence ID. Add a new mlxsw dependency on the PTP classifier. Add helpers that can store and retrieve unmatched timestamps and SKBs to the hash table added in a preceding patch. Add the matching code itself: upon arrival of a timestamp or a packet, look up the corresponding unmatched entry, and match it up. If there is none, add a new unmatched entry. This logic is the same on ingress as on egress. Packets and timestamps that never matched need to be eventually disposed of. A garbage collector added in a follow-up patch will take care of that. Since currently all this code is turned off, no crud will accumulate in the hash table. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	89e602ee6e	mlxsw: spectrum: PTP: Disable BH when working with PHC Up until now, the PTP hardware clock code was only invoked in the process context (SYS_clock_adjtime -> do_clock_adjtime -> k_clock::clock_adj -> pc_clock_adjtime -> posix_clock_operations::clock_adjtime -> ptp_clock_info::adjtime -> mlxsw_spectrum). In order to enable HW timestamping, which is tied into trap handling, it will be necessary to take the clock lock from the PCI queue handler tasklets as well. Therefore use the _bh variants when handling the clock lock. Incidentally, Documentation/ptp/ptp.txt recommends _irqsave variants, but that's unnecessarily strong for our needs. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	810256cec1	mlxsw: spectrum: PTP: Add PTP initialization / finalization Add two ptp_ops: init and fini, to initialize and finalize the PTP subsystem. Call as appropriate from mlxsw_sp_init() and _fini(). Lay the groundwork for Spectrum-1 support. On Spectrum-1, the received timestamped packets and their corresponding timestamps arrive independently, and need to be matched up. Introduce the related data types and add to struct mlxsw_sp_ptp_state the hash table that will keep the unmatched entries. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	0714256c3d	mlxsw: pci: PTP: Hook into packet transmit path On Spectrum-1, timestamps are delivered separately from the packets, and need to paired up. Therefore, at some point after mlxsw_sp_port_xmit() is invoked, it is necessary to involve the chip-specific driver code to allow it to do the necessary bookkeeping and matching. On Spectrum-2, timestamps are delivered in CQE. For that reason, position the point of driver involvement into mlxsw_pci_cqe_sdq_handle() to make it hopefully easier to extend for Spectrum-2 in the future. To tell the driver what port the packet was sent on, keep tx_info in SKB control buffer. Introduce a new driver core interface mlxsw_core_ptp_transmitted(), a driver callback ptp_transmitted, and a PTP op transmitted. The callee is responsible for taking care of releasing the SKB passed to the new interfaces, and correspondingly have the new stub callbacks just call dev_kfree_skb_any(). Follow-up patches will introduce the actual content into mlxsw_sp1_ptp_transmitted() in particular. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	d7cd206dbf	mlxsw: core: Add support for using SKB control buffer The SKB control buffer is useful (and used) for bookkeeping of information related to that SKB. Add helpers so that the mlxsw driver(s) can safely use the buffer as well. The structure is currently empty, individual users will add members to it as necessary. Note that SKB allocation functions already clear the buffer, so the cleanup is only necessary when ndo_start_xmit is called. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	aed4b57211	mlxsw: spectrum: PTP: Hook into packet receive path When configured, the Spectrum hardware can recognize PTP packets and trap them to the CPU using dedicated traps, PTP0 and PTP1. One reason to get PTP packets under dedicated traps is to have a separate policer suitable for the amount of PTP traffic expected when switch is operated as a boundary clock. For this, add two new trap groups, MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP0 and _PTP1, and associate the two PTP traps with these two groups. In the driver, specifically for Spectrum-1, event PTP packets will need to be paired up with their timestamps. Those arrive through a different set of traps, added later in the patch set. To support this future use, introduce a new PTP op, ptp_receive. It is possible to configure which PTP messages should be trapped under which PTP trap. On Spectrum systems, we will use PTP0 for event packets (which need timestamping), and PTP1 for control packets (which do not). Thus configure PTP0 trap with a custom callback that defers to the ptp_receive op. Additionally, L2 PTP packets are actually trapped through the LLDP trap, not through any of the PTP traps. So treat the LLDP trap the same way as the PTP0 trap. Unlike PTP traps, which are currently still disabled, LLDP trap is active. Correspondingly, have all the implementations of the ptp_receive op return true, which the handler treats as a signal to forward the packet immediately. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	dadbc6bc09	mlxsw: spectrum: Add support for traps specific to Spectrum-1 On Spectrum-1, timestamps for PTP packets are delivered through queues of ingress and egress timestamps. There are two event traps corresponding to activity on each of those queues. This mechanism is absent on Spectrum-2, and therefore the traps should only be registered on Spectrum-1. Carry a chip-specific listener array in mlxsw_sp->listeners and listeners_count. Register listeners from that array in mlxsw_sp_traps_init(). Add a new listener array for Spectrum-1 traps and configure the newly-added mlxsw_sp->listeners with this array. The listener array is empty for now, the events will be added in a later patch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	4b6b91ed2d	mlxsw: spectrum: Extract a helper for trap registration On Spectrum-1, timestamps for PTP packets are delivered through queues of ingress and egress timestamps. There are two event traps corresponding to activity on each of those queues. This mechanism is absent on Spectrum-2, and therefore the traps should only be registered on Spectrum-1. Extract out of mlxsw_sp_traps_init() a generic helper, mlxsw_sp_traps_register(), and likewise with _unregister(). The new helpers will later be called with Spectrum-1-specific traps. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	41ce78b92e	mlxsw: reg: Add Monitoring Global Configuration Register This register serves to configure global parameters of certain monitoring operations. The following patches will use it to configure that when PTP timestamps are delivered through the PTP FIFO traps, the FIFO in question is cleared as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	98b9028ea5	mlxsw: reg: Add Time Precision Packet Timestamping Reading The MTPPTR is used for reading the per port PTP timestamp FIFO. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	4dfecb6570	mlxsw: reg: Add Monitoring Precision Time Protocol Trap Register This register is used for configuring under which trap to deliver PTP packets depending on type of the packet. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:33 -07:00
Petr Machata	da28e87847	mlxsw: reg: Add Monitoring Time Precision Packet Port Configuration Register This register serves for configuration of which PTP messages should be timestamped. This is a global configuration, despite the register name. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:33 -07:00
Paul Blakey	f6dc1264f1	net/mlx5e: Disallow tc redirect offload cases we don't support After changing the parent_id to be the same for both NICs of same the hardware device, netdev_port_same_parent_id now returns true for more cases (all the lower devices in the hierarchy are on the same hardware device). If merged eswitch isn't enabled, these cases aren't supported, so disallow them. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:04:00 -07:00
Paul Blakey	7ff40a46dd	net/mlx5e: Expose same physical switch_id for all representors Report system_image_guid as the E-Switch switch_id, this ensures that when a NIC contains multiple PCI functions and which has merged eswitch capability, all representors from multiple PFs publish same switch_id. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:04:00 -07:00
Gavi Teitz	a90f88fe55	net/mlx5e: Don't refresh TIRs when updating representor SQs Refreshing TIRs is done in order to update the TIRs with the current state of SQs in the transport domain, so that the TIRs can filter out undesired self-loopback packets based on the source SQ of the packet. Representor TIRs will only receive packets that originate from their associated vport, due to dedicated steering, and therefore will never receive self-loopback packets, whose source vport will be the vport of the E-Switch manager, and therefore not the vport associated with the representor. As such, it is not necessary to refresh the representors' TIRs, since self-loopback packets can't reach them. Since representors only exist in switchdev mode, and there is no scenario in which a representor will exist in the transport domain alongside a non-representor, it is not necessary to refresh the transport domain's TIRs upon changing the state of a representor's queues. Therefore, do not refresh TIRs upon such a change. Achieve this by adding an update_rx callback to the mlx5e_profile, which refreshes TIRs for non-representors and does nothing for representors, and replace instances of mlx5e_refresh_tirs() upon changing the state of the queues with update_rx(). Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:04:00 -07:00
Arnd Bergmann	5233794b17	net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create Putting an empty 'mlx5_flow_spec' structure on the stack is a bit wasteful and causes a warning on 32-bit architectures when building with clang -fsanitize-coverage: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create': drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Since the structure is never written to, we can statically allocate it to avoid the stack usage. To be on the safe side, mark all subsequent function arguments that we pass it into as 'const' as well. Fixes: `10caabdaad` ("net/mlx5e: Use termination table for VLAN push actions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:59 -07:00
Parav Pandit	f72e6c3e17	net/mlx5e: Set drvinfo in generic manner Consider PCI and non PCI device types while setting device name in get_drvinfo() callback using existing generic device. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:59 -07:00
Parav Pandit	087067368a	net/mlx5e: Correct phys_port_name for PF port Currently PF phys_port_name is named as pfNvf-1 as vport number for PF vport is 65535. Correct PF's phys_port name as agreed upon name as pfN. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:59 -07:00
Ariel Levkovich	5dc9520bf0	net/mlx5e: Report netdevice MPLS features Set supported device features in the netdevice MPLS features mask. This will enable HW checksumming and TSO for MPLS tagged traffic. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:59 -07:00
Ariel Levkovich	e4683f35f8	net/mlx5e: Move to HW checksumming advertising This patch changes the way the driver advertises its checksum offload capabilities within the net device features bit mask. Instead of advertising protocol specific checksumming capabilities which are limited today to IPv4 and IPv6, we move to reporing generic HW checksumming capabilities. This will allow the network stack to let mlx5 device offload checksum for cases where the IP header is encapsulated within another protocol and the skb->protocol doesn't indicate one of the IP versions protocol, specifically in the case of MPLS label encapsulating the IP header and the skb->protocol indiciates MPLS ethertype rather than IP. Moving the HW_CSUM reporting is required in the basic net device hw features mask and also in the extensions (vlan and encpasulation features) since the extensions are always multiplied by the basic features set during the packet's traversal through the stack's tx flow. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:59 -07:00
Gavi Teitz	e7e0bee8c5	net/mlx5: MPFS, Allow adding the same MAC more than once Remove the limitation preventing adding a vport's MAC address to the Multi-Physical Function Switch (MPFS) more than once per E-switch, as there is no difference in the MPFS if an address is being used by an E-switch more than once. This allows the E-switch to have multiple vports with the same MAC address, allowing vports to be classified by VLAN id instead of by MAC if desired. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:58 -07:00
Gavi Teitz	6311f30884	net/mlx5: MPFS, Cleanup add MAC flow Unify and isolate the error handling flow in mlx5_mpfs_add_mac(), removing code duplication. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:58 -07:00
Saeed Mahameed	4f5d1beadc	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) E-Switch vport metadata support for source vport matching 2) Convert mkey_table to XArray 3) Shared IRQs and to use single IRQ for all async EQs Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-28 16:03:54 -07:00
David S. Miller	d7ee287827	Generic DIM From: Tal Gilboa and Yamin Fridman Implement net DIM over a generic DIM library, add RDMA DIM dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt moderation for networking interfaces. We want a similar functionality for other protocols, which might need to optimize interrupts differently. Main motivation here is DIM for NVMf storage protocol. Current DIM implementation prioritizes reducing interrupt overhead over latency. Also, in order to reduce DIM's own overhead, the algorithm might take some time to identify it needs to change profiles. While this is acceptable for networking, it might not work well on other scenarios. Here we propose a new structure to DIM. The idea is to allow a slightly modified functionality without the risk of breaking Net DIM behavior for netdev. We verified there are no degradations in current DIM behavior with the modified solution. Suggested solution: - Common logic is implemented in lib/dim/dim.c - Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses the common logic in dim.c - Any new DIM logic will be implemented in "lib/dim/new_dim.c". This new implementation will expose modified versions of profiles, dim_step() and dim_decision(). - DIM API is declared in include/linux/dim.h for all implementations. Pros for this solution are: - Zero impact on existing net_dim implementation and usage - Relatively more code reuse (compared to two separate solutions) - Increased extensibility -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0SiDAACgkQSD+KveBX +j4mlQgAoabbtxwIapMg62tKhnlr/3hEEyuEjiQRhzMUb5y5zOIJNRL82Nerv4K3 oD7v76emw/Xb1ErPe4yJJRdkpbpabEAFKbqkM0R7xFPNK33ZbphlU1d9YGgUKp1W Uieydn5kfubzi6p7R1TC7c2qW1Wr6rE1wnNJyC/sy8zFEP/OEH0NmgPDyOs7ckWI xaoJ1lhNWMoGN+Qk+fxazd1VtsKdcbc9JvLD2e8ON5qWdnuNiLhawcJVoWzfsCAa V/Tdff+Rj9bVFsi468u20og4EdV3ceAOTzMnhCYpAtYZHVbVk6MFgh87uQYkTrfY sW8BDmWKkRdHtzHd3a3yQiVDnz/iuw== =bAZn -----END PGP SIGNATURE----- Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mamameed says: ==================== Generic DIM From: Tal Gilboa and Yamin Fridman Implement net DIM over a generic DIM library, add RDMA DIM dim.h lib exposes an implementation of the DIM algorithm for dynamically-tuned interrupt moderation for networking interfaces. We want a similar functionality for other protocols, which might need to optimize interrupts differently. Main motivation here is DIM for NVMf storage protocol. Current DIM implementation prioritizes reducing interrupt overhead over latency. Also, in order to reduce DIM's own overhead, the algorithm might take some time to identify it needs to change profiles. While this is acceptable for networking, it might not work well on other scenarios. Here we propose a new structure to DIM. The idea is to allow a slightly modified functionality without the risk of breaking Net DIM behavior for netdev. We verified there are no degradations in current DIM behavior with the modified solution. Suggested solution: - Common logic is implemented in lib/dim/dim.c - Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses the common logic in dim.c - Any new DIM logic will be implemented in "lib/dim/new_dim.c". This new implementation will expose modified versions of profiles, dim_step() and dim_decision(). - DIM API is declared in include/linux/dim.h for all implementations. Pros for this solution are: - Zero impact on existing net_dim implementation and usage - Relatively more code reuse (compared to two separate solutions) - Increased extensibility ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-27 12:42:51 -07:00
Jianbo Liu	92ab1eb392	net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it As the ingress ACL rules save vhca id and vport number to packet's metadata REG_C_0, and the metadata matching for the rules in both fast path and slow path are all added, enable this feature if supported. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:29 -07:00
Jianbo Liu	a5641cb524	net/mlx5: E-Switch, Add match on vport metadata for rule in slow path In slow path, packet that not matched by any offloaded rule is forwarded to eswitch vport manager for further processing. Add matching on metadata for peer miss rules in FDB, and rules which forward packet to correct representor in esw manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:29 -07:00
Jianbo Liu	c1286050cf	net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager In order to do matching on metadata in slow path when demuxing traffic to representors, explicitly enable the feature that allows HW to pass metadata REG_C_0 from FDB to eswitch manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:29 -07:00
Jianbo Liu	5784386870	net/mlx5: E-Switch, Add query and modify esw vport context functions Add esw vport query and modify functions, and exposing them is needed for enabling or disabling registers passed as metatdata to vport NIC_RX table in slow path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	c01cfd0f11	net/mlx5: E-Switch, Add match on vport metadata for rule in fast path If FW's capabilities and configurations meet the requirement of vport metadata matching, this feature will be used. As the information about vport number and vhca_id related to packet is already stored to its metadata register, which is used as an indicator for perticular vport, now we can change to match on this metadata for all the offloading rules in fast path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	8d212ff057	net/mlx5e: Specifying known origin of packets matching the flow In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	7445cfb116	net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	bb0ee7dcc4	net/mlx5: Add flow context for flow tag Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Parav Pandit	91d6291c4e	net/mlx5: Introduce a helper API to check VF vport Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	84b0d6a7a1	net/mlx5: Support allocating modify header context from ingress ACL That modify header action can be then attached to a steering rule in the ingress ACL. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Jianbo Liu	f53297d678	net/mlx5: Get vport ACL namespace by vport index The ingress and egress ACL root namespaces are created per vport and stored into arrays. However, the vport number is not the same as the index. Passing the array index, instead of vport number, to get the correct ingress and egress acl namespace. Fixes: `9b93ab981e` ("net/mlx5: Separate ingress/egress namespaces for each vport") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-26 12:01:28 -07:00
Tal Gilboa	4f75da3666	linux/dim: Move implementation to .c files Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c. This is both more structurally appealing and would allow to only expose externally used functions. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-25 13:46:39 -07:00
Tal Gilboa	8960b38932	linux/dim: Rename externally used net_dim members Removed 'net' prefix from functions and structs used by external drivers. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-25 13:46:39 -07:00
Tal Gilboa	e5b6ab02d7	linux/dim: Rename net_dim_sample() to net_dim_update_sample() In order to avoid confusion between the function and the similarly named struct. In preparation for removing the 'net' prefix from dim members. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-25 13:46:39 -07:00
Tal Gilboa	c002bd529d	linux/dim: Rename externally exposed macros Renamed macros in use by external drivers. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-25 13:46:39 -07:00
Matthew Wilcox	792c4e9d0b	net/mlx5: Convert mkey_table to XArray The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-24 16:44:40 -07:00
Vadim Pasternak	f485cc36b0	mlxsw: core: Add support for negative temperature readout Extend macros MLXSW_REG_MTMP_TEMP_TO_MC() to allow support of negative temperature readout, since chip and others thermal components are capable of operating within the negative temperature. With no such support negative temperature will be consider as very high temperature and it will cause wrong readout and thermal shutdown. For negative values 2`s complement is used. Tested in chamber. Example of chip ambient temperature readout with chamber temperature: -10 Celsius: temp1: -6.0C (highest = -5.0C) -5 Celsius: temp1: -1.0C (highest = -1.0C) v2 (Andrew Lunn): * Replace '%u' with '%d' in mlxsw_hwmon_module_temp_show() Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-24 08:15:42 -07:00
Vadim Pasternak	6f73862fab	mlxsw: core: Add the hottest thermal zone detection When multiple sensors are mapped to the same cooling device, the cooling device should be set according the worst sensor from the sensors associated with this cooling device. Provide the hottest thermal zone detection and enforce cooling device to follow the temperature trends of the hottest zone only. Prevent competition for the cooling device control from others zones, by "stable trend" indication. A cooling device will not perform any actions associated with a zone with a "stable trend". When other thermal zone is detected as a hottest, a cooling device is to be switched to following temperature trends of new hottest zone. Thermal zone score is represented by 32 bits unsigned integer and calculated according to the next formula: For T < TZ<t><i>, where t from {normal trip = 0, high trip = 1, hot trip = 2, critical = 3}: TZ<i> score = (T + (TZ<t><i> - T) / 2) / (TZ<t><i> - T) * 256 ** j; Highest thermal zone score s is set as MAX(TZ<i>score); Following this formula, if TZ<i> is in trip point higher than TZ<k>, the higher score is to be always assigned to TZ<i>. For two thermal zones located at the same kind of trip point, the higher score will be assigned to the zone which is closer to the next trip point. Thus, the highest score will always be assigned objectively to the hottest thermal zone. All the thermal zones initially are to be configured with mode "enabled" with the "step_wise" governor. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-24 08:15:42 -07:00
Vadim Pasternak	f14f4e621b	mlxsw: core: Extend thermal core with per inter-connect device thermal zones Add a dedicated thermal zone for each inter-connect device. The current temperature is obtained from inter-connect temperature sensor and the default trip points are set to the same values as default ASIC trip points. These settings could be changed from the user space. A cooling device (fan) is bound to all inter-connect devices. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-24 08:15:41 -07:00
Jesper Dangaard Brouer	99c07c43c4	xdp: tracking page_pool resources and safe removal This patch is needed before we can allow drivers to use page_pool for DMA-mappings. Today with page_pool and XDP return API, it is possible to remove the page_pool object (from rhashtable), while there are still in-flight packet-pages. This is safely handled via RCU and failed lookups in __xdp_return() fallback to call put_page(), when page_pool object is gone. In-case page is still DMA mapped, this will result in page note getting correctly DMA unmapped. To solve this, the page_pool is extended with tracking in-flight pages. And XDP disconnect system queries page_pool and waits, via workqueue, for all in-flight pages to be returned. To avoid killing performance when tracking in-flight pages, the implement use two (unsigned) counters, that in placed on different cache-lines, and can be used to deduct in-flight packets. This is done by mapping the unsigned "sequence" counters onto signed Two's complement arithmetic operations. This is e.g. used by kernel's time_after macros, described in kernel commit `1ba3aab303` and `5a581b367b`, and also explained in RFC1982. The trick is these two incrementing counters only need to be read and compared, when checking if it's safe to free the page_pool structure. Which will only happen when driver have disconnected RX/alloc side. Thus, on a non-fast-path. It is chosen that page_pool tracking is also enabled for the non-DMA use-case, as this can be used for statistics later. After this patch, using page_pool requires more strict resource "release", e.g. via page_pool_release_page() that was introduced in this patchset, and previous patches implement/fix this more strict requirement. Drivers no-longer call page_pool_destroy(). Drivers already call xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will attempt to disconnect the mem id, and if attempt fails schedule the disconnect for later via delayed workqueue. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-19 11:23:13 -04:00
Jesper Dangaard Brouer	29b006a676	mlx5: more strict use of page_pool API The mlx5 driver is using page_pool, but not for DMA-mapping (currently), and is a little too relaxed about returning or releasing page resources, as it is not strictly necessary, when not using DMA-mappings. As this patchset is working towards tracking page_pool resources, to know about in-flight frames on shutdown. Then fix places where mlx5 leak page_pool resource. In case of dma_mapping_error, then recycle into page_pool. In mlx5e_free_rq() moved the page_pool_destroy() call to after the mlx5e_page_release() calls, as it is more correct. In mlx5e_page_release() when no recycle was requested, then release page from the page_pool, via page_pool_release_page(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-19 11:23:13 -04:00

1 2 3 4 5 ...

5197 Commits