Commit Graph

2081 Commits

Author SHA1 Message Date
Huy Nguyen
820c2c5e77 net/mlx5e: Read ETS settings directly from firmware
Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
      creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
      is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
      implementation solves the scenarios when the DCBX is in FW control
      and willing bit is on which means the ETS setting is dictated
      by remote switch.

Also check ETS capability where needed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Huy Nguyen
3a6a931dfb net/mlx5e: Support DCBX CEE API
Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Huy Nguyen
80653f73c5 net/mlx5e: Add qos capability check
Make sure firmware supports qos before exposing the DCB API.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 15:09:34 -05:00
Eric Dumazet
b9972d2205 mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.

Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.

Interrupt moderation is best effort, and we do not really care of
ultra precise counters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-27 15:26:15 -05:00
David S. Miller
0b42f25d2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26 23:42:21 -05:00
Nogah Frankel
9148e7cf73 mlxsw: spectrum: Add policers for trap groups
Configure policers and connect them to trap groups.

While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.

Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
76a4c7d32a mlxsw: reg: Add QoS Policer Configuration Register
The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
2b77958bf4 mlxsw: resources: Add max cpu policers resource
Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
117b0dad2d mlxsw: Create a different trap group list for each device
Trap groups can be used to control traps priority, both in terms of
which trap "wins" if a packet matches two traps (priority) and in terms
of packets from which trap group will be scheduled to the cpu first (tc).
They can also be used to set rate limiters (policers) on them (will be
added in the next patches).

Currently, we support two trap groups. In Spectrum we want a better
resolution, so every protocol / flow will have a different trap group,
so we can control its parameters separately. Once the policers will be
implemented, it will also allow us limit the rate of each protocol by
itself.

This patch change the trap group list to include:
* the emad trap group, which is shared for all the devices.
* Switchx2's trap groups, which are a copy of the current trap groups.
* Spectrum's new trap groups, in order to match the above guidelines.
(Switchib is using only the emad trap group, so it require no changes).

This patch also includes new configuration for Spectrum's trap groups,
with primary priority order within them.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
616d8040e5 mlxsw: spectrum: Add BGP trap
Add a trap for BGP protocol that was previously trapped by the generic trap
for IP2ME. This trap will allow us to have better control (over priority
and rate) of the traffic.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
579c82e4c5 mlxsw: Change trap groups setting
Trap groups have many options which we currently set to default values.
In the next patches we will use many of them with non-default values.

Some of these options have no default value, so this patch sets them as
params for the trap group set function. Others almost always use the same
values, so the set function will use this default values. In the rare cases
when they will need to be with other values, these values can be set
directly (using the macros for fields in registers).

Parameters without default value:
TC - the traffic class for packets that hit this trap group.
    (old default is the max tc)
priority - if one packet hits multiple trap groups, the group with the
	   higher priority will "catch" it. (old default is 0)
policer - limit rate policer (old default is disabled)

Default parameters:
swid - switch id, relevant for the emad trap only, ignored on Spectrum.
       (new default is 0)
rdq - CPU receive descriptor queue (new default is identical to trap
      group id)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
23432cb86a mlxsw: resources: Add max trap groups resource
Add the max number of trap groups to resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
9d87fceac6 mlxsw: core: Change emad trap group settings
Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
0fb78a4e9c mlxsw: Add option to choose trap group
Currently, we set the trap group to pre-determined option, based on whether
it is an rx or event trap.
This commit adds a possibility to chose the trap group, so it can be set
to different values in the following patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
d570b7ee4e mlxsw: Change trap set function
Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
85d5c9cd90 mlxsw: switchib: Use generic listener struct for events
Change the event handling in Switchib to be comptible with Spectrum and
Switchx2.
Use the generic listener struct for the events. Init and fini them by loop
(and not by calling each event by its name).

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
6bf08b53ed mlxsw: switchx2: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
4544913ed7 mlxsw: spectrum: Use generic listener struct for events
Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
fb9012d93f mlxsw: core: Introduce generic macro for event
Create a macro for creating the generic listener struct for events,
similar to the one for rx traps.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
2332d8c7df mlxsw: switchx2: Use generic listener struct for rx traps
Reorganize the traps to use the new generic listener struct and
functions. Use macros to shorten the traps list.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
14eeda99c4 mlxsw: spectrum: Use generic listener struct for rx traps
Replace the old rx listener struct definitions by the generic ones.
Use the new generic registering / unregistering functions for them.
Add some macros to organize the trap list.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
b63da93de8 mlxsw: core: Expose generic macros for rx trap
In Spectrum, there is a macro to arrange the traps list.
This macro is useful for everyone who is using rx traps.
Create a similar macro in core.h for creating the generic listener struct
for rx traps.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
0791051c43 mlxsw: core: Create a generic function to register / unregister traps
We have 2 types of HW traps to handle, rx traps and events.
The registration workflow for both is very similar. So it only make
sense to create one function to handle both.

This patch creates a struct to hold the data for both cases. It also
creates a registration and an un-registration functions that get this
generic struct as input.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Nogah Frankel
ee4a60d898 mlxsw: spectrum: Remove unused traps
Since commit 99724c18fc ("mlxsw: spectrum: Introduce support for
router interfaces") we no longer rely on flooding traffic to the CPU in
order to trap packets intended for the host itself. Therefore, the FDB
MC trap can be removed.
Remove traps for protocols that are not supported yet.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 21:22:14 -05:00
Geliang Tang
5e7dfeb758 net/mlx5: drop duplicate header delay.h
Drop duplicate header delay.h from mlx5/core/main.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 20:33:10 -05:00
Dan Carpenter
eafa6abd99 net/mlx5: remove a duplicate condition
We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first
line of the function so we don't need to check again here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-25 20:28:28 -05:00
Eric Dumazet
e3f42f8453 mlx4: reorganize struct mlx4_en_tx_ring
Goal is to reorganize this critical structure to increase performance.

ndo_start_xmit() should only dirty one cache line, and access as few
cache lines as possible.

Add sp_ (Slow Path) prefix to fields that are not used in fast path,
to make clear what is going on.

After this patch pahole reports something much better, as all
ndo_start_xmit() needed fields are packed into two cache lines instead
of seven or eight

struct mlx4_en_tx_ring {
	u32                        last_nr_txbb;         /*     0   0x4 */
	u32                        cons;                 /*   0x4   0x4 */
	long unsigned int          wake_queue;           /*   0x8   0x8 */
	struct netdev_queue *      tx_queue;             /*  0x10   0x8 */
	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /*  0x18   0x8 */
	struct mlx4_en_rx_ring *   recycle_ring;         /*  0x20   0x8 */

	/* XXX 24 bytes hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        prod;                 /*  0x40   0x4 */
	unsigned int               tx_dropped;           /*  0x44   0x4 */
	long unsigned int          bytes;                /*  0x48   0x8 */
	long unsigned int          packets;              /*  0x50   0x8 */
	long unsigned int          tx_csum;              /*  0x58   0x8 */
	long unsigned int          tso_packets;          /*  0x60   0x8 */
	long unsigned int          xmit_more;            /*  0x68   0x8 */
	struct mlx4_bf             bf;                   /*  0x70  0x18 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	__be32                     doorbell_qpn;         /*  0x88   0x4 */
	__be32                     mr_key;               /*  0x8c   0x4 */
	u32                        size;                 /*  0x90   0x4 */
	u32                        size_mask;            /*  0x94   0x4 */
	u32                        full_size;            /*  0x98   0x4 */
	u32                        buf_size;             /*  0x9c   0x4 */
	void *                     buf;                  /*  0xa0   0x8 */
	struct mlx4_en_tx_info *   tx_info;              /*  0xa8   0x8 */
	int                        qpn;                  /*  0xb0   0x4 */
	u8                         queue_index;          /*  0xb4   0x1 */
	bool                       bf_enabled;           /*  0xb5   0x1 */
	bool                       bf_alloced;           /*  0xb6   0x1 */
	u8                         hwtstamp_tx_type;     /*  0xb7   0x1 */
	u8 *                       bounce_buf;           /*  0xb8   0x8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	long unsigned int          queue_stopped;        /*  0xc0   0x8 */
	struct mlx4_hwq_resources  sp_wqres;             /*  0xc8  0x58 */
	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
	struct mlx4_qp             sp_qp;                /* 0x120  0x30 */
	/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
	struct mlx4_qp_context     sp_context;           /* 0x150  0xf8 */
	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
	cpumask_t                  sp_affinity_mask;     /* 0x248  0x20 */
	enum mlx4_qp_state         sp_qp_state;          /* 0x268   0x4 */
	u16                        sp_stride;            /* 0x26c   0x2 */
	u16                        sp_cqn;               /* 0x26e   0x2 */

	/* size: 640, cachelines: 10, members: 36 */
	/* sum members: 600, holes: 1, sum holes: 24 */
	/* padding: 16 */
};

Instead of this silly placement :

struct mlx4_en_tx_ring {
	u32                        last_nr_txbb;         /*     0   0x4 */
	u32                        cons;                 /*   0x4   0x4 */
	long unsigned int          wake_queue;           /*   0x8   0x8 */

	/* XXX 48 bytes hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        prod;                 /*  0x40   0x4 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          bytes;                /*  0x48   0x8 */
	long unsigned int          packets;              /*  0x50   0x8 */
	long unsigned int          tx_csum;              /*  0x58   0x8 */
	long unsigned int          tso_packets;          /*  0x60   0x8 */
	long unsigned int          xmit_more;            /*  0x68   0x8 */
	unsigned int               tx_dropped;           /*  0x70   0x4 */

	/* XXX 4 bytes hole, try to pack */

	struct mlx4_bf             bf;                   /*  0x78  0x18 */
	/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
	long unsigned int          queue_stopped;        /*  0x90   0x8 */
	cpumask_t                  affinity_mask;        /*  0x98  0x10 */
	struct mlx4_qp             qp;                   /*  0xa8  0x30 */
	/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
	struct mlx4_hwq_resources  wqres;                /*  0xd8  0x58 */
	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
	u32                        size;                 /* 0x130   0x4 */
	u32                        size_mask;            /* 0x134   0x4 */
	u16                        stride;               /* 0x138   0x2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        full_size;            /* 0x13c   0x4 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	u16                        cqn;                  /* 0x140   0x2 */

	/* XXX 2 bytes hole, try to pack */

	u32                        buf_size;             /* 0x144   0x4 */
	__be32                     doorbell_qpn;         /* 0x148   0x4 */
	__be32                     mr_key;               /* 0x14c   0x4 */
	void *                     buf;                  /* 0x150   0x8 */
	struct mlx4_en_tx_info *   tx_info;              /* 0x158   0x8 */
	struct mlx4_en_rx_ring *   recycle_ring;         /* 0x160   0x8 */
	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168   0x8 */
	u8 *                       bounce_buf;           /* 0x170   0x8 */
	struct mlx4_qp_context     context;              /* 0x178  0xf8 */
	/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
	int                        qpn;                  /* 0x270   0x4 */
	enum mlx4_qp_state         qp_state;             /* 0x274   0x4 */
	u8                         queue_index;          /* 0x278   0x1 */
	bool                       bf_enabled;           /* 0x279   0x1 */
	bool                       bf_alloced;           /* 0x27a   0x1 */

	/* XXX 5 bytes hole, try to pack */

	/* --- cacheline 10 boundary (640 bytes) --- */
	struct netdev_queue *      tx_queue;             /* 0x280   0x8 */
	int                        hwtstamp_tx_type;     /* 0x288   0x4 */

	/* size: 704, cachelines: 11, members: 36 */
	/* sum members: 587, holes: 6, sum holes: 65 */
	/* padding: 52 */
};

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:03:37 -05:00
Roi Dayan
de0af0bf64 net/mlx5e: Enforce min inline mode when offloading flows
A flow should be offloaded only if the matches are
allowed according to min inline mode.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Roi Dayan
bffaa91658 net/mlx5: E-Switch, Add control for inline mode
Implement devlink show and set of HW inline-mode.
The supported modes: none, link, network, transport.
We currently support one mode for all vports so set is done on all vports.
When eswitch is first initialized the inline-mode is queried from the FW.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Roi Dayan
34e4e99078 net/mlx5: Enable to query min inline for a specific vport
Also move the inline capablities enum to a shared header vport.h

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Or Gerlitz
20a1ea6747 net/mlx5e: Support VF vport link state control for SRIOV switchdev mode
Reflect the administative link changes done on the VF representor to the
VF e-switch vport. This means that doing ip link set down/up commands on
the VF rep will modify the e-switch vport state which in turn will make
proper VF drivers to set their carrier accordingly.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Or Gerlitz
370bad0f9a net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode
Switchdev driver net-device port statistics should follow the model introduced
in commit a5ea31f573 'Merge branch net-offloaded-stats'.

For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats
if asked. For the PF, if we're in the switchdev mode, we return the uplink stats
and SW stats if asked, otherwise as before. The uplink stats are implemented using
the PPCNT 802_3 counters which are already being read/cached by the driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Or Gerlitz
3df5b3c675 net: Add net-device param to the get offloaded stats ndo
Some drivers would need to check few internal matters for
that. To be used in downstream mlx5 commit.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24 16:01:14 -05:00
Tariq Toukan
b6e01232e2 net/mlx4_en: Free netdev resources under state lock
Make sure mlx4_en_free_resources is called under the netdev state lock.
This is needed since RCU dereference of XDP prog should be protected.

Fixes: 326fe02d1e ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sagi Grimberg <sagi@grimberg.me>
CC: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-23 20:18:36 -05:00
Ivan Vecera
a50c1e3565 mlxsw: core: Implement thermal zone
Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 10:04:19 -05:00
Jiri Pirko
55c63aaa69 mlxsw: reg: Add Management Fan Speed Limit register
The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 10:04:19 -05:00
Eric Dumazet
dad42c3038 mlx4: avoid unnecessary dirtying of critical fields
While stressing a 40Gbit mlx4 NIC with busy polling, I found false
sharing in mlx4 driver that can be easily avoided.

This patch brings an additional 7 % performance improvement in UDP_RR
workload.

1) If we received no frame during one mlx4_en_process_rx_cq()
   invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons

2) Do not refill rx buffers if we have plenty of them.
   This avoids false sharing and allows some bulk/batch optimizations.
   Page allocator and its locks will thank us.

Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
cpu handling NIC IRQ should be changed. We should return budget-1
instead, to not fool net_rx_action() and its netdev_budget.

v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 11:33:31 -05:00
Daniel Borkmann
a055c19be9 bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled, otherwise we leak the program.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 11:25:58 -05:00
Daniel Borkmann
c54c062904 bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
There are multiple issues in mlx5e_xdp_set():

1) The batched bpf_prog_add() is currently not checked for errors. When
   doing so, it should be done at an earlier point in time to makes sure
   that we cannot fail anymore at the time we want to set the program for
   each channel. The batched refs short-cut can only be performed when we
   don't need to perform a reset for changing the rq type and the device
   was in opened state. In case the device was not in opened state, then
   the next mlx5e_open_locked() will aquire the refs from the control prog
   via mlx5e_create_rq(), same when we need to perform a reset.

2) When swapping the priv->xdp_prog, then no extra reference count must be
   taken since we got that from call path via dev_change_xdp_fd() already.
   Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
   without checking the return code could fail.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 11:25:58 -05:00
Daniel Borkmann
97bc402db7 bpf, mlx5: fix mlx5e_create_rq taking reference on prog
In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail since 92117d8443
("bpf: fix refcnt overflow"), so we really must check it. Take the reference
right when we assign it to the rq from priv->xdp_prog, and just drop the
reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.

Fixes: 86994156c7 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21 11:25:57 -05:00
Gal Pressman
9c7262399b net/mlx5e: Expose PCIe statistics to ethtool
This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 12:08:58 -05:00
Huy Nguyen
012e50e109 net/mlx5: Set driver version into firmware
If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.

Example of driver version: "Linux,mlx5_core,3.0-1"

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 12:08:57 -05:00
Huy Nguyen
bedb7c909c net/mlx5e: Add port module event counters to ethtool stats
Add port module event counters to ethtool -S command

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 12:08:57 -05:00
Huy Nguyen
d4eb4cd78b net/mlx5: Add handling for port module event
For each asynchronous port module event:
  1. print with ratelimit to the dmesg log
  2. increment the corresponding event counter

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 12:08:57 -05:00
Mohamad Haj Yahia
0ac3ea7089 net/mlx5: Make the command interface cache more flexible
Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18 12:08:56 -05:00
Eric Dumazet
d30d9ccbfa net/mlx5e: remove napi_hash_del() calls
Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 12:06:20 -05:00
Eric Dumazet
bb07fafaec net/mlx4_en: remove napi_hash_del() call
There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()

netif_napi_del() does this already.

Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 11:17:05 -05:00
Vadim Pasternak
d556e92916 mlxsw: minimal: Add I2C support for Mellanox ASICs
Add I2C access support for Mellanox ASICs:
- Virtual Protocol Interconnect switches SwitchX, SwitchX2,
  providing InfiniBand, Ethernet and Fibre Channel connectivity;
- Infiniband switches SwitchIB, SwitchIB2:
- Ethernet switch Spectrum.

Example of probing activation:
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Vadim Pasternak
4ebd00bc5f mlxsw: Invoke driver's init/fini methods only if defined
We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.

Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00
Vadim Pasternak
6882b0aee1 mlxsw: Introduce support for I2C bus
Add I2C bus implementation for Mellanox Technologies Switch ASICs.
This includes command interface implementation using input / out mailboxes,
whose location is retrieved from the firmware during probe time.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 23:29:04 -05:00