linux

Author	SHA1	Message	Date
Matthew Finlay	69976fb104	net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue When MLX5_EN=y MLX5_CORE=y and VXLAN=m there is a linker error for vxlan_get_rx_port() due to the fact that VXLAN is a module. Change Kconfig to select VXLAN when MLX5_CORE=y. When MLX5_CORE=m there is no dependency on the value of VXLAN. Fixes: `b3f63c3d5e` ('net/mlx5e: Add netdev support for VXLAN tunneling') Signed-off-by: Matthew Finlay <matt@mellanox.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:26 -04:00
Gal Pressman	5f8a02a441	net/mlx5: Unmap only the relevant IO memory mapping When freeing UAR the driver tries to unmap uar->map and uar->bf_map which are mutually exclusive thus always unmapping a NULL pointer. Make sure we only call iounmap() once, for the actual mapping. Fixes: `0ba422410b` ('net/mlx5: Fix global UAR mapping') Signed-off-by: Gal Pressman <galp@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:25 -04:00
Maor Gottlieb	45bf454ae8	net/mlx5e: Enabling aRFS mechanism Accelerated RFS requires that ntuple filtering is enabled via ethtool and driver supports ndo_rx_flow_steer. When the ntuple filtering is enabled, we modify the l3_l4 ttc rules to point on the aRFS flow tables and when the filtering is disabled, we modify the l3_l4 ttc rules to point on the RSS TIRs. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:12 -04:00
Maor Gottlieb	18c908e477	net/mlx5e: Add accelerated RFS support Implement ndo_rx_flow_steer ndo. A new flow steering rule will be composed from the skb 4-tuple and added to the hardware aRFS flow table. Each rule is stored in an internal hash table, if such skb 4-tuple rule already exists we update the corresponding hardware steering rule with the new destination. For garbage collection rps_may_expire_flow will be invoked for a limited amount of old rules upon any ndo_rx_flow_steer invocation. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	1cabe6b096	net/mlx5e: Create aRFS flow tables Create the following four flow tables for aRFS usage: 1. IPv4 TCP - filtering 4-tuple of IPv4 TCP packets. 2. IPv6 TCP - filtering 4-tuple of IPv6 TCP packets. 3. IPv4 UDP - filtering 4-tuple of IPv4 UDP packets. 4. IPv6 UDP - filtering 4-tuple of IPv6 UDP packets. Each flow table has two flow groups: one for the 4-tuple filtering (full match) and the other contains * rule for miss rule. Full match rule means a hit for aRFS and packet will be forwarded to the dedicated RQ/Core, miss rule packets will be forwarded to default RSS hashing. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	5a7b27eb9c	net/mlx5: Initializing CPU reverse mapping Allocating CPU rmap and add entry for each IRQ. CPU rmap is used in aRFS to get the RX queue number of the RX completion interrupts. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	33cfaaa8f3	net/mlx5e: Split the main flow steering table Currently, the main flow table is used for two purposes: One is to do mac filtering and the other is to classify the packet l3-l4 header in order to steer the packet to the right RSS TIR. This design is very complex, for each configured mac address we have to add eleven rules (rule for each traffic type), the same if the device is put to promiscuous/allmulti mode. This scheme isn't scalable for future features like aRFS. In order to simplify it, the main flow table is split to two flow tables: 1. l2 table - filter the packet dmac address, if there is a match we forward to the ttc flow table. 2. TTC (Traffic Type Classifier) table - classify the traffic type of the packet and steer the packet to the right TIR. In this new design, when new mac address is added, the driver adds only one flow rule instead of eleven. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	acff797cd1	net/mlx5e: Refactor mlx5e flow steering structs Slightly refactor and re-order the flow steering structs, tables and data-bases for better self-containment and flexibility to add more future steering phases (tables/rules/data bases) e.g: aRFS. Changes: 1. Move the vlan DB and address DB into their table structs. 2. Rename steering table structs to unique format: mlx5e_*_table, e.g: mlx5e_vlan_table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:10 -04:00
Maor Gottlieb	13de6c106c	net/mlx5: Support different attributes for priorities in namespace Currently, namespace could be initialized only with priorities with the same attributes. Add support to initialize namespace with priorities with different attributes(e.g. different number of levels). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:10 -04:00
Maor Gottlieb	d63cd28608	net/mlx5: Add user chosen levels when allocating flow tables Currently, consumers of the flow steering infrastructure can't choose their own flow table levels and are limited to one flow table per level. This just waste levels. Instead, we introduce here the possibility to use multiple flow tables in a level. The user is free to connect these flow tables, while following the rule (FTEs in FT of level x could only point to FTs of level y where y > x). In addition this patch switch the order of the create/destroy flow tables of the NIC(vlan and main). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:09 -04:00
Maor Gottlieb	a257b94a18	net/mlx5: Set number of allowed levels in priority Refactors the flow steering namespace creation, by changing the name num_fts to num_levels. When new flow table is created, the driver assign new level to this flow table therefore the meaning is equivalent. Since downstream patches will introduce the ability to create more than one flow table per level, the name num_fts is no longer accurate. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:09 -04:00
Maor Gottlieb	d745098ced	net/mlx5: Introduce modify flow rule destination This API is used for modifying the flow rule destination. This is needed for modifying the pointed flow table by the traffic type classifier rules to point on the aRFS tables. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Tariq Toukan	1da366964e	net/mlx5e: Direct TIR per RQ Introduce new TIRs for direct access per RQ. Now we have 2 available kinds of TIRs: - indirect TIR per traffic type, each points to one RQT (RSS RQT) same as before. - New direct TIR per RQ, each points to RQT with a size of one that forwards packets to that RQ only. Driver will open max channels (num cores) direct TIRs by default, they will be filled with the actual RQs once channels are allocated. Needed for downstream aRFS and ethtool direct steering functionalities. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Matthew Finlay	01a14098d3	net/mlx5e: Call vxlan_get_rx_port() with rtnl lock Hold the rtnl lock when calling vxlan_get_rx_port(). Fixes: `b7aade1548` ("vxlan: break dependency with netdev drivers") Signed-off-by: Matthew Finlay <matt@mellanox.com> Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Arnd Bergmann	6b87663fbe	net/mlx5e: avoid stack overflow in mlx5e_open_channels struct mlx5e_channel_param is a large structure that is allocated on the stack of mlx5e_open_channels, and with a recent change it has grown beyond the warning size for the maximum stack that a single function should use: mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels': mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] The function is already using dynamic allocation and is not in a fast path, so the easiest workaround is to use another kzalloc for allocating the channel parameters. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `d3c9bc2743` ("net/mlx5e: Added ICO SQs") Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:46:59 -04:00
Masanari Iida	c01e01597c	treewide: Fix typos in printk This patch fix spelling typos in printk from various part of the codes. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-04-28 10:52:28 +02:00
David S. Miller	c0cc53162a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 15:43:10 -04:00
Saeed Mahameed	1b223dd391	net/mlx5e: Fix checksum handling for non-stripped vlan packets Now as rx-vlan offload can be disabled, packets can be received with vlan tag not stripped, which means is_first_ethertype_ip will return false, for that we need to check if the hardware reported csum OK so we will report CHECKSUM_UNNECESSARY for those packets. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:03 -04:00
Gal Pressman	363501145e	net/mlx5e: Add ethtool support for rxvlan-offload (vlan stripping) Use ethtool -K <interface> rxvlan <on/off> to enable/disable C-TAG vlan stripping by hardware. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	bb64143eee	net/mlx5e: Add ethtool support for dump module EEPROM Add query MCIA, PMLP registers infrastructure and commands. Add ethtool support for get_module_info() and get_module_eeprom() callbacks. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	da54d24ec3	net/mlx5e: Add ethtool support for interface identify (LED blinking) Add the needed hardware command and mlx5_ifc structs for managing LED control. Add set_phys_id ethtool callback to support ethtool -p flag. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Eran Ben Elisha	94cb1ebbaf	net/mlx5e: Add support for RXALL netdev feature Introduce new access register named Ports Check Mask Register (PCMR) to control all HW checks on port. With this register, the driver can enable/disable Hardware FCS validation. When RXALL is enabled/disabled using ndo_set_features, enable/disable fcs check at HW. User can change HW configuration using rx-all flag at ethtool. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	0e405443e8	net/mlx5e: Improve set features ndo resiliency In current mlx5e ndo_set_features implementation, setting some features can success while others can fail. Today, we return one error code which doesn't reflect the current features status of the netdev at the end of the ndo callback. Set netdev->features with features which were successfully set in order to keep the current status in case of failure. For this purpose, define new Macro to set/unset specific feature in netdev->features. This patch introduces a mechanism that uses feature handlers for each feature. Set features will call a generic handler, which will then call a specific handler in his turn and update netdev->features according to it's return value. Each specific handler is responsible to perform driver specific actions, and updating params if needed. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	121fcdc84d	net/mlx5e: Add link down events counter Expose link_down_events counter through ethtool -S. This counter is read from PPort statistics, then proccessed and stored as a special handling software counter. This counter is stored along software counters since it is the only PPort counter that it's size is not 64 bits. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	cf678570d5	net/mlx5e: Add per priority group to PPort counters Expose counters providing information for each priority level (PCP) through ethtool -S option and DCBNL. This includes rx/tx bytes, frames, and pause counters. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	8075cb7238	net/mlx5e: Rename VPort counters VPort and software counters names are confusing and may be unclear, all VPort counters now have a prefix of rx/tx_vport_*. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	9218b44dcc	net/mlx5e: Statistics handling refactoring Redesign ethtool statistics handling and reporting in the driver: 1. Move counters to a separate file (en_stats.h). 2. Remove unnecessary dependencies between stats and strings. 3. Use counter descriptors which hold a name and offset for each counter, and will be used to decide which counters will be exposed. For example when adding a new software counter to ethtool, instead of: 1. Add to stats struct. 2. Add to strings struct in the same order. 3. Change macro defining number of software counters. The only thing needed is to link the new counter to a counter descriptor. VPort counters are a set of hardware traffic counters created automatically for each virtual port opened. PPort counters are a set of counters describing per physical port performance statistics. These counters are gathered from hardware register and divided to groups according to different protocols. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	269e6b3af3	net/mlx5e: Report additional error statistics in get stats ndo Provide rtnl_link_stats64 with information regarding physical errors to be seen in ifconfig and ip tool. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:00 -04:00
Eric Dumazet	fc96256c90	net/mlx4_en: fix spurious timestamping callbacks When multiple skb are TX-completed in a row, we might incorrectly keep a timestamp of a prior skb and cause extra work. Fixes: `ec693d4701` ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 01:13:18 -04:00
Majd Dibbiny	5fc7197d3a	net/mlx5: Add pci shutdown callback This patch introduces kexec support for mlx5. When switching kernels, kexec() calls shutdown, which unloads the driver and cleans its resources. In addition, remove unregister netdev from shutdown flow. This will allow a clean shutdown, even if some netdev clients did not release their reference from this netdev. Releasing The HW resources only is enough as the kernel is shutting down Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Eli Cohen	78228cbdeb	net/mlx5_core: Remove static from local variable The static is not required and breaks re-entrancy if it will be required. Fixes: `2530236303` ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	cd255efff9	net/mlx5e: Use vport MTU rather than physical port MTU Set and report vport MTU rather than physical MTU, Driver will set both vport and physical port mtu and will rely on the query of vport mtu. SRIOV VFs have to report their MTU to their vport manager (PF), and this will allow them to work with any MTU they need without failing the request. Also for some cases where the PF is not a port owner, PF can work with MTU less than the physical port mtu if set physical port mtu didn't take effect. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	d8edd2469a	net/mlx5e: Fix minimum MTU Minimum MTU that can be set in Connectx4 device is 68. This fixes the case where a user wants to set invalid MTU, the driver will fail to satisfy this request and the interface will stay down. It is better to report an error and continue working with old mtu. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	046339eaab	net/mlx5e: Device's mtu field is u16 and not int For set/query MTU port firmware commands the MTU field is 16 bits, here I changed all the "int mtu" parameters of the functions wrapping those firmware commands to be u16. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Majd Dibbiny	64dbbdfef2	net/mlx5_core: Add ConnectX-5 to list of supported devices Add the upcoming ConnectX-5 devices (PF and VF) to the list of supported devices by the mlx5 driver. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Rana Shahout	6e4c218946	net/mlx5e: Fix MLX5E_100BASE_T define Bit 25 of eth_proto_capability in PTYS register is 1000Base-TT and not 100Base-T. Fixes: `f62b8bb8f2` ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Maor Gottlieb	c3f9bf628b	net/mlx5_core: Fix soft lockup in steering error flow In the error flow of adding flow rule to auto-grouped flow table, we call to tree_remove_node. tree_remove_node locks the node's parent, however the node's parent is already locked by mlx5_add_flow_rule and this causes a deadlock. After this patch, if we failed to add the flow rule, we unlock the flow table before calling to tree_remove_node. fixes: `f0d22d1874` ('net/mlx5_core: Introduce flow steering autogrouped flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reported-by: Amir Vadai <amir@vadai.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
David S. Miller	1602f49b58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 18:51:33 -04:00
Hannes Frederic Sowa	0c5c3252c4	mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock mlx4_en_start_port requires rtnl_lock to be held. Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:43 -04:00
Tariq Toukan	5498440756	net/mlx5e: Add ethtool counter for RX buffer allocation failures Counts the number of RX buffer allocation failures and shows it in ethtool statistics. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Saeed Mahameed	e20a0db304	net/mlx5e: Delay skb->data access Move mlx5e_handle_csum and eth_type_trans to the end of mlx5e_build_rx_skb to gain some more time before accessing skb->data, to reduce cache misses. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Tariq Toukan	1bfec31627	net/mlx5e: Remove redundant barrier The bit-op operation one line before is an explicit barrier by itself. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	c5adb96f6c	net/mlx5e: Use napi_alloc_skb for RX SKB allocations Instead of netdev_alloc_skb, we use the napi_alloc_skb function which is designated to allocate skbuff's for RX in a channel-specific NAPI instance, and implies the IP packet alignment. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	bc77b240b3	net/mlx5e: Add fragmented memory support for RX multi packet WQE If the allocation of a linear (physically continuous) MPWQE fails, we allocate a fragmented MPWQE. This is implemented via device's UMR (User Memory Registration) which allows to register multiple memory fragments into ConnectX hardware as a continuous buffer. UMR registration is an asynchronous operation and is done via ICO SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	d3c9bc2743	net/mlx5e: Added ICO SQs Added ICO (Internal Control Operations) SQ per channel to be used for driver internal operations such as memory registration for fragmented memory and nop requests upon ifconfig up. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	461017cb00	net/mlx5e: Support RX multi-packet WQE (Striding RQ) Introduce the feature of multi-packet WQE (RX Work Queue Element) referred to as (MPWQE or Striding RQ), in which WQEs are larger and serve multiple packets each. Every WQE consists of many strides of the same size, every received packet is aligned to a beginning of a stride and is written to consecutive strides within a WQE. In the regular approach, each regular WQE is big enough to be capable of serving one received packet of any size up to MTU or 64K in case of device LRO is enabled, making it very wasteful when dealing with small packets or device LRO is enabled. For its flexibility, MPWQE allows a better memory utilization (implying improvements in CPU utilization and packet rate) as packets consume strides according to their size, preserving the rest of the WQE to be available for other packets. MPWQE default configuration: Num of WQEs = 16 Strides Per WQE = 2048 Stride Size = 64 byte The default WQEs memory footprint went from 1024mtu (~1.5MB) to 16 2048 * 64 = 2MB per ring. However, HW LRO can now be supported at no additional cost in memory footprint, and hence we turn it on by default and get an even better performance. Performance tested on ConnectX4-Lx 50G. To isolate the feature under test, the numbers below were measured with HW LRO turned off. We verified that the performance just improves when LRO is turned back on. * Netperf single TCP stream: - BW raised by 10-15% for representative packet sizes: default, 64B, 1024B, 1478B, 65536B. * Netperf multi TCP stream: - No degradation, line rate reached. * Pktgen: packet rate raised by 2-10% for traffic of different message sizes: 64B, 128B, 256B, 1024B, and 1500B. * Pktgen: packet loss in bursts of small messages (64byte), single stream: - \| num packets \| packets loss before \| packets loss after \| 2K \| ~ 1K \| 0 \| 8K \| ~ 6K \| 0 \| 16K \| ~13K \| 0 \| 32K \| ~28K \| 0 \| 64K \| ~57K \| ~24K As expected as the driver can receive as many small packets (<=64B) as the number of total strides in the ring (default = 2048 * 16) vs. 1024 (default ring size regardless of packets size) before this feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	2f48af128d	net/mlx5e: Use function pointers for RX data path handling In preparation for Striding RQ feature, which will need its own RX handlers. This patch does not change any functionality. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	d8c9660dac	net/mlx5e: Use only close NUMA node for default RSS Distribute default RSS table uniformly over the rings of the close NUMA node, instead of all available channels. This way we enforce the preference of close rings over far ones. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Rana Shahout	593cf33829	net/mlx5e: Allocate set of queue counters per netdev Connect all netdev RQs to this set of queue counters. Also, add an "rx_out_of_buffer" counter to ethtool, which indicates RX packet drops due to lack of receive buffers. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	237cd21809	net/mlx5: Introduce device queue counters A queue counter can collect several statistics for one or more hardware queues (QPs, RQs, etc ..) that the counter is attached to. For Ethernet it will provide an "out of buffer" counter which collects the number of all packets that are dropped due to lack of software buffers. Here we add device commands to alloc/query/dealloc queue counters. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Eran Ben Elisha	d21ed3a311	net/mlx4_en: Split SW RX dropped counter per RX ring Count SW packet drops per RX ring instead of a global counter. This will allow monitoring the number of rx drops per ring. In addition, SW rx_dropped counter was overwritten by HW rx_dropped counter, sum both of them instead to show the accurate value. Fixes: `a3333b35da` ('net/mlx4_en: Moderate ethtool callback to [...] ') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Eugenia Emantayev	2a500090a4	net/mlx4_core: Don't allow to VF change global pause settings Currently changing global pause settings is done via SET_PORT command with input modifier GENERAL. This command is allowed for each VF since MTU setting is done via the same command. Change the above to the following scheme: before passing the request to the FW, the PF will check whether it was issued by a slave. If yes, don't change global pause and warn, otherwise change to the requested value and store for further reference. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Daniel Jurgens	4bfd2e6e53	net/mlx4_core: Avoid repeated calls to pci enable/disable Maintain the PCI status and provide wrappers for enabling and disabling the PCI device. Performing the actions more than once without doing its opposite results in warning logs. This occurred when EEH hotplugged the device causing a warning for disabling an already disabled device. Fixes: `2ba5fbd62b` ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Daniel Jurgens	c12833acff	net/mlx4_core: Implement pci_resume callback Move resume related activities to a new pci_resume function instead of performing them in mlx4_pci_slot_reset. This change is needed to avoid a hotplug during EEH recovery due to commit `f2da4ccf8b` ("powerpc/eeh: More relaxed hotplug criterion"). Fixes: `2ba5fbd62b` ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:39 -04:00
Konstantin Khlebnikov	851b10d608	net/mlx4_en: do batched put_page using atomic_sub This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Konstantin Khlebnikov	04aeb56a17	net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC High order pages are optional here since commit `51151a16a6` ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Masanari Iida	c19ca6cb4c	treewide: Fix typos in printk This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-04-18 11:23:24 +02:00
Jiri Pirko	b94cdabbf1	mlxsw: spectrum_buffers: Use MLXSW_SP_PB_UNUSED define for unused pb Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:43 -04:00
Jiri Pirko	ce78f02042	mlxsw: spectrum_buffers: Use designated initializers for mlxsw_sp_pbs Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:42 -04:00
Jiri Pirko	2d0ed39fbd	mlxsw: spectrum_buffers: Implement occupancy monitoring Implement occupancy API introduced in devlink and mlxsw core. This is done by accessing SBPM register for Port-Pool and SBSR for Port-TC current and max occupancy values. Max clear is implemented using the same registers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	caf7297e7a	mlxsw: core: Introduce support for asynchronous EMAD register access So far it was possible to have one EMAD register access at a time, locked by mutex. This patch extends this interface to allow multiple EMAD register accesses to be in fly at once. That allows faster processing on firmware side avoiding unused time in between EMADs. Measured speedup is ~30% for shared occupancy snapshot operation. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	dd9bdb04d2	mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processing Follow-up patch is going to need to use delayed work as well and frequently. The FDB notification processing is already using that and also quite frequently. It makes sense to create separate workqueue just for mlxsw driver in this case and do not pollute system_wq. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	42a7f1d774	mlxsw: reg: Extend SBPM register for occupancy control Since it is not possible to get and clear Port-Pool occupancy data using SBSR register, there's a need to implement that using SBPM. Extend pack helper and add unpack helper to get occupancy values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	26176def3c	mlxsw: reg: Add Shared Buffer Status register definition This register allows to query HW for current and maximal buffer usage. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	1ceecc88d2	mlxsw: core: Add devlink shared buffer occupancy callbacks Add middle layer in mlxsw core code to forward shared buffer occupancy calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	0f433fa0ec	mlxsw: spectrum_buffers: Implement shared buffer configuration Implement previously introduced mlxsw core shared buffer API. For Spectrum, that is done utilizing registers SBPR, SBCM and SBPM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	325f2f197d	mlxsw: core: Add mlxsw_core_port_driver_priv helper Needed in following patch. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	c30a53c7de	mlxsw: spectrum_buffers: Get max_buff defaults into limits exposed to user Although the device supports max_buff magic values 0 and 0xff, these are not exposed to the user via devlink. Therefore, adjust the default values to be within configurable range. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	bc872506f5	mlxsw: spectrum_buffers: Change initialization of PG 9 As explained in commit `ff6551ec0c` ("mlxsw: spectrum: Correctly configure headroom size") control packets are directed to priority group buffer 9 (PG9) in the ports' headroom buffers. Since we don't want to drop control packets in case they can't be admitted to the switch's shared buffer we bind PG9 to a different ingress pool from the one used by all other PGs. Unlike other PGs, we currently don't expose the binding between PG9 to a pool and leave it fixed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	5408f7cba3	mlxsw: spectrum_buffers: Remove eg pool 3 default init and CPU port TC binding to it Since there is no congestion control for CPU port traffic, we can change the CPU port TC binding to pool 0 with min_buff and max_buff zeroed. Remove initialization for pool egress pool 3 since it is no longer used by dafault. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	078f9c7132	mlxsw: spectrum_buffers: Cache shared buffer configuration In order to achieve faster dumping of current setting and also in order to provide possibility to get pool mode without a need to query hardware, do cache the configuration in driver. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	aa99bc70ba	mlxsw: spectrum_buffers: Rename "pool" to "pr" in initialization Be consintent with rest of the registers (pm, cm) and use "pr" here. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	b11c3b4018	mlxsw: spectrum_buffers: Push out indexes and direction out of SB structs Structs are in arrays so use array index as pool/tc/prio index. With that, there is need to maintain separate arrays for ingress and egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	94266e3278	mlxsw: spectrum_buffers: Push out shared buffer register writes Pushed them into helper functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:03 -04:00
Jiri Pirko	a6179bf0d1	mlxsw: core: Add devlink shared buffer callbacks Add middle layer in mlxsw core code to forward shared buffer calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:03 -04:00
Jiri Pirko	9efc8f655c	mlxsw: reg: Fix SBPM register name Fix copy&paste error and state the name of SBPM register correctly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	497e8592c6	mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM Same field, same values, so share the same enum. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	b2f10571b9	mlxsw: Do not pass around driver_priv directly Instead of that, pass mlxsw_core and use a helper to get driver priv from driver code. Looks much cleaner that way. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	307c2431ab	mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit* Instead of passing around driver priv, pass struct mlxsw_core * directly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	932762b69a	mlxsw: Move devlink port registration into common core code Remove devlink port reg/unreg from spectrum and switchx2 code and rather do the common work in core. That also ensures code separation where devlink is only used in core.c. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Ido Schimmel	d81a6bdb87	mlxsw: spectrum: Add IEEE 802.1Qbb PFC support Implement the appropriate DCB ops and allow a user to configure certain traffic classes as lossless. The operation configures PFC for both the egress (respecting PFC frames) and ingress (sending PFC frames) parts of the port. At egress, when a PFC frame is received for a PFC enabled priority, then all the priorities mapped to the same TC are stopped. At ingress, the priority group (PG) buffers to which the enabled PFC priorities are mapped are configured to be lossless. PFC frames will be transmitted when the Xoff threshold is crossed. The user-supplied delay parameter is used to determine the PG's size according to the following formula: PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU In the worst case scenario the delay will be made up of packets that are all of size CELL_SIZE + 1, which means each packet will require almost twice its true size when buffered in the switch. We therefore multiply this value by the "cell factor", which is close to 2. Another MTU is added in case the transmitting host already started transmitting a maximum length frame when the PFC packet was received. As with PAUSE enabled ports, when the port's MTU is changed both the PGs' size and threshold are adjusted accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:20 -04:00
Ido Schimmel	34dba0a59d	mlxsw: reg: Introduce per priority counters We are going to add support for PFC as part of DCB ops, which requires us to report the number of PFC frames sent and received per priority. Add per priority counters in order to report number of PFC frames sent and received per priority. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:20 -04:00
Ido Schimmel	9f7ec052b7	mlxsw: spectrum: Add support for PAUSE frames When a packet ingress the switch it's placed in its assigned priority group (PG) buffer in the port's headroom buffer while it goes through the switch's pipeline. After going through the pipeline - which determines its egress port(s) and traffic class - it's moved to the switch's shared buffer awaiting transmission. However, some packets are not eligible to enter the shared buffer due to exceeded quotas or insufficient space. Marking their associated PGs as lossless will cause the packets to accumulate in the PG buffer. Another reason for packets accumulation are complicated pipelines (e.g. involving a lot of ACLs). To prevent packets from being dropped a user can enable PAUSE frames on the port. This will mark all the active PGs as lossless and set their size according to the maximum delay, as it's not configured by user. +----------------+ + \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Delay \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Xon/Xoff threshold +----------------+ + \| \| \| \| \| \| 2 * MTU \| \| \| +----------------+ + The delay (612 [Cells]) was calculated according to worst-case scenario involving maximum MTU and 100m cables. After marking the PGs as lossless the device is configured to respect incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE) according to user's settings. Whenever the port's headroom configuration changes we take into account the PAUSE configuration, so that we correctly set the PG's type (lossy / lossless), size and threshold. This can happen when: a) The port's MTU changes, as it directly affects the PG's size. b) A PG is created following user configuration, by binding a priority to it. Note that the relevant SUPPORTED flags were already mistakenly set by the driver before this commit. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	155f9de2e0	mlxsw: reg: Add lossless settings for PBMC register When configuring PAUSE frames and PFC we'll need to configure the Xon/Xoff threshold for the priority group (PG) buffers. Add the Xon/Xoff threshold fields to the PBMC register so that we can configure these when needed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	6f253d8381	mlxsw: reg: Add Port Flow Control Configuration register Add the Port Flow Control Configuration (PFCC) register, which configures both flow control and Priority-based Flow Control (PFC). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	cc7cf51758	mlxsw: spectrum: Allow setting maximum rate for a TC Allow a user to set maximum rate for a particular TC using DCB ops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	8e8dfe9fdf	mlxsw: spectrum: Add IEEE 802.1Qaz ETS support Implement the appropriate DCB ops and allow a user to configure: * Priority to traffic class (TC) mapping with a total of 8 supported TCs * Transmission selection algorithm (TSA) for each TC and the corresponding weights in case of weighted round robin (WRR) As previously explained, we treat the priority group (PG) buffer in the port's headroom as the ingress counterpart of the egress TC. Therefore, when a certain priority to TC mapping is configured, we also configure the port's headroom buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	f00817df2b	mlxsw: spectrum: Introduce support for Data Center Bridging (DCB) Introduce basic infrastructure for DCB and add the missing ops in following patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	90183b980d	mlxsw: spectrum: Initialize egress scheduling Before introducing support for DCB ops we should first make sure we initialize the relevant parts in the device correctly. Specifically, the egress scheduling. The device supports a superset of the 802.1Qaz standard with 4 hierarchy levels that can be linked to each other in multiple ways and with different transmission selection algorithms (TSA) employed between them. However, since we only intend to support the 802.1Qaz standard we flatten the hierarchies and let the user configure via DCB ops the TSA and max rate shaper at the subgroup hierarchy (see figure below) and the mapping between switch priority to traffic class. By default, all switch priorities are mapped to traffic class 0, strict priority is employed and max shaper is disabled. Default configuration: switch priority 0 ... switch priority 7 + + \| \| +----------------------------------+ \| +--v--+ +-----+ Traffic Class \| \| \| \| Hierarchy \| TC0 \| ... \| TC7 \| \| \| \| \| +--+--+ +--+--+ \| \| +--v--+ +--v--+ Subgroup \| SG0 \| \| SG7 \| Hierarchy \| \| \| \| +-----+ +-----+ \| TSA \| \| TSA \| +-----+ ... +-----+ \| MAX \| \| MAX \| +--+--+ +--+--+ \| \| +---------------+----------------+ \| +--v--+ Group \| \| Hierarchy \| GR0 \| \| \| +--+--+ \| +--v--+ Port \| \| Hierarchy \| PR0 \| \| \| +-----+ Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	2c63a555e8	mlxsw: reg: Add QoS Switch Traffic Class Table register As part of DCB ops we'll have to configure the priority to traffic class mapping of a port. Add the QoS Switch Traffic Class Table (QTCT) register, which configures the mapping between the packet switch priority and traffic class on the transmit port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	b9b7cee405	mlxsw: reg: Add QoS ETS Element Configuration register We are going to introduce support for DCB, so we need to be able to configure the traffic selection algorithm (TSA) used by each traffic class (TC), as well as the bandwidth percentage allocated to each TC in case of ETS. Add the QoS ETS Element Configuration register, which controls the above parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00
Ido Schimmel	d6b7c13b01	mlxsw: spectrum: Set port's shared buffer size to 0 In addition to the priority group (PG) buffers in the headroom, the device enables the allocation of headroom shared buffer, which can be shared between different PGs. However, we are not going to use the headroom shared buffer and instead allow the user to use its size for PGs or the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00
Ido Schimmel	7ad7cd6113	mlxsw: reg: Use correct PBMC register length The last field of the PBMC register is at offset 0x64 and its size is 0x8, so the correct register's length is 0x6C bytes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00
Ido Schimmel	ff6551ec0c	mlxsw: spectrum: Correctly configure headroom size When packets ingress the switch they are assigned a switch priority and directed to the corresponding priority group (PG) buffer in the port's headroom buffer. Since we now map all switch priorities to priority group 0 (PG0) by default, there is no need to allocate the other priority groups during initialization. The only exception is PG9, which is used for control traffic. At minimum, the PG should be able to store the currently classified packet (pipeline latency isn't 0) and also the packets arriving during the classification time. However, an incoming packet will not be buffered if there is no available MTU-sized buffer space for storing it. The buffer needed to accommodate for pipeline latency is variable and needs to take into account both the current link speed and current latency of the pipeline, which is time-dependent. Testing showed that setting the PG's size to twice the current MTU is optimal. Since PG9 is used strictly for control packets and not subject to flow control, we are not going to resize it according to user configuration, so we simply set it according to worst case scenario, which is twice the maximum MTU. In any case, later patches in the series will allow a user to direct lossless flows to other PGs than PG0 and set their size to accommodate for round-trip propagation delay. The above change also requires us to resize the PG buffer whenever the port's MTU is changed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00
Ido Schimmel	1a1984490f	mlxsw: spectrum: Add bytes to cells helper Buffers in the switch store packets in units called buffer cells. Add a helper to convert from bytes to cells, so that the actual number of cells required (result is round up) is returned. Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro, as this unit is also used in the ports' buffers and not only the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:16 -04:00
Ido Schimmel	dd6cb0f9fd	mlxsw: spectrum: Map all switch priorities to priority group 0 During transmission, the skb's priority is used to map the skb to a traffic class, where the idea is to group priorities with similar characteristics (e.g. lossy, lossless) to the same traffic class. By default, all priorities are mapped to traffic class 0. In the device, we model the skb's priority as the switch priority, which is assigned to a packet according to its PCP value and ingress port (untagged packets are assigned the port's default switch priority - 0). At ingress, the packet is directed to a priority group (PG) buffer in the port's headroom buffer according to the packet's switch priority and switch priority to buffer mapping. While it's possible to configure the egress mapping between skb's priority (switch priority) and traffic class, there is no mechanism to configure the ingress mapping to a PG. In order to keep things simple and since grouping certain priorities into a traffic class at egress also implies they should be grouped the same at ingress, treat a PG as the ingress counterpart of an egress traffic class. Having established the above, during initialization map all the switch priorities to PG0 in accordance with the Linux defaults for traffic class mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:16 -04:00
Ido Schimmel	b98ff151b6	mlxsw: reg: Add Port Prio To Buffer register When packets ingress the switch they are assigned a switch priority number that dictates the packet's priority group (PG) buffer in the port's headroom buffer. Add the Port Prio To Buffer (PPTB) register, which configures the switch priority to PG mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:16 -04:00
Ido Schimmel	2bf9a58675	mlxsw: spectrum: Add support for physical port names Export to userspace the front panel name of the port, so that udev can rename the ports accordingly. The convention suggested by switchdev documentation is used: 1) Non-split: pX 2) Split: pXsY Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 15:07:54 -04:00
Ido Schimmel	b555cf4a50	mlxsw: spectrum: Reduce number of supported 802.1D bridges Resources allocated for these bridges at init time cannot be later used for other purposes. While current number is supported by the device, it's mostly theoretical with regards to any real use case, which leads to poor utilization of device's resources. Solve that by reducing the number. The long term plan is to make this value (along with others) user configurable via devlink and write it to NVRAM, so that it can be used during the next init. Until then we must hardcode such values. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 15:07:54 -04:00
Linus Torvalds	aca04ce5db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking bugfixes from David Miller: "Several bug fixes rolling in, some for changes introduced in this merge window, and some for problems that have existed for some time: 1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda. 2) The new DST_CACHE should be a silent config option, from Dave Jones. 3) inet_current_timestamp() unintentionally truncates timestamps to 16-bit, from Deepa Dinamani. 4) Missing reference to netns in ppp, from Guillaume Nault. 5) Free memory reference in hv_netvsc driver, from Haiyang Zhang. 6) Missing kernel doc documentation for function arguments in various spots around the networking, from Luis de Bethencourt. 7) UDP stopped receiving broadcast packets properly, due to overzealous multicast checks, fix from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) net: ping: make ping_v6_sendmsg static hv_netvsc: Fix the order of num_sc_offered decrement net: Fix typos and whitespace. hv_netvsc: Fix the array sizes to be max supported channels hv_netvsc: Fix accessing freed memory in netvsc_change_mtu() ppp: take reference on channels netns net: Reset encap_level to avoid resetting features on inner IP headers net: mediatek: fix checking for NULL instead of IS_ERR() in .probe net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY at803x: fix reset handling AF_VSOCK: Shrink the area influenced by prepare_to_wait Revert "vsock: Fix blocking ops call in prepare_to_wait" macb: fix PHY reset ipv4: initialize flowi4_flags before calling fib_lookup() fsl/fman: Workaround for Errata A-007273 ipv4: fix broadcast packets reception net: hns: bug fix about the overflow of mss net: hns: adds limitation for debug port mtu net: hns: fix the bug about mtu setting net: hns: fixes a bug of RSS ...	2016-03-23 23:25:14 -07:00
Linus Torvalds	b8ba452683	Round two of 4.6 merge window patches - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW8HR9AAoJELgmozMOVy/dDYMP+wSBALhIdV/pqVzdLCGfIUbK H5agonm/3b/Oj74W30w2JYqXBFfZC2LGVJy6OwocJ3wK04v/KfZbA9G+QsOuh2hQ Db+tFn1eoltvzrcx3k/a7x6zHGC4YyxyH9OX2B3QfRsNHeE7PG9KGp5dfEs2OH1r WGp3jMLAsHf7o8uKpa0jyTEUEErATaTlG+YoaJ+BGHwurgCNy8ni+wAn+EAFiJ3w iEJhcXB6KY69vkLsrLYuT9xxJn4udFJ3QEk8xdPkpLKsu+6Ue5i/eNQ19VfbpZgR c6fTc8genfIv5S+fis+0P44u1oA7Kl2JT6IZYLi35gJ60ZmxTD+7GruWP3xX/wJ2 zuR3sTj5fjcFWenk087RSIU/EK87ONPD4g9QPdZpf3FtgleTVKk3YDlqwjqf8pgv cO6gQ1BcOBnixJvhjNFiX1c2hvNhb3CkgObly1JBwhcCzZhLkV7BNFPbZuDHAeAx VqzNEUse4hupkgiiuiGgudcJ4fsSxMW37kyfX9QC/qyk6YVuUDbrekcWI+MAKot7 5e5dHqFExpbn1Zgvc8yfvh88H2MUQAgaYwjanWF/qpppOPRd01nTisVQIOJn7s5C arcWzvocpQe0GL2UsvDoWwAABXznL3bnnAoCyTWOES2RhOOcw0Ibw46Jl8FQ8gnl 2IRxQ+ltNEscb2cwi5wE =t2Ko -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "Round two of 4.6 merge window patches. This is a monster pull request. I held off on the hfi1 driver updates (the hfi1 driver is intimately tied to the qib driver and the new rdmavt software library that was created to help both of them) in my first pull request. The hfi1/qib/rdmavt update is probably 90% of this pull request. The hfi1 driver is being left in staging so that it can be fixed up in regards to the API that Al and yourself didn't like. Intel has agreed to do the work, but in the meantime, this clears out 300+ patches in the backlog queue and brings my tree and their tree closer to sync. This also includes about 10 patches to the core and a few to mlx5 to create an infrastructure for configuring SRIOV ports on IB devices. That series includes one patch to the net core that we sent to netdev@ and Dave Miller with each of the three revisions to the series. We didn't get any response to the patch, so we took that as implicit approval. Finally, this series includes Intel's new iWARP driver for their x722 cards. It's not nearly the beast as the hfi1 driver. It also has a linux-next merge issue, but that has been resolved and it now passes just fine. Summary: - A few minor core fixups needed for the next patch series - The IB SRIOV series. This has bounced around for several versions. Of note is the fact that the first patch in this series effects the net core. It was directed to netdev and DaveM for each iteration of the series (three versions total). Dave did not object, but did not respond either. I've taken this as permission to move forward with the series. - The new Intel X722 iWARP driver - A huge set of updates to the Intel hfi1 driver. Of particular interest here is that we have left the driver in staging since it still has an API that people object to. Intel is working on a fix, but getting these patches in now helps keep me sane as the upstream and Intel's trees were over 300 patches apart" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (362 commits) IB/ipoib: Allow mcast packets from other VFs IB/mlx5: Implement callbacks for manipulating VFs net/mlx5_core: Implement modify HCA vport command net/mlx5_core: Add VF param when querying vport counter IB/ipoib: Add ndo operations for configuring VFs IB/core: Add interfaces to control VF attributes IB/core: Support accessing SA in virtualized environment IB/core: Add subnet prefix to port info IB/mlx5: Fix decision on using MAD_IFC net/core: Add support for configuring VF GUIDs IB/{core, ulp} Support above 32 possible device capability flags IB/core: Replace setting the zero values in ib_uverbs_ex_query_device net/mlx5_core: Introduce offload arithmetic hardware capabilities net/mlx5_core: Refactor device capability function net/mlx5_core: Fix caching ATOMIC endian mode capability ib_srpt: fix a WARN_ON() message i40iw: Replace the obsolete crypto hash interface with shash IB/hfi1: Add SDMA cache eviction algorithm IB/hfi1: Switch to using the pin query function IB/hfi1: Specify mm when releasing pages ...	2016-03-22 15:48:44 -07:00
Eli Cohen	1f324bff9b	net/mlx5_core: Implement modify HCA vport command Implement the modify HCA vport commands used to modify the parameters of virtual HCA's ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 17:13:14 -04:00
Eli Cohen	2a4826fe74	net/mlx5_core: Add VF param when querying vport counter Add a vf parameter to mlx5_core_query_vport_counter so we can call it to query counters of virtual functions. Also update current users of the API. PFs may call mlx5_core_query_vport_counter with other_vport set to indicate that they are querying a virtual function. The virtual function to be queried is given by the vf parameter. Virtual function numbering is zero based so the first VF is 0 and so on. When a PF queries its own function, the other_vport parameter is cleared. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 17:13:14 -04:00
Sagi Grimberg	3f0393a575	net/mlx5_core: Introduce offload arithmetic hardware capabilities Define the necessary hardware structures for the offload arithmetic capabilities and read/cache them on driver load. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 16:32:35 -04:00
Leon Romanovsky	b06e7de8a9	net/mlx5_core: Refactor device capability function Device capability function was called similar in all places. It was called twice for every queried parameter, while the difference between calls was in HCA capability mode only. The change proposed unify these calls into one function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 16:29:07 -04:00
Leon Romanovsky	91d9ed8443	net/mlx5_core: Fix caching ATOMIC endian mode capability Add caching of maximum device capability of ATOMIC endian mode. Fixes: `f91e6d8941` ('net/mlx5_core: Add setting ATOMIC endian mode') Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-21 16:29:07 -04:00
Colin Ian King	5a779c4fed	net/mlx4: remove unused array zero_gid[] zero_gid is not used, so remove this redundant array. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-20 16:24:06 -04:00
Linus Torvalds	1200b6809d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...	2016-03-19 10:05:34 -07:00
Eli Cohen	76e39ccf9c	net/mlx4_core: Fix backward compatibility on VFs Commit `85743f1eb3` ("net/mlx4_core: Set UAR page size to 4KB regardless of system page size") introduced dependency where old VF drivers without this fix fail to load if the PF driver runs with this commit. To resolve this add a module parameter which disables that functionality by default. If both the PF and VFs are running with a driver with that commit the administrator may set the module param to true. The module parameter is called enable_4k_uar. Fixes: `85743f1eb3` ('net/mlx4_core: Set UAR page size to 4KB ...') Signed-off-by: Eli Cohen <eli@mellanox.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-18 23:09:45 -04:00
Linus Torvalds	814a2bf957	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...	2016-03-18 19:26:54 -07:00
Linus Torvalds	9ea4463520	Initial roundup of 4.6 merge window patches - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW6aTEAAoJELgmozMOVy/dlAEQAKgT0VwBi6Zd4PihP2UQgsfH LUmbGhCzBpcao1eJ7piOOEYQGSb3slN3Cnup4qBJak+y2mhtErxNkLOIhGRrvcHk XCym7N9uAhp4j++OnUBp6Cpr0hZNmBEBKm6nKqdEcdaxLaVa0ezdcxAOkVlHhZ77 NnhTHvPy8pu4kC8NZCvCIJK+fqW+5Xj+ojAcVKGPV+Y3zf9lfaDCXCSdD2m6+dFX /KV3V/CNUSdYTWrPZSIDhqoYix2AGl5Fg17mfsgBWQB/T405fiwZkd0FEXkqXDkR bOhS5PnuCN+ScwsxMDHCbzqtaOb06sKttg9IE3s0qdFpOwGtbyoU+lLUh1qbjKLP vtEiySZq2Mhlr41ajuUuDSgNbqCTL7+52/HUf8qcjFFiSBlZRaTO8rVJ5tABKRiW SkxkHbR6orx8okKtaWRskKRtYSNkA2uexdIQ/wzc4fJVqzqJUh6Elcxp3dPq/KSN lkrYXNJ5X4ux72QfHRobBX1pBjT0P2+avoFri3763k9ZrsWwY9tXgDUB/OdX11IF gAadgUNw2pHgY10jqCZBOw22F+foB2qx8ZkaNSGYE0h3uQrp+iiCnfeU9rWNCWVv MelRGpfGa7VF3RTDojc7Dq7JpWRUChMx9BY+XrQPmV08Z+JGoVuRT20Q7twgillz Yb3aGRKZNtqYehj9fM4n =kTkT -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...	2016-03-18 09:39:22 -07:00
Joonsoo Kim	fe896d1878	mm: introduce page reference manipulation functions The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Doug Ledford	d2ad9cc759	Merge branches 'mlx4', 'mlx5' and 'ocrdma' into k.o/for-4.6	2016-03-16 13:38:28 -04:00
Arnd Bergmann	baefd7015c	mlx4: add missing braces in verify_qp_parameters The implementation of QP paravirtualization back in linux-3.7 included some code that looks very dubious, and gcc-6 has grown smart enough to warn about it: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation] if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) { ^~ drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not if (slave != mlx4_master_func_num(dev)) >From looking at the context, I'm reasonably sure that the indentation is correct but that it should have contained curly braces from the start, as the update_gid() function in the same patch correctly does. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `54679e1482` ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop") Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 13:09:49 -04:00
Jesper Dangaard Brouer	8ec736e556	mlx5: use napi_consume_skb API to get bulk free operations Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is needed by napi_consume_skb() to detect if called from netpoll. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-13 22:35:36 -04:00
Jesper Dangaard Brouer	b4a53379a0	mlx4: use napi_consume_skb API to get bulk free operations Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is usually needed by napi_consume_skb() to detect if called from netpoll. In this patch it has an extra meaning. For mlx4 driver, the mlx4_en_stop_port() call is done outside NAPI/softirq context, and cleanup the entire TX ring via mlx4_en_free_tx_buf(). The code mlx4_en_free_tx_desc() for freeing SKBs are shared with NAPI calls. To handle this shared use the zero budget indication is reused, and handled appropriately in napi_consume_skb(). To reflect this, variable is called napi_mode for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-13 22:35:35 -04:00
Jiri Pirko	233fa44bd6	mlxsw: pci: Implement reset done check Firmware now tells us that the reset is done by passing a magic value via register. Use it to shorten the wait in case this is supported. With old firmware, we still wait until the timeout is reached. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-13 22:30:01 -04:00
Ido Schimmel	869f63a4d2	mlxsw: spectrum: Check requested ageing time is valid Commit `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") added a check for minimum and maximum ageing time, but this breaks existing behaviour where one can set ageing time to 0 for a non-learning bridge. Push this check down to the driver and allow the check in the bridge layer to be removed. Currently ageing time 0 is refused by the driver, but we can later add support for this functionality. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-11 14:47:58 -05:00
Amir Vadai	12185a9faf	net/mlx5e: Support offload cls_flower with skbedit mark action Introduce offloading of skbedit mark action. For example, to mark with 0x1234, all TCP (ip_proto 6) packets arriving to interface ens9: # tc qdisc add dev ens9 ingress # tc filter add dev ens9 protocol ip parent ffff: \ flower ip_proto 6 \ indev ens9 \ action skbedit mark 0x1234 Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:03 -05:00
Amir Vadai	e3a2b7ed01	net/mlx5e: Support offload cls_flower with drop action Parse tc_cls_flower_offload into device specific commands and program the hardware to classify and act accordingly. For example, to drop ICMP (ip_proto 1) packets from specific smac, dmac, src_ip, src_ip, arriving to interface ens9: # tc qdisc add dev ens9 ingress # tc filter add dev ens9 protocol ip parent ffff: \ flower ip_proto 1 \ dst_mac 7c:fe:90:69:81:62 src_mac 7c:fe:90:69:81:56 \ dst_ip 11.11.11.11 src_ip 11.11.11.12 indev ens9 \ action drop Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:02 -05:00
Amir Vadai	e8f887ac6a	net/mlx5e: Introduce tc offload support Extend ndo_setup_tc() to support ingress tc offloading. Will be used by later patches to offload tc flower filter. Feature is off by default and could be enabled by issuing: # ethtool -K eth0 hw-tc-offload on Offloads flow table is dynamically created when first filter is added. Rules are saved in a hash table that is maintained by the consumer (for example - the flower offload in the next patch). When last filter is removed and no filters exist in the hash table, the offload flow table is destroyed. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:02 -05:00
Amir Vadai	b6172aac71	net/mlx5e: Add a new priority for kernel flow tables Move the vlan and main flow tables to use priority 1. This will allow the upcoming TC offload logic to use a higher priority (0) for the offload steering table. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:02 -05:00
Amir Vadai	67ba422e95	net/mlx5e: Relax ndo_setup_tc handle restriction Restricting handle to TC_H_ROOT breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test, to only check the type. Fixes: `08fb1da` ("net/mlx5e: Support DCBNL IEEE ETS") Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:02 -05:00
Amir Vadai	60ab4584f5	net/mlx5_core: Set flow steering dest only for forward rules We need to handle flow table entry destinations only if the action associated with the rule is forwarding (MLX5_FLOW_CONTEXT_ACTION_FWD_DEST). Fixes: `26a8145390` ('net/mlx5_core: Introduce flow steering firmware commands') Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-10 16:24:02 -05:00
Maor Gottlieb	b3638e1a76	net/mlx5_core: Introduce forward to next priority action Add support to create flow rule that forward packets to the first flow table in the next priority (next priority could be the first priority in the next namespace or the next priority in the same namespace). This feature could be used for DONT_TRAP rules or rules that only want to mark the packet with flow tag. In order to do it optimally, each flow table has list of all rules that point to this flow table, when a flow table is destroyed/created, we update the list head correspondingly. This kind of rule is created when destination is NULL and action is MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 09:22:06 -05:00
Maor Gottlieb	153fefbf34	net/mlx5_core: Create anchor of last flow table Create an empty flow table in the end of NIC rx namesapce. Adding this flow table simplify the implementation of "forward to next prio" rules. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-10 09:22:06 -05:00
David S. Miller	810813c47a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 12:34:12 -05:00
Ido Schimmel	5091730d77	mlxsw: pci: Correctly determine if descriptor queue is full The descriptor queues for sending (SDQs) and receiving (RDQs) packets are managed by two counters - producer and consumer - which are both 16-bit in size. A queue is considered full when the difference between the two equals the queue's maximum number of descriptors. However, if the producer counter overflows, then it's possible for the full queue check to fail, as it doesn't take the overflow into account. In such a case, descriptors already passed to the device - but for which a completion has yet to be posted - will be overwritten, thereby causing undefined behavior. The above can be achieved under heavy load (~30 netperf instances). Fix that by casting the subtraction result to u16, preventing it from being treated as a signed integer. Fixes: `eda6500a98` ("mlxsw: Add PCI bus implementation") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-07 11:39:15 -05:00
Ido Schimmel	912b1c89c5	mlxsw: spectrum: Always decrement bridge's ref count Since we only support one VLAN filtering bridge we need to associate a reference count with it, so that when the last port netdev leaves it, we would know that a different bridge can be offloaded to hardware. When a LAG device is memeber in a bridge and port netdevs are leaving the LAG, we should always decrement the bridge's reference count, as it's incremented for any port in the LAG. Fixes: `4dc236c317` ("mlxsw: spectrum: Handle port leaving LAG while bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-07 11:39:15 -05:00
Arnd Bergmann	3d1cbe839a	net: mellanox: add DEVLINK dependencies The new NET_DEVLINK infrastructure can be a loadable module, but the drivers using it might be built-in, which causes link errors like: drivers/net/built-in.o: In function `mlx4_load_one': :(.text+0x2fbfda): undefined reference to `devlink_port_register' :(.text+0x2fc084): undefined reference to `devlink_port_unregister' drivers/net/built-in.o: In function `mlxsw_sx_port_remove': :(.text+0x33a03a): undefined reference to `devlink_port_type_clear' :(.text+0x33a04e): undefined reference to `devlink_port_unregister' There are multiple ways to avoid this: a) add 'depends on NET_DEVLINK \|\| !NET_DEVLINK' dependencies for each user b) use 'select NET_DEVLINK' from each driver that uses it and hide the symbol in Kconfig. c) make NET_DEVLINK a 'bool' option so we don't have to list it as a dependency, and rely on the APIs to be stubbed out when it is disabled d) use IS_REACHABLE() rather than IS_ENABLED() to check for NET_DEVLINK in include/net/devlink.h This implements a variation of approach a) by adding an intermediate symbol that drivers can depend on, and changes the three drivers using it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `09d4d087cd` ("mlx4: Implement devlink interface") Fixes: `c4745500e9` ("mlxsw: Implement devlink interface") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 17:08:59 -05:00
John Fastabend	5eb4dce3b3	net: relax setup_tc ndo op handle restriction I added this check in setup_tc to multiple drivers, if (handle != TC_H_ROOT \|\| tc->type != TC_SETUP_MQPRIO) Unfortunately restricting to TC_H_ROOT like this breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test to only check the type to make it equivalent to the check before I broke it. With this the old instantiation continues to work. A good smoke test is to setup mqprio with, # tc qdisc add dev eth4 root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7 Fixes: `e4c6734eaa` ("net: rework ndo tc op to consume additional qdisc handle paramete") Reported-by: Singh Krishneil <krishneil.k.singh@intel.com> Reported-by: Jake Keller <jacob.e.keller@intel.com> CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-03 16:25:15 -05:00
Jack Morgenstein	6e5224224f	net/mlx4_core: Allow resetting VF admin mac to zero The VF administrative mac addresses (stored in the PF driver) are initialized to zero when the PF driver starts up. These addresses may be modified in the PF driver through ndo calls initiated by iproute2 or libvirt. While we allow the PF/host to change the VF admin mac address from zero to a valid unicast mac, we do not allow restoring the VF admin mac to zero. We currently only allow changing this mac to a different unicast mac. This leads to problems when libvirt scripts are used to deal with VF mac addresses, and libvirt attempts to revoke the mac so this host will not use it anymore. Fix this by allowing resetting a VF administrative MAC back to zero. Fixes: `8f7ba3ca12` ('net/mlx4: Add set VF mac address support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Moshe Levi <moshele@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:42:46 -05:00
Moni Shoua	00ada91039	net/mlx4_core: Check the correct limitation on VFs for HA mode The limit of 63 is only for virtual functions while the actual enforcement was for VFs plus physical functions, fix that. Fixes: `e57968a10b` ('net/mlx4_core: Support the HA mode for SRIOV VFs too') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:42:45 -05:00
Jack Morgenstein	03a79f31ef	net/mlx4_core: Fix lockdep warning in handling of mac/vlan tables In the mac and vlan register/unregister/replace functions, the driver locks the mac table mutex (or vlan table mutex) on both ports. We move to use mutex_lock_nested() to prevent warnings, such as the one below. [ 101.828445] ============================================= [ 101.834820] [ INFO: possible recursive locking detected ] [ 101.841199] 4.5.0-rc2+ #49 Not tainted [ 101.850251] --------------------------------------------- [ 101.856621] modprobe/3054 is trying to acquire lock: [ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core] [ 101.874598] [ 101.874598] but task is already holding lock: [ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core] [ 101.893776] [ 101.893776] other info that might help us debug this: [ 101.901658] Possible unsafe locking scenario: [ 101.901658] [ 101.908859] CPU0 [ 101.911923] ---- [ 101.914985] lock(&table->mutex#2); [ 101.919595] lock(&table->mutex#2); [ 101.924199] [ 101.924199] * DEADLOCK * [ 101.924199] [ 101.931643] May be due to missing lock nesting notation Fixes: `5f61385d2e` ('net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Suggested-by: Doron Tsur <doront@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:42:45 -05:00
Gal Pressman	faf4478bf8	net/mlx5e: Provide correct packet/bytes statistics Using the HW VPort counters for traffic (rx/tx packets/bytes) statistics is wrong. This is because frames dropped due to steering or out of buffer will be counted as received. To fix that, we move to use the packet/bytes accounting done by the driver for what the netdev reports out. Fixes: `f62b8bb8f2` ('net/mlx5: Extend mlx5_core to support [...]') Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:26 -05:00
Gal Pressman	b081da5ee1	net/mlx5e: Add rx/tx bytes software counters Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported in upcoming statistics fix. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:26 -05:00
Tariq Toukan	85082dba0a	net/mlx5e: Correctly handle RSS indirection table when changing number of channels Upon changing num_channels, reset the RSS indirection table to match the new value. Fixes: `2d75b2bc8a` ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:26 -05:00
Tariq Toukan	bdfc028de1	net/mlx5e: Fix ethtool RX hash func configuration change We should modify TIRs explicitly to apply the new RSS configuration. The light ndo close/open calls do not "refresh" them. Fixes: `2d75b2bc8a` ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:25 -05:00
Eran Ben Elisha	0ad9b20415	net/mlx5e: Fix soft lockup when HW Timestamping is enabled Readers/Writers lock for SW timecounter was acquired without disabling interrupts on local CPU. The problematic scenario: * HW timestamping is enabled * Timestamp overflow periodic service task is running on local CPU and holding write_lock for SW timecounter * Completion arrives, triggers interrupt for local CPU. Interrupt routine calls napi_schedule(), which triggers rx/tx skb process. An attempt to read SW timecounter using read_lock is done, which is already locked by a writer on the same CPU and cause soft lockup. Add irqsave/irqrestore for when using the readers/writers lock for writing. Fixes: `ef9814deaf` ('net/mlx5e: Add HW timestamping (TS) support') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:25 -05:00
Tariq Toukan	ab0394fe2c	net/mlx5e: Fix LRO modify Ethtool LRO enable/disable is broken, as of today we only modify TCP TIRs in order to apply the requested configuration. Hardware requires that all TIRs pointing to the same RQ should share the same LRO configuration. For that all other TIRs' LRO fields must be modified as well. Fixes: `5c50368f38` ('net/mlx5e: Light-weight netdev open/stop') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:25 -05:00
Tariq Toukan	59a7c2fd33	net/mlx5e: Remove wrong poll CQ optimization With the MLX5E_CQ_HAS_CQES optimization flag, the following buggy flow might occur: - Suppose RX is always busy, TX has a single packet every second. - We poll a single TX cqe and clear its flag. - We never arm it again as RX is always busy. - TX CQ flag is never changed, and new TX cqes are not polled. We revert this optimization. Fixes: `e586b3b0ba` ('net/mlx5: Ethernet Datapath files') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-02 14:37:24 -05:00
Moshe Lazer	0ba422410b	net/mlx5: Fix global UAR mapping Avoid double mapping of io mapped memory, Device page may be mapped to non-cached(NC) or to write-combining(WC). The code before this fix tries to map it both to WC and NC contrary to what stated in Intel's software developer manual. Here we remove the global WC mapping of all UARS "dev->priv.bf_mapping", since UAR mapping should be decided per UAR (e.g we want different mappings for EQs, CQs vs QPs). Caller will now have to choose whether to map via write-combining API or not. mlx5e SQs will choose write-combining in order to perform BlueFlame writes. Fixes: `88a85f99e5` ('TX latency optimization to save DMA reads') Signed-off-by: Moshe Lazer <moshel@mellanox.com> Reviewed-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:28:00 -05:00
Gal Pressman	2fcb92fbd0	net/mlx5e: Don't modify CQ before it was created Calling mlx5e_set_coalesce while the interface is down will result in modifying CQs that don't exist. Fixes: `f62b8bb8f2` ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:28:00 -05:00
Gal Pressman	7524a5d88b	net/mlx5e: Don't try to modify CQ moderation if it is not supported If CQ moderation is not supported by the device, print a warning on netdevice load, and return error when trying to modify/query cq moderation via ethtool. Fixes: `f62b8bb8f2` ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:28:00 -05:00
Tariq Toukan	556dd1b9c3	net/mlx5e: Set drop RQ's necessary parameters only By its role, there is no need to set all the other parameters for the drop RQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:27:59 -05:00
Tariq Toukan	c89fb18b65	net/mlx5e: Move common case counters within sq_stats struct For data cache locality considerations, we moved the nop and csum_offload_inner within sq_stats struct as they are more commonly accessed in xmit path. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:27:59 -05:00
Tariq Toukan	3b6195240c	net/mlx5e: Changed naming convention of tx queues in ethtool stats Instead of the pair (channel, tc), we now use a single number that goes over all tx queues of a TC, for all TCs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:27:59 -05:00
Tariq Toukan	ce89ef36d2	net/mlx5e: Placement changed for carrier state updates More proper to declare carrier state UP only after the channels are ready for traffic. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:27:59 -05:00
Tariq Toukan	daa21560a2	net/mlx5e: Replace async events spinlock with synchronize_irq() We only need to flush the irq handler to make sure it does not queue a work into the global work queue after we start to flush it. So using synchronize_irq() is more appropriate than a spin lock. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:27:58 -05:00
Ido Schimmel	18f1e70c41	mlxsw: spectrum: Introduce port splitting Allow a user to split or unsplit a port using the newly introduced devlink ops. Once split, the original netdev is destroyed and 2 or 4 others are created, according to user configuration. The new ports are like any other port, with the sole difference of supporting a lower maximum speed. When unsplit, the reverse process takes place. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:31 -05:00
Ido Schimmel	a133318cde	mlxsw: spectrum: Mark unused ports using NULL When splitting and unsplitting we'll destroy usable ports on the fly, so mark them using a NULL pointer to indicate that their local port number is free and can be re-used. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:30 -05:00
Ido Schimmel	558c2d5e52	mlxsw: spectrum: Store local port to module mapping during init The port netdevs are each associated with a different local port number in the device. These local ports are grouped into groups of 4 (e.g. (1-4), (5-8)) called clusters. The cluster constitutes the one of two possible modules they can be mapped to. This mapping is board-specific and done by the device's firmware during init. When splitting a port by 4, the device requires us to first unmap all the ports in the cluster and then map each to a single lane in the module associated with the port netdev used as the handle for the operation. This means that two port netdevs will disappear, as only 100Gb/s (4 lanes) ports can be split and we are guaranteed to have two of these ((1, 3), (5, 7) etc.) in a cluster. When unsplit occurs we need to reinstantiate the two original 100Gb/s ports and map each to its origianl module. Therefore, during driver init store the initial local port to module mapping, so it can be used later during unsplitting. Note that a by 2 split doesn't require us to store the mapping, as we only need to reinstantiate one port whose module is known. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:30 -05:00
Ido Schimmel	3e9b27b8fc	mlxsw: spectrum: Unmap local port from module during teardown When splitting a port we replace it with 2 or 4 other ports. To be able to do that we need to remove the original port netdev and unmap it from its module. However, we first mark it as disabled, as active ports cannot be unmapped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:30 -05:00
Jiri Pirko	284ef80357	mlxsw: core: Add devlink port splitter callbacks Add middle layer in mlxsw core code to forward port split/unsplit calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:30 -05:00
Jiri Pirko	c4745500e9	mlxsw: Implement devlink interface Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:30 -05:00
Jiri Pirko	b2facd95ab	mlx4: Implement port type setting via devlink interface So far, there has been an mlx4-specific sysfs file allowing user to change port type to either Ethernet of InfiniBand. This is very inconvenient. Allow to expose the same ability to set port type in a generic way using devlink interface. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:29 -05:00
Jiri Pirko	09d4d087cd	mlx4: Implement devlink interface Implement newly introduced devlink interface. Add devlink port instances for every port and set the port types accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> v2->v3: -add dev param to devlink_register (api change) Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:07:29 -05:00
Matan Barak	a606b0f669	net/mlx5: Refactor mlx5_core_mr to mkey Mlx5's mkey mechanism is also used for memory windows. The current code base uses MR (memory region) naming, which is inaccurate. Changing MR to mkey in order to represent its different usages more accurately. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-01 11:18:37 -05:00
Meny Yossefi	1c64bf6f29	net/mlx5_core: Add helper function to read IB error counters Added helper function to read IB standard error counters via the PPCNT register. The PPCNT register read command provides the 32-bit error counters of both IB/RoCE link layer and transport layer. Signed-off-by: Meny Yossefi <menyy@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-01 10:57:15 -05:00
Meny Yossefi	b54ba2772b	net/mlx5_core: Add helper function to read virtual port counters Added helper function to read 64bit virtual port Infiniband traffic counters. Signed-off-by: Meny Yossefi <menyy@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-03-01 10:57:15 -05:00
Marina Varshaver	0e451e883b	IB/mlx4: Add support for the don't trap rule Add support for receiving multicast/unicast traffic with the don't trap rule. Sniffing these packets requires a flow steering rule of type NORMAL at priority 0 with flag IB_FLOW_ATTR_FLAGS_DONT_TRAP set. Choosing between multicast or unicast is done via ethernet L2 dest_mac mask and value: - If mask is all zeros - unicast and multicast are set. - If mask non zero - only mask with multicast bit 1 and rest 0 is supported, the mac value will choose if it is multicast or unicast rule. If the mask multicast bit is on and some other bits are on too, it means a request for specific multicast or unicast, this is not supported, either receive all multicast or all unicast. Only when limitations are met registered QP will receive requested type but other QPs can receive same traffic if registered for it. Otherwise, if limitations are not met, an error will be returned. Limitations: - Rule must be with priority 0. - A0 mode is not supported. - Sniffer QP cannot appear in any other flow steering rule. Signed-off-by: Marina Varshaver <marinav@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-02-29 17:11:40 -05:00
David Decotigny	3d8f7cc78d	net: mlx4: use new ETHTOOL_G/SSETTINGS API Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:47 -05:00
Matthew Finlay	89db09eb59	net/mlx5e: Add TX inner packet counters Add TSO and TX checksum counters for tunneled, inner packets Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:22 -05:00
Matthew Finlay	9879515895	net/mlx5e: Add TX stateless offloads for tunneling Add support for TSO and TX checksum when using hw assisted, tunneled offloads. Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:22 -05:00
Matthew Finlay	b3f63c3d5e	net/mlx5e: Add netdev support for VXLAN tunneling If a VXLAN udp dport is added to device it will: - Configure the hardware to offload the port (up to the max supported). - Advertise NETIF_F_GSO_UDP_TUNNEL and supported hw_enc_features. Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:22 -05:00
Matthew Finlay	1afff42c06	net/mlx5e: Protect en header file from redefinitions add ifndef to en.h. needed for upcoming vxlan patchset. Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:22 -05:00
Matthew Finlay	5f6d12d10f	net/mlx5e: Move to checksum complete Use checksum complete for all IP packets, unless they are HW LRO, in which case, use checksum unnecessary. Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:22 -05:00
Tariq Toukan	928cfe8745	net/mlx5e: Wake On LAN support Implement set/get WOL by ethtool and added the needed device commands and structures to mlx5_ifc. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:21 -05:00
Tariq Toukan	d8880795da	net/mlx5e: Implement DCBNL IEEE max rate Add support for DCBNL IEEE get/set max rate. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:21 -05:00
Achiad Shochat	ef9184335e	net/mlx5e: Support DCBNL IEEE PFC Implement the set/get DCBNL IEEE PFC callbacks. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:21 -05:00
Saeed Mahameed	08fb1dacdd	net/mlx5e: Support DCBNL IEEE ETS Support the ndo_setup_tc callback and the needed methods for multi TC/UP support, and removed the default_vlan_prio from mlx5e_priv which is always 0, it was replaced with hardcoded "0" in the new select queue method. For that we now create MAX_NUM_TC num of TISs (one per prio) on netdevice creation instead of priv->params.num_tc which was always 1. So far each channel had a single TXQ, Now each channel has a TXQ per TC (Traffic Class). Added en_dcbnl.c which implements the set/get DCBNL IEEE ETS, set/get dcbx and registers the mlx5e dcbnl ops. We still use the kernel's default TXQ selection method to select the channel to transmit through but now we use our own method to select the TXQ inside the channel based on VLAN priority. In mlx5, as opposed to mlx4, tc group N gets lower priority than tc group N+1. CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:21 -05:00
Saeed Mahameed	4f3961eeaf	net/mlx5: Introduce physical port TC/prio access functions Add access functions to set and query a physical port TC groups and prio parameters. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:20 -05:00
Achiad Shochat	ad909eb064	net/mlx5: Introduce physical port PFC access functions Add access functions to set and query a physical port PFC (Priority Flow Control) parameters. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:20 -05:00
Achiad Shochat	ada68c31ba	net/mlx5: Introduce a new header file for physical port functions All the device physical port access functions are implemented in the port.c file. We just extract the exposure of these functions from driver.h into a dedicated header file called port.h. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-24 13:50:20 -05:00
David S. Miller	b633353115	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-23 00:09:14 -05:00
Linus Torvalds	dea08e6044	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Looks like a lot, but mostly driver fixes scattered all over as usual. Of note: 1) Add conditional sched in nf conntrack in cleanup to avoid NMI watchdogs. From Florian Westphal. 2) Fix deadlock in nfnetlink cttimeout, also from Floarian. 3) Fix handling of slaves in bonding ARP monitor validation, from Jay Vosburgh. 4) Callers of ip_cmsg_send() are responsible for freeing IP options, some were not doing so. Fix from Eric Dumazet. 5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT. 6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot. 7) bcm7xxx PHY driver bug fixes from Florian Fainelli. 8) Avoid unaligned accesses to protocol headers wrt. GRE, from Alexander Duyck. 9) SKB leaks and other problems in arc_emac driver, from Alexander Kochetkov. 10) tcp_v4_inbound_md5_hash() releases listener socket instead of request socket on error path, oops. Fix from Eric Dumazet. 11) Missing socket release in pppoe_rcv_core() that seems to have existed basically forever. From Guillaume Nault. 12) Missing slave_dev unregister in dsa_slave_create() error path, from Florian Fainelli. 13) crypto_alloc_hash() never returns NULL, fix return value check in __tcp_alloc_md5sig_pool. From Insu Yun. 14) Properly expire exception route entries in ipv4, from Xin Long. 15) Fix races in tcp/dccp listener socket dismantle, from Eric Dumazet. 16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not legal. These drivers modify the SKB on transmit. From Jiri Benc. 17) Fix regression in the initialziation of netdev->tx_queue_len. From Phil Sutter. 18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun. 19) SCTP port hash sizing does not properly ensure that table is a power of two in size. From Neil Horman. 20) Fix initializing of software copy of MAC address in fmvj18x_cs driver, from Ken Kawasaki" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits) bnx2x: Fix 84833 phy command handler bnx2x: Fix led setting for 84858 phy. bnx2x: Correct 84858 PHY fw version bnx2x: Fix 84833 RX CRC bnx2x: Fix link-forcing for KR2 net: ethernet: davicom: fix devicetree irq resource fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address Driver: Vmxnet3: Update Rx ring 2 max size net: netcp: rework the code for get/set sw_data in dma desc soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data net: ti: netcp: restore get/set_pad_info() functionality MAINTAINERS: Drop myself as xen netback maintainer sctp: Fix port hash table size computation can: ems_usb: Fix possible tx overflow Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb net: bcmgenet: Fix internal PHY link state af_unix: Don't use continue to re-execute unix_stream_read_generic loop unix_diag: fix incorrect sign extension in unix_lookup_by_ino bnxt_en: Failure to update PHY is not fatal condition. bnxt_en: Remove unnecessary call to update PHY settings. ...	2016-02-22 12:18:07 -08:00
Ido Schimmel	28a01d2d7d	mlxsw: spectrum: Allow for PVID deletion When PVID is toggled off on a port member in a VLAN filtering bridge or the PVID VLAN is deleted, make the port drop untagged packets. Reverse the operation when PVID is toggled back on. Set the PVID back to the default (1), when leaving the bridge so that untagged traffic will be directed to the CPU. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 10:44:26 -05:00
Ido Schimmel	148f472da5	mlxsw: reg: Add the Switch Port Acceptable Frame Types register When VLAN filtering is enabled on a bridge and PVID is deleted from a bridge port, then untagged frames are not allowed to ingress into the bridge from this port. Add the Switch Port Acceptable Frame Types (SPAFT) register, which configures the frame admittance of the port. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 10:44:26 -05:00
Ido Schimmel	6a9863a622	mlxsw: spectrum: Set STP state when leaving 802.1D bridge When a VLAN device leaves a bridge its STP state is set to DISABLED, which causes the hardware to discard any packets coming through the port with this VLAN. Fix that by setting STP state to FORWARDING when the device leaves its bridge and allow traffic to be directed to CPU. Fixes: `26f0e7fb15` ("mlxsw: spectrum: Add support for VLAN devices bridging") Reported-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:52:58 -05:00
Ido Schimmel	1e5ad30c64	mlxsw: Treat local port 64 as valid MLXSW_PORT_MAX_PORTS represents the maximum number of local ports, which is 65 for both ASICs (SwitchX-2 and Spectrum) supported by this driver. Fixes: `93c1edb27f` ("mlxsw: Introduce Mellanox switch driver core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 15:52:58 -05:00
Eugenia Emantayev	925ab1aa93	net/mlx4_en: Avoid changing dev->features directly in run-time It's forbidden to manually change dev->features in run-time. Currently, this is done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when VXLAN tunnel is set. However, since the stack actually does features intersection with hw_enc_features, we can safely revert to advertizing features early when registering the netdevice. Fixes: `f4a1edd561` ('net/mlx4_en: Advertize encapsulation offloads [...]') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:27 -05:00
Huy Nguyen	85743f1eb3	net/mlx4_core: Set UAR page size to 4KB regardless of system page size problem description: The current code sets UAR page size equal to system page size. The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages. The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages. solution: Always set UAR page to 4KB. This allows more UAR pages if the OS has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB system page size, with 4MB uar region, there are 4MB/2/64KB = 32 uars (half for uar, half for blueflame). This does not meet minimum 128 UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars which meet the minimum requirement. Note that only codes in mlx4_core that deal with firmware know that uar page size is 4KB. Codes that deal with usr page in cq and qp context (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption that uar page size equals to system page size. Note that with this implementation, on 64KB system page size kernel, there are 16 uars per system page but only one uars is used. The other 15 uars are ignored because of the above assumption. Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size to 4KB and mlx4_core code in virtual OS will obtain the uar page size from firmware. Regarding backward compatibility in SR-IOV, if hypervisor has this new code, the virtual OS must be updated. If hypervisor has old code, and the virtual OS has this new code, the new code will be backward compatible with the old code. If the uar size is big enough, this new code in VF continues to work with 64 KB uar page size (on PowerPc kernel). If the uar size does not meet 128 uars requirement, this new code not loaded in VF and print the same error message as the old code in Hypervisor. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:27 -05:00
Daniel Jurgens	22e3817e6c	net/mlx4_core: Do not BUG_ON during reset when PCI is offline The PCI channel could go offline during reset due to EEH. Don't bug on in this case, the error is recoverable. Fixes: `f6bc11e426` ('net/mlx4_core: Enhance the catas flow to support device reset') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:26 -05:00
Eran Ben Elisha	6b94bab0ee	net/mlx4_core: Fix potential corruption in counters database The error flow in procedure handle_existing_counter() is wrong. The procedure should exit after encountering the error, not continue as if everything is OK. Fixes: `68230242cd` ('net/mlx4_core: Add port attribute when tracking counters') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:26 -05:00
Eugenia Emantayev	31c128b66e	net/mlx4_en: Choose time-stamping shift value according to HW frequency Previously, the shift value used for time-stamping was constant and didn't depend on the HW chip frequency. Change that to take the frequency into account and calculate the maximal value in cycles per wraparound of ten seconds. This time slot was chosen since it gives a good accuracy in time synchronization. Algorithm for shift value calculation: * Round up the maximal value in cycles to nearest power of two * Calculate maximal multiplier by division of all 64 bits set to above result * Then, invert the function clocksource_khz2mult() to get the shift from maximal mult value Fixes: `ec693d4701` ('net/mlx4_en: Add HW timestamping (TS) support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:25 -05:00
Amir Vadai	281e8b2fdf	net/mlx4_en: Count HW buffer overrun only once RdropOvflw counts overrun of HW buffer, therefore should be used for rx_fifo_errors only. Currently RdropOvflw counter is mistakenly also set into rx_missed_errors and rx_over_errors too, which makes the device total dropped packets accounting to show wrong results. Fix that. Use it for rx_fifo_errors only. Fixes: `c27a02cd94` ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC') Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 10:29:25 -05:00
John Fastabend	16e5cc6471	net: rework setup_tc ndo op to consume general tc operand This patch updates setup_tc so we can pass additional parameters into the ndo op in a generic way. To do this we provide structured union and type flag. This lets each classifier and qdisc provide its own set of attributes without having to add new ndo ops or grow the signature of the callback. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 09:47:35 -05:00
John Fastabend	e4c6734eaa	net: rework ndo tc op to consume additional qdisc handle parameter The ndo_setup_tc() op was added to support drivers offloading tx qdiscs however only support for mqprio was ever added. So we only ever added support for passing the number of traffic classes to the driver. This patch generalizes the ndo_setup_tc op so that a handle can be provided to indicate if the offload is for ingress or egress or potentially even child qdiscs. CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-17 09:47:35 -05:00
Saeed Mahameed	b0eed40ea1	net/mlx5e: Use static constant netdevice ndos Currently our netdevice ops is a one static global variable which is referenced by all mlx5e netdevice instances. This can be problematic when different driver instances do not share same HW capabilities (e.g SRIOV PF and VFs probed to the host). Now we have two constant global netdevice ops variables, one for basic netdevice ops and the other with extended SRIOV ops, on netdevice construction we choose the one suitable for current device capabilities. Fixes: `66e49dedad` ("net/mlx5e: Add support for SR-IOV ndos") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-16 15:21:47 -05:00
Saeed Mahameed	b236872739	net/mlx5e: Remove select queue ndo initialization Currently mlx5e_select_queue is redundant since num_tc is always 1. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-16 15:21:47 -05:00
Linus Torvalds	7686e3c16c	Additional 4.5-rc3 fixes - One fix to ipoib multicast joins - One fix to mlx4 error handling - One fix to mlx5 size computation - One fix to a thinko in core code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWvjoAAAoJELgmozMOVy/dxTcP/3ELxMBGStqhdIJhxf1UfnYh IwdMWpBrvRZo92H+/YHZstnK1kHFZyQLejvdWU24loymzcs94vzswHPMpa0baKd1 PfBTtTKCts5CA4kTdxECiBjHlLYkrcsAv8il1x09RNF3ZzlMxrvupPa9fjulQalP ULbNsV+DyZr7hGXly8dX6imYQCF0433uxrYSWOY9o98IbTRYmRQhp1R31TDj/KMP jXdLrMYzs7glmtfIQdfA4Vup7PhPAZZ2tKNejNI3ECCnB/wCQsh86k7hl3ghlmDz rvrsFheIk114jgHALb+l/FMgNNjTHxSmy98s8Fcuow753oxf940YryUORUIrI/B/ 50CnjjtgCWwG6FzRPM41yYxJ2nvfGMTURKrV6LfR0hre5AxMAxb+Eu0sGl3NAe/6 rbsdQjAUsctIpSjbo2xjmazFk3Itda7xTzMYS72ovqRQ/VvdMtH63fdUXVXStxUv b121ZS3tGYWh6k/4FasRyvvExqgJy3TRHsIEbwC8P+GFAnulhN11pLAc56Eug5Bw +U057Uzj3lm2iKZUzC/OvCBARufNZXH2P/oAPXqIDJPBnMAia+U76m8sq1N6tO83 ct0kn3aYwZ6q1O69SzOpashnTqJCfkuwMtf+sbsDs0zHavjMzKMWhp9UVnQ73vf9 7mRZc/Uz3+Jrl8c/K1eX =lzqm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma fixes from Doug Ledford: "I think we are getting pretty close to done now. There are four one-off fixes in this update: - fix ipoib multicast joins - fix mlx4 error handling - fix mlx5 size computation - fix a thinko in core code" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Fix RC transport send queue overhead computation IB/ipoib: fix for rare multicast join race condition IB/core: Fix reading capability mask of the port info class net/mlx4: fix some error handling in mlx4_multi_func_init()	2016-02-13 17:35:23 -08:00
Rasmus Villemoes	fa51b247d6	net/mlx4: fix some error handling in mlx4_multi_func_init() The while loop after err_slaves should use post-decrement; otherwise we'll fail to do the kfrees for i==0, and will run into out-of-bounds accesses if the setup above failed already at i==0. [I'm not sure why one even bothers populating the ->vlan_filter array: mlx4.h isn't #included by anything outside drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the vlan_filter elements aren't used at all.] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-02-11 11:04:54 -05:00
Linus Torvalds	34229b2774	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "This looks like a lot but it's a mixture of regression fixes as well as fixes for longer standing issues. 1) Fix on-channel cancellation in mac80211, from Johannes Berg. 2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables module, from Eric Dumazet. 3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric Dumazet. 4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is bound, from Craig Gallek. 5) GRO key comparisons don't take lightweight tunnels into account, from Jesse Gross. 6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric Dumazet. 7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we register them, otherwise the NEWLINK netlink message is missing the proper attributes. From Thadeu Lima de Souza Cascardo. 8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido Schimmel 9) Handle fragments properly in ipv4 easly socket demux, from Eric Dumazet. 10) Don't ignore the ifindex key specifier on ipv6 output route lookups, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits) tcp: avoid cwnd undo after receiving ECN irda: fix a potential use-after-free in ircomm_param_request net: tg3: avoid uninitialized variable warning net: nb8800: avoid uninitialized variable warning net: vxge: avoid unused function warnings net: bgmac: clarify CONFIG_BCMA dependency net: hp100: remove unnecessary #ifdefs net: davinci_cpdma: use dma_addr_t for DMA address ipv6/udp: use sticky pktinfo egress ifindex on connect() ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() netlink: not trim skb for mmaped socket when dump vxlan: fix a out of bounds access in __vxlan_find_mac net: dsa: mv88e6xxx: fix port VLAN maps fib_trie: Fix shift by 32 in fib_table_lookup net: moxart: use correct accessors for DMA memory ipv4: ipconfig: avoid unused ic_proto_used symbol bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout. bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter. bnxt_en: Ring free response from close path should use completion ring net_sched: drr: check for NULL pointer in drr_dequeue ...	2016-02-01 15:56:08 -08:00
Ido Schimmel	4f2c6ae5c6	switchdev: Require RTNL mutex to be held when sending FDB notifications When switchdev drivers process FDB notifications from the underlying device they resolve the netdev to which the entry points to and notify the bridge using the switchdev notifier. However, since the RTNL mutex is not held there is nothing preventing the netdev from disappearing in the middle, which will cause br_switchdev_event() to dereference a non-existing netdev. Make switchdev drivers hold the lock at the beginning of the notification processing session and release it once it ends, after notifying the bridge. Also, remove switchdev_mutex and fdb_lock, as they are no longer needed when RTNL mutex is held. Fixes: `03bf0c2812` ("switchdev: introduce switchdev notifier") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 16:21:31 -08:00
Ido Schimmel	bbeeda27ab	mlxsw: reg: Use correct offset in field definiton The rx_lane, tx_lane and module fields in the PMLP register don't have an additional offset besides the base one (0x04), so set it to 0x00. Fixes: `4ec14b7634` ("mlxsw: Add interface to access registers and process events") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:32 -08:00
Ido Schimmel	3f47f86781	mlxsw: spectrum: Compare local ports instead of pointers When dumping the FDB we can't compare the actual pointers of the ports structs, as it's possible the struct represents a vPort instead of the underlying physical port. Solve this by comparing the local port number instead, as it's shared between the physical ports and all the vPorts on top of him. Fixes: `54a732018d` ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:32 -08:00
Ido Schimmel	304f51584f	mlxsw: spectrum: Dump LAG FDB records only once LAG FDB records can only point to LAG devices or VLAN devices configured on top of them. Therefore, when dumping the FDB we shouldn't associate these records with the underlying physical ports. Fixes: `8a1ab5d766` ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:31 -08:00
Ido Schimmel	e43aca22ed	mlxsw: spectrum: Use correct netdev when notifying bridge LAG FDB entries pointing to VLAN devices should be reported to the bridge with the matching VLAN device and not the underlying LAG device. Fixes: `aac78a4408` ("mlxsw: spectrum: Adjust FDB notifications for VLAN devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:31 -08:00
Ido Schimmel	004f85ea82	mlxsw: spectrum: Don't report VLAN for 802.1D FDB entries When dumping the hardware FDB we should report entries pointing to VLAN devices with VLAN 0, as packets coming into the bridge are untagged. Likewise, pass FDB_{ADD,DEL} notifications with VLAN 0 for these devices. Fixes: `54a732018d` ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:31 -08:00
Ido Schimmel	45827d780a	mlxsw: spectrum: Notify bridge's FDB only based on learning_sync When we disable learning on bridge port we should still update the software bridge's FDB when entry pointing to this bridge port is aged-out. We can otherwise have an inconsistency between software and hardware tables. Fixes: `8a1ab5d766` ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:31 -08:00
Ido Schimmel	454911333b	mlxsw: spectrum: Disable learning according to STP state When port is put into LISTENING state it shouldn't populate the FDB, so set the port's STP state in hardware to DISCARDING instead of LEARNING. It will therefore keep listening to BPDU packets, but discard other non-control packets and won't perform any learning. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:31 -08:00
Ido Schimmel	9cb026ebb8	mlxsw: spectrum: Don't forward packets when STP state is DISABLED When STP state is set to DISABLED the port is assumed to be inactive, but currently we forward packets ingressing through it. Instead, set the port's STP state in hardware to DISCARDING, which means it doesn't forward packets or perform any learning, but it does trap control packets. However, these packets will be dropped by bridge code, which results in the expected behavior. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:30 -08:00
Ido Schimmel	039c49a6e6	mlxsw: spectrum: Flush FDB when leaving bridge As explained in previous commit, we should always take care of flushing the FDB in the driver and not rely on bridge code. We need to distinguish between two cases with regards to LAG: 1) Port is leaving LAG while LAG is bridged (or VLAN devices on top of it). In this case don't flush the FDB entries pointing to the LAG ID, as this will affect other ports still member in the LAG. Only flush the FDB when the last port in the LAG is leaving the bridge. 2) LAG device is leaving the bridge. In this case the CHANGEUPPER event is simply propagated to each member port, so make each port flush the FDB in its turn. Note that emptying a bridged LAG from ports creates an inconsistency between hardware and software. A user who later (< ageing_time) re-populates the LAG won't have any FDB entries pointing to the LAG ID in hardware, but they will be present in the software bridge's FDB. Currently there is no good solution to this problem, but this will be addressed by us in the future. In order to optimize the flushing process, flush by port or LAG ID if there are no VLAN interfaces on top of the port. Otherwise, flush using (Port / LAG ID, FID=VID} for each of the lower 4K FIDs. In the case of VLAN device simply flush using {Port / LAG ID, vFID} with the vFID to which the VLAN device is mapped to. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:30 -08:00
Ido Schimmel	4193327135	mlxsw: reg: Add the Switch Filtering DB Flush register When removing a net device from a bridge we should flush the FDB entries associated with this net device. Up until now, we relied upon bridge code to do that for us, but it is possible for user to prevent hardware from syncing with the software bridge (learning_sync=0), so we need to flush overselves. Add the Switch Filtering DB Flush (SFDF) register that is used to flush FDB entries according to different parameters (per-port, per-FID etc). Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:30 -08:00
Ido Schimmel	4dc236c317	mlxsw: spectrum: Handle port leaving LAG while bridged It is possible for a user to remove a port from a LAG device, while the LAG device or VLAN devices on top of it are bridged. In these cases, bridge's teardown sequence is never issued, so we need to take care of it ourselves. When LAG's unlinking event is received by port netdev: 1) Traverse its vPorts list and make those member in a bridge leave it. They will be deleted later by LAG code. 2) Make the port netdev itself leave its bridge if member in one. Fixes: `0d65fc1304` ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 15:55:30 -08:00
Linus Torvalds	048ccca8c1	Initial roundup of 4.5 merge window patches - Remove usage of ib_query_device and instead store attributes in ib_device struct - Move iopoll out of block and into lib, rename to irqpoll, and use in several places in the rdma stack as our new completion queue polling library mechanism. Update the other block drivers that already used iopoll to use the new mechanism too. - Replace the per-entry GID table locks with a single GID table lock - IPoIB multicast cleanup - Cleanups to the IB MR facility - Add support for 64bit extended IB counters - Fix for netlink oops while parsing RDMA nl messages - RoCEv2 support for the core IB code - mlx4 RoCEv2 support - mlx5 RoCEv2 support - Cross Channel support for mlx5 - Timestamp support for mlx5 - Atomic support for mlx5 - Raw QP support for mlx5 - MAINTAINERS update for mlx4/mlx5 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates - Add support for remote invalidate to the iSER driver (pushed through the RDMA tree due to dependencies, acknowledged by nab) - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies, acknowledged by Bruce) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg 095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k 7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000 s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B KEMcM2we4bz+uyKMjEAD =5oO7 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.5 merge window patches - Remove usage of ib_query_device and instead store attributes in ib_device struct - Move iopoll out of block and into lib, rename to irqpoll, and use in several places in the rdma stack as our new completion queue polling library mechanism. Update the other block drivers that already used iopoll to use the new mechanism too. - Replace the per-entry GID table locks with a single GID table lock - IPoIB multicast cleanup - Cleanups to the IB MR facility - Add support for 64bit extended IB counters - Fix for netlink oops while parsing RDMA nl messages - RoCEv2 support for the core IB code - mlx4 RoCEv2 support - mlx5 RoCEv2 support - Cross Channel support for mlx5 - Timestamp support for mlx5 - Atomic support for mlx5 - Raw QP support for mlx5 - MAINTAINERS update for mlx4/mlx5 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates - Add support for remote invalidate to the iSER driver (pushed through the RDMA tree due to dependencies, acknowledged by nab) - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies, acknowledged by Bruce)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits) IB/mlx5: Unify CQ create flags check IB/mlx5: Expose Raw Packet QP to user space consumers {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib IB/mlx5: Support setting Ethernet priority for Raw Packet QPs IB/mlx5: Add Raw Packet QP query functionality IB/mlx5: Add create and destroy functionality for Raw Packet QP IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types IB/mlx5: Allocate a Transport Domain for each ucontext net/mlx5_core: Warn on unsupported events of QP/RQ/SQ net/mlx5_core: Add RQ and SQ event handling net/mlx5_core: Export transport objects IB/mlx5: Expose CQE version to user-space IB/mlx5: Add CQE version 1 support to user QPs and SRQs IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext IB/sa: Fix netlink local service GFP crash IB/srpt: Remove redundant wc array IB/qib: Improve ipoib UD performance IB/mlx4: Advertise RoCE v2 support IB/mlx4: Create and use another QP1 for RoCEv2 IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers ...	2016-01-23 18:45:06 -08:00
majd@mellanox.com	427c1e7bcd	{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib When modifying a QP, the desired operation was determined in the mlx5_core using a transition table that takes the current state, the final state, and returns the desired operation. Since this logic will be used for Raw Packet QP, move the operation table to the mlx5_ib. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:09 -05:00
majd@mellanox.com	75850d0bce	IB/mlx5: Support setting Ethernet priority for Raw Packet QPs When the user changes the Address Vector(AV) in the modify QP, he provides an SL. This SL should be translated to Ethernet Priority by taking the 3 LSB bits, and modify the QP's TIS according to this Ethernet priority. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:09 -05:00
majd@mellanox.com	6d2f89df04	IB/mlx5: Add Raw Packet QP query functionality Since Raw Packet QP is composed of RQ and SQ, the IB QP's state is derived from the sub-objects. Therefore we need to query each one of the sub-objects, and decide on the IB QP's state. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:09 -05:00
majd@mellanox.com	a14c2d4bee	net/mlx5_core: Warn on unsupported events of QP/RQ/SQ When an event arrives on QP/RQ/SQ, check whether it's supported, and print a warning message otherwise. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:09 -05:00
majd@mellanox.com	e2013b212f	net/mlx5_core: Add RQ and SQ event handling RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated events are affiliated also with SQs and RQs. Since SQ, RQ and QP resource numbers do not share the same name space, a queue type field was added to the event data to specify the SW object that the event is affiliated with. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:09 -05:00
majd@mellanox.com	8d7f9ecb37	net/mlx5_core: Export transport objects To be used by mlx5_ib in the following patches for implementing RAW PACKET QP. Add mlx5_core_ prefix to alloc and delloc transport_domain since they are exposed now. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:08 -05:00
Haggai Abramovsky	cfb5e088e2	IB/mlx5: Add CQE version 1 support to user QPs and SRQs Enforce working with CQE version 1 when the user supports CQE version 1 and asked to work this way. If the user still works with CQE version 0, then use the default CQE version to tell the Firmware that the user still works in the older mode. After this patch, the kernel still reports CQE version 0. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-21 12:01:08 -05:00
Moni Shoua	3f723f42d9	net/mlx4_core: Add support for RoCE v2 entropy In RoCE v2 we need to choose a source UDP port, we do so by using entropy over the source and dest QPNs. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Moni Shoua	fca8300629	net/mlx4_core: Add support for configuring RoCE v2 UDP port In order to support RoCE v2, the hardware needs to be configured to classify certain UDP packets as RoCE v2 packets and pass it through its RoCE pipeline. This patch enables configuring this UDP port. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Moni Shoua	1da494cbc0	net/mlx4_core: Configure mlx4 hardware for mixed RoCE v1/v2 modes If the hardware supports RoCE v2 (mixed with RoCE v1) mode, we enable it. This is necessary in order to support RoCE v2. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Moni Shoua	d8ae914196	net/mlx4: Query RoCE support Query the RoCE support from firmware using the appropriate firmware commands. Downstream patches will read these capabilities and act accordingly. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-01-19 15:35:00 -05:00
Doron Tsur	0b6e26ce89	net/mlx5_core: Fix trimming down IRQ number With several ConnectX-4 cards installed on a server, one may receive irqn > 255 from the kernel API, which we mistakenly trim to 8bit. This causes EQ creation failure with the following stack trace: [<ffffffff812a11f4>] dump_stack+0x48/0x64 [<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0 [<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180 [<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core] [<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core] [<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core] [<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core] [<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core] [<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0 Fixing it by changing of the irqn type from u8 to unsigned int to support values > 255 Fixes: `61d0e73e0a` ('net/mlx5_core: Use the the real irqn in eq->irqn') Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-17 12:08:04 -05:00
Dan Carpenter	00ae40e71d	mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB There is a missing break statement so we always return -EOPNOTSUPP. Fixes: `3a49b4fde2` ('mlxsw: Adding layer 2 multicast support') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-13 10:31:21 -05:00
David S. Miller	9d367eddf3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/bonding/bond_main.c drivers/net/ethernet/mellanox/mlxsw/spectrum.h drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c The bond_main.c and mellanox switch conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 23:55:43 -05:00
Maor Gottlieb	b217ea25af	net/mlx5_core: Export flow steering API Add exports to flow steering API for mlx5_ib usage. The following functions are exported: 1. mlx5_create_auto_grouped_flow_table - used to create flow table with auto flow grouping management (create and destroy flow groups). In auto-grouped flow tables, we create groups automatically if needed (if we don't find an existing flow group with same match criteria when we add new rule). 2. mlx5_destroy_flow_table - used to destroy a flow table. 3. mlx5_add_flow_rule - used to add flow rule into a flow table. 4. mlx5_del_flow_rule - used to delete flow rule from its flow table. 5. mlx5_get_flow_namespace - used to get a handle to the required namespace sub-tree. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Maor Gottlieb	4cbdd30ed5	net/mlx5_core: Enable flow steering support for the IB driver When the driver is loaded, we create flow steering namespace for kernel bypass with nine priorities and another namespace for leftovers(in order to catch packets that weren't matched). Verbs applications will use these priorities. we found nine as a number that balances the requirements from the user and retains performance. The bypass namespace is used by verbs applications that want to bypass the kernel networking stack. The leftovers namespace is used by verbs applications and the sniffer in order to catch packets that weren't handled by any preceding rules. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Maor Gottlieb	8d40d162c0	net/mlx5_core: Initialize namespaces only when supported by device Before we create the sub tree of a steering namespaces(kernel, bypass, leftovers) we check that the device has the required capabilities in order to create this subtree. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Maor Gottlieb	655227edae	net/mlx5_core: Set priority attributes Each priority has two attributes: 1. max_ft - maximum allowed flow tables under this priority. 2. start_level - start level range of the flow tables in the priority. These attributes are set by traversing the tree nodes by DFS and set start level and max flow tables to each priority. Start level depends on the max flow tables of the prior priorities in the tree. The leaves of the trees have max_ft set in them. Each node accumulates the max_ft of its children and set it accordingly. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Maor Gottlieb	f90edfd279	net/mlx5_core: Connect flow tables Flow tables from different priorities should be chained together. When a packet arrives we search for a match in the by-pass flow tables (first we search for a match in priority 0 and if we don't find a match we move to the next priority). If we can't find a match in any of the bypass flow-tables, we continue searching in the flow-tables of the next priority, which are the kernel's flow tables. Setting the miss flow table in a new flow table to be the next one in the list is performed via create flow table API. If we want to change an existing flow table, for example in order to point from an existing flow table to the new next-in-list flow table, we use the modify flow table API. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:53 -05:00
Maor Gottlieb	34a40e6893	net/mlx5_core: Introduce modify flow table command Introduce the modify flow table command. This command is used when we want to change the next flow table of an existing flow table. The next flow table is defined as the table we search (in order to find a match), if we couldn't find a match in any of the flow table entries in the current flow table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:52 -05:00
Maor Gottlieb	2cc43b494a	net/mlx5_core: Managing root flow table The root Flow Table for each Flow Table Type is defined, by default, as the Flow Table with level 0. In order not to use an empty flow tables and introduce new hops, but still preserve space for flow-tables that have a priority greater(lower number) than the current flow table, we introduce this new set root flow table command. This command tells the HW to start matching packets from the assigned root flow table. This command is used when we create new flow table with level lower than the current lowest flow table or it is the first flow table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:52 -05:00
Maor Gottlieb	fdb6896fd9	net/mlx5_core: Add utilities to find next and prev flow-tables Add two utility functions for find next and prev flow table. Find next flow table function gets priority and return the first flow table of the next priority in the tree. Find prev flow table return the last flow table of the previous priority in the tree. These utility functions are used for chaining flow table from different priorities. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:52 -05:00
Maor Gottlieb	f0d22d1874	net/mlx5_core: Introduce flow steering autogrouped flow table When user add rule to autogrouped flow table, we search for flow group with the same match criteria, if we don't find such group then we create new flow group with the required match criteria and insert the rule to this group. We divide the flow table into required_groups + 1, in order to reserve a part of the flow table for rules which don't match any existing group. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 17:48:52 -05:00
Ido Schimmel	366ce60315	mlxsw: spectrum: Add FDB lock to prevent session interleaving Dumping the FDB (invoked with a process context) or handling FDB notifications (polled periodicly in delayed work) might each entail multiple EMAD transcations due to the number of entries. While we only allow one EMAD transaction at a time, there is nothing stopping the dump and notification processing sessions from interleaving. However, this is forbidden by the hardware, so we need to make sure only one of these sessions can run at a time. Solve this by adding a mutex ('fdb_lock'), as both kernel threads can sleep while waiting for the response EMAD. Fixes: `56ade8fe3f` ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-11 00:21:19 -05:00
Elad Raz	3a49b4fde2	mlxsw: Adding layer 2 multicast support Add SWITCHDEV_OBJ_ID_PORT_MDB switchdev ops support. On first MDB insertion creates a new multicast group (MID) and add members port to the MID. Also add new MDB entry for the flooding-domain (fid-vid) and link the MDB entry to the newly constructed MC group. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
Elad Raz	e4b6f6931c	mlxsw: Adding VID to FID translatation Adding a generic function that translate VID to FID. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
Elad Raz	53ae628316	mlxsw: Changing the maximum number of multicast group to a define Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
Elad Raz	fabe548322	mlxsw: reg: Adding SMID register Adding back SMID register definition and packing. For each MC group a new SMID entry will be generated. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
Elad Raz	5230b25f06	mlxsw: reg: Add definition of multicast record for SFD register Multicast-related records have specific format in SFD register. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
Jiri Pirko	12f1501e75	mlxsw: spectrum: remove FDB entry in case we get unknown object notification It may happen that we get notification for FDB entry for object (port, lag, vport), which does not exist. Currently we ignore that, which only causes this being re-sent in next notification. The entry will never disappear. So get rid of it by simply removing it using SFD register. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-09 21:06:33 -05:00
Jiri Pirko	2fa9d45e16	mlxsw: spectrum: pass local_port to mlxsw_sp_port_fdb_uc_op Do not pass struct mlxsw_sp_port to mlxsw_sp_port_fdb_uc_op and rather just pass local_port. This is needed in case this is called from SFN process function and mlxsw_sp_port is not existent for particular local_port. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-09 21:06:33 -05:00
Dan Carpenter	719255d0e2	mlxsw: core: remove an unnecessary condition We checked "err" on the lines before so we know it's zero here. These cause a static checker warning because checking known things can indicate a bug. Maybe there is a missing assignment or we are checking the wrong variable. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 15:07:47 -05:00
Elad Raz	fc1273afb2	mlxsw: Remember untagged VLANs When a vlan is been configured, remeber the untagged mode of the vlan. When displaying the list of configured VLANs, show the untagged attribute. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:42 -05:00
Elad Raz	26a4ea0f45	mlxsw: Disable vlan_filtering for non .1D bridge When a port is bridged, the bridge must be vlan aware bridge (.1Q) or the bridging should be on top of VLAN interfaces (.1D bridge). Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:41 -05:00
Elad Raz	e4a1305507	mlxsw: Renaming local variable names for consistency Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:41 -05:00
Elad Raz	29edf44f85	mlxsw: Fixing vlans init range Initialize VLANs 0..4095 (Remove init for VID 4096). Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:41 -05:00
Ido Schimmel	f0138e2596	mlxsw: pci: Adjust value of CPU egress traffic class During initialization, when creating the send descriptor queues (SDQs), we specify the CPU egress traffic class of each SDQ. The maximum number of classes of this type is different in the two ASICs supported by this PCI driver. New firmware versions check this value is set correctly, which causes errors on the Spectrum ASIC, as its max exposed egress traffic class is lower than 7. Solve this by setting this field to 3, which is an acceptable value for both ASICs. Note that we currently do not expose the QoS capabilities of the ASICs, so setting this to an hardcoded value is OK for now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 00:48:50 -05:00
Eran Ben Elisha	3d8c38af14	net/mlx5e: Add PTP Hardware Clock (PHC) support Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to protect the timecounter since every packet received needs to call timecounter_cycle2time() when timestamping is enabled. This can become a performance bottleneck with RSS and multiple receive queues if normal spinlocks are used. The driver has been tested with both Documentation/ptp/testptp and the linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox ConnectX-4 card. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-05 14:11:50 -05:00
Eran Ben Elisha	ef9814deaf	net/mlx5e: Add HW timestamping (TS) support Add support for enable/disable HW timestamping for incoming and/or outgoing packets. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. Add internal clock for converting hardware timestamp to nanoseconds. In addition, add a service task to catch internal clock overflow, to make sure timestamping is accurate. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-05 14:11:50 -05:00
Eran Ben Elisha	b084444459	net/mlx5_core: Introduce access function to read internal timer A preparation step which adds support for reading the hardware internal timer and the hardware timestamping from the CQE. In addition, advertize device_frequency_khz HCA capability. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-05 14:11:50 -05:00
Achiad Shochat	34802a42b3	net/mlx5e: Do not modify the TX SKB If the SKB is cloned, or has an elevated users count, someone else can be looking at it at the same time. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-05 14:11:50 -05:00
Ido Schimmel	6c72a3d0d3	mlxsw: spectrum: Change bridge port attributes only when bridged Bridge port attributes are offloaded to hardware when invoked with SELF flag set, but it really makes no sense to reflect them when port is not bridged. Allow a user to change these attribute only when port is bridged and initialize them correctly when joining or leaving a bridge. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-04 22:07:58 -05:00
Ido Schimmel	5a8f45258e	mlxsw: spectrum: Set bridge status in appropriate functions Set the bridge status of physical ports in the appropriate functions, to be consistent with LAG join/leave and vPorts joining/leaving bridge. Also, remove the error messages in these two functions, as we already emit errors in both the single functions they call. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-04 22:07:58 -05:00
Ido Schimmel	78124078c4	mlxsw: spectrum: Return NOTIFY_BAD on bridge failure It is possible for us to fail when joining or leaving a bridge, so let the user know about that by returning NOTIFY_BAD, as already done for LAG join/leave and 802.1D bridges. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-04 22:07:58 -05:00

... 3 4 5 6 7 ...

1702 Commits