linux

Author	SHA1	Message	Date
Ido Schimmel	a2d2a20553	mlxsw: spectrum: Replace hard-coded default VID with a define Subsequent patches are going to replace the current default VID (1) with VLAN_N_VID - 1 (4095). Prepare for this conversion by replacing the hard-coded '1' with a define. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 15:48:54 -08:00
Ido Schimmel	f40be47a3e	mlxsw: spectrum_router: Do not force specific configuration order In symmetric routing, the only two members in the VLAN corresponding to the L3 VNI are the router port and the VXLAN tunnel. In case the VXLAN device is already enslaved to the bridge and only later the VLAN interface is configured, the tunnel will not be offloaded. The reason for this is that when the router interface (RIF) corresponding to the VLAN interface is configured, it calls the core fid_get() API which does not check if NVE should be enabled on the FID. Instead, call into the bridge code which will check if NVE should be enabled on the FID. This effectively means that the same code path is used to retrieve a FID when either a local port or a router port joins the FID. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 15:48:54 -08:00
David S. Miller	6eea2db210	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-12-20 This series contains updates to e100, igb, ixgbe, i40e and ice drivers. I replaced spinlocks for mutex locks to reduce the latency on CPU0 for igb when updating the statistics. This work was based off a patch provided by Jan Jablonsky, which was against an older version of the igb driver. Jesus adjusts the receive packet buffer size from 32K to 30K when running in QAV mode, to stay within 60K for total packet buffer size for igb. Vinicius adds igb kernel documentation regarding the CBS algorithm and its implementation in the i210 family of NICs. YueHaibing from Huawei fixed the e100 driver that was potentially passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL() instead. Konstantin Khorenko fixes i40e where we were not setting up the neigh_priv_len in our net_device, which caused the driver to read beyond the neighbor entry allocated memory. Miroslav Lichvar extends the PTP gettime() to read the system clock by adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e. Young Xiao fixed the ice driver to only enable NAPI on q_vectors that actually have transmit and receive rings. Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the NIC does not wake up when a cable is plugged in. This was due to the driver not setting PME during runtime suspend. Stephen Douthit enables the ixgbe driver allow DSA devices to use the MII interface to talk to switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 15:34:30 -08:00
Steve Douthit	643bae17fd	ixgbe: use mii_bus to handle MII related ioctls Use the mii_bus callbacks to address the entire clause 22/45 address space. Enables userspace to poke switch registers instead of a single PHY address. The ixgbe firmware may be polling PHYs in a way that is not protected by the mii_bus lock. This isn't new behavior, but as Andrew Lunn pointed out there are more addresses available for conflicts. Signed-off-by: Stephen Douthit <stephend@silicom-usa.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:22:39 -08:00
Steve Douthit	8fa10ef012	ixgbe: register a mdiobus Most dsa devices expect a 'struct mii_bus' pointer to talk to switches via the MII interface. While this works for dsa devices, it will not work safely with Linux PHYs in all configurations since the firmware of the ixgbe device may be polling some PHY addresses in the background. Signed-off-by: Stephen Douthit <stephend@silicom-usa.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:19:11 -08:00
Kai-Heng Feng	1fb3a7a75e	igb: Fix an issue that PME is not enabled during runtime suspend I210 ethernet card doesn't wakeup when a cable gets plugged. It's because its PME is not set. Since commit `42eca23021` ("PCI: Don't touch card regs after runtime suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops calling pci_finish_runtime_suspend(), which enables the PCI PME. To fix the issue, let's not to save PCI states when it's runtime suspend, to let the PCI subsystem enables PME. Fixes: `42eca23021` ("PCI: Don't touch card regs after runtime suspend D3") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:14:23 -08:00
Young Xiao	eec903769b	ice: Do not enable NAPI on q_vectors that have no rings If ice driver has q_vectors w/ active NAPI that has no rings, then this will result in a divide by zero error. To correct it I am updating the driver code so that we only support NAPI on q_vectors that have 1 or more rings allocated to them. See commit `13a8cd191a` ("i40e: Do not enable NAPI on q_vectors that have no rings") for detail. Signed-off-by: Young Xiao <YangX92@hotmail.com> Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:10:24 -08:00
Miroslav Lichvar	9a2d57a7a0	i40e: extend PTP gettime function to read system clock This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Cc: Richard Cochran <richardcochran@gmail.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:06:35 -08:00
Konstantin Khorenko	31389b53b3	i40e: define proper net_device::neigh_priv_len Out of bound read reported by KASan. i40iw_net_event() reads unconditionally 16 bytes from neigh->primary_key while the memory allocated for "neighbour" struct is evaluated in neigh_alloc() as tbl->entry_size + dev->neigh_priv_len where "dev" is a net_device. But the driver does not setup dev->neigh_priv_len and we read beyond the neigh entry allocated memory, so the patch in the next mail fixes this. Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 12:02:26 -08:00
YueHaibing	cd0d465bb6	e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait Fix a static code checker warning: drivers/net/ethernet/intel/e100.c:1349 e100_load_ucode_wait() warn: passing zero to 'PTR_ERR' Signed-off-by: YueHaibing <yuehaibing@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 11:54:27 -08:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
Jesus Sanchez-Palencia	6f9ae17530	igb: Change RXPBSIZE size when setting Qav mode Section 4.5.9 of the datasheet says that the total size of all packet buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not exceed 60KB. Today we are configuring a total of 62KB, so reduce the RxPB from 32KB to 30KB in order to respect that. The choice of changing RxPBSIZE here is mainly because it seems more correct to give more priority to the transmit packet buffers over the receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC sizes are already too short. Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 11:45:10 -08:00
Jeff Kirsher	59361316af	igb: reduce CPU0 latency when updating statistics This change is based off of the work and suggestion of Jan Jablonsky <jan.jablonsky@thalesgroup.com>. The Watchdog workqueue in igb driver is scheduled every 2s for each network interface. That includes updating a statistics protected by spinlock. Function igb_update_stats in this case will be protected against preemption. According to number of a statistics registers (cca 60), processing this function might cause additional cpu load on CPU0. In case of statistics spinlock may be replaced with mutex, which reduce latency on CPU0. CC: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com> CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-12-20 11:02:06 -08:00
Michael Chan	0c2ff8d796	bnxt_en: Adjust default RX coalescing ticks to 10 us. For a little better performance on faster machines and faster link speeds. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Venkat Duvvuru	abd43a1352	bnxt_en: Support for 64-bit flow handle. Older firmware only supports 16-bit flow handle, because of which the number of flows that can be offloaded can’t scale beyond a point. Newer firmware supports 64-bit flow handle enabling the host to scale upto millions of flows. With the new 64-bit flow handle support, driver has to query flow stats in a different way compared to the older approach. This patch adds support for 64-bit flow handle and new way to query flow stats. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Michael Chan	cf6daed098	bnxt_en: Increase context memory allocations on 57500 chips for RDMA. If RDMA is supported on the 57500 chip, increase context memory allocations for the resources used by RDMA. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Michael Chan	08fe9d1816	bnxt_en: Add Level 2 context memory paging support. Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls() to allocate and free pages for context memory. The new functions will handle the different levels of paging support and allocate/free the pages accordingly using the existing functions. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Michael Chan	4f49b2b8d4	bnxt_en: Enhance bnxt_alloc_ring()/bnxt_free_ring(). To support level 2 context page memory structures, enhance the bnxt_ring_mem_info structure with a "depth" field to specify the page level and add a flag to specify using full pages for L1 and L2 page tables. This is needed to support RDMA functionality on 57500 chips since RDMA requires more context memory. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Venkat Duvvuru	760b6d3341	bnxt_en: Add support for 2nd firmware message channel. Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed by KONG processor were sent to the CHIMP processor from the host. This approach was taken as there was no direct message channel to KONG. CHIMP in turn used to send them to KONG. Newer firmware supports a new message channel which the host can send messages directly to the KONG processor. This patch adds support for required changes needed in the driver to support direct KONG message channel. This speeds up flow related messages sent to the firmware for CLS_FLOWER offload. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Venkat Duvvuru	5c209fc821	bnxt_en: Introduce bnxt_get_hwrm_resp_addr & bnxt_get_hwrm_seq_id routines. These routines will be enhanced in the subsequent patch to return the 2nd firmware comm. channel's hwrm response address & sequence id respectively. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:16 -08:00
Venkat Duvvuru	89455017fb	bnxt_en: Avoid arithmetic on void * pointer. Typecast hwrm_cmd_resp_addr to (u8 ) from (void ) before doing arithmetic. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:15 -08:00
Venkat Duvvuru	2e9ee39877	bnxt_en: Use macros for firmware message doorbell offsets. In preparation for adding a 2nd communication channel to firmware. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:15 -08:00
Venkat Duvvuru	fc718bb2d1	bnxt_en: Set hwrm_intr_seq_id value to its inverted value. Set hwrm_intr_seq_id value to its inverted value instead of HWRM_SEQ_INVALID, when an hwrm completion of type CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use the complete 16-bit sequence ID space. Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:15 -08:00
Michael Chan	3322479e6d	bnxt_en: Update firmware interface spec. to 1.10.0.33. The major changes are in the flow offload firmware APIs. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 08:26:15 -08:00
Peng Li	1154bb26c8	net: hns3: remove redundant variable initialization This patch removes the redundant variable initialization, as driver will devm_kzalloc to set value to hdev soon. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:59 -08:00
Peng Li	31a16f99e0	net: hns3: fix the descriptor index when get rss type Driver gets rss information from the last descriptor of the packet. When driver handle the rss type, ring->next_to_clean indicates the first descriptor of next packet. This patch fix the descriptor index with "ring->next_to_clean - 1". Fixes: `232fc64b6e` ("net: hns3: Add HW RSS hash information to RX skb") Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:59 -08:00
Jian Shen	8edc2285b7	net: hns3: don't restore rules when flow director is disabled When user disables flow director, all the rules will be disabled. But when reset happens, it will restore all the rules again. It's not reasonable. This patch fixes it by add flow director status check before restore fules. Fixes: `6871af29b3` ("net: hns3: Add reset handle for flow director") Fixes: `c17852a893` ("net: hns3: Add support for enable/disable flow director") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:59 -08:00
Jian Shen	0285dbae5d	net: hns3: fix vf id check issue when add flow director rule When add flow director fule for vf, the vf id is used as array subscript before valid checking, which may cause memory overflow. Fixes: `dd74f815dd` ("net: hns3: Add support for rule add/delete for flow director") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Huazhong Tan	39cfbc9c4f	net: hns3: reset tqp while doing DOWN operation While doing DOWN operation, the driver will reclaim the memory which has already used for TX. If the hardware is processing this memory, it will cause a RCB error to the hardware. According the hardware's description, the driver should reset the tqp before reclaim the memory during DOWN. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Jian Shen	75edb61086	net: hns3: add max vector number check for pf Each pf supports max 64 vectors and 128 tqps. For 2p/4p core scenario, there may be more than 64 cpus online. So the result of min_t(u16, num_Online_cpus(), tqp_num) may be more than 64. This patch adds check for the vector number. Fixes: `dd38c72604` ("net: hns3: fix for coalesce configuration lost during reset") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Peng Li	1b7d7b0581	net: hns3: fix a bug caused by udelay udelay() in driver may always occupancy processor. If there is only one cpu in system, the VF driver may initialize fail when insmod PF and VF driver in the same system. This patch use msleep() to free cpu when VF wait PF message. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Jian Shen	a298797532	net: hns3: change default tc state to close In original codes, default tc value is set to the max tc. It's more reasonable to close tc by changing default tc value to 1. Users can enable it with lldp tool when they want to use tc. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Jian Shen	8cdb992f0d	net: hns3: refine the handle for hns3_nic_net_open/stop() When triggering nic down, there is a time window between bringing down the protocol stack and stopping the work task. If the net is up in the time window, it may bring up the protocol stack again. This patch fixes it by stop the work task at the beginning of hns3_nic_net_stop(). To keep symmetrical, start the work task at the end of hns3_nic_net_open(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 23:47:58 -08:00
Andrew Lunn	9c7f37e5ca	net: dsa: mv88e6xxx: Add missing watchdog ops for 6320 family The 6320 family of switches uses the same watchdog registers as the 6390. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 16:40:53 -08:00
Antoine Tenart	1b451fb205	net: mvpp2: fix the phylink mode validation The mvpp2_phylink_validate() sets all modes that are supported by a given PPv2 port. An mistake made the 10000baseT_Full mode being advertised in some cases when a port wasn't configured to perform at 10G. This patch fixes this. Fixes: `d97c9f4ab0` ("net: mvpp2: 1000baseX support") Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 16:38:35 -08:00
Biao Huang	22a3a5403b	net-next: stmmac: dwmac-mediatek: remove fine-tune property 1. remove fine-tune property and related setting to simplify the timing adjustment flow. 2. set timing value according to the value from device tree, and will not care whether PHY insert internal delay. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 16:24:58 -08:00
Bryan Whitehead	e0e587878f	lan743x: Remove MAC Reset from initialization The MAC Reset was noticed to erase important EEPROM settings. It is also unnecessary since a chip wide reset was done earlier in initialization, and that reset preserves EEPROM settings. There for this patch removes the unnecessary MAC specific reset. Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 15:48:11 -08:00
Alaa Hleihel	4765420439	net/mlx5e: Remove the false indication of software timestamping support mlx5 driver falsely advertises support of software timestamping. Fix it by removing the false indication. Fixes: `ef9814deaf` ("net/mlx5e: Add HW timestamping (TS) support") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
Yuval Avnery	f033788914	net/mlx5: Typo fix in del_sw_hw_rule Expression terminated with "," instead of ";", resulted in set_fte getting bad value for modify_enable_mask field. Fixes: `bd5251dbf1` ("net/mlx5_core: Introduce flow steering destination of type counter") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
Tariq Toukan	bfc698254b	net/mlx5e: RX, Fix wrong early return in receive queue poll When the completion queue of the RQ is empty, do not immediately return. If left-over decompressed CQEs (from the previous cycle) were processed, need to go to the finalization part of the poll function. Bug exists only when CQE compression is turned ON. This solves the following issue: mlx5_core 0000:82:00.1: mlx5_eq_int:544:(pid 0): CQ error on CQN 0xc08, syndrome 0x1 mlx5_core 0000:82:00.1 p4p2: mlx5e_cq_error_event: cqn=0x000c08 event=0x04 Fixes: `4b7dfc9925` ("net/mlx5e: Early-return on empty completion queues") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-12-19 13:31:16 -08:00
Ido Schimmel	b61cd7c6f9	mlxsw: spectrum_router: Hold a reference on RIF's netdev Previous patches tried to make RIF deletion more robust and avoid use-after-free situations. As another precaution, hold a reference on a RIF's netdev and release it when the RIF is deleted. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	965fa8e600	mlxsw: spectrum_router: Make RIF deletion more robust In the past we had multiple instances where RIFs were not properly deleted. One of the reasons for leaking a RIF was that at the time when IP addresses were flushed from the respective netdev (prompting the destruction of the RIF), the netdev was no longer a mlxsw upper. This caused the inet{,6}addr notification blocks to ignore the NETDEV_DOWN event and leak the RIF. Instead of checking whether the netdev is our upper when an IP address is removed, we can instead check if the netdev has a RIF configured. To look up a RIF we need to access mlxsw private data, so the patch stores the notification blocks inside a mlxsw struct. This then allows us to use container_of() and extract the required private data. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	21ffedb6db	mlxsw: spectrum_router: Propagate 'struct mlxsw_sp' further Next patch is going to make RIF deletion more robust by removing reliance on fragile mlxsw_sp_lower_get(). This is because a netdev is not necessarily our upper anymore when its IP addresses are flushed. The inet{,6}addr notification blocks are going to resolve 'struct mlxsw_sp' using container_of(), but the functions they call still use mlxsw_sp_lower_get(). As a preparation for the next patch, propagate 'struct mlxsw_sp' down to the functions called from the notification blocks and remove reliance on mlxsw_sp_lower_get(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	be2d6f421f	mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG When a LAG device or a VLAN device on top of it is enslaved to a bridge, the driver propagates the CHANGEUPPER event to the LAG's slaves. This causes each physical port to increase the reference count of the internal representation of the bridge port by calling mlxsw_sp_port_bridge_join(). However, when a port is removed from a LAG, the corresponding leave() function is not called and the reference count is not decremented. This leads to ugly hacks such as mlxsw_sp_bridge_port_should_destroy() that try to understand if the bridge port should be destroyed even when its reference count is not 0. Instead, make sure that when a port is unlinked from a LAG it would see the same events as if the LAG (or its uppers) were unlinked from a bridge. The above is achieved by walking the LAG's uppers when a port is unlinked and calling mlxsw_sp_port_bridge_leave() for each upper that is enslaved to a bridge. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	635c8c8bba	mlxsw: spectrum: Remove reference count from VLAN entries Commit `b3529af6bb` ("spectrum: Reference count VLAN entries") started reference counting port-VLAN entries in a similar fashion to the 8021q driver. However, this is not actually needed and only complicates things. Instead, the driver should forbid the creation of a VLAN on a port if this VLAN already exists. This would also solve the issue fixed by the mentioned commit. Therefore, remove the get()/put() API and use create()/destroy() instead. One place that needs special attention is VLAN addition in a VLAN-aware bridge via switchdev operations. In case the VLAN flags (e.g., 'pvid') are toggled, then the VLAN entry already exists. To prevent the driver from wrongly returning EEXIST, the driver is changed to check in the prepare phase whether the entry already exists and only returns an error in case it is not associated with the correct bridge port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	e149113a74	mlxsw: spectrum: Handle VLAN device unlinking In commit `993107fea5` ("mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl") I fixed a bug caused by the fact that the driver views differently the deletion of a VLAN device when it is deleted via an ioctl and netlink. Instead of relying on a specific order of events (device being unregistered vs. VLAN filter being updated), simply make sure that the driver performs the necessary cleanup when the VLAN device is unlinked, which always happens before the other two events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	f1d7c33d6a	mlxsw: spectrum_fid: Remove unused function This function is no longer used. Remove it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	32fd4b49a3	mlxsw: spectrum_router: Do not destroy RIFs based on FID's reference count Currently, when a RIF is constructed on top of a FID, the RIF increments the FID's reference count and the RIF is destroyed when the FID's reference count drops to 1. This effectively means that when no local ports are member in the FID, the FID is destroyed regardless if the router port is a member in the FID or not. The above can lead to the unexpected behavior in which routes using a VLAN interface as their nexthop device are no longer offloaded after the last local port leaves the corresponding VLAN (FID). Example: # ip -4 route show dev br0.10 192.0.2.0/24 proto kernel scope link src 192.0.2.1 offload # bridge vlan del vid 10 dev swp3 # ip -4 route show dev br0.10 192.0.2.0/24 proto kernel scope link src 192.0.2.1 After the patch, the route is offloaded before and after the VLAN is removed from local port 'swp3', as the RIF corresponding to 'br0.10' continues to exists. In order to remove RIFs' reliance on the underlying FID's reference count, we need to add a reference count to sub-port RIFs, which are RIFs that correspond to physical ports and their uppers (e.g., LAG devices). In this case, each {Port, VID} ('struct mlxsw_sp_port_vlan') needs to hold a reference on the RIF. For example: bond0.10 \| bond0 \| +-------+ \| \| swp1 swp2 Both {Port 1, VID 10} and {Port 2, VID 10} will hold a reference on the RIF corresponding to 'bond0.10'. When the last reference is dropped, the RIF will be destroyed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Ido Schimmel	927d0ef10a	mlxsw: spectrum: Sanitize VLAN interface's uppers Currently, only VRF and macvlan uppers are supported on top of VLAN device configured over a bridge, so make sure the driver forbids other uppers. Note that enslavement to a VRF is handled earlier in the notification block, so there is no need to check for a VRF upper here. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 12:28:07 -08:00
Michael Chan	84404d5fd5	bnxt_en: Fix ethtool self-test loopback. The current code has 2 problems. It assumes that the RX ring for the loopback packet is combined with the TX ring. This is not true if the ethtool channels are set to non-combined mode. The second problem is that it won't work on 57500 chips without adjusting the logic to get the proper completion ring (cpr) pointer. Fix both issues by locating the proper cpr pointer through the RX ring. Fixes: `e44758b78a` ("bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 11:31:09 -08:00

1 2 3 4 5 ...

80689 Commits