linux

Author	SHA1	Message	Date
Mohamad Haj Yahia	105433659d	net/mlx5: Add support to s-tag in mlx5 firmware interface Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry match param. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2017-01-19 23:19:55 +02:00
Peter Zijlstra	2c935bc572	locking/atomic, kref: Add kref_read() Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-14 11:37:18 +01:00
Jack Wang	102c5ce082	RDMA/cma: use cached port state when bind loopback Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Michael Wang <yun.wang@profitbricks.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 23:00:04 -05:00
Jack Wang	93b1f29de7	RDMA/cma: resolve to first active ib port When we try to resolve a dest addr, if we don't give src addr, cma core will try to resolve to our source ib device automatically. The current logic only checks if a given port has the same subnet_prefix as our dest, which is not enough if we use default well known subnet_prefix on our active port, as it will be the same as the subnet_prefix on inactive ports and we might match against an inactive port by accident. To resolve this, we should also check if port is active before we resolve it as a suitable src address for a given dest. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Michael Wang <yun.wang@profitbricks.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 23:00:04 -05:00
Jack Wang	9e2c3f1c7f	RDMA/core: export ib_get_cached_port_state Export function for rdma_cm, patch for rdma_cm to follow. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Michael Wang <yun.wang@profitbricks.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 23:00:00 -05:00
Jack Wang	aaaca121c7	RDMA/core: add port state cache We need a port state cache in ib_core, later we will use in rdma_cm. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Michael Wang <yun.wang@profitbricks.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 22:59:55 -05:00
Feras Daoud	27d41d29c7	IB/ipoib: Change list_del to list_del_init in the tx object Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function belong to different work queues, they can run in parallel. In this case if ipoib_cm_tx_reap calls list_del and release the lock, ipoib_cm_tx_start may acquire it and call list_del_init on the already deleted object. Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem. Fixes: `839fcaba35` ("IPoIB: Connected mode experimental support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:06 -05:00
Feras Daoud	c586071d1d	IB/ipoib: Replace list_del of the neigh->list with list_del_init In order to resolve a situation where a few process delete the same list element in sequence and cause panic, list_del is replaced with list_del_init. In this case if the first process that calls list_del releases the lock before acquiring it again, other processes who can acquire the lock will call list_del_init. Fixes: `b63b70d877` ("IPoIB: Use a private hash table for path lookup") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:05 -05:00
Feras Daoud	13ee429a02	IB/ipoib: Use debug prints instead of warnings in RNR WC status If a receive request has not been posted to the work queue, the incoming message is rejected and the peer will receive a receiver-not-ready (RNR) error. In IPoIB, IB_WC_RNR_RETRY_EXC_ERR error is part of the life cycle therefore ipoib_cm_handle_tx_wc function will print to debug instead of warnings. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:05 -05:00
Feras Daoud	d32b9a81d7	IB/ipoib: Add detailed error message to dev_queue_xmit call Add a detailed return code to dev_queue_xmit function when calling to requeue packet via __skb_dequeue. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:04 -05:00
Feras Daoud	89a3987ab7	IB/ipoib: rtnl_unlock can not come after free_netdev The ipoib_vlan_add function calls rtnl_unlock after free_netdev, rtnl_unlock not only releases the lock, but also calls netdev_run_todo. The latter function browses the net_todo_list array and completes the unregistration of all its net_device instances. If we call free_netdev before rtnl_unlock, then netdev_run_todo call over the freed device causes panic. To fix, move rtnl_unlock call before free_netdev call. Fixes: `9baa0b0364` ("IB/ipoib: Add rtnl_link_ops support") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:04 -05:00
Feras Daoud	0a0007f283	IB/ipoib: Fix deadlock between rmmod and set_mode When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: `862096a8bb` ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: <stable@vger.kernel.org> # v3.6+ Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:03 -05:00
Feras Daoud	1c3098cdb0	IB/ipoib: Fix deadlock over vlan_mutex This patch fixes Deadlock while executing ipoib_vlan_delete. The function takes the vlan_rwsem semaphore and calls unregister_netdevice. The later function calls ipoib_mcast_stop_thread that cause workqueue flush. When the queue has one of the ipoib_ib_dev_flush_xxx events, a deadlock occur because these events also tries to catch the same vlan_rwsem semaphore. To fix, unregister_netdevice should be called after releasing the semaphore. Fixes: `cbbe1efa49` ("IPoIB: Fix deadlock between ipoib_open() and child interface create") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:02 -05:00
Feras Daoud	80b5b35aba	IB/ipoib: Set device connection mode only when needed When changing the connection mode, the ipoib_set_mode function did not check if the previous connection mode equals to the new one. This commit adds the required check and return 0 if the new mode equals to the previous one. Fixes: `839fcaba35` ("IPoIB: Connected mode experimental support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 14:01:02 -05:00
Feras Daoud	29da686dff	IB/ipoib: When given an invalid UD MTU, give debug msg In datagram mode, the IB UD (Unreliable Datagram) transport is used so the MTU of the interface is equal to the IB L2 MTU minus the IPoIB encapsulation header. Any request to change the MTU value above the maximum range will change the MTU to the max allowed, but will not show any warning message. An ipoib_warn is issued in such cases, letting the user know that even though the value is legal, it can't be currently applied. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 13:59:56 -05:00
ssh10	db287ec5cb	RDMA/ocrdma: Replace BUG() with BUG_ON() Replace BUG() with BUG_ON() using coccinelle Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 12:21:52 -05:00
ssh10	b462b06eb6	RDMA/cxgb4: Use AF_INET for sin_family field Elsewhere the sin_family field holds a value with a name of the form AF_..., so it seems reasonable to do so here as well. Also the values of PF_INET and AF_INET are the same. The semantic patch that makes this change is as follows: //</smpl> @@ struct sockaddr_in sip; @@ ( sip.sin_family == - PF_INET + AF_INET \| sip.sin_family != - PF_INET + AF_INET \| sip.sin_family = - PF_INET + AF_INET ) //</smpl> Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 12:21:52 -05:00
Amrani, Ram	df15856132	RDMA/qedr: restructure functions that create/destroy QPs Simplify function and sub-function flow of QP creation and destruction. This also serves as a preparation for SRQ and iWARP support. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 12:21:41 -05:00
Geliang Tang	bb75f33cf0	RDMA/qib: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 11:38:41 -05:00
Cao jin	e8f4eb3bfa	RDMA/hfi1: drop pci_link_reset() In AER recovery, pci_error_handlers.link_reset() is never called, drop it now. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 11:38:41 -05:00
Cao jin	850d08721a	RDMA/qib: drop qib_pci_link_reset() In AER recovery, pci_error_handlers.link_reset() is never called, drop it now. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 11:38:41 -05:00
Kees Cook	7f6856b789	RDMA/i40iw: use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 11:38:41 -05:00
Kees Cook	6554c9f7f7	RDMA/nes: use designated initializers Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-12 11:38:41 -05:00
Bart Van Assche	c5540a0195	IB/rxe: Fix an skb leak Additionally, make it easier to detect skb leaks by issuing a warning if a leak occurs. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Cc: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	839f5ac0d8	IB/rxe: Remove a pointless indirection layer Neither rxe->ifc_ops nor any of the function pointers in struct struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops indirection mechanism. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	ab17654476	IB/rxe: Fix reference leaks in memory key invalidation code Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	b3a4599610	IB/rxe: Fix a MR reference leak in check_rkey() Avoid that calling check_rkey() for mem->state == RXE_MEM_STATE_FREE triggers an MR reference leak. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	18d3451c0d	IB/rxe: Generate a completion for all failed work requests Change do_complete() such that an error completion is not only generated if a QP is in the error state but also if a work request failed. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	723ec9ae2a	IB/rxe: Introduce functions for queue draining This change makes the code easier to read and avoids that code is duplicated. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	642c7cbcaf	IB/rxe: Add a runtime check in alloc_index() Since index values equal to or above 'range' can trigger memory corruption, complain if index >= range. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	43553b47c3	IB/rxe: Issue warnings once It is strongly recommended to report kernel warnings once instead of every time a condition is hit. Hence change WARN_ON() into WARN_ON_ONCE() / BUILD_BUG_ON() as appropriate. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	32404fb764	IB/rxe: Let the compiler check the type of the cleanup functions Change the argument type of these functions from void * into struct rxe_pool_entry *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	046ef24d25	IB/rxe: Enable type checking on SKB_TO_PKT() and PKT_TO_SKB() arguments Let the compiler check the type of the arguments passed to SKB_TO_PKT() and PKT_TO_SKB(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	967335ab90	IB/rxe: Remove superfluous casts Casting a pointer to 'void *' explicitly is not necessary in C code. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	175f1244c1	IB/rxe: Remove an unused variable and an unused argument The variable 'av' is not used so remove it. Since that change removes the last user of the 'wqe' argument, remove that argument too. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	c8b82182cb	IB/rxe: Remove an unused function Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	2bec3baded	IB/rxe: Constify the pool name Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Bart Van Assche	8d8f083720	IB/rxe: Suppress sparse warnings Avoid that sparse complains about using 0 as a pointer, about missing function declarations and also avoid that sparse complains about endianness. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 16:52:47 -05:00
Selvin Xavier	69ae543969	RDMA: Adding ethertype ETH_P_IBOE Update the if_ether.h with the ethertype for Infiniband over Ethernet packets. Also, removing the occurances of 0x8915 from infiniband vendor drivers. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 14:05:11 -05:00
Steve Wise	3bcf96e018	iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort Function rx_data(), which handles ingress CPL_RX_DATA messages, was always sending an RX_DATA_ACK with the goal of updating the credits. However, if the RDMA connection is moved out of FPDU mode abruptly, then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW has aborted the connection. These CPLs should not trigger RX_DATA_ACKS. If they do, HW can see a READ after DELETE of the DB_LE hash entry for the tid and post a LE_DB HashTblMemCrcError. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 14:01:38 -05:00
Steve Wise	c12a67fec8	iw_cxgb4: free EQ queue memory on last deref Commit `ad61a4c7a9` ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") introduced a bug where the RDMA QP EQ queue memory (and QIDs) are possibly freed before the underlying connection has been fully shutdown. The result being a possible DMA read issued by HW after the queue memory has been unmapped and freed. This results in possible WR corruption in the worst case, system bus errors if an IOMMU is in use, and SGE "bad WR" errors reported in the very least. The fix is to defer unmap/free of queue memory and QID resources until the QP struct has been fully dereferenced. To do this, the c4iw_ucontext must also be kept around until the last QP that references it is fully freed. In addition, since the last QP deref can happen in an IRQ disabled context, we need a new workqueue thread to do the final unmap/free of the EQ queue memory. Fixes: `ad61a4c7a9` ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 14:01:38 -05:00
Steve Wise	4fe7c2962e	iw_cxgb4: refactor sq/rq drain logic With the addition of the IB/Core drain API, iw_cxgb4 supported drain by watching the CQs when the QP was out of RTS and signalling "drain complete" when the last CQE is polled. This, however, doesn't fully support the drain semantics. Namely, the drain logic is supposed to signal "drain complete" only when the application has _processed_ the last CQE, not just removed them from the CQ. Thus a small timing hole exists that can cause touch after free type bugs in applications using the drain API (nvmf, iSER, for example). So iw_cxgb4 needs a better solution. The iWARP Verbs spec mandates that "_at some point_ after the QP is moved to ERROR", the iWARP driver MUST synchronously fail post_send and post_recv calls. iw_cxgb4 was currently not allowing any posts once the QP is in ERROR. This was in part due to the fact that the HW queues for the QP in ERROR state are disabled at this point, so there wasn't much else to do but fail the post operation synchronously. This restriction is what drove the first drain implementation in iw_cxgb4 that has the above mentioned flaw. This patch changes iw_cxgb4 to allow post_send and post_recv WRs after the QP is moved to ERROR state for kernel mode users, thus still adhering to the Verbs spec for user mode users, but allowing flush WRs for kernel users. Since the HW queues are disabled, we just synthesize a CQE for this post, queue it to the SW CQ, and then call the CQ event handler. This enables proper drain operations for the various storage applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-01-10 14:01:38 -05:00
Parav Pandit	43579b5f2c	IB/core: added support to use rdma cgroup controller Added support APIs for IB core to register/unregister every IB/RDMA device with rdma cgroup for tracking rdma resources. IB core registers with rdma cgroup controller. Added support APIs for uverbs layer to make use of rdma controller. Added uverbs layer to perform resource charge/uncharge functionality. Added support during query_device uverb operation to ensure it returns resource limits by honoring rdma cgroup configured limits. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2017-01-10 11:14:27 -05:00
David S. Miller	bda65b4255	mlx5 4K UAR The following series of patches optimizes the usage of the UAR area which is contained within the BAR 0-1. Previous versions of the firmware and the driver assumed each system page contains a single UAR. This patch set will query the firmware for a new capability that if published, means that the firmware can support UARs of fixed 4K regardless of system page size. In the case of powerpc, where page size equals 64KB, this means we can utilize 16 UARs per system page. Since user space processes by default consume eight UARs per context this means that with this change a process will need a single system page to fulfill that requirement and in fact make use of more UARs which is better in terms of performance. In addition to optimizing user-space processes, we introduce an allocator that can be used by kernel consumers to allocate blue flame registers (which are areas within a UAR that are used to write doorbells). This provides further optimization on using the UAR area since the Ethernet driver makes use of a single blue flame register per system page and now it will use two blue flame registers per 4K. The series also makes changes to naming conventions and now the terms used in the driver code match the terms used in the PRM (programmers reference manual). Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame register). In order to support compatibility between different versions of library/driver/firmware, the library has now means to notify the kernel driver that it supports the new scheme and the kernel can notify the library if it supports this extension. So mixed versions of libraries can run concurrently without any issues. Thanks, Eli and Matan -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYc9kSAAoJEEg/ir3gV/o+a0EH/jEGiopH7CHc4T4nXT1I4kQa TicrkMNV3Sr9MBWwn8TLOyx+Fi1dex4cumrJI/BNVjC6h/nS6JHbslYoZxTkX9lT L0vRsHJBVr/PODqimIGNnlJFBPhNJSGiHG4JHlJHlpvcGNahitN3gXmUjcRNju+V ExnvgwWzAXM0qg1qWf5A/3HmqbtYES1rJXQUsimtc2QAif/SIayBD4fEA8x5zNBA i0p8xcDrzUqmeblkpnsJA3w40s1rsuqvJnvLPDpbpKENtHfw1UFZ2987P7LvOrIv NF/mZBkStC0gOZX6dLEAdoZXL1gTsJX19hTkUMfYH4BHqHARa2/oCS3wcCf1Giw= =C+cp -----END PGP SIGNATURE----- Merge tag 'mlx5-4kuar-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5 4K UAR The following series of patches optimizes the usage of the UAR area which is contained within the BAR 0-1. Previous versions of the firmware and the driver assumed each system page contains a single UAR. This patch set will query the firmware for a new capability that if published, means that the firmware can support UARs of fixed 4K regardless of system page size. In the case of powerpc, where page size equals 64KB, this means we can utilize 16 UARs per system page. Since user space processes by default consume eight UARs per context this means that with this change a process will need a single system page to fulfill that requirement and in fact make use of more UARs which is better in terms of performance. In addition to optimizing user-space processes, we introduce an allocator that can be used by kernel consumers to allocate blue flame registers (which are areas within a UAR that are used to write doorbells). This provides further optimization on using the UAR area since the Ethernet driver makes use of a single blue flame register per system page and now it will use two blue flame registers per 4K. The series also makes changes to naming conventions and now the terms used in the driver code match the terms used in the PRM (programmers reference manual). Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame register). In order to support compatibility between different versions of library/driver/firmware, the library has now means to notify the kernel driver that it supports the new scheme and the kernel can notify the library if it supports this extension. So mixed versions of libraries can run concurrently without any issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-09 17:09:31 -05:00
Eli Cohen	30aa60b3bd	IB/mlx5: Support 4k UAR for libmlx5 Add fields to structs to convey to kernel an indication whether the library supports multi UARs per page and return to the library the size of a UAR based on the queried value. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-09 20:25:09 +02:00
Eli Cohen	b037c29a80	IB/mlx5: Allow future extension of libmlx5 input data Current check requests that new fields in struct mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero. This was introduced so new libraries passing additional information to the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified by old kernels that do not support their request by failing the operation. This schecme is problematic since it requires libmlx5 to issue the requests with descending input size for struct mlx5_ib_alloc_ucontext_req_v2. To avoid this, we require that new features that will obey the following rules: If the feature requires one or more fields in the response and the at least one of the fields can be encoded such that a zero value means the kernel ignored the request then this field will provide the indication to the library. If no response is required or if zero is a valid response, a new field should be added that indicates to the library whether its request was processed. Fixes: `b368d7cb8c` ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-09 20:25:09 +02:00
Eli Cohen	5fe9dec0d0	IB/mlx5: Use blue flame register allocator in mlx5_ib Make use of the blue flame registers allocator at mlx5_ib. Since blue flame was not really supported we remove all the code that is related to blue flame and we let all consumers to use the same blue flame register. Once blue flame is supported we will add the code. As part of this patch we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only used by mlx5_ib. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-09 20:25:08 +02:00
Eli Cohen	0b80c14f00	IB/mlx5: Fix retrieval of index to first hi class bfreg First the function retrieving the index of the first hi latency class blue flame register. High latency class bfregs are located right above medium latency class bfregs. Fixes: `c1be5232d2` ('IB/mlx5: Fix micro UAR allocator') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-08 11:21:26 +02:00
Eli Cohen	2f5ff26478	mlx5: Fix naming convention with respect to UARs This establishes a solid naming conventions for UARs. A UAR (User Access Region) can have size identical to a system page or can be fixed 4KB depending on a value queried by firmware. Each UAR always has 4 blue flame register which are used to post doorbell to send queue. In addition, a UAR has section used for posting doorbells to CQs or EQs. In this patch we change names to reflect this conventions. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-08 11:21:26 +02:00
Eli Cohen	f4044dac63	IB/mlx5: Fix error handling order in create_kernel_qp Make sure order of cleanup is exactly the opposite of initialization. Fixes: `9603b61de1` ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-08 11:21:26 +02:00
Eli Cohen	de8d6e02ef	IB/mlx5: Fix kernel to user leak prevention logic The logic was broken as it failed to update the response length for architectures with PAGE_SIZE larger than 4kB. As a result further extension of the ucontext response struct would fail. Fixes: `d69e3bcf79` ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-08 11:21:26 +02:00
David S. Miller	76eb75be79	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-01-05 11:03:07 -05:00
Artemy Kovalyov	aa8e08d2f5	IB/mlx5: Improve MR check Add "type" field to mlx5_core MKEY struct. Check whether page fault happens on MKEY corresponding to MR. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Artemy Kovalyov	17d2f88f92	IB/mlx5: Add ODP atomics support Handle ODP atomic operations. When initiator of RDMA atomic operation use ODP MR to provide source data handle pagefault properly. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Artemy Kovalyov	d9aaed8387	{net,IB}/mlx5: Refactor page fault handling * Update page fault event according to last specification. * Separate code path for page fault EQ, completion EQ and async EQ. * Move page fault handling work queue from mlx5_ib static variable into mlx5_core page fault EQ. * Allocate memory to store ODP event dynamically as the events arrive, since in atomic context - use mempool. * Make mlx5_ib page fault handler run in process context. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Artemy Kovalyov	7d0cc6edcc	IB/mlx5: Add MR cache for large UMR regions In this change we turn mlx5_ib_update_mtt() into generic mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions supporting both atomic and process contexts and not limited by number of modified entries. Using this function we increase preallocated MRs up to 16GB. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Artemy Kovalyov	c438fde1c2	IB/mlx5: Add support for big MRs Make use of extended UMR translation offset. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Artemy Kovalyov	3161625589	IB/mlx5: Refactor UMR post send format * Update struct mlx5_wqe_umr_ctrl_seg. * Currenlty UMR send_flags aim only certain use cases: enabled/disable cached MR, modifying XLT for ODP. By making flags independent make UMR more flexible allowing arbitrary manipulations. * Since different UMR formats have different entry sizes UMR request should receive exact size of translation table update instead of number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr and update relevant code accordingly. * Add support of length64 bit. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Binoy Jayan	d5ea2df9ce	IB/mlx5: Add helper mlx5_ib_post_send_wait Clean up the following common code (to post a list of work requests to the send queue of the specified QP) at various places and add a helper function 'mlx5_ib_post_send_wait' to implement the same. - Initialize 'mlx5_ib_umr_context' on stack - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe - Acquire the semaphore - call ib_post_send with a single ib_send_wr - wait_for_completion() - Check for umr_context.status - Release the semaphore Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Leon Romanovsky	9f885201f2	IB/mlx5: Reorder code in query device command The order of features exposed by private mlx5-abi.h file is CQE zipping, packet pacing and multi-packet WQE. The internal order implemented in mlx5_ib_query_device() is multi-packet WQE, CQE zipping and packet pacing. Such difference hurts code readability, so let's sync, while mlx5-abi.h (exposed to userspace) is the primary order. This commit doesn't change any functionality. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-02 15:51:20 -05:00
Jack Morgenstein	10b1c04e92	net/mlx4_core: Fix raw qp flow steering rules under SRIOV Demoting simple flow steering rule priority (for DPDK) was achieved by wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper function for the PF and all VFs. In function mlx4_ib_create_flow(), this change caused the main rule creation for the PF to be wrapped, while it left the associated tunnel steering rule creation unwrapped for the PF. This mismatch caused rule deletion failures in mlx4_ib_destroy_flow() for the PF when the detach wrapper function did not find the associated tunnel-steering rule (since creation of that rule for the PF did not go through the wrapper function). Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native" (so that the PF invocation does not go through the wrapper), and perform the required priority demotion for the PF in the mlx4_ib_create_flow() code path. Fixes: `48564135cb` ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-29 14:17:40 -05:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Linus Torvalds	296915912d	First round of -rc fixes for 4.10 kernel - Series of qedr fixes - Series of rxe fixes - One isolated i40iw fix - One isolated cma fix - One isolated cxgb4 fix -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYXAGvAAoJELgmozMOVy/dDukQAMMNarWp0U8KfNYRU5tyCBwd aIQC1gFT6GUCFys40Z6L84m1D3NpGR+vzVv3grVBeuge73b79zAOHXvVDwJCA+Jl QQLG3vZ13C3158sLDiK8zL+4Ob5OfOQ5nQ2spvDfJWpye9SD+pWFcrpqvK02ANRN kFHILk1gROBTNi46yBR5hjWOkw7Bua6XLsPxh6xoaDZ43NL0r0xgm43FTnj/19x3 0zpZYYKP+3C6U7678rqaog9zfXHvadghW5/WBJ/VgfKqEmH89ESx4J2MvbB8DxFD 1tWAOpr5TNY5jnh8mtUsceDjCzQivc/RWqAu05BspEwcavjSLFyRYr1epR0/4oAd PqLSmfORmhpJ8+5Kmn+chtXo3TT4SYGHIzSUbgbEV/ClwX/7UW+w8mfQZ3buUBq/ cQp/oRnJcsrQIEDFO3AH7P+6Sxy6t3zbSl5oKBUOI1u4RFmC7YBPqo9fQu2Z2mGk 3+AWQaPr7qgEcFzXBgLzvd4LhTYKsvmiNwrcXi9KjjwQjNEVg15qqF2YtmxEUgi9 kh3IOcGan3iSblhV/WLrxcOjlPQrPpBOVnTPhUskFtlsrD+032OxeOBpVoU3nCUt MjTYWoNTYdw4wHz0w373o0uR4+4nl4a5OmO4Fh6Drmg5hm4Bl9BWy0Kziu93Z1Ay Z2utZVWLWhBzn8yJujUz =NW9g -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "First round of -rc fixes for 4.10 kernel: - a series of qedr fixes - a series of rxe fixes - one i40iw fix - one cma fix - one cxgb4 fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/rxe: Don't check for null ptr in send() IB/rxe: Drop future atomic/read packets rather than retrying IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends qedr: Always notify the verb consumer of flushed CQEs qedr: clear the vendor error field in the work completion qedr: post_send/recv according to QP state qedr: ignore inline flag in read verbs qedr: modify QP state to error when destroying it qedr: return correct value on modify qp qedr: return error if destroy CQ failed qedr: configure the number of CQEs on CQ creation i40iw: Set 128B as the only supported RQ WQE size IB/cma: Fix a race condition in iboe_addr_get_sgid() IB/rxe: Fix a memory leak in rxe_qp_cleanup() iw_cxgb4: set correct FetchBurstMax for QPs	2016-12-23 10:38:48 -08:00
Andrew Boyer	5cc8fabc5e	IB/rxe: Don't check for null ptr in send() pkt->qp was already dereferenced earlier in the function. Fixes Smatch complaint: drivers/infiniband/sw/rxe/rxe_net.c:458 send() warn: variable dereferenced before check 'pkt->qp' (see line 441) Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Andrew Boyer	cbf1f9a46c	IB/rxe: Drop future atomic/read packets rather than retrying If the completer is in the middle of a large read operation, one lost packet can cause havoc. Going to COMPST_ERROR_RETRY will cause the requester to resend the request. After that, any packet from the first attempt still in the receive queue will be interpreted as an error, restarting the error/retry sequence. The transfer will quickly exhaust its retries. This behavior is very noticeable when doing 512KB reads on a QEMU system configured with 1500B MTU. Also, a resent request here will prompt the responder on the other side to immediately start resending, but the resent packets will get stuck in the already-loaded receive queue and will never be processed. Rather than erroring out every time an unexpected future packet arrives, just drop it. Eventually the retry timer will send a duplicate request; the completer will be able to make progress since the queue will start relatively empty. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Andrew Boyer	37b3619394	IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	74c3875c3d	qedr: Always notify the verb consumer of flushed CQEs Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	27035a1b37	qedr: clear the vendor error field in the work completion We clear the vendor error field in the work completion so that if a work completion is erroneous the field won't confuse the caller. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	922d9a40d3	qedr: post_send/recv according to QP state Enable posting to SQ only in RTS, ERR and SQD QP state. Enable posting to RQ in ERR QP state. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	8b0cabc650	qedr: ignore inline flag in read verbs In the current implementation a read verb with IB_SEND_INLINE may be illegally configured. In this fix we ignore the inline bit in the case of a read verb. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	b4c2cc48aa	qedr: modify QP state to error when destroying it Current code didn't modify the QP state to error because it queried the QP state as a bitmap while it isn't. So the code never got executed. This patch fixes this and queries for each QP state respectively and not at once via a bitmask. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	d6ebbf29c3	qedr: return correct value on modify qp Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	a121135973	qedr: return error if destroy CQ failed Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Amrani, Ram	c7eb3bced7	qedr: configure the number of CQEs on CQ creation Configure ibcq->cqe when a CQ is created. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Chien Tin Tung	61f51b7b20	i40iw: Set 128B as the only supported RQ WQE size RQ WQE size other than 128B is not supported. Correct RQ size calculation to use 128B only. Since this breaks ABI, add additional code to provide compatibility with v4 user provider, libi40iw. Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-22 11:36:12 -05:00
Bart Van Assche	e259934d4d	IB/rxe: Fix a memory leak in rxe_qp_cleanup() A socket is associated with every QP by the rxe driver but sock_release() is never called. Add a call to sock_release() in rxe_qp_cleanup(). Fixes: commit 8700e3e7c48A5 ("Add Soft RoCE driver") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Moni Shoua <monis@mellanox.com> Cc: Kamal Heib <kamalh@mellanox.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-18 13:35:19 -05:00
Steve Wise	b414fa01c3	iw_cxgb4: set correct FetchBurstMax for QPs The current QP FetchBurstMax value is 256B, which is incorrect since a WR can exceed that value. The result being a partial WR fetched by hardware, and a fatal "bad WR" error posted by the SGE. So bump the FetchBurstMax to 512B. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-18 13:35:19 -05:00
Linus Torvalds	d3ea547853	rdma: fix buggy code that the compiler warns about Get rid of this warning: drivers/infiniband/sw/rdmavt/cq.c: In function ‘rvt_cq_exit’: drivers/infiniband/sw/rdmavt/cq.c:542:2: warning: ‘worker’ may be used uninitialized in this function [-Wmaybe-uninitialized] kthread_destroy_worker(worker); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ by fixing the function to actually work. Fixes: `6efaf10f16` ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker") Cc: Petr Mladek <pmladek@suse.com> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-15 12:18:42 -08:00
Linus Torvalds	4d5b57e05a	Updates for 4.10 kernel merge window - Shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - Debug cleanups - New connection rejection helpers - SRP updates - Various misc fixes - New paravirt driver from vmware -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp 4uSHJqpf3YEYylxGMhk3 =QeGy -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...	2016-12-15 12:03:32 -08:00
Lorenzo Stoakes	5b56d49fc3	mm: add locked parameter to get_user_pages_remote() Patch series "mm: unexport __get_user_pages_unlocked()". This patch series continues the cleanup of get_user_pages() functions taking advantage of the fact we can now pass gup_flags as we please. It firstly adds an additional 'locked' parameter to get_user_pages_remote() to allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of this and no other existing higher level function would allow it to do so. Secondly existing callers of __get_user_pages_unlocked() are replaced with the appropriate higher-level replacement - get_user_pages_unlocked() if the current task and memory descriptor are referenced, or get_user_pages_remote() if other task/memory descriptors are referenced (having acquiring mmap_sem.) This patch (of 2): Add a int locked parameter to get_user_pages_remote() to allow VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked(). Taking into account the previous adjustments to get_user_pages*() functions allowing for the passing of gup_flags, we are now in a position where __get_user_pages_unlocked() need only be exported for his ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport __get_user_pages_unlocked() as well as allowing for future flexibility in the use of get_user_pages_remote(). [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change] Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-14 16:04:08 -08:00
Doug Ledford	6f94ba2079	Merge branch 'vmw_pvrdma' into merge-test	2016-12-14 14:56:21 -05:00
Adit Ranadive	29c8d9eba5	IB: Add vmw_pvrdma driver This patch series adds a driver for a paravirtual RDMA device. The device is developed for VMware's Virtual Machines and allows existing RDMA applications to continue to use existing Verbs API when deployed in VMs on ESXi. We recently did a presentation in the OFA Workshop [1] regarding this device. Description and RDMA Support ============================ The virtual device is exposed as a dual function PCIe device. One part is a virtual network device (VMXNet3) which provides networking properties like MAC, IP addresses to the RDMA part of the device. The networking properties are used to register GIDs required by RDMA applications to communicate. These patches add support and the all required infrastructure for letting applications use such a device. We support the mandatory Verbs API as well as the base memory management extensions (Local Inv, Send with Inv and Fast Register Work Requests). We currently support both Reliable Connected and Unreliable Datagram QPs but do not support Shared Receive Queues (SRQs). Also, we support the following types of Work Requests: o Send/Receive (with or without Immediate Data) o RDMA Write (with or without Immediate Data) o RDMA Read o Local Invalidate o Send with Invalidate o Fast Register Work Requests This version only adds support for version 1 of RoCE. We will add RoCEv2 support in a future patch. We do support registration of both MAC-based and IP-based GIDs. I have also created a git tree for our user-level driver [2]. Testing ======= We have tested this internally for various types of Guest OS - Red Hat, Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12 using backported versions of this driver. The tests included several runs of the performance tests (included with OFED), Intel MPI PingPong benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough to test the backported version of the driver internally on their hardware using a VMware provided ESX build. I have also applied and tested this with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch series should be applied all together. I split out the commits so that it may be easier to review. PVRDMA Resources ================ [1] OFA Workshop Presentation - https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf [2] Libpvrdma User-level library - http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Reviewed-by: George Zhang <georgezhang@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 14:55:10 -05:00
Doug Ledford	9032ad78bb	Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-test	2016-12-14 14:44:47 -05:00
Doug Ledford	86ef0beaa0	Merge branch 'mlx' into merge-test	2016-12-14 14:44:25 -05:00
Doug Ledford	253f8b22e0	Merge branch 'hfi1' into merge-test	2016-12-14 14:44:08 -05:00
Doug Ledford	884fa4f304	Merge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test	2016-12-14 14:43:14 -05:00
Pan Bian	46d0703fac	IB/mlx4: fix improper return value If uhw->inlen is non-zero, the value of variable err is 0 if the copy succeeds. Then, if kzalloc() or kmalloc() returns a NULL pointer, it will return 0 to the callers. As a result, the callers cannot detect the errors. This patch fixes the bug, assign "-ENOMEM" to err before the NULL pointer checks, and remove the initialization of err at the beginning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189031 Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 14:35:23 -05:00
Pan Bian	5b4c9cd7e4	IB/ocrdma: fix bad initialization In function ocrdma_mbx_create_ah_tbl(), returns the value of status on errors. However, because status is initialized with 0, 0 will be returned even if on error paths. This patch initialize status with "-ENOMEM". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188831 Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 14:33:48 -05:00
Zhouyi Zhou	6a3a1056d6	infiniband: nes: return value of skb_linearize should be handled Return value of skb_linearize should be handled in function nes_netdev_start_xmit. Compiled in x86_64 Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 14:26:49 -05:00
Sebastian Ott	17069d32a3	IB/core: fix unmap_sg argument __ib_umem_release calls dma_unmap_sg with a different number of sg_entries than ib_umem_get uses for dma_map_sg. This might cause trouble for implementations that merge sglist entries and results in the following dma debug complaint: DMA-API: device driver frees DMA sg list with different entry count [map count=2] [unmap count=1] Fix it by using the correct value. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 14:21:26 -05:00
Souptick Joarder	7ceb740c54	IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc In mthca_create_ah(), pci_pool_alloc() followed by memset will be replaced by pci_pool_zalloc() Signed-off-by: Souptick joarder <jrdr.linux@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:58:39 -05:00
Bart Van Assche	1974ab9d9d	mlx5, calc_sq_size(): Make a debug message more informative Make it clear that qp->sq.wqe_cnt is not the number of WQEs. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Eli Cohen <eli@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:45:38 -05:00
Bart Van Assche	3d6bdf1625	mlx5: Remove a set-but-not-used variable This has been detected by building the mlx5 driver with W=1. Fixes: `1a412fb1ca` ('net/mlx5: Fixes: `1a412fb1ca` (IB/mlx5: Modify QP commands via mlx5 ifc') Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Eli Cohen <eli@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:45:10 -05:00
Bart Van Assche	626bc02d4d	mlx5: Use { } instead of { 0 } to init struct Detected by sparse. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:42:32 -05:00
Bart Van Assche	4fa354c9db	IB/srp: Make writing the add_target sysfs attr interruptible Avoid that shutdown of srp_daemon is delayed if add_target_mutex is held by another process. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:31:47 -05:00
Bart Van Assche	290081b453	IB/srp: Make mapping failures easier to debug Make it easier to figure out what is going on if memory mapping fails because more memory regions than mr_per_cmd are needed. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:31:37 -05:00
Bart Van Assche	3787d9908c	IB/srp: Make login failures easier to debug If login fails because memory region allocation failed it can be hard to figure out what happened. Make it easier to figure out why login failed by logging a message if ib_alloc_mr() fails. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:31:37 -05:00
Bart Van Assche	042dd765bd	IB/srp: Introduce a local variable in srp_add_one() This patch makes the srp_add_one() code more compact and does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:31:37 -05:00
Bart Van Assche	1a1faf7a8a	IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build Avoid that the kernel build fails as follows if dynamic debug support is disabled: drivers/infiniband/ulp/srp/ib_srp.c:2272:3: error: implicit declaration of function 'DEFINE_DYNAMIC_DEBUG_METADATA' drivers/infiniband/ulp/srp/ib_srp.c:2272:33: error: 'ddm' undeclared (first use in this function) drivers/infiniband/ulp/srp/ib_srp.c:2275:39: error: '_DPRINTK_FLAGS_PRINT' undeclared (first use in this function) Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:31:37 -05:00
Bart Van Assche	d3a2418ee3	IB/multicast: Check ib_find_pkey() return value This patch avoids that Coverity complains about not checking the ib_find_pkey() return value. Fixes: commit `547af76521` ("IB/multicast: Report errors on multicast groups if P_key changes") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:27:34 -05:00
Bart Van Assche	11b642b84e	IPoIB: Avoid reading an uninitialized member variable This patch avoids that Coverity reports the following: Using uninitialized value port_attr.state when calling printk Fixes: commit `94232d9ce8` ("IPoIB: Start multicast join process only on active ports") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:27:34 -05:00
Bart Van Assche	2fe2f378dd	IB/mad: Fix an array index check The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS (80) elements. Hence compare the array index with that value instead of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity reports the following: Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127). Fixes: commit `b7ab0b19a8` ("IB/mad: Verify mgmt class in received MADs") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: <stable@vger.kernel.org> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:27:34 -05:00
Bart Van Assche	b42dde478b	IB/mlx4: Rework special QP creation error path The special QP creation error path relies on offset_of(struct mlx4_ib_sqp, qp) == 0. Remove this assumption because that makes the QP creation code easier to understand. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 13:01:11 -05:00
Bart Van Assche	0d38c240f9	IB/srpt: Report login failures only once Report the following message only once if no ACL has been configured yet for an initiator port: "Rejected login because no ACL has been configured yet for initiator %s.\n" Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagig@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:58:30 -05:00
Julia Lawall	5f4c7e4eb5	IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a valid pointer, never NULL. The same is true of get_qp_res_chunk, which just returns the result of calling usnic_ib_qp_grp_get_chunk. Simplify IS_ERR_OR_NULL to IS_ERR in both cases. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression t,e; @@ t = $usnic_ib_qp_grp_get_chunk(...)\\|get_qp_res_chunk(...)$ ... when != t=e - IS_ERR_OR_NULL(t) + IS_ERR(t) @@ expression t,e,e1; @@ t = $usnic_ib_qp_grp_get_chunk(...)\\|get_qp_res_chunk(...)$ ... when != t=e ?- t ? PTR_ERR(t) : e1 + PTR_ERR(t) ... when any // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:57:54 -05:00
Hans Westgaard Ry	9315bc9a13	IB/core: Issue DREQ when receiving REQ/REP for stale QP from "InfiBand Architecture Specifications Volume 1": A QP is said to have a stale connection when only one side has connection information. A stale connection may result if the remote CM had dropped the connection and sent a DREQ but the DREQ was never received by the local CM. Alternatively the remote CM may have lost all record of past connections because its node crashed and rebooted, while the local CM did not become aware of the remote node's reboot and therefore did not clean up stale connections. and: A local CM may receive a REQ/REP for a stale connection. It shall abort the connection issuing REJ to the REQ/REP. It shall then issue DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP. This patch solves a problem with reuse of QPN. Current codebase, that is IPoIB, relies on a REAP-mechanism to do cleanup of the structures in CM. A problem with this is the timeconstants governing this mechanism; they are up to 768 seconds and the interface may look inresponsive in that period. Issuing a DREQ (and receiving a DREP) does the necessary cleanup and the interface comes up. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:56:24 -05:00
Philippe Reynes	24dc08c3c9	IB/nes: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:52:25 -05:00
Alexey Khoroshilov	def4a6ffc9	IB/isert: do not ignore errors in dma_map_single() There are several places, where errors in dma_map_single() are ignored. The patch fixes them. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:51:31 -05:00
Jim Foraker	22dccc5454	IB/rdmavt: Only put mmap_info ref if it exists rvt_create_qp() creates qp->ip only when a qp creation request comes from userspace (udata is not NULL). If we exceed the number of available queue pairs however, the error path always attempts to put a kref to this structure. If the requestor is inside the kernel, this leads to a crash. We fix this by checking that qp->ip is not NULL before caling kref_put(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:16:11 -05:00
Petr Mladek	f5eabf5e51	IB/rdmavt: Handle the kthread worker using the new API Use the new API to create and destroy the cq kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. In addition, the *on_cpu() variant binds the kthread to the given cpu and the related memory node. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior. Note that we must use the on_cpu() variant because the function starts the kthread and it must bind it to the right CPU before waking. The numa node is associated for given CPU as well. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:16:11 -05:00
Petr Mladek	6efaf10f16	IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker The memory barrier is not enough to protect queuing works into a destroyed cq kthread. Just imagine the following situation: CPU1 CPU2 rvt_cq_enter() worker = cq->rdi->worker; rvt_cq_exit() rdi->worker = NULL; smp_wmb(); kthread_flush_worker(worker); kthread_stop(worker->task); kfree(worker); // nothing queued yet => // nothing flushed and // happily stopped and freed if (likely(worker)) { // true => read before CPU2 acted cq->notify = RVT_CQ_NONE; cq->triggered++; kthread_queue_work(worker, &cq->comptask); BANG: worker has been flushed/stopped/freed in the meantime. This patch solves this by protecting the critical sections by rdi->n_cqs_lock. It seems that this lock is not much contended and looks reasonable for this purpose. One catch is that rvt_cq_enter() might be called from IRQ context. Therefore we must always take the lock with IRQs disabled to avoid a possible deadlock. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:16:11 -05:00
Arnd Bergmann	14ab8896f5	IB/mlx5: avoid bogus -Wmaybe-uninitialized warning We get a false-positive warning in linux-next for the mlx5 driver: infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’: infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here I had a trivial workaround for gcc-5 or higher, but that didn't work on gcc-4.9 unfortunately. The only way I found to avoid the warnings for gcc-4.9, short of initializing each of the arguments first was to change the calling conventions to separate the error code from the umem pointer. This avoids casting the error codes from one pointer to another incompatible pointer, and lets gcc figure out when that the data is actually valid whenever we return successfully. Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 12:12:53 -05:00
Steve Wise	1e38a366ee	ib_isert: log the connection reject message Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:38:28 -05:00
Steve Wise	97540bb90a	ib_iser: log the connection reject message Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:38:28 -05:00
Steve Wise	5f24410408	rdma_cm: add rdma_consumer_reject_data helper function rdma_consumer_reject_data() will return the private data pointer and length if any is available. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:38:28 -05:00
Steve Wise	5042a73d3e	rdma_cm: add rdma_is_consumer_reject() helper function Return true if the peer consumer application rejected the connection attempt. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:38:28 -05:00
Steve Wise	77a5db1315	rdma_cm: add rdma_reject_msg() helper function rdma_reject_msg() returns a pointer to a string message associated with the transport reject reason codes. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:38:28 -05:00
Wei Yongjun	aecb66b2b0	qedr: remove pointless NULL check in qedr_post_send() Remove pointless NULL check for 'wr' in qedr_post_send(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:18:17 -05:00
Wei Yongjun	aafec388a1	qedr: Use list_move_tail instead of list_del/list_add_tail Using list_move_tail() instead of list_del() + list_add_tail(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:18:17 -05:00
Wei Yongjun	181d80151f	qedr: Fix possible memory leak in qedr_create_qp() 'qp' is malloced in qedr_create_qp() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:18:17 -05:00
Colin Ian King	ea7ef2accd	qedr: return -EINVAL if pd is null and avoid null ptr dereference Currently, if pd is null then we hit a null pointer derference on accessing pd->pd_id. Instead of just printing an error message we should also return -EINVAL immediately. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:18:17 -05:00
Hal Rosenstock	9fa240bbfc	IB/mad: Eliminate redundant SM class version defines for OPA and rename class version define to indicate SM rather than SMP or SMI Signed-off-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-14 11:01:58 -05:00
Bodong Wang	7d29f349a4	IB/mlx5: Properly adjust rate limit on QP state transitions - Add MODIFY_QP_EX CMD to extend modify_qp. - Rate limit will be updated in the following state transactions: RTR2RTS, RTS2RTS. The limit will be removed when SQ is in RST and ERR state. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:51 -05:00
Bodong Wang	189aba99e7	IB/uverbs: Extend modify_qp and support packet pacing An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP attributes. User driver should choose to call the legacy/extended API based on input mask. IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position which supports legacy ib_uverbs_modify_qp. IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position which supports ib_uverbs_ex_modify_qp, the value of this mask should be updated if new mask is added later. Along with this change, rate_limit is supported by the extended command, user driver could use it to control packet packing. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:51 -05:00
Bodong Wang	528e5a1bd3	IB/core: Support rate limit for packet pacing Add new member rate_limit to ib_qp_attr which holds the packet pacing rate in kbps, 0 means unlimited. IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW QPs when changing QP state from RTR to RTS, RTS to RTS. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:50 -05:00
Bodong Wang	d949167d68	IB/mlx5: Report mlx5 packet pacing capabilities when querying device Enable mlx5 based hardware to report packet pacing capabilities from kernel to user space. Packet pacing allows to limit the rate to any number between the maximum and minimum, based on user settings. The capabilities are exposed to user space through query_device by uhw. The following capabilities are reported: 1. The maximum and minimum rate limit in kbps supported by packet pacing. 2. Bitmap showing which QP types are supported by packet pacing operation. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:50 -05:00
Or Gerlitz	ca5b91d631	IB/mlx5: Support RAW Ethernet when RoCE is disabled On some environments, such as certain SRIOV VF configurations, RoCE is not supported for mlx5 Ethernet ports. Currently, the driver will not open IB device on that port. This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET QPs) functionality to remain in place. For that end, enhance the relevant driver flows such that we do create a device instance in that case. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:49 -05:00
Or Gerlitz	45f95acd63	IB/mlx5: Rename RoCE related helpers to reflect being Eth ones This is a pre-step towards having mlx5 IB device also over Eth ports where RoCE is not supported. We change the roce enable/disable and roce_lag init/fini function names to have _eth instead of _roce. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:48 -05:00
Or Gerlitz	d012f5d6f8	IB/mlx5: Refactor registration to netdev notifier Refactor the netdev notifier registration into a small helper function. This is a pre-step towards having mlx5 IB device over an Ethernet port which doesn't support RoCE. Also, renamed the de-registration helper and the new helper as netdev notifier and not roce, to make it clear this is not only used with roce. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:48 -05:00
Maor Gottlieb	b216af408c	IB/mlx5: Use u64 for UMR length The fast_registration length is used to convey length for memory registrations through UMR which can be of any size up to 2^64. Change the length type to be u64. Fixes: `968e78dd96` ('IB/mlx5: Enhance UMR support to allow partial page table update') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:47 -05:00
Eli Cohen	afd02cd3a9	IB/mlx5: Avoid system crash when enabling many VFs When enabling many VFs, the total amount of DMA mappings increase significantly. This causes DMA allocations to take a lot of time since they are serialized in the kernel. As a result the driver enters into fatal condition due to timeout and the system hangs. To recover from this we disable MR cache for VFs. PFs will still have a full cache and VFs cache can be manipulated as usual after driver load. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:47 -05:00
Maor Gottlieb	c73b7911de	IB/mlx5: Assign SRQ type earlier Move the SRQ type assignment to be before actually using it in create_srq_user() and in create_srq_kernel() functions. Fixes: `af1ba291c5` ('{net, IB}/mlx5: Refactor internal SRQ API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:46 -05:00
Jack Morgenstein	c482af646d	IB/mlx4: Fix out-of-range array index in destroy qp flow For non-special QPs, the port value becomes non-zero only at the RESET-to-INIT transition. If the QP has not undergone that transition, its port number value is still zero. If such a QP is destroyed before being moved out of the RESET state, subtracting one from the qp port number results in a negative value. Using that negative value as an index into the qp1_proxy array results in an out-of-bounds array reference. Fix this by testing that the QP type is one that uses qp1_proxy before using the port number. For special QPs of all types, the port number is specified at QP creation time. Fixes: `9433c18891` ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:46 -05:00
Moni Shoua	41c450fd8d	IB/mlx5: Make create/destroy_ah available to userspace Advertise that create_ah and destroy_ah verbs are accessible from uverbs interface. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:39:19 -05:00
Moni Shoua	5097e71f3e	IB/mlx5: Use kernel driver to help userspace create ah Resolving a MAC address for a given IP address in userspace is inefficient. This patch lets mlx5 user driver using the kernel driver to resolve the mac and get the answer in the private section of the response. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:38:49 -05:00
Moni Shoua	477864c8fc	IB/core: Let create_ah return extended response to user Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:38:27 -05:00
Moni Shoua	6ad279c5a2	IB/mlx5: Report that device has udata response in create_ah To make mlx5 user driver aware of whether kernel driver returns dmac in user data response add a new flag that will be returned back to user-space through alloc_ucontext. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:37:19 -05:00
Moni Shoua	c90ea9d8e5	IB/core: Change ib_resolve_eth_dmac to use it in create AH The function ib_resolve_eth_dmac() requires struct qp_attr * and qp_attr_mask as parameters while the function might be useful to resolve dmac for address handles. This patch changes the signature of the function so it can be used in the flow of creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:25 -05:00
Moses Reuben	2d1e697e9b	IB/mlx5: Add support to match inner packet fields Add support to match packet fields which are tunneled, i.e. support matching the header of the inner packet which is the result of or bit operation of the original header and the IB_FLOW_SPEC_INNER type. The combination of IB_FLOW_SPEC_INNER \| IB_FLOW_SPEC_VXLAN_TUNNEL is not needed to be checked, because the IB core has this check already. Signed-off-by: Moses Reuben <mosesr@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:24 -05:00
Moses Reuben	fbf46860b1	IB/core: Introduce inner flow steering For a tunneled packet which contains external and internal headers, we refer to the external headers as "outer fields" and the internal headers as "inner fields". Example of a tunneled packet: { L2 \| L3 \| L4 \| tunnel header \| L2 \| L3 \| l4 \| data } \| \| \| \| \| \| \| { outer fields }{ inner fields } This patch introduces a new flag for flow steering rules - IB_FLOW_SPEC_INNER - which specifies that the rule applies to the inner fields, rather than to the outer fields of the packet. Signed-off-by: Moses Reuben <mosesr@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:23 -05:00
Moses Reuben	ffb30d8f10	IB/mlx5: Support Vxlan tunneling specification Add support to receive specific Vxlan packet in ConnectX-4. Signed-off-by: Moses Reuben <mosesr@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:23 -05:00
Moses Reuben	0dbf3332b7	IB/core: Add flow spec tunneling support In order to support tunneling, that can be used by the QP, both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc). IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this functionality and match specific Vxlan packets. In similar to IPv6, we check overflow of the vni value by comparing with the maximum size. Signed-off-by: Moses Reuben <mosesr@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:21 -05:00
Bodong Wang	1cbe6fc86c	IB/mlx5: Add support for CQE compressing CQE compressing reduces PCI overhead by coalescing and compressing multiple CQEs into a single merged CQE. Successful compressing improves message rate especially for small packet traffic. CQE compressing is supported for all 64B CQE formats (with certain limitations) generated by RQ/Responder or by SQ/Requestor. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:20 -05:00
Bodong Wang	7e43a2a5ba	IB/mlx5: Report mlx5 CQE compression caps during query The capabilities include: - Max number of compressed and aggregated CQEs in a single session, while zero means unsupported. - For Responder, there are two formats of mini CQE: mini CQE with Rx hash and mini CQE with checksum. They're mutual exclusive. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:34:03 -05:00
Bodong Wang	191ded4a4d	IB/mlx5: Report mlx5 multi packet WQE caps during query The capabilities whether hardware support multi packet WQE or not is exposed to user space through query_device by uhw. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:33:25 -05:00
Yonatan Cohen	d680ebed91	IB/rxe: Increase max number of completions to 32k Increase limit of max CQE from 8K to 32K to allow demanding applications to work over SoftRoCE with same configuration as most RoCEv2 HW vendors have. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:33:24 -05:00
Eran Ben Elisha	bf08e884bf	IB/mlx4: Check if GRH is available before using it Before reading GRH attributes, need to make sure AH contains GRH, and in addition, initialize GID type. Fixes: `dbf727de74` ('IB/core: Use GID table in AH creation and dmac resolution') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:32:51 -05:00
Eran Ben Elisha	1f22e454df	IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is supported only if dmfs_ipoib bit is set. If it isn't set we want to ensure allocating NET_IF QPs fail. We do so by filling out the allocation bitmap. By thus, the NET_IF QPs allocating function won't find any free QP and will fail. Fixes: `c1c9850112` ('IB/mlx4: Add support for steerable IB UD QPs') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-13 13:29:46 -05:00
Henry Orosco	d6f7bbcc2e	i40iw: Reorganize structures to align with HW capabilities Some resources are incorrectly organized and at odds with HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS and statistics belong in a VSI. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:29 -05:00
Mustafa Ismail	0cc0d851cc	i40iw: Fix incorrect check for error In i40iw_ieq_handle_partial() the check for !status is incorrect. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:29 -05:00
Mustafa Ismail	6b0805c256	i40iw: Assign MSS only when it is a new MTU Currently we are changing the MSS regardless of whether there is a change or not in MTU. Fix to make the assignment of MSS dependent on an MTU change. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:28 -05:00
Shiraz Saleem	d627b50631	i40iw: Fix race condition in terminate timer's handler Add a QP reference when terminate timer is started to ensure the destroy QP doesn't race ahead to free the QP while it is being referenced in the terminate timer's handler. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:28 -05:00
Mustafa Ismail	fd90d4d4c2	i40iw: Fix memory leak in CQP destroy when in reset On a device close, the control QP (CQP) is destroyed by calling cqp_destroy which destroys the CQP and frees its SD buffer memory. However, if the reset flag is true, cqp_destroy is never called and leads to a memory leak on SD buffer memory. Fix this by always calling cqp_destroy, on device close, regardless of reset. The exception to this when CQP create fails. In this case, the SD buffer memory is already freed on an error check and there is no need to call cqp_destroy. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:27 -05:00
Shiraz Saleem	1cda28bb5b	i40iw: Fix QP flush to not hang on empty queues or failure When flush QP and there are no pending work requests, signal completion to unblock i40iw_drain_sq and i40iw_drain_rq which are waiting on completion for iwqp->sq_drained and iwqp->sq_drained respectively. Also, signal completion if flush QP fails to prevent the drain SQ or RQ from being blocked indefintely. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:27 -05:00
Mustafa Ismail	f4a87ca12a	i40iw: Fix double free of QP A QP can be double freed if i40iw_cm_disconn() is called while it is currently being freed by i40iw_rem_ref(). The fix in i40iw_cm_disconn() will first check if the QP is already freed before making another request for the QP to be freed. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:20:26 -05:00
Shiraz Saleem	91c42b72f8	i40iw: Use correct src address in memcpy to rdma stats counters hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats(). Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats data into rdma_hw_stats counters. Fixes: `b40f4757da` ("IB/core: Make device counter infrastructure dynamic") Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:19:02 -05:00
Thomas Huth	5e58917122	i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are apparently bad - they are using the logical "&&" operation which does not make sense here. It should have been a bitwise "&" instead. Since the macros seem to be completely unused, let's simply remove them so that nobody accidentially uses them in the future. And while we're at it, also remove the unused macro I40IW_CREATE_STAG. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 17:13:02 -05:00
Andrew Boyer	37f69f43fb	IB/rxe: Hold refs when running tasklets It might be possible for all of a QP's references to be dropped while one of that QP's tasklets is running. For example, the completer might run during QP destroy. If qp->valid is false, it will drop all of the packets on the resp_pkts list, potentially removing the last reference. Then it tries to advance the SQ consumer pointer. If the SQ's buffer has already been destroyed, the system will panic. To be safe, hold a reference on the QP for the duration of each tasklet. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:34:22 -05:00
Andrew Boyer	07bf9627d5	IB/rxe: Wait for tasklets to finish before tearing down QP The system may crash when a malformed request is received and the error is detected by the responder. NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000 NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip> The responder generates a receive error on node B since the incoming SEND is oversized. If the client tears down the QP before the responder or the completer finish running, a page fault may occur. The fix makes the destroy operation spin until the tasks complete, which appears to be original intent of the design. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	5407f53012	IB/rxe: Fix ref leak in duplicate_request() A ref was added after the call to skb_clone(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	5b9ea16c54	IB/rxe: Fix ref leak in rxe_create_qp() The udata->inlen error path needs to clean up the ref added by rxe_alloc(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	accacb8f51	IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS Peek at the CQ after arming it so that we can return a hint. This avoids missed completions due to a race between posting CQEs and arming the CQ. For example, CM teardown waits on MAD requests to complete with ib_cq_poll_work(). Without this fix, the last completion might be left on the CQ, hanging the kthread doing the teardown. The console backtraces look like this: [ 4199.911284] Call Trace: [ 4199.911401] [<ffffffff9657fe95>] schedule+0x35/0x80 [ 4199.911556] [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0 [ 4199.911727] [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20 [ 4199.911891] [<ffffffff96580903>] wait_for_completion+0xb3/0x130 [ 4199.912067] [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70 [ 4199.912243] [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm] [ 4199.912422] [<ffffffff961615d5>] ? printk+0x57/0x73 [ 4199.912578] [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm] [ 4199.912759] [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm] [ 4199.912941] [<ffffffffc076f2cc>] 0xffffffffc076f2cc Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	d4fb59256a	IB/rxe: Add support for zero-byte operations The last_psn algorithm fails in the zero-byte case: it calculates first_psn = N, last_psn = N-1. This makes the operation unretryable since the res structure will fail the (first_psn <= psn <= last_psn) test in find_resource(). While here, use BTH_PSN_MASK to mask the calculated last_psn. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	d38eb801aa	IB/rxe: Unblock loopback by moving skb_out increment skb_out is decremented in rxe_skb_tx_dtor(), which is not called in the loopback() path. Move the increment to the send() path rather than rxe_xmit_packet(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	2a7a85487e	IB/rxe: Don't update the response PSN unless it's going forwards A client might post a read followed by a send. The partner receives and acknowledges both transactions, posting an RCQ entry for the send, but something goes wrong with the read ACK. When the client retries the read, the partner's responder processes the duplicate read but incorrectly resets the PSN to the value preceding the original send. When the duplicate send arrives, the responder cannot tell that it is a duplicate, so the responder generates a duplicate RCQ entry, confusing the client. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	dd753d8743	IB/rxe: Advance the consumer pointer before posting the CQE A simple userspace application might poll the CQ, find a completion, and then attempt to post a new WQE to the SQ. A spurious error can occur if the userspace application detects a full SQ in the instant before the kernel is able to advance the SQ consumer pointer. This is noticeable when using single-entry SQs with ibv_rc_pingpong if lots of kernel and userspace library debugging is enabled. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Andrew Boyer	6e9bb530ff	IB/rxe: Remove buffer used for printing IP address Avoid smashing the stack when an ICRC error occurs on an IPv6 network. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Dan Carpenter	95db9d05b7	IB/rxe: Remove unneeded cast in rxe_srq_from_attr() It makes me nervous when we cast pointer parameters. I would estimate that around 50% of the time, it indicates a bug. Here the cast is not needed becaue u32 and and unsigned int are the same thing. Removing the cast makes the code more robust and future proof in case any of the types change. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Wei Yongjun	4ac4707102	IB/rxe: Use DEFINE_SPINLOCK() for spinlock spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanosky <leonro@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Arnd Bergmann	a0fa72683e	IB/rxe: avoid putting a large struct rxe_qp on stack A race condition fix added an rxe_qp structure to the stack in order to be able to perform rollback in rxe_requester(), but the structure is large enough to trigger the warning for possible stack overflow: drivers/infiniband/sw/rxe/rxe_req.c: In function 'rxe_requester': drivers/infiniband/sw/rxe/rxe_req.c:757:1: error: the frame size of 2064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] This changes the rollback function to only save the psn inside the qp, which is the only field we access in the rollback_qp anyway. Fixes: `3050b99850` ("IB/rxe: Fix race condition between requester and completer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-12 16:31:45 -05:00
Bart Van Assche	66431b0e86	IB/hfi1: Define platform_config_table_limits once Defining static data structures in a header file is wrong because this causes the data structure to be instantiated once in every .c file it is included in. Hence move the definition of a static array from a header file into the only .c file in which it is used. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Bhumika Goyal	0fc859a657	IB/hfi1: constify mmu_notifier_ops structure Declare the structure mmu_notifier_ops as const as it is only stored in the ops field of a mmu_notifier structure. The ops field is of type const struct mmu_notifier_ops *, so mmu_notifier_ops structures having this property can be declared as const. Done using coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct mmu_notifier_ops i@p = {...}; @ok1@ identifier r1.i; position p; struct mmu_rb_handler handler; @@ handler.mn.ops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct mmu_notifier_ops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct mmu_notifier_ops i; File size before: text data bss dec hex filename 3566 72 16 3654 e46 drivers/infiniband/hw/hfi1/mmu_rb.o File size after: text data bss dec hex filename 3658 0 16 3674 e5a drivers/infiniband/hw/hfi1/mmu_rb.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mike Marciniszyn	5dc806052a	IB/rdmavt, IB/hfi1, IB/qib: Add inlines for mtu division Add rvt_div_round_up_mtu() and rvt_div_mtu() routines to do the computation based on the pmtu and the log_pmtu. Change divides in qib, hfi1 to use the new inlines. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mike Marciniszyn	c64607aa8a	IB/hfi1,IB/qib: use rvt swqe mr deref helper Convert to use new swqe put routine. Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Harish Chegondi	9d8145a604	IB/hfi1: Avoid credit return allocation for cpu-less NUMA nodes Do not allocate credit return base and DMA memory for NUMA nodes without CPUs. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mike Marciniszyn	0771da5a6e	IB/hfi1,IB/qib: Use new send completion helper Convert cq completion returns in both rdmavt drivers to use the new helper. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mike Marciniszyn	f2dc9cdce8	IB/rdmavt: Add a send completion helper This is for use by client drivers to drive send completions into a CQ. A new exported table allows for the mapping of ib_wr_opcode into a ib_wc_opcode. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Sebastian Sanchez	238b1862b4	IB/qib: Use standard refcount wrapper for QPs Use the standard driver wrapper for QP reference counters. This makes the code more maintainable. Fixes: Commit `4d6f85c3fa` ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Sebastian Sanchez	f84dfa26e6	IB/hfi1: Use reference count wrapper for MRs Some parts of the code don't use the standard driver wrapper for memory region reference counters. Use the standard driver wrapper throughout the code. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Sebastian Sanchez	b44980f879	IB/hfi1: Replace qp->refcount release code with standard driver wrapper Some parts of the code don't use the standard release wrapper rvt_put_qp() for decrementing and testing the refcount to then try to use a resource. Replace this code with the standard driver wrapper. Fixes: Commit `4d6f85c3fa` ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Dean Luick	0080167467	IB/hfi1: Preserve external device completed bit The driver should not change the external device request completed bit when not actually doing an external device request. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Sebastian Sanchez	9b86071c5e	IB/hfi1: Remove critical section gap in sc_buffer_alloc() In sc_buffer_alloc(), the sc->alloc_lock is released before calling sc_release_update(), and it is reacquired after the function call. This causes CPU lock trading. Fix it by not dropping the lock before calling sc_release_update(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mitko Haralanov	b777f154a0	IB/hfi1: Remove usage of qp->s_cur_sge The s_cur_sge field in the qp structure holds a pointer to the SGE of the currently processed WQE. It assumes the protection of the RVT_S_BUSY flag to prevent the changing of this field while the send engine is using it. This scheme works as long as there is only one instance of the send engine running at a time. Scaling of the send engine to multiple cores would break this assumption as there could be multiple instances of the send engine running on different CPUs. This opens a window where the QP's RVT_S_BUSY flag is not set but the send engine is still running. To prevent accidental changing of the s_cur_sge pointer, the QP's dependence on it is removed. The SGE pointer is now stored in the verbs_txreq, which is a per-packet data structure. This ensures that each individual packet has it's own pointer, which is setup while the RVT_S_BUSY flag is set. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Mike Marciniszyn	fcb29a6668	IB/rdmavt: Add trace of MR segs Add tracing of MR segment information. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Dean Luick	5213006ade	IB/hfi1: Add special setting for low power AOC Low power QSFP AOC cables require a different SerDes Tx PLL bandwidth setting than the default. The 8051 firmware does not know the details, so the driver needs to tell the firmware through a special setting. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Tadeusz Struk	6e40b59cfa	IB/hfi1: Remove definition of unused hfi1_affinity struct The struct hfi1_affinity is not used anymore. We use the struct hfi1_affinity_node and hfi1_affinity_node_list instead. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:29:42 -05:00
Don Hiatt	e922ae06e9	IB/hfi1: Remove dependence on qp->s_cur_size The qp->s_cur_size field assumes that the S_BUSY bit protects the field from modification after the slock is dropped. Scaling the send engine to multiple cores would break that assumption. Correct the issue by carrying the payload size in the txreq structure. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Jianxin Xiong	b7481944b0	IB/hfi1: Show statistics counters under IB stats interface Previously tools like hfi1stats had to access these counters through debugfs, which often caused permission issue for non-root users. It is not always acceptable to change the debugfs mounting permission due to security concerns. When exposed under the IB stats interface, the counters are universally readable by default. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Dennis Dalessandro	cf4c2f8c9d	IB/rdmavt: Fix trace hierarchy Split rdmavt traces into separate files to preserve the original hierarchy since only one trace sub system may now be defined per header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Jakub Pawlak	e730139b34	IB/hfi1: Disable header suppression for short packets For the received packets with payload less or equal 8DWS RxDmaDataFifoRdUncErr is not reported. There is set RHF.EccErr if the header is not suppressed. When such packet is detected on the send side the header suppression mechanism is disabled by clearing SH bit in the packet header. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Dean Luick	1b9e774933	IB/hfi1: Export 8051 memory and LCB registers via debugfs Both the 8051 memory and LCB register access require multiple steps and coordination with the driver. This cannot be safely done with resource0 alone. The 8051 memory is exported read-only. LCB is exported read/write. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Sebastian Sanchez	53e91d264b	IB/hfi1: Use non-atomic __test_and_clear_bit in hot path qp->r_aflags is already protected by qp->r_lock, therefore, test_and_clear_bit() doesn't need to be atomic. Profile shows this function call is costly. Change the test_and_clear_bit() call to use the non-atomic variant. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Dean Luick	d7cf4ccf6f	IB/hfi1: Fix dc8051 multiple qword memory reads When reading multiple dc8051 data memory locations at once, the read enabled field must be toggled at every address change. Do that by writing only the address first, then writing the enable. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Dean Luick	62aeddbf28	IB/hfi1: Read new EPROM format Add the ability to read the new EPROM format. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-11 15:25:13 -05:00
Wei Yongjun	15f7e3c21b	iw_cxgb4: Fix error return code in c4iw_rdev_open() Fix to return error code -ENOMEM from the __get_free_page() error handling case instead of 0, as done elsewhere in this function. Fixes: `05eb23893c` ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-08 10:03:36 -05:00
Henry Orosco	78300cf815	i40iw: Add request for reset on CQP timeout When CQP times out, send a request to LAN driver for reset. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:44 -05:00
Henry Orosco	1ef936b229	i40iw: Code cleanup, remove check of PBLE pages Remove check for zero 'pages' of unallocated pbles calculated in add_pble_pool(); as it can never be true. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:44 -05:00
Shiraz Saleem	bf69f494c3	i40iw: Correctly fail loopback connection if no listener Fail the connect and return the proper error code if a client is started with local IP address and there is no corresponding loopback listener. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:43 -05:00
Shiraz Saleem	fd4e906b2e	i40iw: Fill in IRD value when on connect request IRD is not populated on connect request and application is getting 0 for the value. Fill in the correct value on connect request. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:43 -05:00
Shiraz Saleem	7eb2bde7f3	i40iw: Set TOS field in IP header Set the TOS field in IP header with the value passed in from application. If there is mismatch between the remote client's TOS and listener, set the listener Tos to the higher of the two values. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:42 -05:00
Shiraz Saleem	e0b010da87	i40iw: Add NULL check for ibqp event handler Add NULL check for ibqp event handler before calling it to report QP events, as it might not initialized. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:42 -05:00
Mustafa Ismail	a05e15135b	i40iw: Replace list_for_each_entry macro with safe version Use list_for_each_entry_safe macro for the IPv6 addr list as IPv6 addresses can be deleted while going through the list. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:41 -05:00
Mustafa Ismail	e5e74b61b1	i40iw: Add IP addr handling on netdev events Disable listeners and disconnect all connected QPs on a netdev interface down event. On an interface up event, the listeners are re-enabled. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:41 -05:00
Mustafa Ismail	d59659340c	i40iw: Add missing cleanup on device close On i40iw device close, disconnect all connected QPs by moving them to error state; and block further QPs, PDs and CQs from being created. Additionally, make sure all resources have been freed before deallocating the ibdev as part of the device close. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:40 -05:00
Henry Orosco	f26c7c8339	i40iw: Add 2MB page support Add support to allow each independent memory region to be configured for 2MB page size in addition to 4KB page size. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:40 -05:00
Henry Orosco	b6a529da69	i40iw: Utilize physically mapped memory regions Add support to use physically mapped WQ's and MR's if determined that the OS registered user-memory for the region is physically contiguous. This feature will eliminate the need for unnecessarily setting up and using PBL's when not required. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:39 -05:00
Henry Orosco	d4165e3abd	i40iw: Fix incorrect assignment of SQ head The SQ head is incorrectly incremented when the number of WQEs required is greater than the number available. The fix is to use the I40IW_RING_MOV_HEAD_BY_COUNT macro. This checks for the SQ full condition first and only if SQ has room for the request, then we move the head appropriately. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:39 -05:00
Henry Orosco	78e945aace	i40iw: Remove variable flush_code and check to set qp->sq_flush The flush_code variable in i40iw_bld_terminate_hdr() is obsolete and the check to set qp->sq_flush is unreachable. Currently flush code is populated in setup_term_hdr() and both SQ and RQ are flushed always as part of the tear down flow. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:38 -05:00
Henry Orosco	dfd9c43b3c	i40iw: Remove check on return from device_init_pestat() Remove unnecessary check for return code from device_init_pestat() and change func to void. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:38 -05:00
Henry Orosco	5ebcb0ff54	i40iw: Use runtime check for IS_ENABLED(CONFIG_IPV6) To be consistent, use the runtime check instead of conditional compile. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:37 -05:00
Henry Orosco	e67791858e	i40iw: Use actual page size In i40iw_post_send, use the actual page size instead of encoded page size. This is to be consistent with the rest of the file. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:37 -05:00
Henry Orosco	1ad19f739f	i40iw: Remove NULL check for cm_node->iwdev It is not necessary to check cm_node->iwdev in i40iw_rem_ref_cm_node() as it can never be NULL after a successful call out of i40iw_make_cm_node(). Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:36 -05:00
Henry Orosco	799749979d	i40iw: Remove checks for more than 48 bytes inline data Remove dead code, which isn't executed because we return error if the data size is greater than 48 bytes. Inline data size greater than 48 bytes isn't supported and the maximum WQE size is 64 bytes. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:36 -05:00
Henry Orosco	85a87c90ee	i40iw: Query device accounts for internal rsrc Some resources are consumed internally and not available to the user. After hw is initialized, figure out how many resources are consumed and subtract those numbers from the initial max device capability in i40iw_query_device(). Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:35 -05:00
Henry Orosco	e7f9774af5	i40iw: Optimize inline data copy Use memcpy for inline data copy in sends and writes instead of byte by byte copy. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:35 -05:00
Henry Orosco	c38d7e0d08	i40iw: Fix for LAN handler removal If i40iw_open() fails for any reason, the LAN handler is not being removed. Modify i40iw_deinit_device() to always remove the handler. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:35 -05:00
Henry Orosco	01d0b36798	i40iw: Correct values for max_recv_sge, max_send_sge When creating QPs, ensure init_attr->cap.max_recv_sge is clipped to MAX_FRAG_COUNT. Expose MAX_FRAG_COUNT for max_recv_sge and max_send_sge in i40iw_query_qp(). Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:34 -05:00
Henry Orosco	e69c509361	i40iw: Use vector when creating CQs Assign each CEQ vector to a different CPU when possible, then when creating a CQ, use the vector for the CEQ id. This allows completion work to be distributed over multiple cores. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:34 -05:00
Henry Orosco	68583ca2a1	i40iw: Convert page_size to encoded value Passed in page_size was used as encoded value for writing the WQE and passed in value was usually 4096. This was working out since bit 0 was 0 and implies 4KB pages, but would not work for other page sizes. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-05 16:09:28 -05:00
Henry Orosco	7cba2cc13e	i40iw: Set MAX IRD, MAX ORD size to max supported value Set the MAX_IRD and MAX_ORD size negotiated to the maximum supported values. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 15:24:53 -05:00
Henry Orosco	7581e96ca4	i40iw: Remove workaround for pre-production errata Pre-production silicon incorrectly truncates 4 bytes of the MPA packet in UDP loopback case. Remove the workaround as it is no longer necessary. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 15:24:52 -05:00
Henry Orosco	d62d563424	i40iw: Enable message packing Remove the parameter to disable message packing and always enable it. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 15:24:52 -05:00
Henry Orosco	0fc2dc5889	i40iw: Add Quality of Service support Add support for QoS on QPs. Upon device initialization, a map is created from user priority to queue set handles. On QP creation, use ToS to look up the queue set handle for use with the QP. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 15:24:51 -05:00
Leon Romanovsky	4d4099584c	IB/hns: Move HNS RoCE user vendor structures This patch moves HNS vendor's specific structures to common UAPI folder which will be visible to all consumers. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:23:14 -05:00
Lijun Ou	3b5184be89	IB/hns: Fix the IB device name This patch mainly fix the name for IB device in order to match with libhns. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Shaobo Xu	afb6b092d6	IB/hns: Fix the bug when free cq If the resources of cq are freed while executing the user case, hardware can not been notified in hip06 SoC. Then hardware will hold on when it writes the cq buffer which has been released. In order to slove this problem, RoCE driver checks the CQE counter, and ensure that the outstanding CQE have been written. Then the cq buffer can be released. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	19a408efa0	IB/hns: Delete the redundant memset operation It deleted the redundant memset operation because the memory allocated by ib_alloc_device has been set zero. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	9daed0affa	IB/hns: Fix the bug of setting port mtu In hns_roce driver, we need not call iboe_get_mtu to reduce IB headers from effective IBoE MTU because hr_dev->caps.max_mtu has already been reduced. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Shaobo Xu	bfcc681bd0	IB/hns: Fix the bug when free mr If the resources of mr are freed while executing the user case, hardware can not been notified in hip06 SoC. Then hardware will hold on when it reads the payload by the PA which has been released. In order to slove this problem, RoCE driver creates 8 reserved loopback QPs to ensure zero wqe when free mr. When the mac address is reset, in order to avoid loopback failure, we need to release the reserved loopback QPs and recreate them. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	d838c481e0	IB/hns: Fix the bug when destroy qp If send queue is still working when qp is in reset state by modify qp in destroy qp function, hardware will hold on and don't work in hip06 SoC. In current codes, RoCE driver check hardware pointer of sending and hardware pointer of processing to ensure that hardware has processed all the dbs of this qp. But while the environment of wire becomes not good, The checking time maybe too long. In order to solve this problem, RoCE driver created a workqueue at probe function. If there is a timeout when checking the status of qp, driver initialize work entry and push it into the workqueue, Work function will finish checking and release the related resources later. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Salil	e84e40be8e	IB/hns: Fix for Checkpatch.pl comment style errors This patch correct the comment style errors caught by checkpatch.pl script Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Shaobo Xu	8254746978	IB/hns: Implement the add_gid/del_gid and optimize the GIDs management IB core has implemented the calculation of GIDs and the management of GID tables, and it is now responsible to supply query function for GIDs. So the calculation of GIDs and the management of GID tables in the RoCE driver is redundant. The patch is to implement the add_gid/del_gid to set the GIDs in the RoCE driver, remove the redundant calculation and management of GIDs in the notifier call of the net device and the inet, and update the query_gid. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	5e6ff78a22	IB/hns: Change qpn allocation to round-robin mode. When using CM to establish connections, qp number that was freed just now will be rejected by ib core. To fix these problem, We change qpn allocation to round-robin mode. We added the round-robin mode for allocating resources using bitmap. We use round-robin mode for qp number and non round-robing mode for other resources like cq number, pd number etc. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	dd783a212c	IB/hns: Modify query info named port_num when querying RC QP This patch modified the output query info qp_attr->port_num to fix bug in hip06. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	6b877c32bc	IB/hns: Modify the macro for the timeout when cmd process This patch modified the macro for the timeout when cmd is processing as follows: Before modification: enum { HNS_ROCE_CMD_TIME_CLASS_A = 10000, HNS_ROCE_CMD_TIME_CLASS_B = 10000, HNS_ROCE_CMD_TIME_CLASS_C = 10000, }; After modification: #define HNS_ROCE_CMD_TIMEOUT_MSECS 10000 Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Lijun Ou	1dec243ac0	IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp() In old code, the value of qp state from qpc was assigned for attr->qp_state. The value may be an error while attr_mask & IB_QP_STATE is zero. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Lijun Ou	80596c6717	IB/hns: Modify the condition of notifying hardware loopback This patch modified the condition of notifying hardware loopback. In hip06, RoCE Engine has several ports, one QP is related to one port. hardware only support loopback in the same port, not in the different ports. So, If QP related to port N, the dmac in the QP context equals the smac of the local port N or the loop_idc is 1, we should set loopback bit in QP context to notify hardware. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Lijun Ou	543bfe6c3c	IB/hns: add self loopback for CM This patch mainly adds self loopback support for CM. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Peter Chen <luck.chen@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	8d497eb0f3	IB/hns: Optimize the logic of allocating memory using APIs This patch modified the logic of allocating memory using APIs in hns RoCE driver. We used kcalloc instead of kmalloc_array and bitmap_zero. And When kcalloc failed, call vzalloc to alloc memory. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Ping Zhang <zhangping5@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Wei Hu (Xavier)	8f3e9f3ea0	IB/hns: Add code for refreshing CQ CI using TPTR This patch added the code for refreshing CQ CI using TPTR in hip06 SoC. We will send a doorbell to hardware for refreshing CQ CI when user succeed to poll a cqe. But it will be failed if the doorbell has been blocked. So hardware will read a special buffer called TPTR to get the lastest CI value when the cq is almost full. This patch support the special CI buffer as follows: a) Alloc the memory for TPTR in the hns_roce_tptr_init function and free it in hns_roce_tptr_free function, these two functions will be called in probe function and in the remove function. b) Add the code for computing offset(every cq need 2 bytes) and write the dma addr to every cq context to notice hardware in the function named hns_roce_v1_write_cqc. c) Add code for mapping TPTR buffer to user space in function named hns_roce_mmap. The mapping distinguish TPTR and UAR of user mode by vm_pgoff(0: UAR, 1: TPTR, others:invaild) in hip06. d) Alloc the code for refreshing CQ CI using TPTR in the function named hns_roce_v1_poll_cq. e) Add some variable definitions to the related structure. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Lijun Ou	9eefa953f4	IB/hns: Add the interface for querying QP1 In old code, It only added the interface for querying non-specific QP. This patch mainly adds an interface for querying QP1. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 14:20:42 -05:00
Leon Romanovsky	f73a1dbc45	infiniband: remove WARN that is not kernel bug On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote: > On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote: > > > > > > In ib_ucm_write function there is a wrong prefix: > > > > > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n", > > > > I did it intentionally to have the same errors for all flows. > > Lets actually use a good message too please? > > pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n", > > Jason >From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky <leonro@mellanox.com> Date: Mon, 21 Nov 2016 13:30:59 +0200 Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug WARNINGs mean kernel bugs, in this case, they are placed to mark programming errors and/or malicious attempts. BUG/WARNs that are not kernel bugs hinder automated testing efforts. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:17:07 -05:00
Leon Romanovsky	74226649f4	IB/ipoib: Remove and fix debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	93b80f2904	IB/isert: Remove and fix debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	907610bfdf	IB/rxe: Remove and fix debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	740c330ee6	IB/ocrdma: Remove and fix debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	02d93f8e6b	IB/usninc: Remove and fix debug prints after allocation failure This patch removes unneeded prints after allocation failure and moves one debug print into the appropriate place. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	870b285245	IB/mthca: Remove debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	2e65835a1b	IB/nes: Remove debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00
Leon Romanovsky	c40a83b978	IB/qib: Remove debug prints after allocation failure The prints after [k\|v][m\|z\|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-12-03 13:12:52 -05:00

... 3 4 5 6 7 ...

6513 Commits