linux

Author	SHA1	Message	Date
Erez Shitrit	0e5544d9bf	IB/ipoib: Remove IPOIB_MCAST_RUN bit After Doug Ledford's changes there is no need in that bit, it's semantic becomes subset of the IPOIB_FLAG_OPER_UP bit. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	1e85b806f9	IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's Whenever there is no path->ah to the destination, keep only defined number of skb's. Otherwise there are cases that the driver can keep infinite list of skb's. For example, when one device want to send unicast arp to the destination, and from some reason the SM doesn't respond, the driver currently keeps all the skb's. If that unicast arp traffic stopped, all these skb's are kept by the path object till the interface is down. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	2c01073095	IB/ipoib: Handle QP in SQE state As the result of a completion error the QP can moved to SQE state by the hardware. Since it's not the Error state, there are no flushes and hence the driver doesn't know about that. The fix creates a task that after completion with error which is not a flush tracks the QP state and if it is in SQE state moves it back to RTS. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	3fd0605caa	IB/ipoib: Update broadcast record values after each successful join request Update the cached broadcast record in the priv object after every new join of this broadcast domain group. These values are needed for the port configuration (MTU size) and to all the new multicast (non-broadcast) join requests initial parameters. For example, SM starts with 2K MTU for all the fabric, and after that it restarts (or handover to new SM) with new port configuration of 4K MTU. Without using the new values, the driver will keep its old configuration of 2K and will not apply the new configuration of 4K. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Erez Shitrit	a44878d100	IB/ipoib: Use one linear skb in RX flow The current code in the RX flow uses two sg entries for each incoming packet, the first one was for the IB headers and the second for the rest of the data, that causes two dma map/unmap and two allocations, and few more actions that were done at the data path. Use only one linear skb on each incoming packet, for the data (IB headers and payload), that reduces the packet processing in the data-path (only one skb, no frags, the first frag was not used anyway, less memory allocations) and the dma handling (only one dma map/unmap over each incoming packet instead of two map/unmap per each incoming packet). After commit `73d3fe6d1c` ("gro: fix aggregation for skb using frag_list") from Eric Dumazet, we will get full aggregation for large packets. When running bandwidth tests before and after the (over the card's numa node), using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before and after ~16Gbs on my setup (Mellanox's ConnectX3). Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	1c0453d64a	IB/ipoib: drop mcast_mutex usage We needed the mcast_mutex when we had to prevent the join completion callback from having the value it stored in mcast->mc overwritten by a delayed return from ib_sa_join_multicast. By storing the return of ib_sa_join_multicast in an intermediate variable, we prevent a delayed return from ib_sa_join_multicast overwriting the valid contents of mcast->mc, and we no longer need a mutex to force the join callback to run after the return of ib_sa_join_multicast. This allows us to do away with the mutex entirely and protect our critical sections with a just a spinlock instead. This is highly desirable as there were some places where we couldn't use a mutex because the code was not allowed to sleep, and so we were currently using a mix of mutex and spinlock to protect what we needed to protect. Now we only have a spin lock and the locking complexity is greatly reduced. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	d2fe937ce6	IB/ipoib: deserialize multicast joins Allow the ipoib layer to attempt to join all outstanding multicast groups at once. The ib_sa layer will serialize multiple attempts to join the same group, but will process attempts to join different groups in parallel. Take advantage of that. In order to make this happen, change the mcast_join_thread to loop through all needed joins, sending a join request for each one that we still need to join. There are a few special cases we handle though: 1) Don't attempt to join anything but the broadcast group until the join of the broadcast group has succeeded. 2) No longer restart the join task at the end of completion handling. If we completed successfully, we are done. The join task now needs kicked either by mcast_send or mcast_restart_task or mcast_start_thread, but should not need started anytime else except when scheduling a backoff attempt to rejoin. 3) No longer use separate join/completion routines for regular and sendonly joins, pass them all through the same routine and just do the right thing based on the SENDONLY join flag. 4) Only try to join a SENDONLY join twice, then drop the packets and quit trying. We leave the mcast group in the list so that if we get a new packet, all that we have to do is queue up the packet and restart the join task and it will automatically try to join twice and then either send or flush the queue again. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	69911416d8	IB/ipoib: fix MCAST_FLAG_BUSY usage Commit `a9c8ba5884` ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. And when we did complete it, sometimes we continued to touch the mcast entry after the completion, opening us up to possible use after free issues. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that we have a running task. We know when we need to reschedule our join task thread and don't need a flag to tell us. 9) Add a helper for rescheduling the join task thread A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	efc82eeeae	IB/ipoib: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	0b39578bcd	IB/ipoib: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes everything from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	894021a752	IB/ipoib: Make the carrier_on_task race aware We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_OPER_UP has been cleared, in which case we don't need to do anything anyway and we just return. While investigating which flag should be used, FLAG_ADMIN_UP or FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more appropriate flag to use. However, there was a mix of these two flags in use in the existing code. So while we check for that flag here as part of this race fix, also cleanup the two places that had used the less appropriate flag for their tests. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	c84ca6d2b1	IB/ipoib: Consolidate rtnl_lock tasks in workqueue The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and needs to keep it held. It also needs to call flush_workqueue() to flush out any outstanding work. In the past, we've had to try and make sure that we didn't flush out any outstanding join completions because they also wanted to grab rtnl_lock() and that would deadlock. It turns out that the only thing in the join completion handler that needs this lock can be safely moved to our carrier_on_task, thereby reducing the potential for the join completion code and the flush code to deadlock against each other. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	be7aa663fc	IB/ipoib: change init sequence ordering In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	e135106fac	IB/ipoib: factor out ah flushing Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at appropriate times to flush out all remaining ah entries before we shut the device down. Because neighbors and mcast entries can each have a reference on any given ah, we must make sure to free all of those first before our ah will actually have a 0 refcount and be able to be reaped. This factoring is needed in preparation for having per-device work queues. The original per-device workqueue code resulted in the following error message: <ibdev>: ib_dealloc_pd failed That error was tracked down to this issue. With the changes to which workqueues were flushed when, there were no flushes of the per device workqueue after the last ah's were freed, resulting in an attempt to dealloc the pd with outstanding resources still allocated. This code puts the explicit flushes in the needed places to avoid that problem. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Al Viro	237dae8890	Merge branch 'iocb' into for-davem trivial conflict in net/socket.c and non-trivial one in crypto - that one had evaded aio_complete() removal. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-09 00:01:38 -04:00
David S. Miller	c85d6975ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mellanox/mlx4/cmd.c net/core/fib_rules.c net/ipv4/fib_frontend.c The fib_rules.c and fib_frontend.c conflicts were locking adjustments in 'net' overlapping addition and removal of code in 'net-next'. The mlx4 conflict was a bug fix in 'net' happening in the same place a constant was being replaced with a more suitable macro. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-06 22:34:15 -04:00
Saeed Mahameed	64613d9499	net/mlx5_core: Extend struct mlx5_interface to support multiple protocols Preparation for ethernet driver. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:43 -04:00
Saeed Mahameed	ce0f750932	net/mlx5_core: Modify arm CQ in preparation for upcoming Ethernet driver Pass consumer index as a parameter to arm CQ Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:43 -04:00
Saeed Mahameed	233d05d28a	net/mlx5_core: Move completion eqs from mlx5_ib to mlx5_core Preparation for ethernet driver. These functions will be used in drivers other than mlx5_ib. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:42 -04:00
Saeed Mahameed	6cf0a15f07	IB/mlx5: Fix Mellanox copyright note Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:42 -04:00
Saeed Mahameed	b812b5441e	net/mlx5_core: Clear doorbell record inside mlx5_db_alloc() Do it in one place instead of every where the function is invoked Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:41 -04:00
Eli Cohen	7bef7ad24b	net/mlx5_core: Coding style fix Put a line of space before return and next statement. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:40 -04:00
Haggai Abramonvsky	c3c6c9c810	net/mlx5_core: Fix call to mlx5_core_qp_modify Pass 0 in the sqd_event parameter. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:33:40 -04:00
Ido Shamay	a130b59057	net/mlx4: Add SET_PORT opcode modifiers enumeration The calls to SET_PORT used hard-code numbers, when supplying command's opcode modifiers, fix that to use well defined constants. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 16:25:03 -04:00
Nicolas Dichtel	5aa7add8f1	infiniband/ipoib: implement ndo_get_iflink Don't use dev->iflink anymore. CC: Roland Dreier <roland@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-02 14:05:01 -04:00
Shachar Raindel	8494057ab5	IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Properly verify that the resulting page aligned end address is larger than both the start address and the length of the memory area requested. Both the start and length arguments for ib_umem_get are controlled by the user. A misbehaving user can provide values which will cause an integer overflow when calculating the page aligned end address. This overflow can cause also miscalculation of the number of pages mapped, and additional logic issues. Addresses: CVE-2014-8159 Cc: <stable@vger.kernel.org> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-04-02 09:53:59 -07:00
Christoph Hellwig	e2e40f2c1e	fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-03-25 20:28:11 -04:00
Majd Dibbiny	61a3855bb7	IB/mlx4: Saturate RoCE port PMA counters in case of overflow For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of overflow, according to the IB spec, we have to saturate a counter to its max value, do that. Fixes: `c37791349c` ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-18 15:17:11 -04:00
Moni Shoua	217e8b16a4	IB/mlx4: Verify net device validity on port change event Processing an event is done in a different context from the one when the event was dispatched. This requires a check that the slave net device is still valid when the event is being processed. The check is done under the iboe lock which ensure correctness. Fixes: `a575009030` ('IB/mlx4: Add port aggregation support') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-18 15:17:11 -04:00
Linus Torvalds	be5e6616dd	Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...	2015-02-22 17:42:14 -08:00
Linus Torvalds	e20d3ef540	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - Update vhost-scsi to support F_ANY_LAYOUT using mm/iov_iter.c logic, and signal VERSION_1 support (MST + Viro + nab) - Fix iscsi/iser-target to remove problematic active_ts_set usage (Gavin Guo) - Update iscsi/iser-target to support multi-sequence sendtargets (Sagi) - Fix original PR_APTPL_BUF_LEN 8k size limitation (Martin Svec) - Add missing WRITE_SAME end-of-device sanity check (Bart) - Check for LBA + sectors wrap-around in sbc_parse_cdb() (nab) - Other various minor SPC/SBC compliance fixes based upon Ronnie Sahlberg test suite (nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (32 commits) target: Set LBPWS10 bit in Logical Block Provisioning EVPD target: Fail UNMAP when emulate_tpu=0 target: Fail WRITE_SAME w/ UNMAP=1 when emulate_tpws=0 target: Add sanity checks for DPO/FUA bit usage target: Perform PROTECT sanity checks for WRITE_SAME target: Fail I/O with PROTECT bit when protection is unsupported target: Check for LBA + sectors wrap-around in sbc_parse_cdb target: Add missing WRITE_SAME end-of-device sanity check iscsi-target: Avoid IN_LOGOUT failure case for iser-target target: Fix PR_APTPL_BUF_LEN buffer size limitation iscsi-target: Drop problematic active_ts_list usage iscsi/iser-target: Support multi-sequence sendtargets text response iser-target: Remove duplicate function names vhost/scsi: potential memory corruption vhost/scsi: Global tcm_vhost -> vhost_scsi rename vhost/scsi: Drop left-over scsi_tcq.h include vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq vhost/scsi: Add ANY_LAYOUT iov -> sgl mapping prerequisites vhost/scsi: Change vhost_scsi_map_to_sgl to accept iov ptr + len ...	2015-02-21 13:21:19 -08:00
Linus Torvalds	b5ccb078c8	InfiniBand/RDMA changes for 3.20 merge window: - Re-enable on-demand paging changes with stable ABI - Fairly large set of ocrdma HW driver fixes - Some qib HW driver fixes - Other miscellaneous changes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJU52nHAAoJEENa44ZhAt0hkycP/0vwYNl0JJadasSUrLm2AWje iAePU+K7uiyxSuIU+eLnyyDV/iQ2sCjXfhQNE6FnFhH3JjwBYbhS2Q8WcjwfRWYX iFhORQltb3spSeLdT7N1QfkMF4MtAw3ENPE3QKIgNaIQSso+J256BHrSiqr9Akwe hwIVr0TLDO59ggPs3c083uUhC+5AMViMVgRR+N9+/59WHz2vG2WsTunpg1sYOAgC KpUhfZW4za5xYJ0c8BOnPiSAfJasD1UdDg3oX0RPD4j/diiWAO6EOR+jkOLtODpd 8uhD8HM7ZvCKE1io5QnVdjTR/n51NaVGHk+yfWJRSvV1sw1AALtU4NVniI+E5mFs Fwe+zUzbQ3j08TtrU0VjaeteBh2oGC0pJlkJS9+HXeDmH30/LQCMCiA5mBSSEPDg 0cPEAJgGAZqOIyyZe8jSslW/iN0cE6FDDb8+/1AZ80IfbdMxXh9FVwi7cdXn8+OQ nXmtnSa7yzKWS9VVXDrvSzI0Y2oDv8DSDpCWGCm7bUcDXd/T5LpPR3RdSorGkYAr O2zmuJZlCdkuKgW91O7cztNj2hTlDnJ+e+5P05KwPOb86Muum6tphLJd5b1h970X Actx/6EX/ycSQ3ukiKW7Ksn2bYD3RLZvdbPYQM5xUjlGGWPF6nvmbZ8MqnyaLFfE WKvx39DMGq3ktofiMkbf =JLsF -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA updates from Roland Dreier: - Re-enable on-demand paging changes with stable ABI - Fairly large set of ocrdma HW driver fixes - Some qib HW driver fixes - Other miscellaneous changes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (43 commits) IB/qib: Add blank line after declaration IB/qib: Fix checkpatch warnings IB/mlx5: Enable the ODP capability query verb IB/core: Add on demand paging caps to ib_uverbs_ex_query_device IB/core: Add support for extended query device caps RDMA/cxgb4: Don't hang threads forever waiting on WR replies RDMA/ocrdma: Fix off by one in ocrdma_query_gid() RDMA/ocrdma: Use unsigned for bit index RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit RDMA/ocrdma: Update the ocrdma module version string RDMA/ocrdma: set vlan present bit for user AH RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure RDMA/ocrdma: Add support for interrupt moderation RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE RDMA/ocrdma: Host crash on destroying device resources RDMA/ocrdma: Report correct state in ibv_query_qp RDMA/ocrdma: Debugfs enhancments for ocrdma driver RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device ...	2015-02-21 12:53:21 -08:00
Roland Dreier	147d1da951	Merge branches 'core', 'cxgb4', 'iser', 'mlx4', 'mlx5', 'ocrdma', 'odp', 'qib' and 'srp' into for-next	2015-02-20 09:04:40 -08:00
Mike Marciniszyn	da12c1f685	IB/qib: Add blank line after declaration Upstream checkpatch now requires this. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-20 09:04:12 -08:00
Mike Marciniszyn	a46a2802f7	IB/qib: Fix checkpatch warnings Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-20 09:04:09 -08:00
David Howells	a95104fd33	Infiniband: Fix potential NULL d_inode dereference Code that does this: if (!(d_unhashed(tmp) && tmp->d_inode)) { ... simple_unlink(parent->d_inode, tmp); } is broken because: !(d_unhashed(tmp) && tmp->d_inode) is equivalent to: !d_unhashed(tmp) \|\| !tmp->d_inode so it is possible to get into simple_unlink() with tmp->d_inode == NULL. simple_unlink(), however, assumes tmp->d_inode cannot be NULL. I think that what was meant is this: !d_unhashed(tmp) && tmp->d_inode and that the logical-not operator or the final close-bracket was misplaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Bryan O'Sullivan <bos@pathscale.com> cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-20 04:56:45 -05:00
Haggai Eran	1707cb4ab7	IB/mlx5: Enable the ODP capability query verb Re-enable the on-demand paging capability query through the extended query device verb. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:36:26 -08:00
Haggai Eran	f4056bfd8c	IB/core: Add on demand paging caps to ib_uverbs_ex_query_device Add on-demand paging capabilities reporting to the extended query device verb. Yann Droneaud writes: Note: as offsetof() is used to retrieve the size of the lower chunk of the response, beware that it only works if the upper chunk is right after, without any implicit padding. And, as the size of the latter chunk is added to the base size, implicit padding at the end of the structure is not taken in account. Both point must be taken in account when extending the uverbs functionalities. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:36:26 -08:00
Eli Cohen	02d1aa7af1	IB/core: Add support for extended query device caps Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Following the discussion about this patch [1], the code now validates the command's comp_mask is zero, returning -EINVAL for unknown values, in order to allow extending the verb in the future. The verb also checks the user-space provided response buffer size and only fills in capabilities that will fit in the buffer. In attempt to follow the spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics Alliance International Developer Workshop 2013, the comp_mask bits will only describe which fields are valid. Furthermore, fields that can simply be cleared when they are not supported, do not require a comp_mask bit at all. The verb returns a response_length field containing the actual number of bytes written by the kernel, so that a newer version running on an older kernel can tell which fields were actually returned. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://thread.gmane.org/gmane.linux.kernel.api/7889/ [2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:36:26 -08:00
Hariprasad S	1fc8190dd6	RDMA/cxgb4: Don't hang threads forever waiting on WR replies In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally dead. Further, if the device is marked fatally dead, then fail the WR wait immediately. Also change the timeout to 60 seconds. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:33:15 -08:00
Dan Carpenter	59a39ca3f7	RDMA/ocrdma: Fix off by one in ocrdma_query_gid() The ->sgid_tbl[] array has OCRDMA_MAX_SGID number of elements so this test is off by one. ->sgid_tbl is allocated in ocrdma_alloc_resources(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:06 -08:00
Rasmus Villemoes	f3070e7efd	RDMA/ocrdma: Use unsigned for bit index In the expressions idx/32 and idx%32, both idx and 32 have signed type, and unfortunately the C standard prescribes rounding to 0, so unless gcc can prove that idx is non-negative, these cannot be implemented as simple shift respectively mask operations. Help gcc by changing the type of idx to unsigned - this cuts another few instructions from the generated code. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:06 -08:00
Rasmus Villemoes	ba64fdca63	RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit gcc emits a surprising amount of code in order to flip a bit. One would think that a single instruction is enough. $ scripts/bloat-o-meter /tmp/ocrdma_verbs.o drivers/infiniband/hw/ocrdma/ocrdma_verbs.o add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-142 (-142) function old new delta ocrdma_post_srq_recv 498 460 -38 ocrdma_poll_cq 2010 1962 -48 ocrdma_discard_cqes 495 439 -56 All three calls of ocrdma_srq_toggle_bit happen within spinlocks, so saving a few useless instructions might be worthwhile. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:05 -08:00
Mitesh Ahuja	0ba5dc5cba	RDMA/ocrdma: Update the ocrdma module version string Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:05 -08:00
Devesh Sharma	29565f2f09	RDMA/ocrdma: set vlan present bit for user AH For the AH that describs a VLAN interface details, vlan present bit needs to be set during posting a WQE. This patch adds the code to allow it happening. Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:05 -08:00
Mitesh Ahuja	d2b8f7b1f8	RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure Use get_ocrdma_dev(ocrdma_qp->ibqp.device) function to access ocrdma device pointer. Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:04 -08:00
Mitesh Ahuja	b4dbe8d52d	RDMA/ocrdma: Add support for interrupt moderation Add support for interrupt moderation for ocrdma device. Thresholds for high interrupt rates are static values derived based on experimental results. Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:04 -08:00
Devesh Sharma	a601dc77f8	RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac Check for return value for ocrdma_resolve_dmac while setting AV params. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:04 -08:00
Selvin Xavier	043e9deed9	RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP If the SQ and RQ of the QP in error state uses separate CQs, traverse the list of QPs using each CQs and invoke the buddy CQ handler for both SQ and RQ. Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:03 -08:00
Devesh Sharma	380676ea2b	RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE Remove support for RDMA-READ-WITH-INVALIDATE from ocrdma driver. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2015-02-18 08:31:03 -08:00

1 2 3 4 5 ...

4374 Commits