linux

Author	SHA1	Message	Date
Moni Shoua	2efdd6a038	RDMA/cma: Don't allow IPoIB port space for IBoE This patch fixes a kernel crash in cma_set_qkey(). When the link layer is Ethernet, it is wrong to use IPoIB port space since no IPoIB interface is available. Specifically, setting the Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and an SA query, which doesn't make sense over Ethernet. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 16:50:25 -07:00
Jack Morgenstein	0c9361fcde	RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a few anomalies in the error flow due to confusion about checking for NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value every time we wish to determine if the cma_id object has a cm device associated with it. Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can check directly for the existence of the device pointer -- for a non-NULL check, makes no difference if it is the iwarp or the ib pointer). Finally, make a few code changes here to improve coding consistency. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-07-18 09:44:55 -07:00
Nir Muchtar	83e9502d8d	RDMA/cma: Save PID of ID's owner Save the PID associated with an RDMA CM ID for reporting via netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	753f618ae0	RDMA/cma: Add support for netlink statistics export Add callbacks and data types for statistics export of all current devices/ids. The schema for RDMA CM is a series of netlink messages. Each one contains an rdma_cm_stat struct. Additionally, two netlink attributes are created for the addresses for each message (if applicable). Their types used are: RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID) RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID) sockaddr_* structs are encapsulated within these attributes. In other words, every transaction contains a series of messages like: -------message 1------- struct rdma_cm_id_stats { __u32 qp_num; __u32 bound_dev_if; __u32 port_space; __s32 pid; __u8 cm_state; __u8 node_type; __u8 port_num; __u8 reserved; } RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address -------end 1------- -------message 2------- struct rdma_cm_id_stats RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute -------end 2------- Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Sean Hefty	b26f9b9949	RDMA/cma: Pass QP type into rdma_create_id() The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:23 -07:00
Nir Muchtar	550e5ca77e	RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> Move cma.c's internal definition of enum cma_state to enum rdma_cm_state in an exported header so that it can be exported via RDMA netlink. Signed-off-by: Nir Muchtar <nirm@voltaire.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-25 13:46:22 -07:00
Hefty, Sean	a9bb79128a	RDMA/cma: Add an ID_REUSEADDR option Lustre requires that clients bind to a privileged port number before connecting to a remote server. On larger clusters (typically more than about 1000 nodes), the number of privileged ports is exhausted, resulting in lustre being unusable. To handle this, we add support for reusable addresses to the rdma_cm. This mimics the behavior of the socket option SO_REUSEADDR. A user may set an rdma_cm_id to reuse an address before calling rdma_bind_addr() (explicitly or implicitly). If set, other rdma_cm_id's may be bound to the same address, provided that they all have reuse enabled, and there are no active listens. If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it will only succeed if there are no other id's bound to that same address. The reuse option is exported to user space. The behavior of the kernel reuse implementation was verified against that given by sockets. This patch is derived from a path by Ira Weiny <weiny2@llnl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
Hefty, Sean	43b752daae	RDMA/cma: Fix handling of IPv6 addressing in cma_use_port cma_use_port() assumes that the sockaddr is an IPv4 address. Since IPv6 addressing is supported (and also to support other address families) make the code more generic in its address handling. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-05-09 22:06:10 -07:00
Sean Hefty	a396d43a35	RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one rdma_destroy_id currently uses the global rdma cm 'lock' to test if an rdma_cm_id has been bound to a device. This prevents an active address resolution callback handler from assigning a device to the rdma_cm_id after rdma_destroy_id checks for one. Instead, we can replace the use of the global lock around the check to the rdma_cm_id device pointer by setting the id state to destroying, then flushing all active callbacks. The latter is accomplished by acquiring and releasing the handler_mutex. Any active handler will complete first, and any newly scheduled handlers will find the rdma_cm_id in an invalid state. In addition to optimizing the current locking scheme, the use of the rdma_cm_id mutex is a more intuitive synchronization mechanism than that of the global lock. These changes are based on feedback from Doug Ledford <dledford@redhat.com> while he was trying to debug a crash in the rdma cm destroy path. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:57:34 -07:00
Sean Hefty	25ae21a101	RDMA/cma: Fix crash in request handlers Doug Ledford and Red Hat reported a crash when running the rdma_cm on a real-time OS. The crash has the following call trace: cm_process_work cma_req_handler cma_disable_callback rdma_create_id kzalloc init_completion cma_get_net_info cma_save_net_info cma_any_addr cma_zero_addr rdma_translate_ip rdma_copy_addr cma_acquire_dev rdma_addr_get_sgid ib_find_cached_gid cma_attach_to_dev ucma_event_handler kzalloc ib_copy_ah_attr_to_user cma_comp [ preempted ] cma_write copy_from_user ucma_destroy_id copy_from_user _ucma_find_context ucma_put_ctx ucma_free_ctx rdma_destroy_id cma_exch cma_cancel_operation rdma_node_get_transport rt_mutex_slowunlock bad_area_nosemaphore oops_enter They were able to reproduce the crash multiple times with the following details: Crash seems to always happen on the: mutex_unlock(&conn_id->handler_mutex); as conn_id looks to have been freed during this code path. An examination of the code shows that a race exists in the request handlers. When a new connection request is received, the rdma_cm allocates a new connection identifier. This identifier has a single reference count on it. If a user calls rdma_destroy_id() from another thread after receiving a callback, rdma_destroy_id will proceed to destroy the id and free the associated memory. However, the request handlers may still be in the process of running. When control returns to the request handlers, they can attempt to access the newly created identifiers. Fix this by holding a reference on the newly created rdma_cm_id until the request handler is through accessing it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Doug Ledford <dledford@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2011-03-15 10:00:28 -07:00
Eli Cohen	af7bd46376	IB/core: Add VLAN support for IBoE Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN. The 3 bits user priority field of the packets are identical to the 3 bits of the SL. In case of rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-25 10:20:39 -07:00
Eli Cohen	3c86aa70bf	RDMA/cm: Add RDMA CM support for IBoE devices Add support for IBoE device binding and IP --> GID resolution. Path resolving and multicast joining are implemented within cma.c by filling in the responses and running callbacks in the CMA work queue. IP --> GID resolution always yields IPv6 link local addresses; remote GIDs are derived from the destination MAC address of the remote port. Multicast GIDs are always mapped to multicast MACs as is done in IPv6. (IPv4 multicast is enabled by translating IPv4 multicast addresses to IPv6 multicast as described in <http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.) Some helper functions are added to ib_addr.h. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-10-13 15:46:43 -07:00
Roland Dreier	ffebedb7ab	Merge branches 'amso1100', 'bkl', 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'iser', 'masked-atomics', 'misc', 'mthca' and 'nes' into for-next	2010-05-15 20:06:01 -07:00
Julia Lawall	9893e742a0	IB/core: Use kmemdup() instead of kmalloc()+memcpy() Use kmemdup when some other buffer is immediately copied into the allocated region. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; statement S; @@ - to = \(kmalloc\\|kzalloc\)(size,flag); + to = kmemdup(from,size,flag); if (to==NULL \|\| ...) S - memcpy(to, from, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-05-15 20:05:07 -07:00
Tetsuo Handa	5d7220e8dc	RDMA/cma: Randomize local port allocation Randomize local port allocation in the way sctp_get_port_local() does. Update rover at the end of loop since we're likely to pick a valid port on the first try. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-21 16:18:40 -07:00
Linus Torvalds	0eddb519b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx4: Check correct variable for allocation failure RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp() RDMA/cm: Set num_paths when manually assigning path records IB/cm: Fix device_create() return value check	2010-04-09 11:53:06 -07:00
Sean Hefty	ae2d9293d7	RDMA/cm: Set num_paths when manually assigning path records When manually assigning the path records to use for a connection, save the number of paths that were set. Otherwise, checks against num_path will show 0, even though path record data is available. This was discovered by manually setting the path records from user space, then querying the kernel to see if the correct path records were assigned, only to discover that the kernel returned 0 path records to the query. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-04-07 14:13:22 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Sean Hefty	8523c04809	RDMA/cm: Revert association of an RDMA device when binding to loopback Revert the following change from commit `6f8372b6` ("RDMA/cm: fix loopback address support") The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdma_bind_addr, no device is associated with the rdma_cm_id. Fix this. It turns out that important apps such as Open MPI depend on rdma_bind_addr() NOT associating any RDMA device when binding to a loopback address. Open MPI is being updated to deal with this, but at least until a new Open MPI release is available, maintain the previous behavior: allow rdma_bind_addr() to succeed, but do not bind to a device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-02-10 12:00:48 -08:00
Robert P. J. Day	fd4582a399	IB/addr: Correct CONFIG_IPv6 to CONFIG_IPV6 Correct misspelled "CONFIG_IPv6" that was introduced in commit `d14714df` ("IB/addr: Fix IPv6 routing lookup"). The config variable should be all uppercase. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> [ This was my fault when I munged the original patch. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2010-01-06 13:16:30 -08:00
Sean Hefty	d14714df61	IB/addr: Fix IPv6 routing lookup Include link scope as part of address resolution. Combine local and remote address resolution into a single, simpler code path. Fix error checking in the IPv6 routing lookups. Based on work from: David Wilder <dwilder@us.ibm.com> Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Fix up cma_check_linklocal() for !IPV6 case. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 16:46:25 -08:00
Sean Hefty	6f8372b69c	RDMA/cm: fix loopback address support The RDMA CM is intended to support the use of a loopback address when establishing a connection; however, the behavior of the CM when loopback addresses are used is confusing and does not always work, depending on whether loopback was specified by the server, the client, or both. The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdam_bind_addr, no device is associated with the rdma_cm_id. Fix this. If a loopback address is specified by the client as the destination address for a connection, it will fail to establish a connection. This is true even if the server is listing across all addresses or on the loopback address itself. The issue is that the server tries to translate the IP address carried in the REQ message to a local net_device address, which fails. The translation is not needed in this case, since the REQ carries the actual HW address that should be used. Finally, cleanup loopback support to be more transport neutral. Replace separate calls to get/set the sgid and dgid from the device address to a single call that behaves correctly depending on the format of the device address. And support both IPv4 and IPv6 address formats. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/ - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 13:26:06 -08:00
Sean Hefty	c4315d85f9	IB/addr: Store net_device type instead of translating to RDMA transport The struct rdma_dev_addr stores net_device address information: the source device address, destination hardware address, and broadcast address. For consistency, store the net_device type rather than converting it to the rdma_node_type. The type indicates the format of the various hardware addresses, which is what we're concerned with, and not the RDMA node type that the address may map to. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:57:18 -08:00
Sean Hefty	6266ed6e41	RDMA/cma: Replace net_device pointer with index Provide the device interface when resolving route information to ensure that the correct outbound device is used. This will also simplify processing of sin6_scope_id for IPv6 support. Based on work from: David Wilder <dwilder@us.ibm.com> Jason Gunthorpe <jgunthrope@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:22 -08:00
Jason Gunthorpe	e2e626972e	RDMA/cma: Fix AF_INET6 support in multicast joining If joining to an AF_INET6 address, we need to map the address to a MGID in the same way as the IP stack. The old code would just fall through to the IPv4 case and generate garbage. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:22 -08:00
Jason Gunthorpe	1c9b281997	RDMA/cma: Correct detection of SA Created MGID RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy; fix it up. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-11-19 12:55:21 -08:00
Peter Huewe	716abb1fdf	RDMA: Add __init/__exit macros to addr.c and cma.c Add __init and __exit annotations to the module_init/module_exit functions from drivers/infiniband/core/addr.c and cma.c. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-06-23 10:38:42 -07:00
Yossi Etigin	d2ca39f262	RDMA/cma: Create cm id even when IB port is down When doing rdma_resolve_addr(), if the relevant IB port is down, the function fails and the cm_id is not bound to the correct device. Therefore, application does not have a device handle and cannot wait for the port to become active. The function fails because the underlying IPoIB interface is not joined to the broadcast group and therefore the SA does not have a multicast record to take a Q_Key from. The fix is to use lazy Q_Key resolution - cma_set_qkey() will set id_priv->qkey if it was not set, and will be called just before the Q_Key is really required. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-04-08 13:42:33 -07:00
Yossi Etigin	84adeee9aa	RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups When joining an IPoIB multicast group, use the same rate as in the broadcast group. Otherwise, if the RDMA CM creates this group before IPoIB does, it might get a different rate. This will cause IPoIB to fail joining to the same group later on, because IPoIB uses strict rate selection. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2009-04-01 13:55:32 -07:00
Aleksey Senin	1f5175adea	RDMA/cma: Add IPv6 support Handle AF_INET6 cases where required, and use struct sockaddr_storage wherever an IPv6 address might be stored. Signed-off-by: Aleksey Senin <aleksey@alst60.(none)> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-12-24 10:16:45 -08:00
Roland Dreier	3f44675439	RDMA/cma: Remove padding arrays by using struct sockaddr_storage There are a few places where the RDMA CM code handles IPv6 by doing struct sockaddr addr; u8 pad[sizeof(struct sockaddr_in6) - sizeof(struct sockaddr)]; This is fragile and ugly; handle this in a better way with just struct sockaddr_storage addr; [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to switch to struct sockaddr_storage and get rid of padding arrays in struct rdma_addr. ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:02:14 -07:00
Amir Vadai	38ca83a588	RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event Consumers that want to re-use their QPs in new connections need to know when the QP has exited the timewait state. Report the timewait event through the rdma_cm. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:23 -07:00
Or Gerlitz	dd5bdff83b	RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm consumers that wish to have their RDMA sessions always use the same links (eg <hca/port>) as the IP stack does. In the current code, this does not happen when bonding is used and fail-over happened but the IB link used by an already existing session is operating fine. Use the netevent notification for sensing that a change has happened in the IP stack, then scan the rdma-cm ID list to see if there is an ID that is "misaligned" with respect to the IP stack, and deliver RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the event or just ignore it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:22 -07:00
Or Gerlitz	de910bd921	RDMA/cma: Simplify locking needed for serialization of callbacks The RDMA CM has some logic in place to make sure that callbacks on a given CM ID are delivered to the consumer in a serialized manner. Specifically it has code to protect against a device removal racing with a running callback function. This patch simplifies this logic by using a mutex per ID instead of a wait queue and atomic variable. This means that cma_disable_remove() now is more properly named to cma_disable_callback(), and cma_enable_remove() can now be removed because it just would become a trivial wrapper around mutex_unlock(). Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Or Gerlitz	64c5e613b9	RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr Keep a pointer to the local (src) netdevice in struct rdma_dev_addr, and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip() in cma_new_conn_id() to reduce some code duplication and also make sure the src_dev member gets set. In a high-availability configuration the netdevice pointer can be used by the RDMA CM to align RDMA sessions to use the same links as the IP stack does under fail-over and route change cases. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Roland Dreier	468f2239bc	RDMA/cma: Add missing newlines to printk()s Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2008-07-14 23:48:47 -07:00
Sean Hefty	a947491709	RDMA: Fix license text The license text for several files references a third software license that was inadvertently copied in. Update the license to what was intended. This update was based on a request from HP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00
Julia Lawall	10f32065a2	RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0 The function rdma_create_id() always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. The problem was found using the following semantic match. (http://www.emn.fr/x-info/coccinelle/) //<smpl> @a@ expression E, E1; statement S,S1; position p; @@ E = rdma_create_id(...) ... when != E = E1 if@p (E) S else S1 @n@ position a.p; expression E,E1; statement S,S1; @@ E = NULL ... when != E = E1 if@p (E) S else S1 @depends on !n@ expression E; statement S,S1; position a.p; @@ * if@p (E) S else S1 //</smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:25 -07:00
Al Viro	1b90c137cc	trivial endianness annotations: infiniband core Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:24 -07:00
Sean Hefty	ead595aeb0	RDMA/cma: Do not issue MRA if user rejects connection request There's an undesirable interaction with issuing MRA requests to increase connection timeouts and the listen backlog. When the rdma_cm receives a connection request, it queues an MRA with the ib_cm. (The ib_cm will send an MRA if it receives a duplicate REQ.) The rdma_cm will then create a new rdma_cm_id and give that to the user, which in this case is the rdma_user_cm. If the listen backlog maintained in the rdma_user_cm is full, it destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The ib_cm_id generates a REJ because the state of the ib_cm_id has changed to MRA sent, versus REQ received. When the backlog is full, we just want to drop the REQ so that it is retried later. Fix this by deferring queuing the MRA until after the user of the rdma_cm has examined the connection request. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-14 15:30:41 -08:00
Denis V. Lunev	1ab352768f	[NETNS]: Add namespace parameter to ip_dev_find. in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:04 -08:00
Joe Perches	6360a02af1	[IPV4] drivers/infiniband: Use ipv4_is_<type> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:17 -08:00
Sean Hefty	5851bb893e	RDMA/cma: Override default responder_resources with user value By default, the responder_resources parameter is set to that received in a connection request. The passive side may override this value when accepting the connection. Use the value provided by the passive side when transitioning the QP to RTR state, rather than the value given in the connect request. Without this change, the RTR transition may fail if the passive side supports fewer responder_resources than that in the request. For code consistency and to protect against QP destruction, restructure overriding initiator_depth to match how responder_resources is set. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:41 -08:00
Rolf Manderscheid	a9e527e3f9	IPoIB: improve IPv4/IPv6 to IB mcast mapping functions An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:37 -08:00
Vladimir Sokolovsky	45d9478da1	RDMA/cma: Reenable device removal on passive side Enable conn_id remove on the passive side after connection establishment. This corrects an issue where the IB driver can't be unloaded after running applications over RDS. The 'dev_remove' counter does not reach 0 for established connections on the passive side. This problem is limited to device removal, and only occurs on the passive side if there are established connections. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:31 -08:00
Steve Wise	8d8293cfb3	RDMA/iwcm: Set initiator depth and responder resources to device max values Set the initiator depth and responder resources to the device max values for new connect request events in the iWARP connection manager. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:25 -08:00
Linus Torvalds	0b776eb542	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()	2007-10-23 09:56:11 -07:00
Anton Arapov	a25de534f8	[INET]: Justification for local port range robustness. There is a justifying patch for Stephen's patches. Stephen's patches disallows using a port range of one single port and brakes the meaning of the 'remaining' variable, in some places it has different meaning. My patch gives back the sense of 'remaining' variable. It should mean how many ports are remaining and nothing else. Also my patch allows using a single port. I sure we must be able to use mentioned port range, this does not restricted by documentation and does not brake current behavior. usefull links: Patches posted by Stephen Hemminger http://marc.info/?l=linux-netdev&m=119206106218187&w=2 http://marc.info/?l=linux-netdev&m=119206109918235&w=2 Andrew Morton's comment http://marc.info/?l=linux-kernel&m=119248225007737&w=2 1. Allows using a port range of one single port. 2. Gives back sense of 'remaining' variable. Signed-off-by: Anton Arapov <aarapov@redhat.com> Acked-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:00:17 -07:00
Sean Hefty	d02d1f5359	RDMA/cma: Fix deadlock destroying listen requests Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>. The deadlock occurs when a connection request arrives at the same time that a wildcard listen is being destroyed. A wildcard listen maintains per device listen requests for each RDMA device in the system. The per device listens are automatically added and removed when RDMA devices are inserted or removed from the system. When a wildcard listen is destroyed, rdma_destroy_id() acquires the rdma_cm's device mutex ('lock') to protect against hot-plug events adding or removing per device listens. It then tries to destroy the per device listens by calling ib_destroy_cm_id() or iw_destroy_cm_id(). It does this while holding the device mutex. However, if the underlying iw/ib CM reports a connection request while this is occurring, the rdma_cm callback function will try to acquire the same device mutex. Since we're in a callback, the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until their callback thread returns, but the callback is blocked waiting for the device mutex. Fix this by re-working how per device listens are destroyed. Use rdma_destroy_id(), which avoids the deadlock, in place of cma_destroy_listen(). Additional synchronization is added to handle device hot-plug events and ensure that the id is not destroyed twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-16 12:25:49 -07:00
Sean Hefty	c5483388bb	RDMA/cma: Add locking around QP accesses If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically transition the QP through its states (RTR, RTS, error, etc.) While the QP state transitions are occurring, the QP itself must remain valid. Provide locking around the QP pointer to prevent its destruction while accessing the pointer. This fixes an issue reported by Olaf Kirch from Oracle that resulted in a system crash: "An incoming connection arrives and we decide to tear down the nascent connection. The remote ends decides to do the same. We start to shut down the connection, and call rdma_destroy_qp on our cm_id. ... Now apparently a 'connect reject' message comes in from the other host, and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED. It calls cma_modify_qp_err, which for some odd reason tries to modify the exact same QP we just destroyed." Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-16 12:25:00 -07:00

1 2

93 Commits