Commit Graph

52 Commits

Author SHA1 Message Date
Matan Barak
dd5f03beb4 IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:20:54 -08:00
Eli Cohen
1c636f8016 IB/core: Encorce MR access rights rules on kernel consumers
Enforce the rule that when requesting remote write or atomic permissions, local
write must be indicated as well. See IB spec 11.2.8.2.

Spotted by: Hagay Abramovsky <hagaya@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 10:25:32 -08:00
Upinder Malhi \(umalhi\)
180771a370 IB/core: Add Cisco usNIC rdma node and transport types
This patch adds new rdma node and new rdma transport, and supporting
code used by Cisco's low latency driver called usNIC.  usNIC uses its
own transport, distinct from IB and iWARP.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-09 02:36:25 -08:00
Roland Dreier
82af24ac6f Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2013-09-03 09:01:08 -07:00
Hadar Hen Zion
319a441d13 IB/core: Add receive flow steering support
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side.  Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.

This patch adds ib_create_flow(), which allow providing a flow
specification for a QP.  When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers.  Currently specs for Ethernet, IPv4,
TCP and UDP are defined.  Flow specs are made of values and masks.

The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.

    struct ib_flow_attr {
            enum ib_flow_attr_type type;
            u16      size;
            u16      priority;
            u32      flags;
            u8       num_of_specs;
            u8       port;
            /* Following are the optional layers according to user request
             * struct ib_flow_spec_yyy
             * struct ib_flow_spec_zzz
             */
    };

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.

The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.

For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.

Steering rules processing order is according to the domain over which
the rule is set and the rule priority.  All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation.  Lower numerical value for the priority field means
higher priority.

The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().

Applications that offload TCP/IP traffic can also be written over IB
UD QPs.  The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too.  A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.

The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:

    IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification

    IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
        all Ethernet traffic which isn't steered to any QP

    IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast

    IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-28 09:51:52 -07:00
Yishai Hadas
73c40c616a IB/core: Add locking around event dispatching on XRC target QPs
Fix a potential race when event occurrs on a target XRC QP and in the
middle of reporting that on its shared qps, one of them is destroyed
by user space application.  Also add note for kernel consumers in
ib_verbs.h that they must not destroy the QP from within the handler.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:21:32 -07:00
Shlomo Pongratz
eec9e29fc5 IB/core: Verify that QP handler is valid before dispatching events
For QPs of type IB_QPT_XRC_TGT the IB core assigns a common event
handler __ib_shared_qp_event_handler(), and the optionally supplied
event handler is stored. When the common handler is called it iterates
on all opened QPs and calles the original handlers without checking if
they are NULL.  Fix that.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:42:54 -07:00
Shani Michaeli
7083e42ee2 IB/core: Add "type 2" memory windows support
This patch enhances the IB core support for Memory Windows (MWs).

MWs allow an application to have better/flexible control over remote
access to memory.

Two types of MWs are supported, with the second type having two flavors:

    Type 1  - associated with PD only
    Type 2A - associated with QPN only
    Type 2B - associated with PD and QPN

Applications can allocate a MW once, and then repeatedly bind the MW
to different ranges in MRs that are associated to the same PD. Type 1
windows are bound through a verb, while type 2 windows are bound by
posting a work request.

The 32-bit memory key is composed of a 24-bit index and an 8-bit
key. The key is changed with each bind, thus allowing more control
over the peer's use of the memory key.

The changes introduced are the following:

* add memory window type enum and a corresponding parameter to ib_alloc_mw.
* type 2 memory window bind work request support.
* create a struct that contains the common part of the bind verb struct
  ibv_mw_bind and the bind work request into a single struct.
* add the ib_inc_rkey helper function to advance the tag part of an rkey.

Consumer interface details:

* new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
  IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
  for these features.

  Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
  IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
  memory windows. It can set neither to indicate it doesn't support
  type 2 windows at all.

* modify existing provides and consumers code to the new param of
  ib_alloc_mw and the ib_mw_bind_info structure

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-21 11:51:45 -08:00
Roland Dreier
cc169165c8 Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus 2012-05-21 09:00:47 -07:00
Or Gerlitz
c938a616aa IB/core: Add raw packet QP type
IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.

This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves.  Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:18:09 -07:00
Or Gerlitz
c3bccbfbb7 IB/core: Use qp->usecnt to track multicast attach/detach
Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs
that are attached to them, don't let a QP be destroyed if there are
multicast group(s) attached to it.  Use the existing usecnt field of
struct ib_qp which was added by commit 0e0ec7e ("RDMA/core: Export
ib_open_qp() to share XRC TGT QPs") to track this.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:16:54 -07:00
Bernd Schubert
e47e321a35 RDMA/core: Fix kernel panic by always initializing qp->usecnt
We have just been investigating kernel panics related to
cq->ibcq.event_handler() completion calls.  The problem is that
ib_destroy_qp() fails with -EBUSY.

Further investigation revealed qp->usecnt is not initialized.  This
counter was introduced in linux-3.2 by commit 0e0ec7e063
("RDMA/core: Export ib_open_qp() to share XRC TGT QPs") but it only
gets initialized for IB_QPT_XRC_TGT, but it is checked in
ib_destroy_qp() for any QP type.

Fix this by initializing qp->usecnt for every QP we create.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Sven Breuner <sven.breuner@itwm.fraunhofer.de>

[ Initialize qp->usecnt in uverbs too.  - Sean ]

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-27 09:20:10 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Paul Gortmaker
b108d9764c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Sean Hefty
0e0ec7e063 RDMA/core: Export ib_open_qp() to share XRC TGT QPs
XRC TGT QPs are shared resources among multiple processes.  Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP.  This allows us to transfer
ownership of an XRC TGT QP to another process.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:49:51 -07:00
Sean Hefty
53d0bd1e7f RDMA/uverbs: Export XRC domains to user space
Allow user space to create XRC domains.  Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:21:24 -07:00
Sean Hefty
d3d72d909e RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD
XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:20:27 -07:00
Sean Hefty
b42b63cf0d RDMA/core: Add XRC QPs
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:16:19 -07:00
Sean Hefty
418d51307d RDMA/core: Add XRC SRQ type
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:14:31 -07:00
Sean Hefty
96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Sean Hefty
59991f94eb RDMA/core: Add XRC domain support
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-12 10:32:26 -07:00
Marcel Apfelbaum
71eeba161d IB: Add new InfiniBand link speeds
Introduce support for the following extended speeds:

FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
        64b/66b encoding rather than 8b/10b encoding.
FDR:    IBA extended speed 14.0625 Gbps.
EDR:    IBA extended speed 25.78125 Gbps.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:53:47 -07:00
Eli Cohen
a3f5adaf49 IB/core: Add link layer property to ports
This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET.  This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support.  For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:10 -07:00
Aleksey Senin
a2ebf07ae5 IB: Rename RAW_ETY to RAW_ETHERTYPE
Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
the special QP type easier to understand.

cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html

Signed-off-by: Aleksey Senin <alekseys@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 10:44:19 -07:00
Ralph Campbell
e5a5e7d59a IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Steve Wise
00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Roland Dreier
f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Eli Cohen
2dd5716227 IB/core: Add support for modify CQ
Add support for modifying CQ parameters for controlling event
generation moderation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Dotan Barak
7ce5eacb45 IB/core: Check optional verbs before using them
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Sean Hefty
47645d8d25 IB/core: Set hop limit in ib_init_ah_from_wc correctly
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB
spec.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:02 -08:00
Tom Tucker
07ebafbaaa RDMA: iWARP Core Changes.
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:47 -07:00
Ralph Campbell
9bc57e2d19 IB/uverbs: Pass userspace data to modify_srq and modify_qp methods
Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:25 -07:00
Sean Hefty
4e00d69454 IB: Add ib_init_ah_from_wc()
Add a function to initialize address handle attributes from a work
completion.  This functionality is duplicated by both verbs and the CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Jack Morgenstein
bf6a9e31cf IB: simplify static rate encoding
Push translation of static rate to HCA format into low-level drivers,
where it belongs.  For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage.  The changes are:

 - Add enum ib_rate to midlayer includes.
 - Get rid of static rate translation in IPoIB; just use static rate
   directly from Path and MulticastGroup records.
 - Update mthca driver to translate absolute static rate into the
   format used by hardware.  This also fixes mthca's static rate
   handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:47 -07:00
Dotan Barak
4546d31d84 IB: Fix modify QP checking of "current QP state" attribute
According to the IB spec version 1.2, section 11.2.4.2, the current
table has a couple of mistakes where it allows the current QP state
(IB_QP_CUR_STATE) attribute.  For the transitions:

  RTS -> RTS: IB_QP_CUR_STATE should be allowed for all transports
  SQD -> SQD: IB_QP_CUR_STATE should never be allowed

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:20 -08:00
Roland Dreier
a74cd4af0b IB: Whitespace cleanups
Remove trailing whitespace and fix indentation that with spaces
instead of tabs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:13 -08:00
Roland Dreier
8a51866f08 IB: Add ib_modify_qp_is_ok() library function
The in-kernel mthca driver contains a table of which attributes are
valid for each queue pair state transition.  It turns out that both
other IB drivers -- ipath and ehca -- which are being prepared for
merging have copied this table, errors and all.

To forestall this code duplication, move this table and the code to
check parameters against it into a midlayer library function,
ib_modify_qp_is_ok().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:12 -08:00
Roland Dreier
33b9b3ee97 IB: Add userspace support for resizing CQs
Add support to uverbs to handle resizing userspace CQs (completion
queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has ib_resize_cq().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:07 -08:00
Ralph Campbell
4f8448dfe8 IB: Set GIDs correctly in ib_create_ah_from_wc()
ib_create_ah_from_wc() doesn't create the correct return address (AH)
when there is a GRH present (source & dest GIDs need to be swapped).

Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-06 16:43:47 -08:00
Linus Torvalds
78b9c0f91c Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2005-11-10 13:27:06 -08:00
Roland Dreier
40de2e548c [IB] Have cq_resize() method take an int, not int*
Change the struct ib_device.resize_cq() method to take a plain integer
that holds the new CQ size, rather than a pointer to an integer that
it uses to return the new size.  This makes the interface match the
exported ib_resize_cq() signature, and allows the low-level driver to
update the CQ size with proper locking if necessary.

No in-tree drivers are exporting this method yet.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Tim Schmielau
8c65b4a604 [PATCH] fix remaining missing includes
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch.  This should now allow not to include sched.h
from module.h, which is done by a followup patch.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:41 -08:00
Jack Morgenstein
0c33aeedb2 [IB] Add checks to multicast attach and detach
Add checks so that we only allow multicast attach/detach with
a valid multicast GID and the correct QP type.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-17 15:20:24 -07:00
Roland Dreier
a4d61e8480 [PATCH] IB: move include files to include/rdma
Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:38 -07:00
Roland Dreier
d41fcc6705 [PATCH] IB: Add SRQ support to midlayer
Make the required core API additions and changes for
shared receive queues (SRQs).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:36 -07:00
Roland Dreier
2a1d9b7f09 [PATCH] IB: Add copyright notices
Make some lawyers happy and add copyright notices for people who
forgot to include them when they actually touched the code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:35 -07:00
Hal Rosenstock
497677ab94 [PATCH] IB: A couple of IB core bug fixes
Replace be32_to_cpup with be32_to_cpu and fix bug referencing pointer rather
than value in ib_create_ah_from_wc().

Signed-off-by: Tom Duffy <tduffy@sun.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:12 -07:00
Hal Rosenstock
513789ed99 [PATCH] IB: Add ib_create_ah_from_wc to IB verbs
Added new call: ib_create_ah_from_wc.  Call will allocate an address handle
given work completion information, including any received GRH.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:12 -07:00