Commit Graph

550 Commits

Author SHA1 Message Date
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Roland Dreier
cbfb50e6e2 IB/uverbs: Fix checking of userspace object ownership
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().

Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.

Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 20:01:43 -07:00
Anton Arapov
a25de534f8 [INET]: Justification for local port range robustness.
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

  I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
  http://marc.info/?l=linux-netdev&m=119206106218187&w=2
  http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
  http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18 22:00:17 -07:00
Sean Hefty
d02d1f5359 RDMA/cma: Fix deadlock destroying listen requests
Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system.  The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens.  It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id().  It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex.  Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed.  Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen().  Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:49 -07:00
Sean Hefty
c5483388bb RDMA/cma: Add locking around QP accesses
If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.)  While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
 connection.  The remote ends decides to do the same.  We start to shut
 down the connection, and call rdma_destroy_qp on our cm_id. ... Now
 apparently a 'connect reject' message comes in from the other host,
 and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
 It calls cma_modify_qp_err, which for some odd reason tries to modify
 the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:00 -07:00
Kay Sievers
7eff2e7a8b Driver core: change add_uevent_var to use a struct
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.

Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:01 -07:00
Linus Torvalds
ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Stephen Hemminger
227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Sean Hefty
dcb3f974da RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries
Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment.  This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Sean Hefty
de98b693e9 IB/cm: Modify interface to send MRAs in response to duplicate messages
The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages.  In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent.  Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.

This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent.  It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Roland Dreier
04d29b0ede IB/uverbs: Make ib_uverbs_release_event_file() static
ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier
a394f83bdf IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems
The declaration of struct ib_user_mad_reg_req.method_mask[] exported
to userspace was an array of __u32, but the kernel internally treated
it as a bitmap made up of longs.  This makes a difference for 64-bit
big-endian kernels, where numbering the bits in an array of__u32 gives:

    |31.....0|63....31|95....64|127...96|

while numbering the bits in an array of longs gives:

    |63..............0|127............64|

64-bit userspace can handle this by just treating method_mask[] as an
array of longs, but 32-bit userspace is really stuck: the meaning of
the bits in method_mask[] depends on whether the kernel is 32-bit or
64-bit, and there's no sane way for userspace to know that.

Fix this by updating <rdma/ib_user_mad.h> to make it clear that
method_mask[] is an array of longs, and using a compat_ioctl method to
convert to an array of 64-bit longs to handle the 32-on-64 problem.
This fixes the interface description to match existing behavior (so
working binaries continue to work) in almost all situations, and gives
consistent semantics in the case of 32-bit userspace that can run on
either a 32-bit or 64-bit kernel, so that the same binary can work for
both 32-on-32 and 32-on-64 systems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier
2be8e3ee8e IB/umad: Add P_Key index support
Add support for setting the P_Key index of sent MADs and getting the
P_Key index of received MADs.  This requires a change to the layout of
the ABI structure struct ib_user_mad_hdr, so to avoid breaking
compatibility, we default to the old (unchanged) ABI and add a new
ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
of the new ABI to opt into using it.

We plan on switching to the new ABI by default in a year or so, and
this patch adds a warning that is printed when an application uses the
old ABI, to push people towards converting to the new ABI.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@xsigo.com>
2007-10-09 19:59:15 -07:00
Ralph Campbell
57cb61d587 IB/core: Fix handling of multicast response failures
I was looking at the code for multicast.c and noticed that
ib_sa_join_multicast() calls queue_join() which puts the
request at the front of the group->pending_list.  If this
is a second request, it seems like it would interfere with
process_join_error() since group->last_join won't point
to the member at the head of the pending_list. The sequence
would thus be:

1. ib_sa_join_multicast()
   puts member1 on head of pending_list and starts work thread
2. mcast_work_handler()
   calls send_join() which sets group->last_join to member1
3. ib_sa_join_multicast()
   puts member2 on head of pending_list
4. join operation for member1 receives failures response from SA.
5. join_handler() is called with error status
6. process_join_error() fails to process member1 since
   it doesn't match the first entry in the group->pending_list.

The impact is that the failed join request is tossed.  The second
request is processed, and after it completes, the original request ends
up being retried.

This change also results in join requests being processed in FIFO
order.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Steve Wise
935ef2d7a2 RDMA/cma: Use neigh_event_send() to start neighbour discovery
Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all
this.  Without doing full ND, RDMA address resolution fails in the
presence of dropped ARP broadcast packets.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Joachim Fenkes
c8d8beea03 IB/umem: Add hugetlb flag to struct ib_umem
During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Sean Hefty
7ce86409ad RDMA/ucma: Allow user space to set service type
Export the ability to set the type of service to user space.  Model
the interface after setsockopt.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
a81c994d5e RDMA/cma: Add ability to specify type of service
Provide support to specify a type of service for a communication
identifier.  A new function call is used when dealing with IPv4
addresses.  For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ The comments Eitan Zahavi and myself have made over the v1 post at 
  <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
  were fully addressed. ]
 
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> 
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
733d65fe33 IB/sa: Add new QoS fields to path record
The QoS annex defines new fields for path records.  Add them to the
ib_sa for consumers that want to use them.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Ali Ayoub
3c10c7c929 IB/sa: Error handling thinko fix
ib_create_send_mad() returns an error code pointer on error, not NULL.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Anton Blanchard
8a68bbe31d IB/fmr_pool: Clean up some error messages in fmr_pool.c
A number of printks in fmr_pool.c dont have newlines, eg:

    fmr_create failed for FMR 0<5>FS-Cache: Loaded

Fix them up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier
65d470b3ea IB: find_first_zero_bit() takes unsigned pointer
Fix sparse warning

    drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness)
    drivers/infiniband/core/device.c:142:6:    expected unsigned long const *addr
    drivers/infiniband/core/device.c:142:6:    got long *[assigned] inuse

by making the local variable inuse unsigned.  Does not affect generated
code at all.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Dotan Barak
92ddc447ce IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c
After moving the definition of struct ib_umem_chunk from ib_verbs.h to
ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK
to stay in ib_verbs.h.  Move the macro to umem.c, the only place where
it is used.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Sean Hefty
38d5af9565 IB/mad: Fix address handle leak in mad_rmpp
The address handle associated with dual-sided RMPP direction switch
ACKs is never destroyed.  Free the AH for ACKs which fall into this
category.

Problem was reported by Dotan Barak <dotanb@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
8fc394b197 IB/mad: agent_send_response() should be void
Nothing looks at the return value of agent_send_response(), so there's
no point in returning anything.

Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
86dfbecdea IB/mad: Fix memory leak in switch handling in ib_mad_recv_done_handler()
If agent_send_response() returns an error, we shouldn't do anything
differently than if it succeeds; setting response to NULL just means
that the response buffer gets leaked.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock
445d68070c IB/mad: Fix error path if response alloc fails in ib_mad_recv_done_handler()
If ib_mad_recv_done_handler() fails to allocate response, then it just
printed a warning and continued, which leads to an oops if the MAD is
being handled for a switch device, because the switch code uses
response without checking for NULL.  Fix this by bailing out of the
function if the allocation fails.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Roland Dreier
5399891052 IB/sa: Don't need to check for default P_Key twice
Now that ib_find_pkey() ignores the membership bit of P_Keys, there's no
need for ib_sa to look for both 0x7fff and 0xffff in a port's P_Key table.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Moni Shoua
36026ecc20 IB/core: Ignore membership bit in ib_find_pkey()
ib_find_pkey() is used as a replacement for ib_find_cached_pkey(), and
the original function ignored the membership bit when searching for a
P_Key, so ib_find_pkey() should ignore the bit too.

In particular, IPoIB turns on the P_Key membership bit of limited
membership P_Keys when creating a child interface and looks for the
full membership P_key.  This broke if a port was a partial member of a
partition when IPoIB switched from ib_find_cached_pkey() to
ib_find_pkey(), and this change fixes things again.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Yoann Padioleau
dd00cc486a some kmalloc/memset ->kzalloc (tree wide)
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

Here is a short excerpt of the semantic patch performing
this transformation:

@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@

 x =
- kmalloc
+ kzalloc
  (E1,E2)
  ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);

@@
expression E1,E2,E3;
@@

- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)

[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:50 -07:00
Dotan Barak
8f076531cd RDMA/cma: Remove local write permission from QP access flags
Local write permission makes no sense as part of the QP access flags,
since the access flags only control what the remote end of the
connection is allowed to do.  Remove the code in the RDMA CM that
initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 20:30:22 -07:00
Roland Dreier
454a01e7f4 IB/cm: Make internal function cm_get_ack_delay() static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:43 -07:00
Linus Torvalds
0cdf6990e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (76 commits)
  IB: Update MAINTAINERS with Hal's new email address
  IB/mlx4: Implement query SRQ
  IB/mlx4: Implement query QP
  IB/cm: Send no match if a SIDR REQ does not match a listen
  IB/cm: Fix handling of duplicate SIDR REQs
  IB/cm: cm_msgs.h should include ib_cm.h
  IB/cm: Include HCA ACK delay in local ACK timeout
  IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
  IB/sa: Make sure SA queries use default P_Key
  IPoIB: Recycle loopback skbs instead of freeing and reallocating
  IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
  IPoIB/cm: Fix warning if IPV6 is not enabled
  IB/core: Take sizeof the correct pointer when calling kmalloc()
  IB/ehca: Improve latency by unlocking after triggering the hardware
  IB/ehca: Notify consumers of LID/PKEY/SM changes after nondisruptive events
  IB/ehca: Return QP pointer in poll_cq()
  IB/ehca: Change idr spinlocks into rwlocks
  IB/ehca: Refactor sync between completions and destroy_cq using atomic_t
  IB/ehca: Lock renaming, static initializers
  IB/ehca: Report RDMA atomic attributes in query_qp()
  ...
2007-07-12 16:45:40 -07:00
Tejun Heo
7b595756ec sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Sean Hefty
6164c8cd13 IB/cm: Send no match if a SIDR REQ does not match a listen
If a SIDR REQ does not match a listen, we should reply with status
value 1 (service ID not supported), rather than dropping through to
the default case of status 2 (rejected by service provider).

Doing this also fixes a bug where the cm_id_priv is removed from the
remote_sidr_table twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:52:28 -07:00
Sean Hefty
29c2731cbf IB/cm: Fix handling of duplicate SIDR REQs
Fix handling to duplicate SIDR REQs to avoid sending a reject if a
duplicate is detected.  Duplicates should just be silently discarded.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:51:43 -07:00
Sean Hefty
5d861be8c8 IB/cm: cm_msgs.h should include ib_cm.h
cm_msgs.h uses definitions from ib_cm.h.  Include it directly, rather
than depending on a specific include order.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:53 -07:00
Sean Hefty
1d84612649 IB/cm: Include HCA ACK delay in local ACK timeout
The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs.  If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:05 -07:00
Sean Hefty
24be6e81c7 IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
The ib_cm is a little over zealous about using spin_lock_irqsave,
when spin_lock_irq would do.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:47:29 -07:00
Sean Hefty
2aec5c602c IB/sa: Make sure SA queries use default P_Key
MADs sent to the SA should use the the default P_Key (0x7fff/0xffff).
There's no requirement that the default P_Key is stored at index 0 in
the local P_Key table, so add code to the sa_query module to look up
the index of the default P_Key when creating an address handle for the
SA (which is done any time the P_Key table might change), and use this
index for all SA queries.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:45:31 -07:00
Dotan Barak
856c52a741 IB/core: Take sizeof the correct pointer when calling kmalloc()
When allocating out_mad in show_pma_counter(), take sizeof *out_mad
instead of sizeof *in_mad.  It is true that today the type of in_mad
and out_mad are the same, but this patch will give us a cleaner code.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 11:04:40 -07:00
Andrew Morton
1d3f4b905a IB: Fix ib_umem_get() when npages == 0
gcc correctly warned:

drivers/infiniband/core/umem.c: In function 'ib_umem_get':
drivers/infiniband/core/umem.c:78: warning: 'ret' may be used uninitialized in this function

Set ret to 0 in case npages == 0 and the loop isn't entered at all.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:33 -07:00
Roland Dreier
43506d954e IB: Remove garbage non-ASCII characters from comments
A few files had 0xa0 characters in comments.  Remove them so that the 
files are clean ASCII text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Hal Rosenstock
1bae4dbf95 IB/mad: Enhance SMI for switch support
Extend the SMI with switch (intermediate hop) support. Care has been
taken to ensure that the CA (and router) code paths are changed as
little as possible.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Roland Dreier
24bce50803 IB/umem: Fix possible hang on process exit
If ib_umem_release() is called after ib_uverbs_close() sets context->closing,
then a process can get stuck in a D state, because the code boils down to

	if (down_write_trylock(&mm->mmap_sem))
		down_write(&mm->mmap_sem);

which is obviously a stupid instant deadlock.  Fix the code so that we
only try to take the lock once.

This bug was introduced in commit f7c6a7b5 ("IB/uverbs: Export
ib_umem_get()/ib_umem_release() to modules") which fortunately never
made it into a release, and was reported by Pete Wyckoff <pw@osc.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 11:05:58 -07:00
Sean Hefty
bf2944bd56 RDMA/cma: Fix initialization of next_port
next_port should be between sysctl_local_port_range[0] and [1].
However, it is initially set to a random value with get_random_bytes().  
If the value is negative when treated as a signed integer, next_port
can end up outside the expected range because of the result of the % 
operator being negative.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:38 -07:00
Sean Hefty
d998ccce02 IB/cm: Fix stale connection detection
The ib_cm can incorrectly detect a stale connection (a new connection
request for a QPN that is already connected) as a duplicate connection
request.  Separate the handling of potential duplicate REQs from stale
connections.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Linus Torvalds
8aee74c8ee Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
2007-05-21 16:19:32 -07:00
Michael S. Tsirkin
9f81036c54 IB/cm: Improve local id allocation
The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:41:29 -07:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Roland Dreier
1af4c435f3 IB/core: Use start_port() and end_port()
Clean up ib_query_port() and ib_modify_port() slightly by using the 
just-added start_port() and end_port() helpers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Yosef Etigin
5eb620c81c IB/core: Add helpers for uncached GID and P_Key searches
Add ib_find_gid() and ib_find_pkey() functions that use uncached device
queries.  The calls might block but the returns are always up-to-date.
Cache P_Key and GID table lengths in core to avoid extra port info queries.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Eli Cohen
7b82cd8ee7 IB/core: Free umem when mm is already gone
Free umem when task's mm is already destroyed by the time
ib_umem_release gets called.

Found by Dotan Barak at Mellanox.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Sean Hefty
6c719f5c6c RDMA/cma: Add check to validate that cm_id is bound to a device
Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device.  Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:32 -07:00
Sean Hefty
be65f086f2 RDMA/cma: Fix synchronization with device removal in cma_iw_handler
The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id.  Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:56:32 -07:00
Sean Hefty
8aa08602bd RDMA/cma: Simplify device removal handling code
Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:54:49 -07:00
Roland Dreier
1bf66a3042 IB: Put rlimit accounting struct in struct ib_umem
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm.  However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Roland Dreier
f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Linus Torvalds
972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier
1a70a05d9d IB/fmr_pool: Add prefix to all printks
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Jean Delvare
6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Joachim Fenkes
1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Hal Rosenstock
de493d47d8 IB/mad: Change SMI to use enums rather than magic return codes
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
aeba84a925 IB/umad: Implement GRH handling for sent/received MADs
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
d0e7bb1418 IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
src_path_bits needs to mask off the base LID value.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty
9d41b7fdea IB/ucm: Simplify ib_ucm_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Sean Hefty
d92f76448c RDMA/ucma: Simplify ucma_get_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Hal Rosenstock
9a4b65e357 IB/umad: Fix declaration of dev_map[]
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[].  We shouldn't declare it to be twice as big.

Pointed-out-by: Roland Dreier <rolandd@cisco.com>

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
2007-04-18 20:20:53 -07:00
Sean Hefty
3492856e33 RDMA/ucma: Avoid sending reject if backlog is full
Change the returned error code to ENOMEM if the connection event
backlog is full.  This prevents the ib_cm from issuing a reject
on the connection, which can allow retries to succeed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 14:58:11 -08:00
Sean Hefty
cb164b8c6a RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()
The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list.  Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:41:44 -08:00
Sean Hefty
1836854f25 RDMA/cma: Remove unused node_guid from cma_device structure
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:35 -08:00
Sean Hefty
e971b8cd19 IB/cm: Remove ca_guid from cm_device structure
The cm_device references an ib_device, which already contains the node_guid.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:33 -08:00
Sean Hefty
962063e64b RDMA/cma: Request reversible paths only
The rdma_cm requires that path records be reversible.  Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:07 -08:00
Sean Hefty
47645d8d25 IB/core: Set hop limit in ib_init_ah_from_wc correctly
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB
spec.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:02 -08:00
Roland Dreier
aaf1aef55f IB/uverbs: Return correct error for invalid PD in register MR
If no matching PD is found in ib_uverbs_reg_mr(), then the function
jumps to err_release without setting the return value ret.  This means
that ret will hold the return value of the call to ib_umem_get() a few
lines earlier; if the function reaches the point where it looks for
the PD, we know that ib_umem_get() must have returned 0, so
ib_uverbs_reg_mr() ends up return 0 for a bad PD ID.  Fix this by
setting ret to -EINVAL before jumping to the exit path when no PD is
found.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 13:16:51 -08:00
Roland Dreier
7084f8429c IB/core: Set static rate in ib_init_ah_from_path()
The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 15:31:24 -08:00
Roland Dreier
38abaa63bf IB/core: Fix sparse warnings about shadowed declarations
Change a couple of variable names to avoid sparse warnings about
symbols being shadowed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:41:14 -08:00
Sean Hefty
c8f6a362bf RDMA/cma: Add multicast communication support
Extend rdma_cm to support multicast communication.  Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space.  The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP.  The port space determines the signature used in the
MGID when joining the group.  The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant.  This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP.  The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:29:07 -08:00
Sean Hefty
faec2f7b96 IB/sa: Track multicast join/leave requests
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left.  Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations.  Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:20:02 -08:00
Steve Wise
ebb90986e1 RDMA/iwcm: iw_cm_id destruction race fixes
iwcm iw_cm_id destruction race condition fixes:

- iwcm_deref_id() always wakes up if there's another reference.
- clean up race condition in cm_work_handler().
- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.
- rem_ref() if this is the last reference -and- the IWCM owns freeing the
  cm_id, then free it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Linus Torvalds
93bbad8fe1 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mthca: Always fill MTTs from CPU
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC
  IB: Remove redundant "_wq" from workqueue names
  RDMA/cma: Increment port number after close to avoid re-use
  IB/ehca: Fix memleak on module unloading
  IB/mthca: Work around gcc bug on sparc64
  IPoIB: Connected mode experimental support
  IB/core: Use ARRAY_SIZE macro for mandatory_table
  IB/mthca: Use correct structure size in call to memset()
2007-02-13 21:16:39 -08:00
Arjan van de Ven
2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Sean Hefty
c7f743a669 IB: Remove redundant "_wq" from workqueue names
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Sean Hefty
aedec08050 RDMA/cma: Increment port number after close to avoid re-use
Randomize the starting port number and avoid re-using port values
immediately after they are closed.  Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Ahmed S. Darwish
9a6b090c0d IB/core: Use ARRAY_SIZE macro for mandatory_table
Use ARRAY_SIZE() macro already defined in kernel.h instead of open
coding equivalent code.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Steve Wise
1f12667021 RDMA/addr: Handle ethernet neighbour updates during route resolution
The iWARP connection manager uses the ib_addr services to do route
resolution (neighbour discovery in the IP world).  The ib_addr
netevent callback routine, however, currently only acts on InfiniBand
neighbour updates.  It needs to act on ethernet neighbour updates as
well.

This patch just removes filtering on device type altogether and will
trigger on any neighour updates where the nud_type is valid.  This
simplifies the code some.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Michael S. Tsirkin
062dbb69f3 IB: Return qp pointer as part of ib_wc
struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:55 -08:00
Sean Hefty
0cefcf0bbc RDMA/ucma: Don't report events with invalid user context
There's a problem with how rdma cm events are reported to userspace
that can lead to application crashes.

When a new connection request arrives, a context for the connection is
allocated in the kernel.  The connection event is then reported to
userspace.  The userspace library retrieves the event and allocates
its own context for the connection.  The userspace context is
associated with the kernel's context when accepting.  This allows the
kernel to give userspace context with other events.

A problem occurs if a second event for the same connection occurs
before the user has had a chance to call accept.  The userspace
context has not yet been set, which causes the librdmacm to crash.
(This has been seen when the app takes too long to call accept,
resulting in the remote side timing out and rejecting the connection)

Fix this by ignoring events for new connections until userspace has
set their context.  This can only happen if an error occurs on a new
connection before the user accepts it.  This is okay, since the accept
will just fail later.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:20:08 -08:00
Sean Hefty
30a5ec982e RDMA/ucma: Fix struct ucma_event leak when backlog is full
We discard new connection requests while the listen backlog is full,
but leak a struct ucma_event in the process.  Free the structure in
this case.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:17:34 -08:00
Steve Wise
881a045fc5 RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects
The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:15:58 -08:00
Ralph Campbell
1527106ff8 IB/core: Use the new verbs DMA mapping functions
Convert code in core/ to use the new DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:28:30 -08:00
Sean Hefty
7521663857 RDMA/cma: Export rdma cm interface to userspace
Export the rdma cm interfaces to userspace via a misc device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:22 -08:00
Sean Hefty
628e5f6d39 RDMA/cma: Add support for RDMA_PS_UDP
Allow the use of UD QPs through the rdma_cm, in order to provide
address translation services for resolving IB addresses for datagram
messages using SIDR.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
0fe313b000 RDMA/cma: Allow early transition to RTS to handle lost CM messages
During connection establishment, the passive side of a connection can
receive messages from the active side before the connection event has
been delivered to the user.  Allow the passive side to send messages
in response to received data before the event is delivered.  To handle
the case where the connection messages are lost, a new rdma_notify()
function is added that users may invoke to force a connection into the
established state.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
a1b1b61f80 RDMA/cma: Report connect info with connect events
Connection information was never given to the recipient of a
connection request or reply message.  Only the event was delivered.
Report the connection data with the event to allows user to
reject the connection based on the requested parameters, or adjust
their resources to match the request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty
9b2e9c0c24 RDMA/cma: Remove unneeded qp_type parameter from rdma_cm
The qp_type parameter into the rdma_cm is unneeded, and can be
misleading.  The QP type should be determined from the port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Roland Dreier
f47e22c6e4 IB/fmr: ib_flush_fmr_pool() may wait too long
ib_flush_fmr_pool() stashes away the request generation number
properly, but then goes ahead and rereads it every time it tests
whether the flush generation number has caught up.  This means that
there is a theoretical possibility of livelock, if the request
generation number keeps getting bumped and the flush generation number
never catches up.  The fix is simple: use the request generation
number read at the beginning of the function.

Also, atomic_inc() followed by atomic_read() can be replaced with
atomic_int_return().  There's no real requirement for atomicity here
but we might as well shrink the code.

This bug was discovered using David Binderman's list of "set but never
used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:19 -08:00
Josef Sipek
1cfd6e648b [PATCH] struct path: convert infiniband
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:46 -08:00
David Howells
4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Michael S. Tsirkin
f469b2626f IB/ucm: Fix deadlock in cleanup
ib_ucm_cleanup_events() holds file_mutex while calling ib_destroy_cm_id().
This can deadlock since ib_destroy_cm_id() flushes event handlers, and
ib_ucm_event_handler() needs file_mutex, too.  Therefore, drop the
file_mutex during the call to ib_destroy_cm_id().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Sean Hefty
e1444b5a16 IB/cm: Fix automatic path migration support
The ib_cm_establish() function is replaced with a more generic
ib_cm_notify().  This routine is used to notify the CM that failover
has occurred, so that future CM messages (LAP, DREQ) reach the remote
CM.  (Currently, we continue to use the original path)  This bumps the
userspace CM ABI.

New alternate path information is captured when a LAP message is sent
or received.  This allows QP attributes to be initialized for the user
when a new path is loaded after failover occurs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Roland Dreier
04699a1f86 RDMA/addr: list_move() cleanups
Replace a couple list_del()/list_add() combos with list_move().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar
c78bb8442b RDMA/addr: Fix some cancellation problems in process_req()
Fix following problems in process_req() relating to cancellation:

- Function is wrongly doing another addr_remote() when cancelled,
  which is not required.
- Make failure reporting immediate by using time_after_eq().
- On cancellation, -ETIMEDOUT was returned to the callback routine
  instead of the more appropriate -ECANCELLED (users getting notified
  may want to print/return this status, eg ucma_event_handler).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar
9ab1ffa877 RDMA/iwcm: Fix comment for iwcm_deref_id() to match code
In iwcm_deref_id(), the comment says : "If the last reference is being
removed and iw_destroy_cm_id is waiting, wake up the waiting
thread". The second part of the comment, "and iw_destroy_cm_id is
waiting," is wrong, since this function either wakes the waiter
already waiting in iwcm_deref_id, or enables it (so that when
wait_for_completion() is performed later, it will immediately return).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
715a588f42 RDMA/iwcm: Remove unnecessary function argument
Remove unnecessary cm_id_priv argument to copy_private_data(), and
change text to reflect the code.  Fix couple of typos in comments.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
13fccdb380 RDMA/iwcm: Remove unnecessary initializations
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar
83b9658623 RDMA/iwcm: Fix memory leak
If we get IW_CM_EVENT_CONNECT_REQUEST message and encounter an error
(not in the LISTEN state, cannot create an id, cannot alloc
work_entry, etc), then the memory allocated by cm_event_handler() in
the event->private_data gets leaked. Since cm_work_handler has already
put the event on the work_free_list, this allocated memory is
leaked. High backlog value can allow DoS attacks.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Krishna Kumar
33ba0fa9f3 RDMA/iwcm: Fix memory corruption bug in cm_work_handler()
Possible memory corruption scenario: after putting the work entry back
on the work_free_list, we call process_event() which dereferences
work->event, which could have been modified to another value
meanwhile.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier
e54f81889c IB: Convert kmem_cache_t -> struct kmem_cache
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Dotan Barak
e31353eaec RDMA/cm: Remove setting local write as part of QP access flags
The qp_access_flags are for remote access permissions only, so
IB_ACCESS_LOCAL_WRITE is an invalid value.  Remove it from the values
set by cm_init_qp_init_attr() and cma_init_ib_qp().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Eric Sesterhenn
bed8bdfddd IB: kmemdup() cleanup
Replace open coded kmemdup() to save some screen space, and allow
inlining/not inlining to be triggered by gcc.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Krishna Kumar
a1a733f65b RDMA/cma: Rewrite cma_req_handler() to encapsulate common code
Rewrite cma_req_handler error handling case to encapsulate
common code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar
f115db4803 RDMA/addr: Use time_after_eq() instead of time_after() in queue_req()
In queue_req(), use time_after_eq() instead of time_after()
for following reasons :

- Improves insert time if multiple entries with same time are
  present.
- set_timeout need not be called if entry with same time
  is added to the list (and that happens to be the entry
  with the smallest time), saving atomic/locking operations.
- Earlier entries with same time are deleted first (fifo).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar
e4022274cf RDMA/cma: Remove redundant check in cma_add_one
Remove redundant check of node_guid in cma_add_one().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
Krishna Kumar
e82153b54d RDMA/cma: Optimize cma_bind_loopback() to check for empty list
Optimize to test for an empty list first.  This ends up simplifying
the code too.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
David Howells
c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Roland Dreier
39798695b4 IB/mad: Fix race between cancel and receive completion
When ib_cancel_mad() is called, it puts the canceled send on a list
and schedules a "flushed" callback from process context.  However,
this leaves a window where a receive completion could be processed
before the send is fully flushed.

This is fine, except that ib_find_send_mad() will find the MAD and
return it to the receive processing, which results in the sender
getting both a successful receive and a "flushed" send completion for
the same request.  Understandably, this confuses the sender, which is
expecting only one of these two callbacks, and leads to grief such as
a use-after-free in IPoIB.

Fix this by changing ib_find_send_mad() to return a send struct only
if the status is still successful (and not "flushed").  The search of
the send_list already had this check, so this patch just adds the same
check to the search of the wait_list.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 09:38:07 -08:00
Sean Hefty
7a118df3ea RDMA/addr: Use client registration to fix module unload race
Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-02 14:26:04 -08:00
Jack Morgenstein
0b26c88f29 IB/uverbs: Return sq_draining value in query_qp response
Return the sq_draining value back to user space for query_qp instead
of the en_sqd_async notify value, which is valid only for
modify_qp.  For query_qp, the draining status should returned.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 21:19:35 -08:00
Krishna Kumar
255d0c14b3 RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count
rdma_bind_addr() leaks a cma_dev reference count in failure case.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-10-30 20:52:51 -08:00
Sean Hefty
82a9c16a10 IB/cm: Send DREP in response to unmatched DREQ
Currently a DREP is only sent in response to a DREQ if a connection
has been found matching the DREQ, and it is in the proper state.  Once
a DREP is sent, the local connection moves into timewait.  Duplicate
DREQs received while in this state result in re-sending the DREP.

However, it's likely that the local connection will enter and exit
timewait before the remote side times out a lost DREP and resends a DREQ.
To handle this, we send a DREP in response to a DREQ, even if a local
connection is not found.  This avoids maintaining disconnected
id's in timewait states for excessively long times, just to handle a
lost DREP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Sean Hefty
8575329d4f IB/cm: Fix timewait crash after module unload
If the ib_cm module is unloaded while id's are still in timewait, the
CM will destroy the work queue used to process timewait.  Once the
id's exit timewait, their timers will fire, leading to a crash trying
to access the destroyed work queue.

We need to track id's that are in timewait, and cancel their deferred
work on module unload.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Krishna Kumar
3f168d2b66 RDMA/cma: Optimize error handling
Reorganize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:16 -07:00
Krishna Kumar
94de178ac6 RDMA/cma: Eliminate unnecessary remove_list
Eliminate remove_list by using list_del_init() instead during device
removal handling.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:16 -07:00
Sean Hefty
8f0472d331 RDMA/cma: Set status correctly on route resolution error
On reporting a route error, also include the status for the error,
rather than indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Krishna Kumar
6e35aabee1 RDMA/cma: Fix device removal race
The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
	    which sets id state to CMA_DEVICE_REMOVAL and
	    calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
	    and calls cma_acquire_ib_dev() and on failure
	    calls cma_release_remove(), which does a
	    wake_up of cma_process_remove(). Then
	    cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
	    state of id, and since it is still (wrongly)
	    CMA_DEVICE_REMOVAL, it calls notify_user(id)
	    and if that fails, the caller - cma_process_remove()
	    calls rdma_destroy_id(id). Two processes can
	    call rdma_destroy_id(), resulting in one
	    de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Krishna Kumar
675a027c3d RDMA/cma: Fix leak of cm_ids in case of failures
cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Al Viro
60cad5da57 [IPV4]: annotate inetdev.h helpers
inet_confirm_addr(), inet_ifa_byprefix(), ip_dev_find(), inet_make_mask() and
inet_ifa_match() annotated, along with inferred net-endian variables

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:05 -07:00
Alexey Dobriyan
1a1d92c10d [PATCH] Really ignore kmem_cache_destroy return value
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

	(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
  the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
  low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Al Viro
d7b2004528 [PATCH] missing includes from infiniband merge
indirect chains of includes are arch-specific and can't
be relied upon...  (hell, even attempt to build it for
itanic would trigger vmalloc.h ones; err.h triggers
on e.g. alpha).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-23 11:34:43 -07:00
Krishna Kumar
9cd330d36b IB: Fix typo in kerneldoc for ib_set_client_data()
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:58 -07:00
Michael S. Tsirkin
a70d059009 IB/cm: Do not track remote QPN in timewait state
Do not track remote QPN in TimeWait state, since QP is not connected.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:53 -07:00
Michael S. Tsirkin
c1a0b23bf4 IB/sa: Require SA registration
Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running.  Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:53 -07:00
Sean Hefty
61a73c708f RDMA/cma: Protect against adding device during destruction
Closes a window where address resolution can attach an rdma_cm_id to a
device during destruction of the rdma_cm_id.  This can result in the
rdma_cm_id remaining in the device list after its memory has been
freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:49 -07:00
Tom Tucker
07ebafbaaa RDMA: iWARP Core Changes.
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:47 -07:00
Tom Tucker
922a8e9fb2 RDMA: iWARP Connection Manager.
Add an iWARP Connection Manager (CM), which abstracts connection
management for iWARP devices (RNICs).  It is a logical instance of the
xx_cm where xx is the transport type (ib or iw).  The symbols exported
are used by the transport independent rdma_cm module, and are
available also for transport dependent ULPs.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:46 -07:00
Roland Dreier
3cd965646b IB: Whitespace fixes
Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all.  Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:46 -07:00
Sean Hefty
f06d265375 IB/cm: Randomize starting comm ID
Randomize the starting local comm ID to avoid getting a rejected
connection due to a stale connection after a system reboot or
reloading of the ib_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:45 -07:00
James Lentini
2b3e258e5d IB/mad: Remove unused includes
The ib_mad module does not use a kthread function, but mad_priv.h
includes <linux/kthread.h>.  mad_rmpp.c does not do any DMA-related
stuff, but includes <linux/dma-mapping.h>.  Remove the unused includes.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:44 -07:00
Sean Hefty
75ab13443e IB/mad: Add support for dual-sided RMPP transfers.
The implementation assumes that any RMPP request that requires a
response uses DS RMPP.  Based on the RMPP start-up scenarios defined
by the spec, this should be a valid assumption.  That is, there is no
start-up scenario defined where an RMPP request is followed by a
non-RMPP response.  By having this assumption we avoid any API
changes.

In order for a node that supports DS RMPP to communicate with one that
does not, RMPP responses assume a new window size of 1 if a DS ACK has
not been received.  (By DS ACK, I'm referring to the turn-around ACK
after the final ACK of the request.)  This is a slight spec deviation,
but is necessary to allow communication with nodes that do not
generate the DS ACK.  It also handles the case when a response is sent
after the request state has been discarded.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:44 -07:00
Sean Hefty
76842405fc IB/cm: Use correct reject code for invalid GID
Set the reject code properly when rejecting a request that contains an
invalid GID.  A suitable GID is returned by the IB CM in the
additional reject information (ARI).  This is a spec compliancy issue.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:43 -07:00
Sean Hefty
c1f250c0b4 IB/cm: Enable atomics along with RDMA reads
Enable atomic operations along with RDMA reads if a local RDMA
read/atomic depth is provided by the user.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:42 -07:00
Ralph Campbell
9bc57e2d19 IB/uverbs: Pass userspace data to modify_srq and modify_qp methods
Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:25 -07:00
Ralph Campbell
64f817ba98 IB/uverbs: Allow resize CQ operation to return driver-specific data
Add a ib_uverbs_resize_cq_resp.driver_data field so that low-level
drivers can return data from a resize CQ operation to userspace.  Have
ib_uverbs_resize_cq() only copy the cqe field, to avoid having to bump
the userspace ABI.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:24 -07:00
Roland Dreier
1ccf6aa19a IB/uverbs: Fix lockdep warning when QP is created with 2 CQs
Lockdep warns when userspace creates a QP that uses different CQs for
send completions and receive completions, because both CQs are locked
and their mutexes belong to the same lock class.  However, we know
that the mutexes are distinct and the nesting is safe (there is no
possibility of AB-BA deadlock because the mutexes are locked with
down_read()), so annotate the situation with SINGLE_DEPTH_NESTING to
get rid of the lockdep warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:20 -07:00
Roland Dreier
ab10867621 IB/uverbs: Use idr_read_cq() where appropriate
There were two functions that open-coded idr_read_cq() in terms of
idr_read_uobj() rather than using the helper.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:19 -07:00
Michael S. Tsirkin
d5bb75999c RDMA/cma: Increase the IB CM retry count in CMA
3 seems like a low number of IB Communication Manager retries to set;
we see connections failing under stress, and in any case 3 just looks
like an arbitrary number.  15 is the max value allowed by the
InfiniBand spec.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-14 13:55:30 -07:00
Jack Morgenstein
acaea9ee46 IB/core: Fix SM LID/LID change with client reregister set
After commit 12bbb2b7be, when SM LID
change or LID change MAD also has a client reregistration bit set,
only CLIENT_REREGISTER event is generated.

As a result, the sa_query module and the cache module don't update the
port information, and ULPs (e.g. IPoIB) stop working.  This is the
regression we observe as compared to 2.6.17.

Rather than generate multiple events (which would have negative
performance impact), let us simply let cache and SA query respond to
reregister event in the same way as to LID and SM change events.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-16 09:54:47 -07:00
Jack Morgenstein
fd60ae404f IB/uverbs: Avoid a crash on device hot remove
Wait until all users have closed their device context before allowing
device unregistration to complete.  This prevents a crash caused by
referring to stale data structures.

A better solution would be to have a way to revoke contexts rather
than waiting for userspace to close the context, but that's a much
bigger change that will have to wait.  For now let's at least avoid
the crash.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 10:56:42 -07:00
Sean Hefty
75df23e229 IB/cm: Fix error handling in ib_send_cm_req
Report error code rather than success (0) on failure allocating
timewait_info in ib_send_cm_req().

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:22 -07:00
Tom Tucker
e795d09250 [NET] infiniband: Cleanup ib_addr module to use the netevents
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:22 -07:00
Sean Hefty
2527e681fd IB/mad: Validate MADs for spec compliance
Validate MADs sent by userspace clients for spec compliance with
C13-18.1.1 (prevent duplicate requests and responses sent on the
same port).  Without this, RMPP transactions get aborted because
of duplicate packets.

This patch is similar to that provided by Jack Morgenstein.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Roland Dreier
43db2bc044 IB/uverbs: Fix lockdep warnings
Lockdep warns because uverbs is trying to take uobj->mutex when it
already holds that lock.  This is because there are really multiple
types of uobjs even though all of their locks are initialized in
common code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-23 15:16:04 -07:00
Michael S. Tsirkin
ec924b4726 IB/uverbs: Fix unlocking in error paths
ib_uverbs_create_ah() and ib_uverbs_create_srq() did not release the
PD's read lock in their error paths, which lead to deadlock when
destroying the PD.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-23 15:16:03 -07:00
Michael S. Tsirkin
e322fedf0c [PATCH] IB/core: use correct gfp_mask in sa_query
Avoid bogus out of memory errors: fix sa_query to actually pass gfp_mask
supplied by the user to idr_pre_get.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: "Sean Hefty" <mshefty@ichips.intel.com>
Acked-by: "Roland Dreier" <rdreier@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:51 -07:00
Michael S. Tsirkin
adfaa888a2 [PATCH] fmr pool: remove unnecessary pointer dereference
ib_fmr_pool_map_phys gets the virtual address by pointer but never writes
there, and users (e.g.  srp) seem to assume this and ignore the value
returned.  This patch cleans up the API to get the VA by value, and updates
all users.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:51 -07:00
Ira Weiny
74f76fbac7 [PATCH] IB/cm: set private data length for reject messages
Set private data length for reject messages to the correct size.  Fix from
openib svn r8483.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Michael S. Tsirkin
f0ee3404cc [PATCH] IB/addr: gid structure alignment fix
The device address contains unsigned character arrays, which contain raw GID
addresses.  The GIDs may not be naturally aligned, so do not cast them to
structures or unions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Michael S. Tsirkin
04c335430f [PATCH] IB/cm: drop REQ when out of memory
If a user of the IB CM returns -ENOMEM from their connection callback, simply
drop the incoming REQ - do not attempt to send a reject.  This should allow
the sender to retry the request.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
Sean Hefty
0d8fdfd71b IB/core: Set alternate port number when initializing QP attributes
Set alternate port number when initializing QP attributes.  This bug
is OpenFabrics bugzilla bug #160.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-30 14:10:14 -07:00
Roland Dreier
146d26b2bf IB/uverbs: Set correct user handle for user SRQs
Store away the user handle passed in from userspace when creating an
SRQ, so that the kernel can return the correct handle when an SRQ
asynchronous event occurs.  (A 0 was incorrectly stored as the user
handle as part of the changes in 9ead190b, "IB/uverbs: Don't serialize
with ib_uverbs_idr_mutex")

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-30 13:40:13 -07:00
Eric Sesterhenn
5fd571cbc1 [PATCH] Array overrun in drivers/infiniband/core/cma.c
This was spotted by coverity #id 1300.  Since the array has only four
elements, we should just use those four.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 11:57:28 -07:00
Akinobu Mita
179e09172a [PATCH] drivers: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under drivers/.

Acked-by: Corey Minyard <minyard@mvista.com>
Cc: Ben Collins <bcollins@debian.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Alasdair Kergon <dm-devel@redhat.com>
Cc: Gerd Knorr <kraxel@bytesex.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Pavlic <fpavlic@de.ibm.com>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Linus Torvalds
61b9175808 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: iSER Kconfig and Makefile
  IB/iser: iSER handling of memory for RDMA
  IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
  IB/iser: iSER initiator iSCSI PDU and TX/RX
  IB/iser: iSCSI iSER transport provider high level code
  IB/iser: iSCSI iSER transport provider header file
  IB/uverbs: Remove unnecessary list_del()s
  IB/uverbs: Don't free wr list when it's known to be empty
2006-06-25 16:07:58 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Roland Dreier
9b8efc0242 IB/uverbs: Remove unnecessary list_del()s
In ib_uverbs_cleanup_ucontext(), when iterating through the lists of
objects, there's no reason to do list_del() to remove the objects,
since both the objects and the lists that contain them are about to be
freed anyway.  Since list_del() is a moderately big inline function,
getting rid of this extra work saves quite a bit of .text:

add/remove: 0/0 grow/shrink: 1/2 up/down: 3/-217 (-214)
function                                     old     new   delta
ib_uverbs_comp_handler                       225     228      +3
ib_uverbs_async_handler                      256     255      -1
ib_uverbs_close                              905     689    -216

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:47:27 -07:00
Krishna Kumar
183208284e IB/uverbs: Don't free wr list when it's known to be empty
In ib_uverbs_post_send(), move the "out:" label after the loop that
frees the list of work requests, since the only place that jumps there
is before any work requests could possibly be added to the list.

This removes a compile warning: "is_ud might be used uninitialized in
this function".

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:47:27 -07:00
Roland Dreier
9ead190bfd IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex.  This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.

Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.

This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention.  However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.

Surprisingly, these changes even shrink the object code:

add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:44:49 -07:00
Roland Dreier
3463175d6e IB/uverbs: Factor out common idr code
Factor out common code for adding a userspace object to an idr into a
function idr_add_uobj().  This shrinks both the source and object code:

add/remove: 1/0 grow/shrink: 0/6 up/down: 57/-220 (-163)
function                                     old     new   delta
idr_add_uobj                                   -      57     +57
ib_uverbs_create_ah                          543     512     -31
ib_uverbs_create_srq                         662     630     -32
ib_uverbs_reg_mr                             737     699     -38
ib_uverbs_create_cq                          639     600     -39
ib_uverbs_alloc_pd                           485     446     -39
ib_uverbs_create_qp                         1020     979     -41

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Roland Dreier
92b1582268 IB/uverbs: Don't decrement usecnt on error paths
In error paths when destroying an object, uverbs should not decrement
associated objects' usecnt, since ib_dereg_mr(), ib_destroy_qp(),
etc. already do that.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Ganapathi CH
77f76013e3 IB/uverbs: Release lock on error path
If ibdev->alloc_ucontext() fails then ib_uverbs_get_context() does not
unlock file->mutex before returning error.

Signed-off by: Ganapathi CH <cganapathi@novell.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
ca222c6b2c IB/cm: Use address handle helpers
Use new ib_init_ah_from_wc() and ib_init_ah_from_path() helper
functions to clean up the IB CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
6d969a471b IB/sa: Add ib_init_ah_from_path()
Add a call to initialize address handle attributes given a path record.
This is used by the CM, and would be useful for users of UD QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
4e00d69454 IB: Add ib_init_ah_from_wc()
Add a function to initialize address handle attributes from a work
completion.  This functionality is duplicated by both verbs and the CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
75af908851 IB/ucm: Get rid of duplicate P_Key parameter
The P_Key is provided into a SIDR REQ in two places, once as a
parameter, and again in the path record.  Remove the P_Key as a
parameter and always use the one given in the path record.

This change has no practical effect on ABI functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Or Gerlitz
6c8c1aa25d IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
When creating a FMR pool, query the IB device and use the returned
max_map_map_per_fmr attribute as for the max number of FMR remaps. If
the device does not suport querying this attribute, use the original
IB_FMR_MAX_REMAPS (32) default.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Jack Morgenstein
9874e74655 IB/mad: Check GID/LID when matching requests
Check GID/LID for requester side when searching for request which
matches received response.  This is in order to guarantee uniqueness
if the same TID is used when requesting via multiple source LIDs (when
LMC is not zero).  Use ports' cached LMC to perform the check.

Further, do not perform LID check for direct-routed packets, since
the permissive LID makes a proper check impossible.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein
6fb9cdbf2c IB: Add caching of ports' LMC
Add an LMC cache to struct ib_device, and add a function
ib_get_cached_lmc() to query the cache.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Michael S. Tsirkin
856c256f88 IB/cm: remove unneeded flush_workqueue
destroy_workqueue() already does flush_workqueue().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-06-17 20:37:33 -07:00
Sean Hefty
4be10c1e6d IB/ucm: convert semaphore to mutex
Convert semaphore in ib_ucm_file to a real mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Roland Dreier
403a496fd4 IB: Make needlessly global ib_mad_cache static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Sean Hefty
e51060f08a IB: IP address based RDMA connection manager
Kernel connection management agent over InfiniBand that connects based
on IP addresses.  The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty
7025fcd36b IB: address translation to map IP toIB addresses (GIDs)
Add an address translation service that maps IP addresses to
InfiniBand GID addresses using IPoIB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6e61d04f2d IB/cm: Match connection requests based on private data
Extend matching connection requests to listens in the InfiniBand CM to
include private data checks.

This allows applications to listen on the same service identifier,
with private data directing the request to the appropriate application.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6a9af2e18a IB: common handling for marshalling parameters to/from userspace
Provide common handling for marshalling data between userspace clients
and kernel InfiniBand drivers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:27 -07:00
Roland Dreier
0cb4fe8d26 IB/uverbs: Don't leak ref to mm on error path
In ib_umem_release_on_close(), if the kmalloc() fails, then a
reference to current->mm will be leaked.  Fix this by adding a mmput()
instead of just returning on kmalloc() failure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 22:20:50 -07:00
Sean Hefty
1b52fa98ed IB: refcount race fixes
Fix race condition during destruction calls to avoid possibility of
accessing object after it has been freed.  Instead of waking up a wait
queue directly, which is susceptible to a race where the object is
freed between the reference count going to 0 and the wake_up(), use a
completion to wait in the function doing the freeing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-12 14:57:52 -07:00
Ralph Campbell
d8b9f23b23 IB: Fix display of 4-bit port counters in sysfs
The code to display local_link_integrity_errors and
excessive_buffer_overrun_errors in
/sys/class/infiniband/<hca>/ports/<n>/counters/
uses the wrong shift to extract the 4 bit values.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Hal Rosenstock
64cb9c6aff IB/mad: Fix RMPP version check during agent registration
Only check that RMPP version is not specified when MAD class does not
support RMPP.  Just because a class is allowed to use RMPP doesn't
mean that rmpp_version needs to be set for the MAD agent to
register. Checking this was a recent change which was too pedantic.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:11 -07:00
Michael S. Tsirkin
ce684df05a IB/cache: Use correct pointer to calculate size
When allocating gid_cache, use kmalloc(sizeof *gid_cache, ...) rather
than kmalloc(sizeof *pkey_cache, ...).  It doesn't really matter which
one is used, since the size ends up the same either way, but it's much
better to say what we mean.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 13:17:43 -07:00
Jack Morgenstein
bf6a9e31cf IB: simplify static rate encoding
Push translation of static rate to HCA format into low-level drivers,
where it belongs.  For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage.  The changes are:

 - Add enum ib_rate to midlayer includes.
 - Get rid of static rate translation in IPoIB; just use static rate
   directly from Path and MulticastGroup records.
 - Update mthca driver to translate absolute static rate into the
   format used by hardware.  This also fixes mthca's static rate
   handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:47 -07:00
Michael S. Tsirkin
37289efe3e IB/mad: fix oops in cancel_mads
We have seen the following OOPs in cancel_mads, when restarting opensm
multiple times:

    Call Trace:
      [<c010549b>] show_stack+0x9b/0xb0
      [<c01055ec>] show_registers+0x11c/0x190
      [<c01057cd>] die+0xed/0x160
      [<c031b966>] do_page_fault+0x3f6/0x5d0
      [<c010511f>] error_code+0x4f/0x60
      [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad]
      [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad]
      [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad]
      [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad]
      [<c0162937>] __fput+0x187/0x1c0
      [<c01627a9>] fput+0x19/0x20
      [<c0160f7a>] filp_close+0x3a/0x60
      [<c0121ca8>] put_files_struct+0x68/0xa0
      [<c0103cf7>] do_signal+0x47/0x100
      [<c0103ded>] do_notify_resume+0x3d/0x40
      [<c0103f9e>] work_notifysig+0x13/0x25

We traced this back to local_completions unlocking mad_agent_priv->lock
while still keeping a pointer into local_list. A later call to
list_del(&local->completion_list) would then corrupt the list.

To fix this, remove the entry from local_list after looking it up but
before releasing mad_agent_priv->lock, to prevent cancel_mads from
finding and freeing it.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Hal Rosenstock
618a3c03fc IB/mad: RMPP support for additional classes
Add RMPP support for additional management classes that support it.
Also, validate RMPP is consistent with management class specified.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-30 07:19:51 -08:00
Jack Morgenstein
fa9656bbd9 IB/mad: include GID/class when matching receives
Received responses are currently matched against sent requests based
on TID only.  According to the spec, responses should match based on
the combination of TID, management class, and requester LID/GID.

Without the additional qualification, an agent that is responding to
two requests, both of which have the same TID, can match RMPP ACKs
with the incorrect transaction.  This problem can occur on the SM node
when responding to SA queries.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-30 07:19:48 -08:00
Michael S. Tsirkin
dc05980dd7 IB/mad: Fix oopsable race on device removal
Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>:
After removing the port from port_list, ib_mad_port_close flushes
port_priv->wq before destroying the special QPs. This means that a
completion event could arrive, and queue a new work in this work queue
after flush.

This patch also removes an unnecessary flush_workqueue():
destroy_workqueue() already includes a flush.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Roland Dreier
048975ac58 IB: Coverity fixes to sysfs.c
Fix two bugs found by coverity:
 - Memory leak in error path of alloc_group_attrs()
 - Fencepost error in state_show(): the test should be < ARRAY_SIZE(),
   not <= ARRAY_SIZE().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Ami Perlmutter
702b2aaccf IB/uverbs: Use correct alt_pkey_index in modify QP
The old code incorrectly used the primary P_Key index as the alternate
index too.

Signed-off-by: Ami Perlmutter <amip@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Jack Morgenstein
f36e1793e2 IB/umad: Add support for large RMPP transfers
Add support for sending and receiving large RMPP transfers.  The old
code supports transfers only as large as a single contiguous kernel
memory allocation.  This patch uses linked list of memory buffers when
sending and receiving data to avoid needing contiguous pages for
larger transfers.

  Receive side: copy the arriving MADs in chunks instead of coalescing
  to one large buffer in kernel space.

  Send side: split a multipacket MAD buffer to a list of segments,
  (multipacket_list) and send these using a gather list of size 2.
  Also, save pointer to last sent segment, and retrieve requested
  segments by walking list starting at last sent segment. Finally,
  save pointer to last-acked segment.  When retrying, retrieve
  segments for resending relative to this pointer.  When updating last
  ack, start at this pointer.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Sean Hefty
87fd1a11ae IB/cm: Check cm_id state before handling a REP
Move checking the state of a cm_id before modifying it when handling a
REP.  This fixes a bug seen under MPI scale-up testing, where a NULL
timewait_info pointer is dereferenced if a request times out before a
REP is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Dotan Barak
27d5630064 IB/uverbs: Fix query QP return of sq_sig_all
The old code didn't convert from the kernel's enum correctly.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:21 -08:00
Dotan Barak
4546d31d84 IB: Fix modify QP checking of "current QP state" attribute
According to the IB spec version 1.2, section 11.2.4.2, the current
table has a couple of mistakes where it allows the current QP state
(IB_QP_CUR_STATE) attribute.  For the transitions:

  RTS -> RTS: IB_QP_CUR_STATE should be allowed for all transports
  SQD -> SQD: IB_QP_CUR_STATE should never be allowed

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:20 -08:00
Dotan Barak
ea88fd16d6 IB/uverbs: Return actual capacity from create SRQ operation
Pass actual capacity of created SRQ back to userspace, so that
userspace can report accurate capacities.  This requires an ABI bump,
to change struct ib_uverbs_create_srq_resp.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:16 -08:00
Dotan Barak
8bdb0e8632 IB/uverbs: Support for query SRQ from userspace
Add support to uverbs to handle querying userspace SRQs (shared
receive queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has the underlying
ib_query_srq() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Dotan Barak
7ccc9a24e0 IB/uverbs: Support for query QP from userspace
Add support to uverbs to handle querying userspace QPs (queue pairs),
including adding an ABI for marshalling requests and responses.  The
kernel midlayer already has the underlying ib_query_qp() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Roland Dreier
a74cd4af0b IB: Whitespace cleanups
Remove trailing whitespace and fix indentation that with spaces
instead of tabs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:13 -08:00
Roland Dreier
8a51866f08 IB: Add ib_modify_qp_is_ok() library function
The in-kernel mthca driver contains a table of which attributes are
valid for each queue pair state transition.  It turns out that both
other IB drivers -- ipath and ehca -- which are being prepared for
merging have copied this table, errors and all.

To forestall this code duplication, move this table and the code to
check parameters against it into a midlayer library function,
ib_modify_qp_is_ok().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:12 -08:00
Ralph Campbell
5e9f71a16c IB/mad: Simplify SMI by eliminating smi_check_local_dr_smp()
The call to ib_get_agent_port() shouldn't be possible to fail when
smi_check_local_dr_smp() is called from ib_mad_recv_done_handler().
When it is called from handle_outgoing_dr_smp(), the device and
port_num come from mad_agent_priv so I assume the call to
ib_get_agent_port() shouldn't fail either.  In either case,
smi_check_local_smp() only uses the mad_agent pointer to check that
mad_agent->device->process_mad is not NULL.  The device pointer would
have to be the same as the one passed to smi_check_local_dr_smp()
since that pointer is used later instead of the one checked in
smi_check_local_smp().

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:11 -08:00
Ralph Campbell
5f0b67e0d5 IB/mad: Remove redundant check from smi_check_local_dr_smp()
smi_check_local_dr_smp() is called only from two places in core/mad.c
It returns 0 or 1.  In smi_check_local_dr_smp(), it checks for
a directed route SMP but this function is only called when the SMP
is a directed route so this is a NOP.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:11 -08:00
Or Gerlitz
d36f34aadf IB: Enable FMR pool user to set page size
This patch allows the consumer to set the page size of "pages" mapped
by the pool FMRs, which is a feature already existing in the base
verbs API.  On the cosmetic side it changes ib_fmr_attr.page_size field
to be named page_shift.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:10 -08:00
Roland Dreier
c5bcbbb9fe IB: Allow userspace to set node description
Expose a writable "node_desc" sysfs attribute for InfiniBand devices.
This allows userspace to update the node description with information
such as the node's hostname, so that IB network management software
can tie its view to the real world.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:09 -08:00
Roland Dreier
33b9b3ee97 IB: Add userspace support for resizing CQs
Add support to uverbs to handle resizing userspace CQs (completion
queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has ib_resize_cq().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:07 -08:00
Linus Torvalds
f78cf0dc7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband 2006-02-14 13:49:37 -08:00
Greg Kroah-Hartman
68f5f99634 [PATCH] IB: fix up major/minor sysfs interface for IB core
Current IB code doesn't work with userspace programs that listen only to
the kernel event netlink socket as it is trying to create its own dev
interface.  This small patch fixes this problem, and removes some
unneeded code as the driver core handles this logic for you
automatically.

Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-02-06 12:17:17 -08:00
Ralph Campbell
8cf3f04f45 IB/mad: Handle DR SMPs with a LID routed part
Fix handling of directed route SMPs with a beginning or ending LID
routed part.

Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-02-03 14:28:48 -08:00
Michael S. Tsirkin
0f47ae0b3e IB/sa_query: Flush scheduled work before unloading module
sa_query schedules work on IB asynchronous events.  After
unregistering the async event handler, make sure that this work has
completed before releasing the IB device (and possibly allowing the
sa_query module text to go away).

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-17 09:53:51 -08:00
Michael S. Tsirkin
cc76e33ec9 IB/uverbs: Flush scheduled work before unloading module
uverbs might schedule work to clean up when a file is closed.  Make
sure that this work runs before allowing module text to go away.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-17 09:41:47 -08:00
Arjan van de Ven
858119e159 [PATCH] Unlinline a bunch of other functions
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:06 -08:00
Ingo Molnar
95ed644fd1 IB: convert from semaphores to mutexes
semaphore to mutex conversion by Ingo and Arjan's script.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-13 14:51:39 -08:00
Sean Hefty
cf311cd49a IB: Add node_guid to struct ib_device
Add a node_guid field to struct ib_device.  It is the responsibility
of the low-level driver to initialize this field before registering a
device with the midlayer.  Convert everyone to looking at this field
instead of calling ib_query_device() when all they want is the node
GUID, and remove the node_guid field from struct ib_device_attr.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-10 07:39:34 -08:00
Linus Torvalds
5367f2d67c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband 2006-01-08 20:18:44 -08:00
Ralph Campbell
4f8448dfe8 IB: Set GIDs correctly in ib_create_ah_from_wc()
ib_create_ah_from_wc() doesn't create the correct return address (AH)
when there is a GRH present (source & dest GIDs need to be swapped).

Signed-off-by: Ralph Campbell <ralphc@pathscale.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-06 16:43:47 -08:00
Jack Morgenstein
ac4e7b3557 IB/uverbs: Release event file reference on ib_uverbs_create_cq() error
ib_uverbs_create_cq() should release the completion channel event file
if an error occurs after it looks it up.  Also, if userspace asks for
a completion channel and we don't find it, an error should be returned
instead of silently creating a CQ without a completion channel.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-06 16:43:14 -08:00
Ralph Campbell
ea5d4a6ad2 IB/uverbs: set ah_flags when creating address handle
AH attribute's ah_flags need to be set according to the is_global flag
passed in from userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-06 16:24:45 -08:00
Jack Morgenstein
b4ca1a3f8c IB/uverbs: Fix reference counting on error paths
If an operation fails after incrementing an object's reference count,
then it should decrement the reference count on the error path.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-06 16:21:19 -08:00
Kay Sievers
312c004d36 [PATCH] driver core: replace "hotplug" by "uevent"
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Jack Morgenstein
0efc4883a6 IB/umad: fix memory leaks
Don't leak packet if it had a timeout, and don't leak timeout struct
if queue_packet() fails.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-12-09 13:46:32 -08:00
Sean Hefty
de1bb1a64c IB/cm: avoid reusing local ID
Use an increasing local ID to avoid re-using identifiers while
messages may still be outstanding on the old ID.  Without this, a
quick connect-disconnect-connect sequence can fail by matching
messages for the new connection with the old connection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-30 10:01:13 -08:00
Sean Hefty
227eca8369 IB/cm: correct reported reject code
Change reject code from TIMEOUT to CONSUMER_REJECT when destroying a
cm_id in the process of connecting.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-30 10:00:25 -08:00
Jack Morgenstein
f4e401562c IB/uverbs: track multicast group membership for userspace QPs
uverbs needs to track which multicast groups is each qp
attached to, in order to properly detach when cleanup
is performed on device file close.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 16:57:01 -08:00
Michael S. Tsirkin
bf6d9e23a3 IB/umad: fix RMPP handling
ib_umad_write in user_mad.c is looking at rmpp_hdr field in MAD before
checking that the MAD actually has the RMPP header.  So for a MAD
without RMPP header it looks like we are actually checking a bit
inside M_Key, or something.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-28 13:07:20 -08:00
Adrian Bunk
2012a116d9 [PATCH] drivers/infiniband/core/mad.c: fix use-after-release case
The Coverity checker spotted this obvious use-after-release bug caused
by a wrong order of the cleanups.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-27 20:23:13 -08:00
Roland Dreier
eabc77935d IB/umad: make sure write()s have sufficient data
Make sure that userspace passes in enough data when sending a MAD.  We
always copy at least sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR
bytes from userspace, so anything less is definitely invalid.  Also,
if the length is less than this limit, it's possible for the second
copy_from_user() to get a negative length and trigger a BUG().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-18 14:18:26 -08:00
Linus Torvalds
78b9c0f91c Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2005-11-10 13:27:06 -08:00
Roland Dreier
94382f3562 [IB] umad: further ib_unregister_mad_agent() deadlock fixes
The previous umad deadlock fix left ib_umad_kill_port() still
vulnerable to deadlocking.  This patch fixes that by downgrading our
lock to a read lock when we might end up trying to reacquire the lock
for reading.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:51 -08:00
Jack Morgenstein
77369ed31d [IB] uverbs: have kernel return QP capabilities
Move the computation of QP capabilities (max scatter/gather entries,
max inline data, etc) into the kernel, and have the uverbs module
return the values as part of the create QP response.  This keeps
precise knowledge of device limits in the low-level kernel driver.

This requires an ABI bump, so while we're making changes, get rid of
the max_sge parameter for the modify SRQ command -- it's not used and
shouldn't be there.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Roland Dreier
ec914c52d6 [IB] umad: get rid of unused mr array
Now that ib_umad uses the new MAD sending interface, it no longer
needs its own L_Key.  So just delete the array of MRs that it keeps.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Roland Dreier
40de2e548c [IB] Have cq_resize() method take an int, not int*
Change the struct ib_device.resize_cq() method to take a plain integer
that holds the new CQ size, rather than a pointer to an integer that
it uses to return the new size.  This makes the interface match the
exported ib_resize_cq() signature, and allows the low-level driver to
update the CQ size with proper locking if necessary.

No in-tree drivers are exporting this method yet.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Roland Dreier
2f76e82947 [IB] umad: avoid potential deadlock when unregistering MAD agents
ib_unregister_mad_agent() completes all pending MAD sends and waits
for the agent's send_handler routine to return.  umad's send_handler()
calls queue_packet(), which does down_read() on the port mutex to look
up the agent ID.  This means that the port mutex cannot be held for
writing while calling ib_unregister_mad_agent(), or else it will
deadlock.  This patch fixes all the calls to ib_unregister_mad_agent()
in the umad module to avoid this deadlock.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Jesper Juhl
6044ec8882 [PATCH] kfree cleanup: misc remaining drivers
This is the remaining misc drivers/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in misc files in
drivers/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Aristeu Sergio Rozanski Filho <aris@cathedrallabs.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Pierre Ossman <drzeus@drzeus.cx>
Acked-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: "Antonino A. Daplas" <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:54:05 -08:00
Tim Schmielau
8c65b4a604 [PATCH] fix remaining missing includes
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch.  This should now allow not to include sched.h
from module.h, which is done by a followup patch.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:41 -08:00
Michael S. Tsirkin
8b37b94721 [IB] umad: two small fixes
Two small fixes for the umad module:
 - set kobject name for issm device properly
 - in ib_umad_add_one(), s is subtracted from the index i when
   initializing ports, so s should be subtracted from the index when
   freeing ports in the error path as well.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-06 15:47:02 -08:00
Linus Torvalds
ba77df570c Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2005-11-04 16:31:54 -08:00
Roland Dreier
0c99cb6d5f [IB] umad: fix hot remove of IB devices
Fix hotplug of devices for ib_umad module: when a device goes away,
kill off all MAD agents for open files associated with that device,
and make sure that the device is not touched again after ib_umad
returns from its remove_one function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-03 12:01:18 -08:00
Roland Dreier
de6eb66b56 [IB] kzalloc() conversions
Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 07:23:14 -08:00
Roland Dreier
7162a3e0db [IB] uverbs: Avoid NULL pointer deref on CQ async event
Userspace CQs that have no completion event channel attached end up
with their cq_context set to NULL.  However, asynchronous events like
"CQ overrun" can still occur on such CQs, so add a uverbs_file member
to struct ib_ucq_object that we can follow to deliver these events.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-31 07:10:32 -08:00
Tim Schmielau
4e57b68178 [PATCH] fix missing includes
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.

In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch.  This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other.  So if any
hunk rejects or gets in the way of other patches, just drop it.  My scripts
will pick it up again in the next round.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:32 -08:00
Al Viro
2d3c0b7bed [PATCH] missing include in infiniband
use of IS_ERR/PTR_ERR in infiniband/core/agent.c, without a portable
chain of includes pulling err.h (breaks on a bunch of platforms).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 10:35:07 -07:00