Commit Graph

27268 Commits

Author SHA1 Message Date
Roland Dreier
9ead190bfd IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex.  This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.

Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.

This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention.  However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.

Surprisingly, these changes even shrink the object code:

add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:44:49 -07:00
Roland Dreier
c93b6fbaa9 IB/mthca: Make all device methods truly reentrant
Documentation/infiniband/core_locking.txt says:

  All of the methods in struct ib_device exported by a low-level
  driver must be fully reentrant.  The low-level driver is required to
  perform all synchronization necessary to maintain consistency, even
  if multiple function calls using the same object are run
  simultaneously.

However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant.  Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Roland Dreier
c9c5d9feef IB/mthca: Fix memory leak on modify_qp error paths
Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp()
just do a "return -EINVAL" without freeing the mailbox.  Convert these
returns to "goto out" to avoid leaking the mailbox storage.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:41 -07:00
Roland Dreier
3463175d6e IB/uverbs: Factor out common idr code
Factor out common code for adding a userspace object to an idr into a
function idr_add_uobj().  This shrinks both the source and object code:

add/remove: 1/0 grow/shrink: 0/6 up/down: 57/-220 (-163)
function                                     old     new   delta
idr_add_uobj                                   -      57     +57
ib_uverbs_create_ah                          543     512     -31
ib_uverbs_create_srq                         662     630     -32
ib_uverbs_reg_mr                             737     699     -38
ib_uverbs_create_cq                          639     600     -39
ib_uverbs_alloc_pd                           485     446     -39
ib_uverbs_create_qp                         1020     979     -41

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Roland Dreier
92b1582268 IB/uverbs: Don't decrement usecnt on error paths
In error paths when destroying an object, uverbs should not decrement
associated objects' usecnt, since ib_dereg_mr(), ib_destroy_qp(),
etc. already do that.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Ganapathi CH
77f76013e3 IB/uverbs: Release lock on error path
If ibdev->alloc_ucontext() fails then ib_uverbs_get_context() does not
unlock file->mutex before returning error.

Signed-off by: Ganapathi CH <cganapathi@novell.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
ca222c6b2c IB/cm: Use address handle helpers
Use new ib_init_ah_from_wc() and ib_init_ah_from_path() helper
functions to clean up the IB CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:40 -07:00
Sean Hefty
6d969a471b IB/sa: Add ib_init_ah_from_path()
Add a call to initialize address handle attributes given a path record.
This is used by the CM, and would be useful for users of UD QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
4e00d69454 IB: Add ib_init_ah_from_wc()
Add a function to initialize address handle attributes from a work
completion.  This functionality is duplicated by both verbs and the CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Sean Hefty
75af908851 IB/ucm: Get rid of duplicate P_Key parameter
The P_Key is provided into a SIDR REQ in two places, once as a
parameter, and again in the path record.  Remove the P_Key as a
parameter and always use the one given in the path record.

This change has no practical effect on ABI functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:39 -07:00
Ishai Rabinovitz
526b4caa0a IB/srp: Factor out common request reset code
Misc cleanups in ib_srp:
1) I think that it is more efficient to move the req entries from req_list
   to free_list in srp_reconnect_target (rather than rebuild the free_list).
   (In any case this code is shorter).
2) This allows us to reuse code in srp_reset_device and srp_reconnect_target
   and call a new function srp_reset_req.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Ramachandra K
0c0450db31 IB/srp: Support SRP rev. 10 targets
There has been a change in the format of port identifiers between
revision 10 of the SRP specification and the current revision 16A.

Revision 10 specifies port identifier format as

  lower 8 bytes :  GUID   upper 8 bytes :  Extension

Whereas revision 16A specifies it as 

 lower 8 bytes :  Extension  upper 8 bytes :  GUID

There are older targets (e.g. SilverStorm Virtual Fibre Channel
Bridge) which conform to revision 10 of the SRP specification.

The I/O class of revision 10 is 0xFF00 and the I/O class of revision
16A is 0x0100.

For supporting older targets, this patch:

1) Adds a new optional target creation parameter "io_class". Default
   value of io_class is 0x0100 (i.e. revision 16A)
2) Uses the correct port identifier format for targets with IO class
   of 0xFF00 (i.e. conforming to revision 10)

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Ramachandra K
73c0996b1c [SCSI] srp.h: Add I/O Class values
Add enum values for I/O Class values from rev. 10 and rev. 16a SRP
drafts.  The values are used to detect targets that implement obsolete
revisions of SRP, so that the initiator can use the old format for
port identifier when connecting to them.

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Or Gerlitz
6c8c1aa25d IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
When creating a FMR pool, query the IB device and use the returned
max_map_map_per_fmr attribute as for the max number of FMR remaps. If
the device does not suport querying this attribute, use the original
IB_FMR_MAX_REMAPS (32) default.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Or Gerlitz
d4cb0784fd IB/mthca: Fill in max_map_per_fmr device attribute
Report the true max_map_per_fmr value from mthca_query_device(),
taking into account the change in FMR remapping introduced by the
Sinai performance optimization.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Roland Dreier
6eddb5cb90 IB/ipath: Add client reregister event generation
Generate a client reregister event instead of a LID change event when
client reregister bit is set.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:37 -07:00
Leonid Arsh
12bbb2b7be IB/mthca: Add client reregister event generation
Change the mthca snoop of MADs that set PortInfo to check if the SM
has set the client reregister bit, and if it has, generate a client
reregister event.  If the bit is not set, just generate a LID change
event as usual.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh
da2ab62ab5 IB: Move struct port_info from ipath to <rdma/ib_smi.h>
Move ipath's struct port_info into <rdma/ib_smi.h>, so that it can be
used by mthca to implement client reregister support.

Remove the __attribute__((packed)) because all the members of the struct
are naturally aligned anyway.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh
508e434123 IPoIB: Handle client reregister events
Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Leonid Arsh
63942c9a98 IB: Add client reregister event type
Add IB_EVENT_CLIENT_REREGISTER to enum so low-level drivers can
generate "client reregister" events.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Jack Morgenstein
37c22a7721 IPoIB: Fix kernel unaligned access on ia64
Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead.  The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.

The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Roland Dreier
31c02e2157 IPoIB: Avoid using stale last_send counter when reaping AHs
The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein
9874e74655 IB/mad: Check GID/LID when matching requests
Check GID/LID for requester side when searching for request which
matches received response.  This is in order to guarantee uniqueness
if the same TID is used when requesting via multiple source LIDs (when
LMC is not zero).  Use ports' cached LMC to perform the check.

Further, do not perform LID check for direct-routed packets, since
the permissive LID makes a proper check impossible.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Jack Morgenstein
6fb9cdbf2c IB: Add caching of ports' LMC
Add an LMC cache to struct ib_device, and add a function
ib_get_cached_lmc() to query the cache.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Michael S. Tsirkin
856c256f88 IB/cm: remove unneeded flush_workqueue
destroy_workqueue() already does flush_workqueue().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-06-17 20:37:33 -07:00
Sean Hefty
4be10c1e6d IB/ucm: convert semaphore to mutex
Convert semaphore in ib_ucm_file to a real mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Roland Dreier
6bfa24fa3e IB/srp: Get rid of "Target has req_lim 0" messages
It's perfectly valid for a connection to an SRP target to have a
request limit of 0, so get rid of the message about it, which can spam
kernel logs even with printk_ratelimit().  Keep a count of such events
in a "zero_req_lim" SCSI host attribute instead, so someone who cares
can look at the statistics.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Ishai Rabinovitz
b7ac4ab497 IB/srp: Handle DREQ events from CM
Handle IB_CM_DREQ_ERROR and IB_CM_DREQ_RECEIVED events from the CM,
instead of just printing "Unhandled CM event".  In the case of
DREQ_ERROR, just ignore the event -- a TIMEWAIT_EXIT will be generated
also.  For DREQ_RECEIVED, send a DREP in response to shut the
connection down cleanly.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Roland Dreier
ac83cbaa9a IPoIB: Mention RFC numbers in documentation
Now that the IETF has released RFCs covering IPoIB, give the numbers in
the documentation for IPoIB.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham
74b0a15b5e IB/srp: Allow sg_tablesize to be adjusted
Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter.  Calculate the corresponding IU length required to
support this.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham
52fb2b50c4 IB/srp: Allow cmd_per_lun to be set per target port
Allow userspace to throttle traffic on a given connection to a target
port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value
set for that scsi_host.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Ishai Rabinovitz
0c5b395239 IB/srp: Clean up loop in srp_remove_one()
Interrupts will always be enabled in srp_remove_one(), so
spin_lock_irq() can be used instead of spin_lock_irqsave().
Also, the loop takes target->scsi_host->host_lock, so target->state
can just be set to SRP_TARGET_REMOVED witout testing the old value.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Roland Dreier
403a496fd4 IB: Make needlessly global ib_mad_cache static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Matthew Wilcox
b3589fd490 IB/srp: Change target_mutex to a spinlock
The SRP driver never sleeps while holding target_mutex, and it's just
used to protect some simple list operations, so hold times will be
short.  So just convert it to a spinlock, which is smaller and faster.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox
549c5fc2c8 IB/srp: Get rid of unneeded use of list_for_each_entry_safe()
list_for_each_entry_safe() is used in one place where the list isn't
modified.  So just change it to list_for_each_entry().

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox
1962a4a1e4 IB/srp: Use SCAN_WILD_CARD from SCSI headers
SCAN_WILD_CARD is indeed available from <scsi/scsi.h>, which is
already included.  So get rid of private hack.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier
e9cd59418f IB/mthca: Convert FW commands to use wait_for_completion_timeout()
The kernel has had wait_for_completion_timeout() for a long time now.
mthca should use it to handle FW commands timing out, instead of
implementing the same thing in a much more complicated way by using
wait_for_completion() along with a timer that does complete().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier
f5358a172f IB/srp: Use FMRs to map gather/scatter lists
Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map
gather/scatter lists that have more than one entry into a single
memory region that appears virtually contiguous to the SRP target
(which is the RDMA initiator).

This patch bails out on FMR mapping for SCSI commands where the
gather/scatter list cannot be mapped into a single FMR because there
are sub-page-sized entries in middle of the list.  An unaligned
start or end of the list is OK.

Based on a patch by Vu Pham <vuhuong@mellanox.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Michael S. Tsirkin
a26026c122 IB/mthca: Remove dead code
Kill some dead code in mthca_eq.c

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty
e51060f08a IB: IP address based RDMA connection manager
Kernel connection management agent over InfiniBand that connects based
on IP addresses.  The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Sean Hefty
7025fcd36b IB: address translation to map IP toIB addresses (GIDs)
Add an address translation service that maps IP addresses to
InfiniBand GID addresses using IPoIB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
a1e8733e55 [NET]: Export ip_dev_find()
Export ip_dev_find() to allow locating a net_device given an IP address.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6e61d04f2d IB/cm: Match connection requests based on private data
Extend matching connection requests to listens in the InfiniBand CM to
include private data checks.

This allows applications to listen on the same service identifier,
with private data directing the request to the appropriate application.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:28 -07:00
Sean Hefty
6a9af2e18a IB: common handling for marshalling parameters to/from userspace
Provide common handling for marshalling data between userspace clients
and kernel InfiniBand drivers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:27 -07:00
Michael S. Tsirkin
4e56ea794e IB/mthca: memfree completion with error FW bug workaround
Memfree firmware is in rare cases reporting WQE index == base - 1 in
receive completion with error, instead of (rq size - 1); base is 0 in
mthca.  Here is a patch to avoid kernel crash and report a correct WR
id in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:20 -07:00
Michael S. Tsirkin
13aa6ecb47 IB/mthca: restore missing PCI registers after reset
mthca does not restore the following PCI-X/PCI Express registers after reset:
  PCI-X device: PCI-X command register
  PCI-X bridge: upstream and downstream split transaction registers
  PCI Express : PCI Express device control and link control registers

This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:00 -07:00
Linus Torvalds
427abfa28a Linux v2.6.17
Being named "Crazed Snow-Weasel" instills a lot of confidence in this
release, so I'm sure this will be one of the better ones.
2006-06-17 18:49:35 -07:00
Arnd Bergmann
ce221982e0 [PATCH] powerpc: enable CPU_FTR_CI_LARGE_PAGE for cell
Reflect the fact that the Cell Broadband Engine supports 64k
pages by adding the bit to the CPU features.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Arnd Bergmann
19242b2407 [PATCH] powerpc: Fix 64k pages on non-partitioned machines
The page size encoding passed to tlbie is incorrect for new-style
large pages.  This fixes it.  This doesn't affect anything on older
machines because mmu_psize_defs[psize].penc (the page size encoding)
is 0 for 4k and 16M pages (the two are distinguished by a separate "is
a large page" bit).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:56:24 -07:00
Oleg Nesterov
f53ae1dc34 [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check
arm_timer() checks PF_EXITING to prevent BUG_ON(->exit_state)
in run_posix_cpu_timers().

However, for some reason it does so only for CPUCLOCK_PERTHREAD
case (which is imho wrong).

Also, this check is not reliable, PF_EXITING could be set on
another cpu without any locks/barriers just after the check,
so it can't prevent from attaching the timer to the exiting
task.

The previous patch makes this check unneeded.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:13 -07:00