Old query_port code reports static MTU and link state values.
Instead, map actual MTU to next largest IB_MTU_* constant and
correctly report link state.
Cc: Steve Wise <swise@opengridcomputing.com>
Reported-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The disconn routine has been reworked to acoomodate the terminate and
flushing changes. The routine has been reorganized to make all the
decisions at the start then it performs all the required operations.
This simplified the lock handling and is easier to follow.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the flush status to fill in cqe status when a specific error has
been identified. Subsequent flushed completions still use the flushed
value.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a flush request is given to the hw, it will place one cqe marked
as flushed (unless there is nothing to flush). An application that is
waiting for all wqe's to complete will be left hanging. This modifies
poll_cq to return the correct number of flushes for the pending
elements on the wq.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When an asynchronous event occurs that requires a terminate, it is
sometimes possible to identify the wqe in error. This change uses
flush to get this information to the poll routine. The flush
operation puts the status into the cqe. If this information is not
available, it continues to use the more generic flush code as before.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement the sending and receiving of Terminate packets.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
CQ errors are not being handled correctly. Put in the the upcall for
CQ errors.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a QP is destroyed, unprocessed CQ entries could still reference
the QP. This change zeroes the context value at QP destroy time. By
skipping over cqe's with a zero context, poll_cq no longer processes a
cqe for a destroyed QP.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The routine to allocate a cqp request is not called from process
context code. Since it is not OK to sleep, it needs to use GFP_ATOMIC
not GFP_KERNEL.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The code currently has a work structure in the QP. This requires a
lock and a pending flag to ensure there is never more than one request
active. When two events happen quickly (such as FIN and LLP CLOSE),
it causes unnecessary timeouts since the second one is dropped.
This fix allocates memory for the work request so the second one can
be queued. A lock is removed since it is no longer needed.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During termination, it is possible for the refcnt to go to zero while
the worker thread is posting events upward. This fix increments the
refcnt before the request is passed to the worker thread. The thread
decrements the refcnt when the request is completed.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.
However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event. In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.
This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the mthca driver uses the same name for interrupts for every
device in the system. This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for. Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks. Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mthca_config_reg.h was including <asm/page.h> for no reason -- the whole
file is just defines of constants, so it's entirely self-contained.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.
However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event. In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.
This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the mlx4 driver uses the same name for interrupts for every
device in the system. This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for. Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On the error path of mlx4_init_hca(), mlx4_close_hca() is called,
followed by mlx4_free_icms() and mlx4_UNMAP_FA(). But both those
functions are also called from mlx4_close_hca(), which leads to a
double free.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
Cc: <stable@kernel.org>
Reported-by: Eli Cohen <eli@mellanox.co.il>
Tested-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_ib_lock_cqs()/mlx4_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks. Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used two calls to pci_request_region() to get the two BARs
for the mlx4 device, for no particularly good reason. Clean up the code
a little by converting this to a single call to pci_request_regions().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dev->ibdev.iwcm allocation may fail, prevent a dereference.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
methods allowed for userspace"), the uverbs core returns EINVAL for
commands not implemented by a specific low-level driver.
This creates a problem that there is no way to tell the difference
between an unimplemented command and an implemented one which is
incorrectly invoked (which also returns EINVAL).
The fix is to have unimplemented commands return ENOSYS.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Until now, retries were only sent when joining a multicast group. This
patch will adds retries when leaving a multicast group as well.
Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace open-coded reimplementations with printk_once().
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the %pM conversion specifier to print a MAC address.
Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Rather than just defining static spinlock_t variables and then
initializing them later in init functions, simply define them with
DEFINE_SPINLOCK() and remove the calls to spin_lock_init(). This cleans
up the source a tad and also shrinks the compiled code; eg on x86-64:
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
function old new delta
ib_uverbs_init 336 326 -10
ib_mad_init_module 147 137 -10
ib_sa_init 123 103 -20
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The hop count field in a directed route MAD is only allowed to be in the
range 0 to 63 (by spec). Check that this really is the case to avoid
accessing outside the bounds of the hop array.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list. This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path. However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).
Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
strlcpy() will always null terminate the string. node_desc is not
guaranteed to be NUL-terminated so just use memcpy().
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The driver was reporting CQE flags in the wrong bit positions, causing
consumers to miss incoming immediate data.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used a lot of hard-coded values, which might not be valid
in all environments (especially routed fabrics or partitioned
subnets). Copy as much information as possible from the incoming
request to correct that.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make port autodetect mode the default for the ehca driver. The
autodetect code has been in the kernel for several releases now and
has proved to be stable.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A close/abort while waiting for a wr_ack during connection migration
can cause a hung process in iwch_accept_cr/iwch_reject_cr.
The fix is to set rpl_error/rpl_done and wake up the waiters when we
get a close/abort while in MPA_REQ_RCVD state.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- Keep ref on connection request endpoints until either accepted or
rejected so it doesn't get freed early.
- Endpoint flags now need to be set via atomic bitops because they can
be set on both the iw_cxgb3 workqueue thread and user disconnect
threads.
- Don't move out of CLOSING too early due to multiple calls to
iwch_ep_disconnect.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Massage the err_handler upcall into an event handler upcall, pass
netdev port events to the cxgb3 ULPs and generate RDMA port events
based on LLD port events.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a null pointer exception caused by removal of
'ack()' for level interrupts in the Xilinx interrupt driver. A recent
change to the xilinx interrupt controller removed the ack hook for
level irqs.
Signed-off-by: Roderick Colenbrander <thunderbird2k@gmail.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>