Because of a typo in iwch_accept_cr(), the cxgb3 connection handling
code programs the hardware IRD (incoming RDMA read queue depth) with
the value that is passed in for the ORD (outgoing RDMA read queue
depth). In particular this means that if an application passes in IRD
> 0 and ORD = 0 (which is a completely sane and valid thing to do for
an app that expects only incoming RDMA read requests), then the
hardware will end up programmed with IRD = 0 and the app will fail in
a mysterious way.
Fix this by using "ep->ird" instead of "ep->ord" in the intended place.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the calculation of the MSS for RDMA connections: we need to
allow space in frames for a VLAN tag too.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size. Fix
this regression by allocating the rings with vmalloc() instead.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled. However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1. Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared. As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send(). These dropped packets are
painful in two cases:
- bonding fail-over which both calls set_multicast_list() on the new
active slave and sends Gratuitous ARP through that slave.
- IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
device and issues IGMP leave.
In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped. As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds). In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).
Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.
Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reset the retry counter when we get a good RDMA_READ_RESPONSE_MIDDLE
packet. This fix will prevent the requester from reporting a retry
exceeded error too early.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
A work completion entry could be placed on the wrong completion
queue when an RC QP is placed in the error state.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes the initialization of RC QPs, since we would rely on
the queue pair type (ibqp->qp_type) being set, but this field is only
initialized when we return from ipath_create_qp (it is initialized by
the user-level verbs library).
The fix is to not depend on this field to initialize the send and
the receive state of the RC QP.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There can be a case where the requester's rnr retry counter
(s_rnr_retry) is less than the number of rnr retries allowed per QP
(s_rnr_retry_cnt). This can happen if the s_rnr_retry counter is being
decremented and an ipath_query_qp call is issued during that time frame.
The fix is to always return the number of rnr retries allowed per QP
instead of the requester's rnr counter.
Found by code review.
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Subnet manager SetPortinfo messages distingush between changing the link
state (DOWN, ARM, ACTIVE) and the link physical state (POLL, SLEEP,
DISABLED). These are somewhat independent commands and affect when link
width and speed changes take effect. Without this patch, a link DOWN
physical state NOP command was causing the link width and speed settings
to take effect which should only happen when the link physical state is
goes down (either by a SMP or some link physical error like link errors
exceeding the threshold).
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed. The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference. Then if it was set, free the cm_id on this thread.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER. Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxbg3 driver is unnecessarily decreasing the number of CQ entries by
one when creating a CQ. This will cause the CQ not to have as many
entries as requested by the user if the user requests a power of 2 size.
Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set cap.max_inline_data to the actual max inline data that the adapter
support, so that userspace apps see the right value returned.
Signed-off-by: Jon Mason <jon@opengridcomputing.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.
This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread. Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This reverts commit a3cd7d9070.
The original commit breaks iSER reliably, making it complain:
iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11
The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up. This commit causes clean but used FMR entries also to be
purged. During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a CM MAD is received, it is queued to a CM workqueue for
processing. The queued work item references the port and device on
which the MAD was received. If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.
To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Interrupt moderation low threshold value was incorrectly triggering,
indicating that the threshold should be lowered.
The impact was the timer was likely to become 40usecs and get stuck
there. The biggest side effect was too many interrupts and nonoptimal
performance.
Signed-off-by: John Lacombe <jlacombe@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With commit ef19454b ("[LIB] crc32c: Keep intermediate crc state in
cpu order"), the behavior of crc32c changes on big-endian platforms.
Our algorithm expects the previous behavior; otherwise we have RDMA
connection establishment failure on big-endian platforms like powerpc.
Apply cpu_to_le32() to value returned by crc32c() to get the previous
behavior.
Signed-off-by: Faisal Latif <flatif@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Just delete the debugging statement so we don't use cqp_request after
freeing it. Adrian Bunk flagged this use-after-free issue spotted by
the Coverity checker.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a check-after-use spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a memory leak spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix an off-by-one spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Adrian Bunk pointed out that a Coverity scan found some apparently
dead code in nes_verbs.c that really shouldn't have been dead.
The function nes_create_cq() was missing the assignment
err = 1;
just prior to an iteration that conditionally set err = 0 if a PBL was
found for a given virtual CQ. I also noticed we should have been
returning -EFAULT on a couple related error paths.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A single entry (addr 0x10001000, size 0x2000) will get converted to
page address 0x10000000 with a page size of 0x4000. The code as it
stands doesn't address the single buffer case, but in fact it allows
the subsequent single-buffer special case to be eliminated entirely.
Because the mask now includes the (page adjusted) starting and ending
addresses, the general case works for the single buffer case as well.
Signed-off-by: Bryan Rosenburg <rosnbrg@us.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When mthca_fmr_alloc() returns an error, it should free the MPT at the
index key, not mr->ibmr.lkey, since the lkey has been mangled by
hw_index_to_key() and no longer is the real index. This bug causes
corruption of the MPT table free bitmap when mthca_fmr_alloc() fails.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out. In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.
Fix this by moving everything to the rx_reap_list that will actually
get freed up.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.
Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In nes_create_qp(), the test
if (nesqp->mmap_sq_db_index > NES_MAX_USER_WQ_REGIONS) {
is used to error out if the db_index is too large; however, if the
test doesn't trigger, then the index is used as
nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp;
and mmap_nesqp is declared as
struct nes_qp *mmap_nesqp[NES_MAX_USER_WQ_REGIONS];
which leads to an array overrun if the index is exactly equal to
NES_MAX_USER_WQ_REGIONS. Fix this by bailing out if the index is
greater than or equal to NES_MAX_USER_WQ_REGIONS.
This was spotted by the Coverity checker (CID 2162).
Acked-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to account for the VLAN header size in nes_netdev_change_mtu()
and nes_netdev_init(). Also, add spin lock/unlock during VLAN RX
registration so only one process can assign VLAN group for a given
interface at a time.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Only mask out MAC interrupt if necessary and re-enable on ifup. There
could be multiple netdevs going through the same MAC. MAC interrupts
should not be masked off until the last netdev is downed.
Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.
When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm. (The ib_cm will send an MRA if it receives a duplicate
REQ.) The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.
If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received. When the backlog is full, we just
want to drop the REQ so that it is retried later.
Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently mlx4_ib_fmr_alloc() calls mlx4_mr_enable() instead of
mlx4_fmr_enable(). The two functions are equivalent at the moment, but
this is not really correct (and the change is needed to fix a bug).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).
When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc. If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.
Found by Mellanox QA.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
replace:
big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) +
expression_in_cpu_byteorder);
with:
beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder);
Generated with a semantic patch.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cxgb3 HW and driver don't support loopback RDMA connections. So
fail any connection attempt where the destination address is local.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device. So
the reference count ended up too low, which leads to bad things. Fix up
and simplify the reference counting to avoid this.
(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Usually harmless, since the scatterlist is always hard-coded to a length
of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it.
This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets. The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.
We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't. Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.
This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.
Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code
to look up an entry is duplicated in get_cqe_from_buf() and the QP and
SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset().
This will also make it easier to switch over to using vmap() for buffers.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a standard NIC and RDMA/iWARP driver for NetEffect 1/10Gb ethernet adapters.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc()
would return 0 (success) no matter what. This leads to crashes a
little down the road, when we try to dereference eg mr->mtt, which was
really ERR_PTR(-Ewhatever).
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>