Commit Graph

231 Commits

Author SHA1 Message Date
Varun Prakash
b65eef0a5b libcxgb,iw_cxgb4,cxgbit: add cxgb_is_neg_adv()
Add cxgb_is_neg_adv() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15 20:49:19 -04:00
Varun Prakash
95554761d1 libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route6()
Add cxgb_find_route6() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15 20:49:19 -04:00
Varun Prakash
804c2f3e36 libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route()
Add cxgb_find_route() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15 20:49:19 -04:00
Varun Prakash
85e42b044e libcxgb,iw_cxgb4,cxgbit: add cxgb_get_4tuple()
Add cxgb_get_4tuple() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15 20:49:19 -04:00
Jens Axboe
3bc42f3f0e Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into for-linus
Sagi writes:

Here we have:
- Kconfig dependencies fix from Arnd
- nvme-rdma device removal fixes from Steve
- possible bad deref fix from Colin
2016-09-13 07:58:34 -06:00
Steve Wise
37eb816c08 iw_cxgb4: block module unload until all ep resources are released
Otherwise an endpoint can be still closing down causing a touch
after free crash.  Also WARN_ON if ulps have failed to destroy
various resources during device removal.

Fixes: ad61a4c7a9 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi@grimbrg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-04 10:00:53 +03:00
Steve Wise
609e941a6b iw_cxgb4: call dev_put() on l2t allocation failure
Reviewed-by: Sagi Grimberg <sagi@grimbrg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-04 10:00:53 +03:00
Steve Wise
30b03b1528 iw_cxgb4: use the MPA initiator's IRD if < our ORD
The i40iw initiator sends an MPA-request with ird=16 and ord=16. The cxgb4
responder sends an MPA-reply with ord = 32 causing i40iw to terminate
due to insufficient resources.

The logic to reduce the ORD to <= peer's IRD was wrong.

Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22 15:00:42 -04:00
Steve Wise
7f446abf12 iw_cxgb4: limit IRD/ORD advertised to ULP by device max.
The i40iw initiator sends an MPA-request with ird = 63, ord = 63. The
cxgb4 responder sends a RST.  Since the inbound ord=63 and it exceeds
the max_ird/c4iw_max_read_depth (=32 default), chelsio decides to abort.

Instead, cxgb4 should adjust the ord/ird down before presenting it to
the ULP.

Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22 15:00:42 -04:00
Steve Wise
12eb5137ed iw_cxgb4: stop MPA_REPLY timer when disconnecting
There exists a race where the application can setup a connection
and then disconnect it before iw_cxgb4 processes the fw4_ack
message.  For passive side connections, the fw4_ack message is
used to know when to stop the ep timer for MPA_REPLY messages.

If the application disconnects before the fw4_ack is handled then
c4iw_ep_disconnect() needs to clean up the timer state and stop the
timer before restarting it for the disconnect timer.  Failure to do this
results in a "timer already started" message and a premature stopping
of the disconnect timer.

Fixes: e4b76a2 ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02 13:15:17 -04:00
Hariprasad S
4a740838bf RDMA/iw_cxgb4: Low resource fixes for connection manager
Pre-allocate buffers for sending various control messages to close
connection, abort connection, etc so that we gracefully handle
connections when system is running out of memory.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:44:17 -04:00
Hariprasad S
4c72efefd9 RDMA/iw_cxgb4: Add missing error codes for act open cmd
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:44:17 -04:00
Hariprasad S
bce2841f5a RDMA/iw_cxgb4: clean up c4iw_reject_cr()
Get rid of unneeded code, and refactor things a bit.

For MPA version 0 we abort the connection.  For > 0, we attempt to send
an MPA_START/REJECT Reply, and then disconnect gracefully.  If the send
of the MPA message fails, then we abort the connection.  We can ignore
c4iw_ep_disconnect() errors here because it will clean up the endpoint
if there are failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:44:16 -04:00
Hariprasad S
3d4e79949c RDMA/iw_cxgb4: only read markers_enabled mod param once
markers_enabled should be read only once during MPA negotiation.
The present code does read markers_enabled twice during negotiation
which results in setting wrong recv/xmit markers if the markers_enabled is
changed in the middle of negotiation.
With this change the markers_enabled is read only once during MPA
negotiation. recv markers are set based on markers enabled module
parameter and xmit markers are set based on markers flag from the
MPA_START_REQ/MPA_START_REP.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23 10:43:19 -04:00
Bart Van Assche
ba987e51a6 iw_cxgb4: Convert a __force cast
__force casts should be avoided if there is a better alternative.
Hence modify the comparison of s_addr with INADDR_ANY such that the
__force cast is no longer necessary.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Vipul Pandya <vipul@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:11 -04:00
Hariprasad S
64bec74a9b RDMA/iw_cxgb4: Add arp failure handlers to send_mpa_reply/reject()
These handlers when called print error message to the kernel log,
but the actual handling is done by _c4iw_free_ep() and process_timeout().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:11 -04:00
Hariprasad S
093108cb36 RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr()
Currently c4iw_peer_abort_intr() does not wake up the waiter if the
endpoint state indicates we're using MPAv2 and we're currently trying to
connect. This was introduced with commit 7c0a33d611 ("RDMA/cxgb4:
Don't wakeup threads for MPAv2")

However, this original fix is flawed because it introduces a race that
can cause a deadlock of the iwarp stack.  Here is the race:

->local side sets up an active offload connection.

->local side sends MPA_START request.

->peer sends MPA_START response.

->local side ingress cpl thread begins processing the MPA_START response,
but before it changes the state from MPA_REQ_SENT to FPDU_MODE:

->peer sends a RST which results in a ABORT_REQ_RSS.  This triggers
peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev
is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the
stack will re-attempt the connection using MPAv1.

->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls
c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR
to firmware.  But since HW sent an abort, FW correctly drops the RI_WR/INIT
WR.

->So the cpl thread is stuck waiting for a reply and cannot process the
ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a
halt because no more ingress cpls are processed by the stack...

The correct fix for the issue is to always do the wake up in
c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().

Fixes: 7c0a33d611 ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:10 -04:00
Hariprasad S
4a4dd8db9d RDMA/iw_cxgb4: Handle ret value of process_mpa_reply() in rx_data
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:10 -04:00
Hariprasad S
f86fac79af RDMA/iw_cxgb4: atomic find and reference for listening endpoints
Add get_ep_from_stid() which will atomically find and reference the
endpoint struct if found. This avoids touch-after-free races between
threads destroying listening endpoints and the CPL processing thread
processing an incoming PASS_ACCEPT_REQ CPL.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:09 -04:00
Hariprasad S
e8667a9b35 RDMA/iw_cxgb4: Handle ULP accept/reject during ABORTING
c4iw_reject() and c4iw_accept() need to handle the case where the
endpoint has timed out and is in the middle of ABORTING the connection.

Here is the flow that causes the BUG_ON() to fire on the server side:

1) offload connection setup and endpoint timer started

2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP

3) endpoint timer fires, and process_timeout() aborts the connection,
this moves the endpoint state to ABORTING until HW sends up the
ABORT_RPL_RSS.

4) application exits closing the CONNECT_REQUEST cm_id.  The IWCM calls
c4iw_reject_cr() to destroy this connection request.

5) WHAMO: BUG_ON() because the state is ABORTING.

The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the
operation if the state is not in MPA_REQ_RCVD vs in DEAD.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:09 -04:00
Hariprasad S
ceb110a8c1 RDMA/iw_cxgb4: Release ep for for FPDU_MODE and MPA_REQ_RCVD in process_timeout
ARP failure may also happen when ep in FPDU_MODE and these failures need
to be handled by process_timeout(). process_timeout() also has to handle
case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:09 -04:00
Hariprasad S
c878b706ff RDMA/iw_cxgb4: Free skb in case of arp failure in _c4iw_free_ep()
Arp failure for send_mpa_reply/reject() is handled by freeing the
mpa_skb in c4iw_free_ep() before releasing ep.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:08 -04:00
Hariprasad S
944661dd97 RDMA/iw_cxgb4: atomically lookup ep and get a reference
There is a race between ULP threads calling c4iw_ep_disconnect() via
c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread
can free the endpoint just after the ingress CPL thread finds the ep
pointer in the tid table.  To avoid this, we now use the hwtid_idr table
for lookups instead of the LLD tid table so we can lock around insert,
remove, and lookup+get_ep to avoid the race.  The CPL handlers now will
either find the ep ptr and have a ref on it, or not find it and they
can discard the CPL.  Callers of get_ep_from_tid() will have a ref
on the ep if found, and thus must deref when they are done.
Negative advice in peer_abort_intr() need to dereference the ep.
therefore peer_abort() is scheduled to dereference the ep later.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:08 -04:00
Hariprasad S
761e19a504 RDMA/iw_cxgb4: Handle return value of c4iw_ofld_send() in abort_arp_failure()
In abort_arp_failure(), the return value from c4iw_ofld_send() is
ignored and thus if the CPL isn't sent, the endpoint is stuck and never
gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and
the ep resources are released in a safer context through process_work().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:07 -04:00
Hariprasad S
8a70f812b1 RDMA/iw_cxgb4: in process_timeout() don't move ep state to ABORTING
Moving the state to ABORTING causes the ep to get stuck because
c4iw_ep_timeout() thinks the ABORT has already been done. So leave the
state alone and let c4iw_ep_disconnect() do the right thing given the
ep state.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:07 -04:00
Hariprasad S
caa6c9f247 RDMA/iw_cxgb4: handle return value of c4iw_l2t_send() and send_mpa_req()
->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is
no handling of the return value of send_fw_act_open_req().
->In send_fw_act_open_req(), there is no handling of return value of
c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers
on connection establish failure.
->send_mpa_req() should act on the return from c4iw_l2t_send() and
return the error to the caller.
->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without
starting the timer and not changing the ep state, which is further
handled by act_establish()
-> In act_establish()?if send_mpa_request's get_skb returns an error,
may cause an ep leak. So handle return value of send_mpa_req()

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:07 -04:00
Hariprasad S
e4b76a2a26 RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation
->Stop the ep timer after MPA negotiation so that the arp failures
during send_mpa_reply/reject will be handled by process_timeout() after
the ep timer expires.
->Added case MPA_REP_SENT in process_timeout().
->For MPA reject, c4iw_ep_disconnect tries to start an already started
timer, which leads to warning message "timer already started".
-> In case of mpa reject stop the timer and call send_mpa_reject().
-> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer
only for send_mpa_reply(), which is set in c4iw_accept_cr().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:06 -04:00
Hariprasad S
da1cecdffc RDMA/iw_cxgb4: Do not stop timer in case of incomplete messages
In case of incomplete mpa messages we should not stop timer as it
results in return with timeout for the next mpa message

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:06 -04:00
Hariprasad S
8d1f1a6b3f RDMA/iw_cxgb4: parent_ep has to be dereferenced in case of passive accept failure
-> On passive side of connection parent_ep referenced during connection
request has to be dereferenced during the passive accept failure.
-> As passive accept failure error handlinglogic runs in atomic context,
the parent ep is dereferenced by scheduling work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:05 -04:00
Hariprasad S
ccd2c30b45 RDMA/iw_cxgb4: Correct RFC number of MPA
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:05 -04:00
Hariprasad S
9ca6f7cf9c RDMA/iw_cxgb4: Add few history bits for ep
- add EP_DISC_FAIL history bit
- add QP_REFED/DEREFED history bits
- Add functions to ref/deref the cm_id and add history bit for the same
- add CLOSE_CON_RPL history

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13 19:38:04 -04:00
Hariprasad S
6973627968 RDMA/iw_cxgb4: remove abort_connection() usage from ep_timeout()
Use c4iw_ep_disconnect() instead.  This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.

This is the last user of abort_connection(), so remove it too.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
c00dcbafac RDMA/iw_cxgb4: move QP -> ERROR on fatal disconnect errors
In c4iw_ep_disconnect(), if we fail to initiate a close operation, then
move the qp to ERROR to disassociate the ep from the qp.  Failure to do
this will leak the ep resources.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
fd6aabe48c RDMA/iw_cxgb4: don't use abort_connection in process_mpa_request()
Instead return whether the caller needs to disconnect. This is part of
getting rid of abort_connection() altogether so we properly clean up on
send_abort() failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
eaf4c6d46a RDMA/iw_cxgb4: remove abort_connection() usage from accept/reject
Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
fef4422d00 RDMA/iw_cxgb4: free resources when send_flowc() fails
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
f8e1e1d137 RDMA/iw_cxgb4: remove connection abort from process_mpa_reply
Instead, have the caller, rx_data() handle the close/abort like
it does for process_mpa_request(). This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
6e410d8f71 RDMA/iw_cxgb4: ensure eps don't get freed while the mutex is held
In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
streaming data, we call c4iw_modify_rc_qp() and move the qp from
RTS -> TERMINATE.  In c4iw_modify_rc_qp(), if rdma_fini() returns
an error, the ep will be dereferenced (refcnt=1).  Then rx_data()
calls c4iw_ep_disconnect() which starts the close operation.
But if send_halfclose() fails in c4iw_ep_disconnect(), we  will call
release_ep_resources() derefing the ep which reduces the refcnt to 0 and
and frees the ep. However we still has the ep mutex at that point, so we
have a touch-after-free bug.  There is a similar issue where
peer_close() calls c4iw_ep_disconnect().

The solution is to add a reference to the ep in c4iw_ep_disconnect()
after acquiring  the mutex, and release it after releasing the mutex.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
88bc230dc6 RDMA/iw_cxgb4: stop ep timer on close failure
In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
but send_halfclose() fails, we need to stop the timer and send a CLOSE
event up to the IWCM before releasing the resources. Otherwise, we can
crash when the ep timer fires if the ep is referencing a previous instance
of the device. This can happen as part of adapter reset/recovery, for
instance.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Hariprasad S
9dec900c20 RDMA/iw_cxgb4: release ep resources on accept arp failure
If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
will be stuck in SYN_PEND and never released.  So create an arp failure
handler specifically for this message to release the endpoint resources.

In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
be freed when destroyed.  Also we don't need to call release_tid() here
because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
hwtid.

If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
peer's ACK to our SYN is never received), then put the parent as well
in peer_abort().

Treat accept_cr() failures just like arp failures: put the parent ep
and release the ep resources destroying the tid

The ARP failure handlers are called in an atomic context, so we need to
schedule some of the processing which might block.  Namely _c4iw_free_ep()
which needs a mutex.  So create a "special" CPL opcode and handler and
schedule it via sched() to be run by process_work() in a blockable context.

Also rework the active open arp failure handler to make use of
release_ep_resources().  This allows both the active and passive arp
failure handlers to use the same deferred cleanup function.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-05 16:11:14 -04:00
Doug Ledford
082eaa5083 Merge branches 'nes', 'cxgb4' and 'iwpm' into k.o/for-4.6 2016-03-16 13:57:43 -04:00
Steve Wise
170003c894 iw_cxgb4: remove port mapper related code
Now that most of the port mapper code been moved to iwcm, we can remove
it from iw_cxgb4.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-16 13:48:21 -04:00
Hariprasad S
ac8e4c69a0 cxgb4/iw_cxgb4: TOS support
This series provides support for iWARP applications to specify a TOS
value and have that map to a VLAN Priority for iw_cxgb4 iWARP connections.

In iw_cxgb4, when allocating an L2T entry, pass the skb_priority based
on the tos value in the cm_id. Also pass the correct tos value during
connection setup so the passive side gets the client's desired tos.
When sending the FLOWC work request to FW, if the egress device is
in a vlan, then use the vlan priority bits as the scheduling class.
This allows associating RDMA connections with scheduling classes to
provide traffic shaping per flow.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:14 -05:00
Hariprasad S
6812faefb7 iw_cxgb4: remove false error log entry
Don't log errors if a listening endpoint is going away when procesing a
PASS_ACCEPT_REQ message.  This can happen.  Change the error printk to
a PDBG() debug log entry

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:14 -05:00
Linus Torvalds
048ccca8c1 Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
   ib_device struct
 - Move iopoll out of block and into lib, rename to irqpoll, and use
   in several places in the rdma stack as our new completion queue
   polling library mechanism.  Update the other block drivers that
   already used iopoll to use the new mechanism too.
 - Replace the per-entry GID table locks with a single GID table lock
 - IPoIB multicast cleanup
 - Cleanups to the IB MR facility
 - Add support for 64bit extended IB counters
 - Fix for netlink oops while parsing RDMA nl messages
 - RoCEv2 support for the core IB code
 - mlx4 RoCEv2 support
 - mlx5 RoCEv2 support
 - Cross Channel support for mlx5
 - Timestamp support for mlx5
 - Atomic support for mlx5
 - Raw QP support for mlx5
 - MAINTAINERS update for mlx4/mlx5
 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
 - Add support for remote invalidate to the iSER driver (pushed through the
   RDMA tree due to dependencies, acknowledged by nab)
 - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies,
   acknowledged by Bruce)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWoSygAAoJELgmozMOVy/dDjsP/2vbTda2MvQfkfkGEZBQdJSg
 095RN0gQgCJdg78lAl8yuaK8r4VN/7uefpDtFdudH1I/Pei7X0wxN9R1UzFNG4KR
 AD53lz92IVPs15328SbPR2kvNWISR9aBFQo3rlElq3Grqlp0EMn2Ou1vtu87rekF
 aMllxr8Nl0uZhP+eWusOsYpJUUtwirLgRnrAyfqo2UxZh/TMIroT0TCx1KXjVcAg
 dhDARiZAdu3OgSc6OsWqmH+DELEq6dFVA5F+DDBGAb8bFZqlJc7cuMHWInwNsNXT
 so4bnEQ835alTbsdYtqs5DUNS8heJTAJP4Uz0ehkTh/uNCcvnKeUTw1c2P/lXI1k
 7s33gMM+0FXj0swMBw0kKwAF2d9Hhus9UAN7NwjBuOyHcjGRd5q7SAnfWkvKx000
 s9jVW19slb2I38gB58nhjOh8s+vXUArgxnV1+kTia1+bJSR5swvVoWRicRXdF0vh
 TvLX/BjbSIU73g1TnnLNYoBTV3ybFKQ6bVdQW7fzSTDs54dsI1vvdHXi3bYZCpnL
 HVwQTZRfEzkvb0AdKbcvf8p/TlaAHem3ODqtO1eHvO4if1QJBSn+SptTEeJVYYdK
 n4B3l/dMoBH4JXJUmEHB9jwAvYOpv/YLAFIvdL7NFwbqGNsC3nfXFcmkVORB1W3B
 KEMcM2we4bz+uyKMjEAD
 =5oO7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "Initial roundup of 4.5 merge window patches

   - Remove usage of ib_query_device and instead store attributes in
     ib_device struct

   - Move iopoll out of block and into lib, rename to irqpoll, and use
     in several places in the rdma stack as our new completion queue
     polling library mechanism.  Update the other block drivers that
     already used iopoll to use the new mechanism too.

   - Replace the per-entry GID table locks with a single GID table lock

   - IPoIB multicast cleanup

   - Cleanups to the IB MR facility

   - Add support for 64bit extended IB counters

   - Fix for netlink oops while parsing RDMA nl messages

   - RoCEv2 support for the core IB code

   - mlx4 RoCEv2 support

   - mlx5 RoCEv2 support

   - Cross Channel support for mlx5

   - Timestamp support for mlx5

   - Atomic support for mlx5

   - Raw QP support for mlx5

   - MAINTAINERS update for mlx4/mlx5

   - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

   - Add support for remote invalidate to the iSER driver (pushed
     through the RDMA tree due to dependencies, acknowledged by nab)

   - Update to NFSoRDMA (pushed through the RDMA tree due to
     dependencies, acknowledged by Bruce)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
  IB/mlx5: Unify CQ create flags check
  IB/mlx5: Expose Raw Packet QP to user space consumers
  {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
  IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
  IB/mlx5: Add Raw Packet QP query functionality
  IB/mlx5: Add create and destroy functionality for Raw Packet QP
  IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
  IB/mlx5: Allocate a Transport Domain for each ucontext
  net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
  net/mlx5_core: Add RQ and SQ event handling
  net/mlx5_core: Export transport objects
  IB/mlx5: Expose CQE version to user-space
  IB/mlx5: Add CQE version 1 support to user QPs and SRQs
  IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
  IB/sa: Fix netlink local service GFP crash
  IB/srpt: Remove redundant wc array
  IB/qib: Improve ipoib UD performance
  IB/mlx4: Advertise RoCE v2 support
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  ...
2016-01-23 18:45:06 -08:00
Hariprasad S
28de1f7437 iw_cxgb4: Take clip reference before starting IPv6 listen
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19 15:23:40 -05:00
Linus Torvalds
7d1fc01afc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  floppy: make local variable non-static
  exynos: fixes an incorrect header guard
  dt-bindings: fixes some incorrect header guards
  cpufreq-dt: correct dead link in documentation
  cpufreq: ARM big LITTLE: correct dead link in documentation
  treewide: Fix typos in printk
  Documentation: filesystem: Fix typo in fs/eventfd.c
  fs/super.c: use && instead of & for warn_on condition
  Documentation: fix sysfs-ptp
  lib: scatterlist: fix Kconfig description
2016-01-14 17:04:19 -08:00
Masanari Iida
e3d132d123 treewide: Fix typos in printk
This patch fix multiple spelling typos found in
various part of kernel.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-08 14:59:19 +01:00
Hariprasad S
963cab5082 iw_cxgb4: Adds support for T6 adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:38 -04:00
Hariprasad S
3dd9a5dc24 iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall
The ESTABLISHED event should have the peer's ord/ird so
swap the values in the event before the upcall.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S
f57b780c00 iw_cxgb4: fix misuse of ep->ord for minimum ird calculation
When calculating the minimum ird in c4iw_accept_cr(), we need to always
have a value of at least 1 if the RTR message is a 0B read.  The code
was
incorrectly using ep->ord for this logic which was incorrectly adjusting
the ird and causing incorrect ord/ird negotiation when using MPAv2 to
negotiate these values.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S
158c776dba iw_cxgb4: pass the ord/ird in connect reply events
This allows client ULPs to get the negotiated ord/ird which is useful
to avoid stalling the SQ due to exceeding the ORD.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S
99718e59fa iw_cxgb4: detect fatal errors while creating listening filters
In c4iw_create_listen(), if we're using listen filters, then bail out
of the busy loop if the device becomes fatally dead

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Nicholas Krause
54b9a96f10 IB/cxgb4: Fix if statement in pick_local_ip6adddrs
This fixes an if statement checking the return value of the function
get_lladdr for success in the function pick_local_ip6addrs to instead
of directly checking the return value of this call check the opposite
as get_lladdr returns zero for success which would incorrectly make
this if statement block not execute with the current if statement
check.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:03:59 -04:00
Hariprasad S
84cc6ac62d iw_cxgb4: Add support for clip
Add support for ipv6 address handling clip api provided by lld

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:48 -04:00
Hariprasad S
b8ac311246 iw_cxgb4: set the default MPA version to 2
This enables ORD/IRD negotiation and its about time to enable it by
default

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:47 -04:00
Steve Wise
940fd304d2 iw_cxgb4: use wildcard mapping for getting remote addr info
For listening endpoints bound to the wildcard address, we need to pass
the wildcard address mapping to iwpm_get_remote_info() instead of the
mapped address of the new child connection.

Without this fix, and with iwarp port mapping enabled, each iw_cxgb4
connection that is spawned from a listening endpoint bound to the wildcard
address, will generate an annoying dmesg entry about failing to find
the remote address mapping info, and the connection state displayed in
debugfs under /sys/kernel/debug/iw_cxgb4/<pci-slot-no>/eps  will not have
the peer's address/port mapping info.  The connection still works though.

Fixes: 5b6b8fe ("RDMA/cxgb4: Report the actual address of the remote connecting peer")

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-11 17:16:49 -04:00
Hariprasad S
179d03bbfd iw_cxgb4: Remove negative advice dmesg warnings
Remove these log messages in favor of per-endpoint counters as well as
device-global counters that can be inspected via debugfs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 13:21:27 -04:00
Steve Wise
5b6b8fe640 RDMA/cxgb4: Report the actual address of the remote connecting peer
Get the actual (non-mapped) ip/tcp address of the connecting peer from
the port mapper

Also setup the passive side endpoint to correctly display the actual
and mapped addresses for the new connection.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 09:18:01 -04:00
Hariprasad S
6198dd8d7a iw_cxgb4: 32b platform fixes
- get_dma_mr() was using ~0UL which is should be ~0ULL.  This causes the
DMA MR to get setup incorrectly in hardware.

- wr_log_show() needed a 64b divide function div64_u64() instead of
  doing
division directly.

- fixed warnings about recasting a pointer to a u64

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 09:18:01 -04:00
Hariprasad S
0b7410471d iw_cxgb4: Cleanup register defines/MACROS
Cleanup macros and register defines for consistency

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 09:18:01 -04:00
Hariprasad Shenai
cf7fe64aee iw_cxgb4: Cleanup register defines/MACROS defined in t4fw_ri_api.h
Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:07:02 -05:00
Hariprasad Shenai
bdc590b99f iw_cxgb4/cxgb4/cxgb4vf/cxgb4i/csiostor: Cleanup register defines/macros related to all other cpl messages
This patch cleanups all other macros/register define related to
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:19:34 -05:00
Hariprasad Shenai
6c53e938a8 iw_cxgb4/cxgb4/cxgb4i: Cleanup register defines/MACROS related to CM CPL messages
This patch cleanups all macros/register define related to connection management
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:19:34 -05:00
Hariprasad S
e6b11163d4 RDMA/cxgb4: Handle NET_XMIT return codes
cxgb4_create_server() and cxgb4_create_server6() return NET_XMIT_*
values or a negative errno. iw_cxgb4 need to handle this correctly.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Hariprasad Shenai
10be6b48fd RDMA/cxgb4: Fix locking issue in process_mpa_request
Fix the following lockdep report:

    =============================================
    [ INFO: possible recursive locking detected ]
    3.17.0+ #3 Tainted: G            E
    ---------------------------------------------
    kworker/u64:3/299 is trying to acquire lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e07a>]
    process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]

    but task is already holding lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0 [iw_cxgb4]

    other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&epc->mutex);
      lock(&epc->mutex);

     *** DEADLOCK ***

     May be due to missing lock nesting notation

    3 locks held by kworker/u64:3/299:
     #0:  ("%s""iw_cxgb4"){.+.+.+}, at: [<ffffffff8106f14d>]
    process_one_work+0x13d/0x4d0
     #1:  (skb_work){+.+.+.}, at: [<ffffffff8106f14d>] process_one_work+0x13d/0x4d0
     #2:  (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0
    [iw_cxgb4]

    stack backtrace:
    CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: G            E  3.17.0+ #3
    Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010
    Workqueue: iw_cxgb4 process_work [iw_cxgb4]
     ffff8800b91593d0 ffff8800b8a2f9f8 ffffffff815df107 0000000000000001
     ffff8800b9158750 ffff8800b8a2fa28 ffffffff8109f0e2 ffff8800bb768a00
     ffff8800b91593d0 ffff8800b9158750 0000000000000000 ffff8800b8a2fa88
    Call Trace:
     [<ffffffff815df107>] dump_stack+0x49/0x62
     [<ffffffff8109f0e2>] print_deadlock_bug+0xf2/0x100
     [<ffffffff810a0f04>] validate_chain+0x454/0x700
     [<ffffffff810a1574>] __lock_acquire+0x3c4/0x580
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810a17cc>] lock_acquire+0x9c/0x110
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff815e111b>] mutex_lock_nested+0x4b/0x360
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810c181a>] ? del_timer_sync+0xaa/0xd0
     [<ffffffff810c1770>] ? try_to_del_timer_sync+0x70/0x70
     [<ffffffffa074e07a>] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffffa074a3ec>] ? update_rx_credits+0xec/0x140 [iw_cxgb4]
     [<ffffffffa074e381>] rx_data+0xd1/0x1f0 [iw_cxgb4]
     [<ffffffff8109ff23>] ? mark_held_locks+0x73/0xa0
     [<ffffffff815e4b90>] ? _raw_spin_unlock_irqrestore+0x40/0x70
     [<ffffffff810a020d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
     [<ffffffff810a02dd>] ? trace_hardirqs_on+0xd/0x10
     [<ffffffffa074c931>] process_work+0x51/0x80 [iw_cxgb4]
     [<ffffffff8106f1c8>] process_one_work+0x1b8/0x4d0
     [<ffffffff8106f14d>] ? process_one_work+0x13d/0x4d0
     [<ffffffff8106f600>] worker_thread+0x120/0x3c0
     [<ffffffff8106f4e0>] ? process_one_work+0x4d0/0x4d0
     [<ffffffff81074a0e>] kthread+0xde/0x100
     [<ffffffff815e4b40>] ? _raw_spin_unlock_irq+0x30/0x40
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70
     [<ffffffff815e512c>] ret_from_fork+0x7c/0xb0
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70

Based on original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Hariprasad Shenai
5167865aaa RDMA/cxgb4/csiostor: Cleansup FW related macros/register defines for PF/VF and LDST
This patch cleanups PF/VF and LDST related macros/register defines that are
defined in t4fw_api.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-22 16:57:47 -05:00
Hariprasad Shenai
77a80e23cc RDMA/cxgb4: Cleanup Filter related macros/register defines
This patch cleanups all filter related macros/register defines that are defined
in t4fw_api.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-22 16:57:46 -05:00
Anish Bhatt
d7990b0c34 cxgb4i/cxgb4 : Refactor macros to conform to uniform standards
Refactored all macros used in cxgb4i as part of previously started cxgb4 macro
names cleanup. Makes them more uniform and avoids namespace collision.
Minor changes in other drivers where required as some of these macros are used
 by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 14:36:22 -05:00
Hariprasad Shenai
e2ac962895 cxgb4: Cleanup macros so they follow the same style and look consistent, part 2
Various patches have ended up changing the style of the symbolic macros/register
defines to different style.

As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 12:57:10 -05:00
Hariprasad S
da22b896b1 RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line
This fixes ntuple calculation for IPv6 active open request for T5
adapter.  And also removes an duplicate line which got added in commit
92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for
iWARP connections")

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14 00:34:08 -07:00
Hariprasad S
d480201b22 RDMA/cxgb4: Add missing neigh_release in find_route
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14 00:34:08 -07:00
Hariprasad S
04524a47c3 RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
best_mtu and set_emss were not considering ipv6 header for ipv6 case.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14 00:34:08 -07:00
David S. Miller
8fd90bb889 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/cxgb4/device.c

The cxgb4 conflict was simply overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22 00:44:59 -07:00
Hariprasad Shenai
dd92b12453 iw_cxgb4: log detailed warnings for negative advice
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 20:23:59 -07:00
Hariprasad Shenai
4c2c576322 cxgb4/iw_cxgb4: use firmware ord/ird resource limits
Advertise a larger max read queue depth for qps, and gather the resource limits
from fw and use them to avoid exhaustinq all the resources.

Design:

cxgb4:

Obtain the max_ordird_qp and max_ird_adapter device params from FW
at init time and pass them up to the ULDs when they attach.  If these
parameters are not available, due to older firmware, then hard-code
the values based on the known values for older firmware.
iw_cxgb4:

Fix the c4iw_query_device() to report these correct values based on
adapter parameters.  ibv_query_device() will always return:

max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
max_res_rd_atom = max_ird_adapter

Bump up the per qp max module option to 32, allowing it to be increased
by the user up to the device max of max_ordird_qp.  32 seems to be
sufficient to maximize throughput for streaming read benchmarks.

Fail connection setup if the negotiated IRD exhausts the available
adapter ird resources.  So the driver will track the amount of ird
resource in use and not send an RI_WR/INIT to FW that would reduce the
available ird resources below zero.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:25:16 -07:00
Steve Wise
46c1376db1 RDMA/cxgb4: Call iwpm_init() only once
We need to only register with the iwpm core once.  Currently it is
being done for every adapter, which causes a failure for each adapter
but the first, making multiple adapters unusable.

Fixes: 9eccfe109b ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-13 10:00:54 -07:00
Hariprasad S
5dab6d3ab1 RDMA/cxgb4: Clean up connection on ARP error
Based on origninal work by Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08 16:56:54 -07:00
Hariprasad S
233b430103 RDMA/cxgb4: Fix skb_leak in reject_cr()
Based on origninal work by Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08 16:56:54 -07:00
Hariprasad Shenai
35b1de5579 rdma/cxgb4: Fixes cxgb4 probe failure in VM when PF is exposed through PCI Passthrough
Change logic which determines our Physical Function at PCI Probe time.
Now we read the PL_WHOAMI register and get the Physical Function.

Pass Physical Function to Upper Layer Drivers in lld_info structure in the
new field "pf" added to lld_info.  This is useful for the cases where the
PF, say PF4, is attached to a Virtual Machine via some form of "PCI
Pass Through" technology and the PCI Function shows up as PF0 in the VM.

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 18:56:10 -07:00
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Hariprasad Shenai
b408ff282d iw_cxgb4: don't truncate the recv window size
Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K).  If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 22:49:54 -07:00
Hariprasad Shenai
92e7ae7172 iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections
Select the appropriate hw mtu index and initial sequence number to optimize
hw memory performance.

Add new cxgb4_best_aligned_mtu() which allows callers to provide enough
information to be used to [possibly] select an MTU which will result in the
TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value.

If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so
that after the SYN the send seqno will align on a 4B boundary. The RTR
message exchange will leave the send seqno aligned on an 8B boundary.
If an RTR is not required, then align the ISS to 8B - 1.  The goal is
to have the send seqno be 8B aligned when we send the first FPDU.

Based on original work by Casey Leedom <leeedom@chelsio.com> and
Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 22:49:54 -07:00
Roland Dreier
eeaddf3670 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-06-10 10:12:14 -07:00
Steve Wise
9eccfe109b RDMA/cxgb4: Add support for iWARP Port Mapper user space service
Based on original work by Vipul Pandya.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>

[ Fix htons -> ntohs to make sparse happy.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10 10:12:06 -07:00
Steve Wise
11b8e22d4d RDMA/cxgb4: Fix vlan support
RDMA connections over a vlan interface don't work due to
import_ep() not using the correct egress device.

 - use the real device in import_ep()
 - use rdma_vlan_dev_real_dev() in get_real_dev().

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-19 18:00:32 -07:00
Steve Wise
92e5011ab0 RDMA/cxgb4: Force T5 connections to use TAHOE congestion control
This is required to work around a T5 HW issue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise
cc18b939e1 RDMA/cxgb4: Fix endpoint mutex deadlocks
In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex.  Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-28 17:29:41 -07:00
Steve Wise
b33bd0cbfa RDMA/cxgb4: Endpoint timeout fixes
1) timedout endpoint processing can be starved. If there are continual
   CPL messages flowing into the driver, the endpoint timeout
   processing can be starved.  This condition exposed the other bugs
   below.

Solution: In process_work(), call process_timedout_eps() after each CPL
is processed.

2) Connection events can be processed even though the endpoint is on
   the timeout list.  If the endpoint is scheduled for timeout
   processing, then we must ignore MPA Start Requests and Replies.

Solution: Change stop_ep_timer() to return 1 if the ep has already been
queued for timeout processing.  All the callers of stop_ep_timer() need
to check this and act accordingly.  There are just a few cases where
the caller needs to do something different if stop_ep_timer() returns 1:

1) in process_mpa_reply(), ignore the reply and  process_timeout()
   will abort the connection.

2) in process_mpa_request, ignore the request and process_timeout()
   will abort the connection.

It is ok for callers of stop_ep_timer() to abort the connection since
that will leave the state in ABORTING or DEAD, and process_timeout()
now ignores timeouts when the ep is in these states.

3) Double insertion on the timeout list.  Since the endpoint timers
   are used for connection setup and teardown, we need to guard
   against the possibility that an endpoint is already on the timeout
   list.  This is a rare condition and only seen under heavy load and
   in the presense of the above 2 bugs.

Solution: In ep_timeout(), don't queue the endpoint if it is already on
the queue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-11 11:36:07 -07:00
Linus Torvalds
877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Steve Wise
c529fb5046 RDMA/cxgb4: rx_data() needs to hold the ep mutex
To avoid racing with other threads doing close/flush/whatever, rx_data()
should hold the endpoint mutex.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:53:54 -07:00
Steve Wise
977116c698 RDMA/cxgb4: Drop RX_DATA packets if the endpoint is gone
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:53:53 -07:00
Steve Wise
a7db89eb89 RDMA/cxgb4: Lock around accept/reject downcalls
There is a race between ULP threads doing an accept/reject, and the
ingress processing thread handling close/abort for the same connection.
The accept/reject path needs to hold the lock to serialize these paths.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>

[ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>.
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-02 08:52:45 -07:00
Steve Wise
9c88aa003d RDMA/cxgb4: Update snd_seq when sending MPA messages
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise
be13b2dff8 RDMA/cxgb4: Connect_request_upcall fixes
When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.

If the IWCM returns an error, then we must abort the connection and
release resources.  Also abort_connection() should not post a CLOSE
event, so clean that up too.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise
1ce1d471ac RDMA/cxgb4: Fix possible memory leak in RX_PKT processing
If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must
free the request skb and the saved skb with the tcp header.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:07:35 -07:00
Steve Wise
df2d5130ec RDMA/cxgb4: Default peer2peer mode to 1
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 10:01:30 -07:00
Steve Wise
ebf00060c3 RDMA/cxgb4: Always release neigh entry
Always release the neigh entry in rx_pkt().

Based on original work by Santosh Rastapur <santosh@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise
f8e819081f RDMA/cxgb4: Allow loopback connections
find_route() must treat loopback as a valid egress interface.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-20 09:59:04 -07:00
Steve Wise
7a2cea2aaa cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice
Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:44:11 -04:00