4cf44be6f1
Both Dan and I have observed two processes invoking
rpcrdma_xprt_disconnect() concurrently. In my case:
1. The connect worker invokes rpcrdma_xprt_disconnect(), which
drains the QP and waits for the final completion
2. This causes the newly posted Receive to flush and invoke
xprt_force_disconnect()
3. xprt_force_disconnect() sets CLOSE_WAIT and wakes up the RPC task
that is holding the transport lock
4. The RPC task invokes xprt_connect(), which calls ->ops->close
5. xprt_rdma_close() invokes rpcrdma_xprt_disconnect(), which tries
to destroy the QP.
Deadlock.
To prevent xprt_force_disconnect() from waking anything, handle the
clean up after a failed connection attempt in the xprt's sndtask.
The retry loop is removed from rpcrdma_xprt_connect() to ensure
that the newly allocated ep and id are properly released before
a REJECTED connection attempt can be retried.
Reported-by: Dan Aloni <dan@kernelim.com>
Fixes:
|
||
---|---|---|
.. | ||
backchannel.c | ||
frwr_ops.c | ||
Makefile | ||
module.c | ||
rpc_rdma.c | ||
svc_rdma_backchannel.c | ||
svc_rdma_recvfrom.c | ||
svc_rdma_rw.c | ||
svc_rdma_sendto.c | ||
svc_rdma_transport.c | ||
svc_rdma.c | ||
transport.c | ||
verbs.c | ||
xprt_rdma.h |