Commit Graph

504 Commits

Author SHA1 Message Date
Olga Kornievskaia
82ee41b85c SUNRPC don't resend a task on an offlined transport
When a task is being retried, due to an NFS error, if the assigned
transport has been put offline and the task is relocatable pick a new
transport.

Fixes: 6f081693e7 ("sunrpc: remove an offlined xprt using sysfs")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-24 12:06:07 -04:00
NeilBrown
4dc73c6791 NFSv4: keep state manager thread active if swap is enabled
If we are swapping over NFSv4, we may not be able to allocate memory to
start the state-manager thread at the time when we need it.
So keep it always running when swap is enabled, and just signal it to
start.

This requires updating and testing the cl_swapper count on the root
rpc_clnt after following all ->cl_parent links.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
8db55a032a SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC
rpc tasks can be marked as RPC_TASK_SWAPPER.  This causes GFP_MEMALLOC
to be used for some allocations.  This is needed in some cases, but not
in all where it is currently provided, and in some where it isn't
provided.

Currently *all* tasks associated with a rpc_client on which swap is
enabled get the flag and hence some GFP_MEMALLOC support.

GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it.
However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does
need it.

xdr_alloc_bvec is called while the XPRT_LOCK is held.  If this blocks,
then it blocks all other queued tasks.  So this allocation needs
GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used
for any swap writes.

Similarly, if the transport is not connected, that will block all
requests including swap writes, so memory allocations should get
GFP_MEMALLOC if swap writes are possible.

So with this patch:
 1/ we ONLY set RPC_TASK_SWAPPER for swap writes.
 2/ __rpc_execute() sets PF_MEMALLOC while handling any task
    with RPC_TASK_SWAPPER set, or when handling any task that
    holds the XPRT_LOCKED lock on an xprt used for swap.
    This removes the need for the RPC_IS_SWAPPER() test
    in ->buf_alloc handlers.
 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking
    any task to a swapper xprt.  __rpc_execute() will clear it.
 3/ PF_MEMALLOC is set for all the connect workers.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com> (for xprtrdma parts)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
NeilBrown
a41b05edfe SUNRPC/auth: async tasks mustn't block waiting for memory
When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.  So it
must not block waiting for memory.

mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running.  If all available workqueue
threads are waiting on the mempool, no thread is available to return
anything.

lookup_cred() can block on a mempool or kmalloc - and this can cause
deadlocks.  So add a new RPCAUTH_LOOKUP flag for async lookups and don't
block on memory.  If the -ENOMEM gets back to call_refreshresult(), wait
a short while and try again.  HZ>>4 is chosen as it is used elsewhere
for -ENOMEM retries.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-13 12:59:35 -04:00
Trond Myklebust
0adc879406 SUNRPC: Convert GFP_NOFS to GFP_KERNEL
The sections which should not re-enter the filesystem are already
protected with memalloc_nofs_save/restore calls, so it is better to use
GFP_KERNEL in these calls to allow better performance for synchronous
RPC calls.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:12 -05:00
Olga Kornievskaia
b8a09619a5 SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
If the supplied argument doesn't specify the transport type, use the
type of the existing rpc clnt and its existing transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:36:58 -05:00
Thiago Rafael Becker
023859ce6f sunrpc: remove unnecessary test in rpc_task_set_client()
In rpc_task_set_client(), testing for a NULL clnt is not necessary, as
clnt should always be a valid pointer to a rpc_client.

Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:55 -04:00
Olga Kornievskaia
dc48e0abee SUNRPC enforce creation of no more than max_connect xprts
If we are adding new transports via rpc_clnt_test_and_add_xprt()
then check if we've reached the limit. Currently only pnfs path
adds transports via that function but this is done in
preparation when the client would add new transports when
session trunking is detected. A warning is logged if the
limit is reached.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:29 -04:00
Olga Kornievskaia
3a3f976639 SUNRPC keep track of number of transports to unique addresses
Currently, xprt_switch keeps a number of all xprts (xps_nxprts)
that were added to the switch regardless of whethere it's an
nconnect transport or a transport to a trunkable address.
Introduce a new counter to keep track of transports to unique
destination addresses per xprt_switch.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:53 -04:00
Trond Myklebust
71d3d0ebc8 SUNRPC: Convert rpc_client refcount to use refcount_t
There are now tools in the refcount library that allow us to convert the
client shutdown code.

Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Chuck Lever
823c73d0c5 SUNRPC: Unset RPC_TASK_NO_RETRANS_TIMEOUT for NULL RPCs
In some rare failure modes, the server is actually reading the
transport, but then just dropping the requests on the floor.
TCP_USER_TIMEOUT cannot detect that case.

Prevent such a stuck server from pinning client resources
indefinitely by ensuring that certain idempotent requests
(such as NULL) can time out even if the connection is still
operational.

Otherwise rpc_bind_new_program(), gss_destroy_cred(), or
rpc_clnt_test_and_add_xprt() can wait forever.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:32:27 -04:00
Chuck Lever
aede517207 SUNRPC: Refactor rpc_ping()
Make it use the rpc_null_call_helper() so that it can share the
new rpc_call_ops structure to be introduced in the next patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:32:27 -04:00
Olga Kornievskaia
6f081693e7 sunrpc: remove an offlined xprt using sysfs
Once a transport has been put offline, this transport can be also
removed from the list of transports. Any tasks that have been stuck
on this transport would find the next available active transport
and be re-tried. This transport would be removed from the xprt_switch
list and freed.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia
e091853ebd SUNRPC mark the first transport
When an RPC client gets created it's first transport is special
and should be marked a main transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Olga Kornievskaia
2a338a5431 sunrpc: add a symlink from rpc-client directory to the xprt_switch
An rpc client uses a transport switch and one ore more transports
associated with that switch. Since transports are shared among
rpc clients, create a symlink into the xprt_switch directory
instead of duplicating entries under each rpc client.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
Olga Kornievskaia
c5a382ebdb sunrpc: Create per-rpc_clnt sysfs kobjects
These will eventually have files placed under them for sysfs operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:23 -04:00
NeilBrown
e877a88d1f SUNRPC in case of backlog, hand free slots directly to waiting task
If sunrpc.tcp_max_slot_table_entries is small and there are tasks
on the backlog queue, then when a request completes it is freed and the
first task on the queue is woken.  The expectation is that it will wake
and claim that request.  However if it was a sync task and the waiting
process was killed at just that moment, it will wake and NOT claim the
request.

As long as TASK_CONGESTED remains set, requests can only be claimed by
tasks woken from the backlog, and they are woken only as requests are
freed, so when a task doesn't claim a request, no other task can ever
get that request until TASK_CONGESTED is cleared.  Each time this
happens the number of available requests is decreased by one.

With a sufficiently high workload and sufficiently low setting of
max_slot (16 in the case where this was seen), TASK_CONGESTED can remain
set for an extended period, and the above scenario (of a process being
killed just as its task was woken) can repeat until no requests can be
allocated.  Then traffic stops.

This patch addresses the problem by introducing a positive handover of a
request from a completing task to a backlog task - the request is never
freed when there is a backlog.

When a task is woken it might not already have a request attached in
which case it is *not* freed (as with current code) but is initialised
(if needed) and used.  If it isn't used it will eventually be freed by
rpc_exit_task().  xprt_release() is enhanced to be able to correctly
release an uninitialised request.

Fixes: ba60eb25ff ("SUNRPC: Fix a livelock problem in the xprt->backlog queue")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-20 12:17:08 -04:00
Baptiste Lepers
f8f7e0fb22 sunrpc: Fix misplaced barrier in call_decode
Fix a misplaced barrier in call_decode. The struct rpc_rqst is modified
as follows by xprt_complete_rqst:

req->rq_private_buf.len = copied;
/* Ensure all writes are done before we update */
/* req->rq_reply_bytes_recvd */
smp_wmb();
req->rq_reply_bytes_recvd = copied;

And currently read as follows by call_decode:

smp_rmb(); // misplaced
if (!req->rq_reply_bytes_recvd)
   goto out;
req->rq_rcv_buf.len = req->rq_private_buf.len;

This patch places the smp_rmb after the if to ensure that
rq_reply_bytes_recvd and rq_private_buf.len are read in order.

Fixes: 9ba828861c ("SUNRPC: Don't try to parse incomplete RPC messages")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-01 19:42:14 -04:00
Chuck Lever
7638e0bfae SUNRPC: Move fault injection call sites
I've hit some crashes that occur in the xprt_rdma_inject_disconnect
path. It appears that, for some provides, rdma_disconnect() can
take so long that the transport can disconnect and release its
hardware resources while rdma_disconnect() is still running,
resulting in a UAF in the provider.

The transport's fault injection method may depend on the stability
of transport data structures. That means it needs to be invoked
only from contexts that hold the transport write lock.

Fixes: 4a06825839 ("SUNRPC: Transport fault injection")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
9ed5af268e SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()
rpc_prepare_reply_pages() currently expects the 'hdrsize' argument to
contain the length of the data that we expect to want placed in the head
kvec plus a count of 1 word of padding that is placed after the page data.
This is very confusing when trying to read the code, and sometimes leads
to callers adding an arbitrary value of '1' just in order to satisfy the
requirement (whether or not the page data actually needs such padding).

This patch aims to clarify the code by changing the 'hdrsize' argument
to remove that 1 word of padding. This means we need to subtract the
padding from all the existing callers.

Fixes: 02ef04e432 ("NFS: Account for XDR pad of buf->pages")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:53 -05:00
Chuck Lever
42ebfc2cbf SUNRPC: Clean up call_bind_status() observability
Time to remove dprintk call sites in here.

Regarding the rpc_bind_status tracepoint: It's friendlier to
administrators if they don't have to look up the error code to
figure out what went wrong. Replace trace_rpc_bind_status with a
set of tracepoints that report more specifically what the problem
was, and what RPC program/version was being queried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
fd66e2a79d SUNRPC: Remove dprintk call site in call_decode
Clean up.

When enabled, this dprintk adds a line in /var/log/messages after
every RPC that reports the task ID (no connection to on the wire
XID values) and the RPC's result (no connection to the program,
operation, or the arguments and results).

Thus it's value is pretty low. Let's remove it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
7c8099f6ad SUNRPC: Trace call_refresh events
Clean up: Replace dprintk call sites.

Note that rpc_call_rpcerror() already has a trace point, so perhaps
adding trace_rpc_refresh_status() isn't necessary. However, it does
report a particular category of error.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
914cdcc78a SUNRPC: Add trace_rpc_timeout_status()
For a long while we've wanted a tracepoint that fires when a major
timeout is reported in the system log. Such a tracepoint can be
attached to other actions that can take place when a timeout is
detected (eg, server or connection health assessment).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
db0a86c426 SUNRPC: Replace connect dprintk call sites with a tracepoint
This trace event can be used to audit transport connections from the
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
0ec36cc9cd SUNRPC: Remove dprintk call site in call_start()
Clean up: The rpc_rpc_request tracepoint serves the same purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
6387039d6d SUNRPC: Remove the dprint_status() macro
Clean up: The rpc_task_run_action tracepoint serves the same
purpose.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
06e234c613 SUNRPC: Hoist trace_xprtrdma_op_allocate into generic code
Introduce a tracepoint in call_allocate that reports the exact
sizes in the RPC buffer allocation request and the status of the
result. This helps catch problems with XDR buffer provisioning,
and replaces transport-specific debugging instrumentation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Olga Kornievskaia
88428cc4ae SUNRPC dont update timeout value on connection reset
Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply. Where as if a connection reset didn't
change the timeout value, the client would have re-tried (the 3rd
time) after 60secs.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Chuck Lever
841a2ed9a1 SUNRPC: Set SOFTCONN when destroying GSS contexts
Move the RPC_TASK_SOFTCONN flag into rpc_call_null_helper(). The
only minor behavior change is that it is now also set when
destroying GSS contexts.

This gives a better guarantee that gss_send_destroy_context() will
not hang for long if a connection cannot be established.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
6fc3737aac SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
Clean up.

All of rpc_call_null_helper() call sites assert RPC_TASK_SOFT, so
move that setting into rpc_call_null_helper() itself.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
eefc536dbd SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
Clean up.

Commit a52458b48a ("NFS/NFSD/SUNRPC: replace generic creds with
'struct cred'.") made rpc_call_null_helper() set RPC_TASK_NULLCREDS
unconditionally. Therefore there's no need for
rpc_call_null_helper()'s call sites to set RPC_TASK_NULLCREDS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
42aad0d7f9 SUNRPC: trace RPC client lifetime events
The "create" tracepoint records parts of the rpc_create arguments,
and the shutdown tracepoint records when the rpc_clnt is about to
signal pending tasks and destroy auths.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
c509f15a58 SUNRPC: Split the xdr_buf event class
To help tie the recorded xdr_buf to a particular RPC transaction,
the client side version of this class should display task ID
information and the server side one should show the request's XID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
Chuck Lever
0125ecbb52 SUNRPC: Add tracepoint to rpc_call_rpcerror()
Add a tracepoint in another common exit point for failing RPCs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11 13:33:48 -04:00
J. Bruce Fields
933496e9cc SUNRPC: 'Directory with parent 'rpc_clnt' already present!'
Each rpc_client has a cl_clid which is allocated from a global ida, and
a debugfs directory which is named after cl_clid.

We're releasing the cl_clid before we free the debugfs directory named
after it.  As soon as the cl_clid is released, that value is available
for another newly created client.

That leaves a window where another client may attempt to create a new
debugfs directory with the same name as the not-yet-deleted debugfs
directory from the dying client.  Symptoms are log messages like

	Directory 4 with parent 'rpc_clnt' already present!

Fixes: 7c4310ff56 "SUNRPC: defer slow parts of rpc_free_client() to a workqueue."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-14 16:31:09 -04:00
Chuck Lever
ce99aa62e1 SUNRPC: Signalled ASYNC tasks need to exit
Ensure that signalled ASYNC rpc_tasks exit immediately instead of
spinning until a timeout (or forever).

To avoid checking for the signal flag on every scheduler iteration,
the check is instead introduced in the client's finite state
machine.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: ae67bd3821 ("SUNRPC: Fix up task signalling")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-11 14:06:50 -04:00
NeilBrown
31e9a7f353 SUNRPC: fix use-after-free in rpc_free_client_work()
Parts of rpc_free_client() were recently moved to
a separate rpc_free_clent_work().  This introduced
a use-after-free as rpc_clnt_remove_pipedir() calls
rpc_net_ns(), and that uses clnt->cl_xprt which has already
been freed.
So move the call to xprt_put() after the call to
rpc_clnt_remove_pipedir().

Reported-by: syzbot+22b5ef302c7c40d94ea8@syzkaller.appspotmail.com
Fixes: 7c4310ff56 ("SUNRPC: defer slow parts of rpc_free_client() to a workqueue.")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-05-10 19:44:56 -04:00
NeilBrown
7c4310ff56 SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
The rpciod workqueue is on the write-out path for freeing dirty memory,
so it is important that it never block waiting for memory to be
allocated - this can lead to a deadlock.

rpc_execute() - which is often called by an rpciod work item - calls
rcp_task_release_client() which can lead to rpc_free_client().

rpc_free_client() makes two calls which could potentially block wating
for memory allocation.

rpc_clnt_debugfs_unregister() calls into debugfs and will block while
any of the debugfs files are being accessed.  In particular it can block
while any of the 'open' methods are being called and all of these use
malloc for one thing or another.  So this can deadlock if the memory
allocation waits for NFS to complete some writes via rpciod.

rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't
obvious that memory allocations can happen while the lock it held, it is
safer to assume they might and to not let rpciod call
rpc_clnt_remove_pipedir().

So this patch moves these two calls (together with the final kfree() and
rpciod_down()) into a work-item to be run from the system work-queue.
rpciod can continue its important work, and the final stages of the free
can happen whenever they happen.

I have seen this deadlock on a 4.12 based kernel where debugfs used
synchronize_srcu() when removing objects.  synchronize_srcu() requires a
workqueue and there were no free workther threads and none could be
allocated.  While debugsfs no longer uses SRCU, I believe the deadlock
is still possible.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-28 15:58:38 -04:00
Xiyu Yang
efe57fd58e SUNRPC: Remove unreachable error condition
rpc_clnt_test_and_add_xprt() invokes rpc_call_null_helper(), which
return the value of rpc_run_task() to "task". Since rpc_run_task() is
impossible to return an ERR pointer, there is no need to add the
IS_ERR() condition on "task" here. So we need to remove it.

Fixes: 7f55489058 ("SUNRPC: Allow addition of new transports to a struct rpc_clnt")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-21 19:11:59 -04:00
Linus Torvalds
04de788e61 NFS client updates for Linux 5.7
Highlights include:
 
 Stable fixes:
 - Fix a page leak in nfs_destroy_unlinked_subrequests()
 - Fix use-after-free issues in nfs_pageio_add_request()
 - Fix new mount code constant_table array definitions
 - finish_automount() requires us to hold 2 refs to the mount record
 
 Features:
 - Improve the accuracy of telldir/seekdir by using 64-bit cookies when
   possible.
 - Allow one RDMA active connection and several zombie connections to
   prevent blocking if the remote server is unresponsive.
 - Limit the size of the NFS access cache by default
 - Reduce the number of references to credentials that are taken by NFS
 - pNFS files and flexfiles drivers now support per-layout segment
   COMMIT lists.
 - Enable partial-file layout segments in the pNFS/flexfiles driver.
 - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
 - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
   the DS using the layouterror mechanism.
 
 Bugfixes and cleanups:
 - SUNRPC: Fix krb5p regressions
 - Don't specify NFS version in "UDP not supported" error
 - nfsroot: set tcp as the default transport protocol
 - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
 - alloc_nfs_open_context() must use the file cred when available
 - Fix locking when dereferencing the delegation cred
 - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
 - Various clean ups of the NFS O_DIRECT commit code
 - Clean up RDMA connect/disconnect
 - Replace zero-length arrays with C99-style flexible arrays
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6LhhsACgkQZwvnipYK
 APIOJxAAiQOgmIg1CV4mrlcVhkwy09N5JAia6AENtoTmwm08nAYg5Y8REb9uX46a
 /MJsM2WG8hBCgI6eYmRY8LTr4Ft9rTQEJM9DRMuwQREXwMWwBhUv/QakCeqY1lHE
 lyB1z4hj5XKeUoN/OcfALC/GXFFf56A0UyN05nMzeCkBTdd3+qu+hW8Ge1wkAXcr
 f0pyLbzdFZlJuTmI4tr8F93g9p3ezuFBuEroT7XPIVJylAdZVumHqnOnz/Mvb99x
 rNTsX2dc44GhSAfRnTzPumU3MT6BOLvUzNH1xzdiqKzJrbOnG8WjFodrGr3JWpfp
 HkeyYQxJ+Hnfb2LiZBjvMQE8M7kVMZ1jVbrGJEbCxfSqgTly8lOHboqAeKsFaReK
 LStnusizdA1LHQVZxPdvn+oL49RDxnzm9dY+DkrXK1qT0GE+icN1CyTyLLfkSCp8
 tYvZSJ/qPk5BNZegqH1nBqXkMDkOJ4eEA7+luXDmajRkdRrZ3IWY2M1DpMEoueJ2
 j/zoj/NFr1oErU4o7PV9oolA1Euhn1L3wIDuzsbVtjySmbXJNQTtaVVRFpGw3SsZ
 7rbqi4BB0SzOooNhQ4q8mLNi4qT7bl/3D04eL8UVzEM73plexhQ8XiOEz/VrIRX7
 L9viXH49g4DHQ0rZIaWefxFueqpgbNvQwnlLZl2uQotG9hwhTts=
 =YUcP
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - Fix a page leak in nfs_destroy_unlinked_subrequests()

   - Fix use-after-free issues in nfs_pageio_add_request()

   - Fix new mount code constant_table array definitions

   - finish_automount() requires us to hold 2 refs to the mount record

  Features:
   - Improve the accuracy of telldir/seekdir by using 64-bit cookies
     when possible.

   - Allow one RDMA active connection and several zombie connections to
     prevent blocking if the remote server is unresponsive.

   - Limit the size of the NFS access cache by default

   - Reduce the number of references to credentials that are taken by
     NFS

   - pNFS files and flexfiles drivers now support per-layout segment
     COMMIT lists.

   - Enable partial-file layout segments in the pNFS/flexfiles driver.

   - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

   - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
     the DS using the layouterror mechanism.

  Bugfixes and cleanups:
   - SUNRPC: Fix krb5p regressions

   - Don't specify NFS version in "UDP not supported" error

   - nfsroot: set tcp as the default transport protocol

   - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

   - alloc_nfs_open_context() must use the file cred when available

   - Fix locking when dereferencing the delegation cred

   - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

   - Various clean ups of the NFS O_DIRECT commit code

   - Clean up RDMA connect/disconnect

   - Replace zero-length arrays with C99-style flexible arrays"

* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
  NFS: Clean up process of marking inode stale.
  SUNRPC: Don't start a timer on an already queued rpc task
  NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
  NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
  NFS: Beware when dereferencing the delegation cred
  NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
  NFS: finish_automount() requires us to hold 2 refs to the mount record
  NFS: Fix a few constant_table array definitions
  NFS: Try to join page groups before an O_DIRECT retransmission
  NFS: Refactor nfs_lock_and_join_requests()
  NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
  NFS: Clean up nfs_lock_and_join_requests()
  NFS: Remove the redundant function nfs_pgio_has_mirroring()
  NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
  NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
  NFS: Fix use-after-free issues in nfs_pageio_add_request()
  NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
  NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
  NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
  pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
  ...
2020-04-07 13:51:39 -07:00
Chuck Lever
b20dfc3fcd svcrdma: Create a generic tracing class for displaying xdr_buf layout
This class can be used to create trace points in either the RPC
client or RPC server paths. It simply displays the length of each
part of an xdr_buf, which is useful to determine that the transport
and XDR codecs are operating correctly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Trond Myklebust
263fb9c21e SUNRPC: Don't take a reference to the cred on synchronous tasks
If the RPC call is synchronous, assume the cred is already pinned
by the caller.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
7eac52648a SUNRPC: Add a flag to avoid reference counts on credentials
Add a flag to signal to the RPC layer that the credential is already
pinned for the duration of the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Chuck Lever
b8457606d9 SUNRPC: call_connect_status should handle -EPROTO
The xprtrdma connect logic can return -EPROTO if the underlying
device or network path does not support RDMA. This can happen
after a device removal/insertion.

- When SOFTCONN is set, EPROTO is a permanent error.

- When SOFTCONN is not set, EPROTO is treated as a temporary error.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Trond Myklebust
4e121fcae8 NFSoRDMA Client Updates for Linux 5.5
New Features:
 - New tracepoints for congestion control and Local Invalidate WRs
 
 Bugfixes and Cleanups:
 - Eliminate log noise in call_reserveresult
 - Fix unstable connections after a reconnect
 - Clean up some code duplication
 - Close race between waking a sender and posting a receive
 - Fix MR list corruption, and clean up MR usage
 - Remove unused rpcrdma_sendctx fields
 - Try to avoid DMA mapping pages if it is too costly
 - Wake pending tasks if connection fails
 - Replace some dprintk()s with tracepoints
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl3MGSwACgkQ18tUv7Cl
 QOt3LhAAz4T4DSGDb6QxUGxDRlusvHBpPFXA+GQOMRBVKiWHtrBcZT0UUybAm3Zp
 i2B1gVZJSo2JeA5vg3rK0yJmGiNPs4QedTUHiRsISrKOvUo+ITAEdNXgIIB3tAM9
 Pkwf0AMSqbzpUjKNHVGDGyhZ2WZM66zsDFI2CFh4Ul7VX/1NhM3xaUgTSEDJhl3z
 tE+aULrwGTQvUq4JKXQ3vu4f8rsbxxNKfvaZyQIKPo79nEFdtniVn18u5p010HDP
 ldJJAtY9qozhqWKwaSNEj6guW4U9wesLPrb7cBysHWjgivU17bwEbN/ZR3YrxoHI
 trpBdr5994FmOCz9mcKxH+BlS0bO7QSPS2r2TpgIMjKCm8scuZlhlnMQxHV8mEpz
 EpoC65qgcmqyeeOcIHnA/eN13ZAYgGKsRBIPEWRE/w+3Yz4bupsKZ/blSRzXdJpQ
 forMrAGTYa64NqdnRRDxf6PMwk8fqIDeTHTybMSLghUAQi89zYK0tZpgFikwBYRJ
 dqNGp4usCLtZ0c2nDnDg00arOZwqPQnxycNexHYNpHOACurCF9FhbaaZjsecZCoy
 QsSFN98K6KI5ztPY1p7DL5N36IC3VDwbgi0COKtF+xB3P0pIZ/Pzwo0KbfXQF0KH
 dmTrnpNY/Yq71i1ow6LTdYZ3hq7ZGztaXEEkl7udNvK97pP0UR0=
 =Jlak
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-5.5-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

NFSoRDMA Client Updates for Linux 5.5

New Features:
- New tracepoints for congestion control and Local Invalidate WRs

Bugfixes and Cleanups:
- Eliminate log noise in call_reserveresult
- Fix unstable connections after a reconnect
- Clean up some code duplication
- Close race between waking a sender and posting a receive
- Fix MR list corruption, and clean up MR usage
- Remove unused rpcrdma_sendctx fields
- Try to avoid DMA mapping pages if it is too costly
- Wake pending tasks if connection fails
- Replace some dprintk()s with tracepoints
2019-11-18 10:55:55 +01:00
Trond Myklebust
e6237b6feb NFSv4.1: Don't rebind to the same source port when reconnecting to the server
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.

NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Chuck Lever
5cd8b0d4dd SUNRPC: Eliminate log noise in call_reserveresult
Sep 11 16:35:20 manet kernel:
		call_reserveresult: unrecognized error -512, exiting

Diagnostic error messages such as this likely have no value for NFS
client administrators.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-24 10:30:39 -04:00
Linus Torvalds
972a2bf7df NFS Client Updates for Linux 5.3
Stable bugfixes:
 - Dequeue the request from the receive queue while we're re-encoding # v4.20+
 - Fix buffer handling of GSS MIC without slack # 5.1
 
 Features:
 - Increase xprtrdma maximum transport header and slot table sizes
 - Add support for nfs4_call_sync() calls using a custom rpc_task_struct
 - Optimize the default readahead size
 - Enable pNFS filelayout LAYOUTGET on OPEN
 
 Other bugfixes and cleanups:
 - Fix possible null-pointer dereferences and memory leaks
 - Various NFS over RDMA cleanups
 - Various NFS over RDMA comment updates
 - Don't receive TCP data into a reset request buffer
 - Don't try to parse incomplete RPC messages
 - Fix congestion window race with disconnect
 - Clean up pNFS return-on-close error handling
 - Fixes for NFS4ERR_OLD_STATEID handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
 QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
 2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
 HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
 ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
 3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
 YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
 sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
 snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
 OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
 06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
 KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
 =4Ey6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Dequeue the request from the receive queue while we're re-encoding
     # v4.20+
   - Fix buffer handling of GSS MIC without slack # 5.1

  Features:
   - Increase xprtrdma maximum transport header and slot table sizes
   - Add support for nfs4_call_sync() calls using a custom
     rpc_task_struct
   - Optimize the default readahead size
   - Enable pNFS filelayout LAYOUTGET on OPEN

  Other bugfixes and cleanups:
   - Fix possible null-pointer dereferences and memory leaks
   - Various NFS over RDMA cleanups
   - Various NFS over RDMA comment updates
   - Don't receive TCP data into a reset request buffer
   - Don't try to parse incomplete RPC messages
   - Fix congestion window race with disconnect
   - Clean up pNFS return-on-close error handling
   - Fixes for NFS4ERR_OLD_STATEID handling"

* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
  pNFS/filelayout: enable LAYOUTGET on OPEN
  NFS: Optimise the default readahead size
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Fix OPEN_DOWNGRADE error handling
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  NFSv4: Add a helper to increment stateid seqids
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Clean up pNFS return-on-close error handling
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  NFS: remove unused check for negative dentry
  NFSv3: use nfs_add_or_obtain() to create and reference inodes
  NFS: Refactor nfs_instantiate() for dentry referencing callers
  SUNRPC: Fix congestion window race with disconnect
  SUNRPC: Don't try to parse incomplete RPC messages
  SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
  SUNRPC: Fix buffer handling of GSS MIC without slack
  SUNRPC: RPC level errors should always set task->tk_rpc_status
  SUNRPC: Don't receive TCP data into a request buffer that has been reset
  ...
2019-09-26 12:20:14 -07:00