In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding a couple of break statements instead of
just letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It's OK to grant a read delegation to a client that holds a write,
as long as it's the only client holding the write.
We originally tried to do this in commit 94415b06eb ("nfsd4: a
client's own opens needn't prevent delegations"), which had to be
reverted in commit 6ee65a7730 ("Revert "nfsd4: a client's own
opens needn't prevent delegations"").
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
No change in behavior, I'm just moving some code around to avoid forward
references in a following patch.
(To do someday: figure out how to split up nfs4state.c. It's big and
disorganized.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It's unusual but possible for multiple filehandles to point to the same
file. In that case, we may end up with multiple nfs4_files referencing
the same inode.
For delegation purposes it will turn out to be useful to flag those
cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The nfs4_file structure is per-filehandle, not per-inode, because the
spec requires open and other state to be per filehandle.
But it will turn out to be convenient for nfs4_files associated with the
same inode to be hashed to the same bucket, so let's hash on the inode
instead of the filehandle.
Filehandle aliasing is rare, so that shouldn't have much performance
impact.
(If you have a ton of exported filesystems, though, and all of them have
a root with inode number 2, could that get you an overlong hash chain?
Perhaps this (and the v4 open file cache) should be hashed on the inode
pointer instead.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
If nfsd already has an open file that it plans to use for IO from
another, it may not need to do another vfs open, but it still may need
to break any delegations in case the existing opens are for another
client.
Symptoms are that we may incorrectly fail to break a delegation on a
write open from a different client, when the delegation-holding client
already has a write open.
Fixes: 28df3d1539 ("nfsd: clients don't need to break their own delegations")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since commit 501cb1849f ("nfsd: rip out the raparms cache")
nrservs is not used in nfsd_startup_generic()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Capture error codes in @ret, which is passed to the send_err
tracepoint, so that they can be logged when something goes awry.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc580 ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>
This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
These fields are no longer used.
The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.
Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().
The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Currently, XPT_BUSY is not cleared until xpo_recvfrom returns.
That effectively blocks the receipt and handling of the next RPC
message until the current one has been taken off the transport.
This strict ordering is a requirement for socket transports.
For our kernel RPC/RDMA transport implementation, however, dequeuing
an ingress message is nothing more than a list_del(). The transport
can safely be marked un-busy as soon as that is done.
To keep the changes simpler, this patch just moves the
svc_xprt_received() call site from svc_handle_xprt() into the
transports, so that the actual optimization can be done in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_rdma_sendto() now waits for the NIC hardware to finish with
the pages backing rq_res. We still have to release the page array
in some cases, but now it's always safe to immediately re-use the
page backing rq_res's head buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.
Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).
Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.
Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Introduce a server-side mechanism similar to commit e340c2d6ef
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
xprt pinning was removed in commit 365e9992b9 ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: explain why svc_xprt_enqueue() is invoked in the event
handler even though no xpt_flags bits are toggled here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.
Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.
So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.
This requires a bit of infrastructure to keep the dentry for the "info"
file. There is no need to take a counted reference as the dentry must
remain around until the client is removed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.
Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>From https://tools.ietf.org/html/rfc7862#page-65
A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.
Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.
This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
`printk()`, by default, uses the log level warning, which leaves the
user reading
NFSD: Using UMH upcall client tracking operations.
wondering what to do about it (`dmesg --level=warn`).
Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Enable watching the progress of directory encoding to capture the
timing of any issues with reading or encoding a directory. The
new tracepoint captures dirent encoding for all NFS versions.
For example, here's what a few NFSv4 directory entries might look
like:
nfsd-989 [002] 468.596265: nfsd_dirent: fh_hash=0x5d162594 ino=2 name=.
nfsd-989 [002] 468.596267: nfsd_dirent: fh_hash=0x5d162594 ino=1 name=..
nfsd-989 [002] 468.596299: nfsd_dirent: fh_hash=0x5d162594 ino=3827 name=zlib.c
nfsd-989 [002] 468.596325: nfsd_dirent: fh_hash=0x5d162594 ino=3811 name=xdiff
nfsd-989 [002] 468.596351: nfsd_dirent: fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h
nfsd-989 [002] 468.596377: nfsd_dirent: fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>