Commit Graph

902187 Commits

Author SHA1 Message Date
Trond Myklebust
97a728f5e2 NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
Ensure that the dereference of the layout cred is atomic with the
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-03 18:29:10 -04:00
Trond Myklebust
fc51b1cf39 NFS: Beware when dereferencing the delegation cred
When we look up the delegation cred, we are usually doing so in
conjunction with a read of the stateid, and we want to ensure
that the look up is atomic with that read.

Fixes: 57f188e047 ("NFSv4: nfs_update_inplace_delegation() should update delegation cred")
[sfr@canb.auug.org.au: Fixed up borken Fixes: line from Trond :-)]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-03 18:26:02 -04:00
Trond Myklebust
f30a6ea0f3 NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
Setting nfs_mountpoint_expiry_timeout() to a negative value stops
mountpoint expiration, while setting it to a positive value restarts
the scheduler.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-02 18:53:59 -04:00
Trond Myklebust
75da98586a NFS: finish_automount() requires us to hold 2 refs to the mount record
We must not return from nfs_d_automount() without holding 2 references
to the mount record. Doing so, will trigger the BUG() in finish_automount().
Also ensure that we don't try to reschedule the automount timer with
a negative or zero timeout value.

Fixes: 22a1ae9a93 ("NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-02 18:51:12 -04:00
Scott Mayhew
529af90576 NFS: Fix a few constant_table array definitions
nfs_vers_tokens, nfs_xprt_protocol_tokens, and nfs_secflavor_tokens were
all missing an empty item at the end of the array, allowing
lookup_constant() to potentially walk off the end and trigger and oops.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: e38bb238ed ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-02 18:37:13 -04:00
Trond Myklebust
ed5d588fe4 NFS: Try to join page groups before an O_DIRECT retransmission
If we have to retransmit requests, try to join their page groups
first.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:57 -04:00
Trond Myklebust
e00ed89d7b NFS: Refactor nfs_lock_and_join_requests()
Refactor nfs_lock_and_join_requests() in order to separate out the
subrequest merging into its own function nfs_lock_and_join_group()
that can be used by O_DIRECT.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
44a65a0c27 NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
If we have to split the request up into subrequests, we have to submit
the request pointed to by the function call parameter last, in case
there is an error or other issue that causes us to exit before the
last request is submitted. The reason is that the caller is expected
to perform cleanup in those cases.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
a62f8e3bd8 NFS: Clean up nfs_lock_and_join_requests()
Clean up nfs_lock_and_join_requests() to simplify the calculation
of the range covered by the page group, taking into account the
presence of mirrors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
377840ee48 NFS: Remove the redundant function nfs_pgio_has_mirroring()
We need to trust that desc->pg_mirror_idx is set correctly, whether
or not mirroring is enabled.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
862f35c947 NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
If we just set the mirror count to 1 without first clearing out
the mirrors, we can leak queued up requests.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
f02cec9d33 NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
nfs_direct_write_scan_commit_list() will lock the request and bump
the reference count, but we also need to account for the reference
that was taken when we initially added the request to the commit list.

Fixes: fb5f7f20cd ("NFS: commit errors should be fatal")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
dc9dc2febb NFS: Fix use-after-free issues in nfs_pageio_add_request()
We need to ensure that we create the mirror requests before calling
nfs_pageio_add_request_mirror() on the request we are adding.
Otherwise, we can end up with a use-after-free if the call to
nfs_pageio_add_request_mirror() triggers I/O.

Fixes: c917cfaf9b ("NFS: Fix up NFS I/O subrequest creation")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
08ca8b21f7 NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
When a subrequest is being detached from the subgroup, we want to
ensure that it is not holding the group lock, or in the process
of waiting for the group lock.

Fixes: 5b2b5187fa ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:34:28 -04:00
Trond Myklebust
add42de317 NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
When we detach a subrequest from the list, we must also release the
reference it holds to the parent.

Fixes: 5b2b5187fa ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 10:11:22 -04:00
Trond Myklebust
f764a1e1cb NFSoRDMA Client Updates for Linux 5.7
New Features:
 - Allow one active connection and several zombie connections to prevent
   blocking if the remote server is unresponsive.
 
 Bugfixes and Cleanups:
 - Enhance MR-related trace points
 - Refactor connection set-up and disconnect functions
 - Make Protection Domains per-connection instead of per-transport
 - Merge struct rpcrdma_ia into rpcrdma_ep
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl5+GKEACgkQ18tUv7Cl
 QOvJVBAA4MNSvzo0856X414uUMe1YK1HXblqYbNWSe0bRsfaakp8mscMFmd4/IuG
 YSOlQPYcTErdfeEelS1ayMZA3qy5Ke62hKFFfv0fk11lN3VoT4XPiIqY6Ry2OJaR
 7XoYyH3YKJxgqOYhQwSo4rvFRNtPKbvEtcQn8uLXnMclomlZ1mli4F9rYs23SvLL
 Jbq1q/SKx0w+GAOUj418dYLjMnQU7L+TotplTBGzE7mkhBR0JarBiKZfLxbg2/md
 Fva1EwY4Tgef+4ofVr5KP8raGVGl6zy8TIZezlijeUv4rmk9JudrYjeUNqyCWufw
 ffkrM+wPr8kQViQoei5ByEvVG2pQK9nfxccx6rxiIOoG8Mb1aK8Wh75e2kjfZZeY
 bsEx8wSfRvI54ukk/NfGpP21OP1DDKU4ET6hsW+b6onqmDfYPZ/LinUvrvIHietQ
 IsMOG+QE+G1mXFiQR1P9dZcIwTNvE/SyUoaykbeslcYCkQt5yEk25UK+38TornbV
 hBxrm9KBd25RUu8prFJadmLBjPx3tLLd451xxppCyrIlpw0b+Q4yHlF7Jr75DYBT
 xjW5Xpr6tJZuoybbSyT9B/tiV4gIfyJZCU5S95nFlluKZ0gQd2jIGj/JagNWppke
 XSEqRR/Ja0OnB7yDz60gQrrZnjw/Em2jx7xbHIXFK+rs45dkdFc=
 =piIM
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-5.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

NFSoRDMA Client Updates for Linux 5.7

New Features:
- Allow one active connection and several zombie connections to prevent
  blocking if the remote server is unresponsive.

Bugfixes and Cleanups:
- Enhance MR-related trace points
- Refactor connection set-up and disconnect functions
- Make Protection Domains per-connection instead of per-transport
- Merge struct rpcrdma_ia into rpcrdma_ep
2020-03-28 12:01:17 -04:00
Trond Myklebust
1de3af9883 NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
If the FLUSH_SYNC flag is set, nfs_initiate_pgio() will currently
wait for completion, and then return the status of the I/O operation.
What we actually want to report in nfs_pageio_doio() is whether or
not the RPC call was launched successfully, whereas actual I/O
status is intended handled in the reply callbacks.

Since FLUSH_SYNC is never set by any of the callers anyway, let's
just remove that code altogether.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-28 11:54:20 -04:00
Trond Myklebust
cbd7be43c4 pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
e70430d939 pNFS/flexfiles: remove requirement for whole file layouts
Remove the requirement that the server always sends whole file
layouts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
e1e54ab710 pNFS/flexfiles: Check the layout segment range before doing I/O
When starting to read or write with a layout segment, check that the
range matches our request.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
660d1eb223 pNFS/flexfile: Don't merge layout segments if the mirrors don't match
Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
e18c18ebd7 NFS/pNFS: Fix pnfs_layout_mark_request_commit() invalid layout segment handling
Fix up pnfs_layout_mark_request_commit() to alway reschedule the write
if the layout segment is invalid. Also minor cleanup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
c84bea5944 NFS/pNFS: Simplify bucket layout segment reference counting
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
9c455a8c1e NFS/pNFS: Clean up pNFS commit operations
Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
0aa647b736 NFS: Remove bucket array from struct pnfs_ds_commit_info
Remove the unused bucket array in struct pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
fb6b53ba40 NFS/pNFS: Add a helper pnfs_generic_search_commit_reqs()
Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code,
and add support for commit arrays.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
ba827c9abb pNFS: Enable per-layout segment commit structures
Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
a9901899b6 pNFS: Add infrastructure for cleaning up per-layout commit structures
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
e3b9f7e60b NFS/pNFS: Support commit arrays in nfs_clear_pnfs_ds_commit_verifiers()
Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
1f28476dcb NFS: Fix O_DIRECT commit verifier handling
Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
fb5f7f20cd NFS: commit errors should be fatal
Fix the O_DIRECT code to avoid retries if the COMMIT fails with a fatal
error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
18f4129696 NFS/pNFS: Allow O_DIRECT to release the DS commitinfo
Add a pNFS callback to allow the O_DIRECT code to release the DS
commitinfo when freeing the dreq.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
0cb1f6df8a pNFS: Support per-layout segment commits in pnfs_generic_commit_pagelist()
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_commit_pagelist().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
fce9ed0302 pNFS: Support per-layout segment commits in pnfs_generic_recover_commit_reqs()
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_recover_commit_reqs().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
a8e3765e51 NFSv4/pNFS: Scan the full list of commit arrays when committing
Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_scan_commit_lists()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
c21e716884 NFSv4/pnfs: Support a list of commit arrays in struct pnfs_ds_commit_info
When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Chuck Lever
e28ce90083 xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt
Change the rpcrdma_xprt_disconnect() function so that it no longer
waits for the DISCONNECTED event.  This prevents blocking if the
remote is unresponsive.

In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is
detached. Upon return from rpcrdma_xprt_disconnect(), the transport
(r_xprt) is ready immediately for a new connection.

The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now
handled almost identically.

However, because the lifetimes of rpcrdma_xprt structures and
rpcrdma_ep structures are now independent, creating an rpcrdma_ep
needs to take a module ref count. The ep now owns most of the
hardware resources for a transport.

Also, a kref is needed to ensure that rpcrdma_ep sticks around
long enough for the cm_event_handler to finish.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
745b734c9b xprtrdma: Extract sockaddr from struct rdma_cm_id
rpcrdma_cm_event_handler() is always passed an @id pointer that is
valid. However, in a subsequent patch, we won't be able to extract
an r_xprt in every case. So instead of using the r_xprt's
presentation address strings, extract them from struct rdma_cm_id.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
93aa8e0a9d xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep
I eventually want to allocate rpcrdma_ep separately from struct
rpcrdma_xprt so that on occasion there can be more than one ep per
xprt.

The new struct rpcrdma_ep will contain all the fields currently in
rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings
for the connection, in addition to per-connection settings
negotiated with the remote.

Take this opportunity to rename the existing ep fields from rep_* to
re_* to disambiguate these from struct rpcrdma_rep.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
d6ccebf956 xprtrdma: Disconnect on flushed completion
Completion errors after a disconnect often occur much sooner than a
CM_DISCONNECT event. Use this to try to detect connection loss more
quickly.

Note that other kernel ULPs do take care to disconnect explicitly
when a WR is flushed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
897b7be9bc xprtrdma: Remove rpcrdma_ia::ri_flags
Clean up:
The upper layer serializes calls to xprt_rdma_close, so there is no
need for an atomic bit operation, saving 8 bytes in rpcrdma_ia.

This enables merging rpcrdma_ia_remove directly into the disconnect
logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:25 -04:00
Chuck Lever
81fe0c57f4 xprtrdma: Invoke rpcrdma_ia_open in the connect worker
Move rdma_cm_id creation into rpcrdma_ep_create() so that it is now
responsible for allocating all per-connection hardware resources.

With this clean-up, all three arms of the switch statement in
rpcrdma_ep_connect are exactly the same now, thus the switch can be
removed.

Because device removal behaves a little differently than
disconnection, there is a little more work to be done before
rpcrdma_ep_destroy() can release the connection's rdma_cm_id. So
it is not quite symmetrical with rpcrdma_ep_create() yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
9ba373ee24 xprtrdma: Allocate Protection Domain in rpcrdma_ep_create()
Make a Protection Domain (PD) a per-connection resource rather than
a per-transport resource. In other words, when the connection
terminates, the PD is destroyed.

Thus there is one less HW resource that remains allocated to a
transport after a connection is closed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
9144a803df xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect()
Clean up: Simplify the synopses of functions in the connect and
disconnect paths in preparation for combining the rpcrdma_ia and
struct rpcrdma_ep structures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
97d0de8812 xprtrdma: Clean up the post_send path
Clean up: Simplify the synopses of functions in the post_send path
by combining the struct rpcrdma_ia and struct rpcrdma_ep arguments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
253a51622f xprtrdma: Refactor frwr_init_mr()
Clean up: prepare for combining the rpcrdma_ia and rpcrdma_ep
structures. Take the opportunity to rename the function to be
consistent with the "subsystem _ object _ verb" naming scheme.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
85cd8e2b78 xprtrdma: Invoke rpcrdma_ep_create() in the connect worker
Refactor rpcrdma_ep_create(), rpcrdma_ep_disconnect(), and
rpcrdma_ep_destroy().

rpcrdma_ep_create will be invoked at connect time instead of at
transport set-up time. It will be responsible for allocating per-
connection resources. In this patch it allocates the CQs and
creates a QP. More to come.

rpcrdma_ep_destroy() is the inverse functionality that is
invoked at disconnect time. It will be responsible for releasing
the CQs and QP.

These changes should be safe to do because both connect and
disconnect is guaranteed to be serialized by the transport send
lock.

This takes us another step closer to resolving the address and route
only at connect time so that connection failover to another device
will work correctly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Chuck Lever
62a89501a3 xprtrdma: Enhance MR-related trace points
Two changes:
- Show the number of SG entries that were mapped. This helps debug
  DMA-related problems.
- Record the MR's resource ID instead of its memory address. This
  groups each MR with its associated rdma-tool output, and reduces
  needless exposure of memory addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-03-27 10:47:24 -04:00
Trond Myklebust
d7242c4641 pNFS: Add a helper to allocate the array of buckets
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26 10:52:04 -04:00
Trond Myklebust
19573c939a NFS/pNFS: Refactor pnfs_generic_commit_pagelist()
Refactor pnfs_generic_commit_pagelist() to simplify the conversion
to layout segment based commit lists.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26 10:52:04 -04:00