Commit Graph

5657 Commits

Author SHA1 Message Date
Trond Myklebust
58ac3e5923 NFSv4/pnfs: Clean up nfs_layout_find_inode()
Now that we can rely on just the rcu_read_lock(), remove the
clp->cl_lock and clean up.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
cf6605d194 NFSv4: Ensure layout headers are RCU safe
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
d911c57a19 NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
Make sure to test the stateid for validity so that we catch instances
where the server may have been reusing stateids in
nfs_layout_find_inode_by_stateid().

Fixes: 7b410d9ce4 ("pNFS: Delay getting the layout header in CB_LAYOUTRECALL handlers")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
194a0dc8e2 pNFS/flexfiles: Report DELAY and GRACE errors from the DS to the server
Ensure that if the DS is returning too many DELAY and GRACE errors, we
also report that to the MDS through the layouterror mechanism.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
a8b373eefc NFS: Limit the size of the access cache by default
Currently, we have no real limit on the access cache size (we set it
to ULONG_MAX). That can lead to credentials getting pinned for a
very long time on lots of files if you have a system with a lot of
memory.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
49cd32543f NFS: Avoid referencing the cred twice in async rename/unlink
In both async rename and rename, we take a reference to the
cred in the call arguments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
63ec2b69e9 NFSv4: Avoid unnecessary credential references in layoutget
Layoutget is just using the credential attached to the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
6129650720 NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O
Avoid unnecessary references to the cred when we have already referenced
it through the open context or the open owner.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
542b994bdb NFS: Assume cred is pinned by open context in I/O requests
In read/write/commit, we should be able to assume that the cred is
pinned by the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
1d179d6bd6 NFS: alloc_nfs_open_context() must use the file cred when available
If we're creating a nfs_open_context() for a specific file pointer,
we must use the cred assigned to that file.

Fixes: a52458b48a ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Trond Myklebust
244fcd2f9a NFS: Ensure we time out if a delegreturn does not complete
We can't allow delegreturn to hold up nfs4_evict_inode() forever,
since that can cause the memory shrinkers to block. This patch
therefore ensures that we eventually time out, and complete the
reclaim of the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Trond Myklebust
59b5639490 NFSv4/pnfs: pnfs_set_layout_stateid() should update the layout cred
If the cred assigned to the layout that we're updating differs from
the one used to retrieve the new layout segment, then we need to
update the layout plh_lc_cred field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Trond Myklebust
57f188e047 NFSv4: nfs_update_inplace_delegation() should update delegation cred
If the cred assigned to the delegation that we're updating differs
from the one we're updating too, then we need to update that field
too.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Trond Myklebust
59e356a967 NFS: Use the 64-bit server readdir cookies when possible
When we're running as a 64-bit architecture and are not running in
32-bit compatibility mode, it is better to use the 64-bit readdir
cookies that supplied by the server. Doing so improves the accuracy
of telldir()/seekdir(), particularly when the directory is changing,
for instance, when doing 'rm -rf'.

We still fall back to using the 32-bit offsets on 32-bit architectures
and when in compatibility mode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Scott Mayhew
55dee1bc0d nfs: add minor version to nfs_server_key for fscache
An NFS client that mounts multiple exports from the same NFS
server with higher NFSv4 versions disabled (i.e. 4.2) and without
forcing a specific NFS version results in fscache index cookie
collisions and the following messages:
[  570.004348] FS-Cache: Duplicate cookie detected

Each nfs_client structure should have its own fscache index cookie,
so add the minorversion to nfs_server_key.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-25 13:53:24 -05:00
Scott Mayhew
75a9b91761 NFS: Fix leak of ctx->nfs_server.hostname
If userspace passes an nfs_mount_data struct in the data argument of
mount(2), then nfs23_parse_monolithic() or nfs4_parse_monolithic()
will allocate memory for ctx->nfs_server.hostname.  This needs to be
freed in nfs_parse_source(), which also allocates memory for
ctx->nfs_server.hostname, otherwise a leak will occur.

Reported-by: syzbot+193c375dcddb4f345091@syzkaller.appspotmail.com
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-25 13:48:21 -05:00
Scott Mayhew
1821b26a1f NFS: Don't hard-code the fs_type when submounting
Hard-coding the fstype causes "nfs4" mounts to appear as "nfs",
which breaks scripts that do "umount -at nfs4".

Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-25 13:31:19 -05:00
Scott Mayhew
1cef21842f NFS: Ensure the fs_context has the correct fs_type before mounting
This is necessary because unless userspace explicitly requests fstype
"nfs4" (either via "mount -t nfs4" or by calling the "mount.nfs4" helper
directly), the fstype will default to "nfs".

This was fine on older kernels because the super_block->s_type was set
via mount_info->nfs_mod->nfs_fs, which was set when parsing the mount
options and subsequently passed in the "type" argument of sget().

After commit f2aedb713c ("NFS: Add fs_context support."), sget_fc(),
which has no "type" argument, is called instead.  In sget_fc(), the
super_block->s_type is set via fs_context->fs_type, which was set when
the filesystem context was initially created.

Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-21 15:51:04 -05:00
Trond Myklebust
5d63944f82 NFSv4: Ensure the delegation cred is pinned when we call delegreturn
Ensure we don't release the delegation cred during the call to
nfs4_proc_delegreturn().

Fixes: ee05f45677 ("NFSv4: Fix races between open and delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-13 16:23:02 -05:00
Trond Myklebust
8c75593c6e NFSv4: Ensure the delegation is pinned in nfs_do_return_delegation()
The call to nfs_do_return_delegation() needs to be taken without
any RCU locks. Add a refcount to make sure the delegation remains
pinned in memory until we're done.

Fixes: ee05f45677 ("NFSv4: Fix races between open and delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-13 16:18:50 -05:00
Olga Kornievskaia
cd1b659d8c NFSv4.1 make cachethis=no for writes
Turning caching off for writes on the server should improve performance.

Fixes: fba83f3411 ("NFS: Pass "privileged" value to nfs4_init_sequence()")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-13 15:37:18 -05:00
Trond Myklebust
efeda80da3 NFSv4: Fix revalidation of dentries with delegations
If a dentry was not initially looked up while we were holding a
delegation, then we do still need to revalidate that it still holds
the same name. If there are multiple hard links to the same file,
then all the hard links need validation.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
[Anna: Put nfs_unset_verifier_delegated() under CONFIG_NFS_V4]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-12 13:55:25 -05:00
Trond Myklebust
cf5b4059ba NFSv4: Fix races between open and dentry revalidation
We want to make sure that we revalidate the dentry if and only if
we've done an OPEN by filename.
In order to avoid races with remote changes to the directory on the
server, we want to save the verifier before calling OPEN. The exception
is if the server returned a delegation with our OPEN, as we then
know that the filename can't have changed on the server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@gmail.com>
Tested-by: Benjamin Coddington <bcodding@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-10 10:50:59 -05:00
Trond Myklebust
a1147b8281 NFS: Fix up directory verifier races
In order to avoid having our dentry revalidation race with an update
of the directory on the server, we need to store the verifier before
the RPC calls to LOOKUP and READDIR.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@gmail.com>
Tested-by: Benjamin Coddington <bcodding@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-10 10:38:48 -05:00
Linus Torvalds
c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
Linus Torvalds
f43574d0ac NFS Client Updates for Linux 5.6
Stable bugfixes:
 - Fix memory leaks and corruption in readdir # v2.6.37+
 - Directory page cache needs to be locked when read # v2.6.37+
 
 New features:
 - Convert NFS to use the new mount API
 - Add "softreval" mount option to let clients use cache if server goes down
 - Add a config option to compile without UDP support
 - Limit the number of inactive delegations the client can cache at once
 - Improved readdir concurrency using iterate_shared()
 
 Other bugfixes and cleanups:
 - More 64-bit time conversions
 - Add additional diagnostic tracepoints
 - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
 - Various xprtrdma cleanups to prepare for 5.7's changes
 - Several fixes for NFS writeback and commit handling
 - Fix acls over krb5i/krb5p mounts
 - Recover from premature loss of openstateids
 - Fix NFS v3 chacl and chmod bug
 - Compare creds using cred_fscmp()
 - Use kmemdup_nul() in more places
 - Optimize readdir cache page invalidation
 - Lease renewal and recovery fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl48kMUACgkQ18tUv7Cl
 QOs/bA/+KAHaee+1jWdgRS88CnNDfeokU2sGWuyXWrVTmiKZ+IjnIUIWqmeKhVyg
 RTbaG4PGTIwiLDFibgzdnc3cTOQEgLnVGWWZ50Xh3b7ubock7+/4JHxqZS+/f3vf
 yqwM0dZaXi5Kcx1kEJ+niBxuzkc9mFI+nHh+wLIlin/kaaUdLKu7mP3NXj2cmWxN
 NoRaKc2gEvkPHhPSH4Z1DVXTHxvH2REFvt9APPUgfLfqcUVHV9b7V/wI/roiGWMn
 53h6f38IdqoNQIpzMog/k/va67NLmEvUZOlpCYPyanPOjuxTrmi8iC2S6gLEOjtc
 GGnQnc5skVL31seFR1NbOJiiN3hTLTncnoXza0cKtYxmo7a/FjXApw4jCu3Rkrav
 UXpCI4O6+2AVVG+pEPbjQy3/GEImeoGvp+xr57jBSZBHoDZU9LDwag65qvZ1btIq
 KOBx2gweQz0aB2heXmfee7qzxFdftHmtMWhIMnJASKNuAWGL23Scqem+d97i2T6H
 7y9OJ3aOXiYxFMLYJCsLWjUJxYiaIANNBmHMjf27mZzcdDuxGFms277CMpNPr3SU
 WZk6/oKw9jaRSzHzaKgVDXiULLXQE1/xZ/mvgR/zk1QAusyeXPvVnMdxoRdxFdXb
 QGZHgUqvFvYi8Lufvs+ZLGS4sAp7oD/Q+lNPXn7cniSwfY4uJiw=
 =b6+F
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Puyll NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Fix memory leaks and corruption in readdir # v2.6.37+
   - Directory page cache needs to be locked when read # v2.6.37+

  New features:
   - Convert NFS to use the new mount API
   - Add "softreval" mount option to let clients use cache if server goes down
   - Add a config option to compile without UDP support
   - Limit the number of inactive delegations the client can cache at once
   - Improved readdir concurrency using iterate_shared()

  Other bugfixes and cleanups:
   - More 64-bit time conversions
   - Add additional diagnostic tracepoints
   - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
   - Various xprtrdma cleanups to prepare for 5.7's changes
   - Several fixes for NFS writeback and commit handling
   - Fix acls over krb5i/krb5p mounts
   - Recover from premature loss of openstateids
   - Fix NFS v3 chacl and chmod bug
   - Compare creds using cred_fscmp()
   - Use kmemdup_nul() in more places
   - Optimize readdir cache page invalidation
   - Lease renewal and recovery fixes"

* tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
  NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
  NFSv4: try lease recovery on NFS4ERR_EXPIRED
  NFS: Fix memory leaks
  nfs: optimise readdir cache page invalidation
  NFS: Switch readdir to using iterate_shared()
  NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
  NFS: Directory page cache pages need to be locked when read
  NFS: Fix memory leaks and corruption in readdir
  SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
  NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
  NFSv4: Limit the total number of cached delegations
  NFSv4: Add accounting for the number of active delegations held
  NFSv4: Try to return the delegation immediately when marked for return on close
  NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
  NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
  NFS: nfs_find_open_context() should use cred_fscmp()
  NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
  NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
  NFS: remove unused macros
  nfs: Return EINVAL rather than ERANGE for mount parse errors
  ...
2020-02-07 17:39:56 -08:00
Al Viro
328de5287b turn fs_param_is_... into functions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:38 -05:00
Al Viro
48ce73b1be fs_parse: handle optional arguments sanely
Don't bother with "mixed" options that would allow both the
form with and without argument (i.e. both -o foo and -o foo=bar).
Rather than trying to shove both into a single fs_parameter_spec,
allow having with-argument and no-argument specs with the same
name and teach fs_parse to handle that.

There are very few options of that sort, and they are actually
easier to handle that way - callers end up with less postprocessing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Al Viro
5eede62529 fold struct fs_parameter_enum into struct constant_table
no real difference now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 00:12:50 -05:00
Al Viro
2710c957a8 fs_parse: get rid of ->enums
Don't do a single array; attach them to fsparam_enum() entry
instead.  And don't bother trying to embed the names into those -
it actually loses memory, with no real speedup worth mentioning.

Simplifies validation as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 00:12:50 -05:00
Robert Milkowski
7dc2993a9e NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
Currently, each time nfs4_do_fsinfo() is called it will do an implicit
NFS4 lease renewal, which is not compliant with the NFS4 specification.
This can result in a lease being expired by an NFS server.

Commit 83ca7f5ab3 ("NFS: Avoid PUTROOTFH when managing leases")
introduced implicit client lease renewal in nfs4_do_fsinfo(),
which can result in the NFSv4.0 lease to expire on a server side,
and servers returning NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID.

This can easily be reproduced by frequently unmounting a sub-mount,
then stat'ing it to get it mounted again, which will delay or even
completely prevent client from sending RENEW operations if no other
NFS operations are issued. Eventually nfs server will expire client's
lease and return an error on file access or next RENEW.

This can also happen when a sub-mount is automatically unmounted
due to inactivity (after nfs_mountpoint_expiry_timeout), then it is
mounted again via stat(). This can result in a short window during
which client's lease will expire on a server but not on a client.
This specific case was observed on production systems.

This patch removes the implicit lease renewal from nfs4_do_fsinfo().

Fixes: 83ca7f5ab3 ("NFS: Avoid PUTROOTFH when managing leases")
Signed-off-by: Robert Milkowski <rmilkowski@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-04 12:27:55 -05:00
Robert Milkowski
924491f2e4 NFSv4: try lease recovery on NFS4ERR_EXPIRED
Currently, if an nfs server returns NFS4ERR_EXPIRED to open(),
we return EIO to applications without even trying to recover.

Fixes: 272289a3df ("NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid")
Signed-off-by: Robert Milkowski <rmilkowski@gmail.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-04 12:08:24 -05:00
Wenwen Wang
123c23c6a7 NFS: Fix memory leaks
In _nfs42_proc_copy(), 'res->commit_res.verf' is allocated through
kzalloc() if 'args->sync' is true. In the following code, if
'res->synchronous' is false, handle_async_copy() will be invoked. If an
error occurs during the invocation, the following code will not be executed
and the error will be returned . However, the allocated
'res->commit_res.verf' is not deallocated, leading to a memory leak. This
is also true if the invocation of process_copy_commit() returns an error.

To fix the above leaks, redirect the execution to the 'out' label if an
error is encountered.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-04 11:01:54 -05:00
Dai Ngo
227823d207 nfs: optimise readdir cache page invalidation
When the directory is large and it's being modified by one client
while another client is doing the 'ls -l' on the same directory then
the cache page invalidation from nfs_force_use_readdirplus causes
the reading client to keep restarting READDIRPLUS from cookie 0
which causes the 'ls -l' to take a very long time to complete,
possibly never completing.

Currently when nfs_force_use_readdirplus is called to switch from
READDIR to READDIRPLUS, it invalidates all the cached pages of the
directory. This cache page invalidation causes the next nfs_readdir
to re-read the directory content from cookie 0.

This patch is to optimise the cache invalidation in
nfs_force_use_readdirplus by only truncating the cached pages from
last page index accessed to the end the file. It also marks the
inode to delay invalidating all the cached page of the directory
until the next initial nfs_readdir of the next 'ls' instance.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Fix conflicts with Trond's readdir patches]
[Anna - Remove redundant call to nfs_zap_mapping()]
[Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-04 10:50:44 -05:00
Trond Myklebust
93a6ab7b69 NFS: Switch readdir to using iterate_shared()
Now that the page cache locking is repaired, we should be able to
switch to using iterate_shared() for improved concurrency when
doing readdir().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:51 -05:00
Trond Myklebust
3803d6721b NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
The directory strings stored in the readdir cache may be used with
printk(), so it is better to ensure they are nul-terminated.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:45 -05:00
Trond Myklebust
114de38225 NFS: Directory page cache pages need to be locked when read
When a NFS directory page cache page is removed from the page cache,
its contents are freed through a call to nfs_readdir_clear_array().
To prevent the removal of the page cache entry until after we've
finished reading it, we must take the page lock.

Fixes: 11de3b11e0 ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:17 -05:00
Trond Myklebust
4b310319c6 NFS: Fix memory leaks and corruption in readdir
nfs_readdir_xdr_to_array() must not exit without having initialised
the array, so that the page cache deletion routines can safely
call nfs_readdir_clear_array().
Furthermore, we should ensure that if we exit nfs_readdir_filler()
with an error, we free up any page contents to prevent a leak
if we try to fill the page again.

Fixes: 11de3b11e0 ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:17 -05:00
Trond Myklebust
a8bd9ddf39 NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
When we already know the string length, it is more efficient to
use kmemdup_nul().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Changes to super.c were already made during fscontext conversion]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
10717f4563 NFSv4: Limit the total number of cached delegations
Delegations can be expensive to return, and can cause scalability issues
for the server. Let's therefore try to limit the number of inactive
delegations we hold.
Once the number of delegations is above a certain threshold, start
to return them on close.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
d2269ea14e NFSv4: Add accounting for the number of active delegations held
In order to better manage our delegation caching, add a counter
to track the number of active delegations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
b7b7dac684 NFSv4: Try to return the delegation immediately when marked for return on close
Add a routine to return the delegation immediately upon close of the
file if it was marked for return-on-close.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
0d10416797 NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
If a delegation is marked as needing to be returned when the file is
closed, then don't clear that marking until we're ready to return
it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
f885ea640d NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
In particular, the pnfs return-on-close code will check for that flag,
so ensure we set it appropriately.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
65f5160376 NFS: nfs_find_open_context() should use cred_fscmp()
We want to find open contexts that match our filesystem access
properties. They don't have to exactly match the cred.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
9a206de2ea NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
We do not need to have the rcu lookup method fail in the case where
the fsuid/fsgid and supplemental groups match.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
3871224787 NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().

Fixes: a52458b48a ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Alex Shi
c0399cf668 NFS: remove unused macros
MNT_fhs_status_sz/MNT_fhandle3_sz are never used after they were
introduced. So better to remove them.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 10:43:06 -05:00