Commit Graph

381 Commits

Author SHA1 Message Date
ChenXiaoSong
b243874f6f NFSv4: fix open failure with O_ACCMODE flag
open() with O_ACCMODE|O_DIRECT flags secondly will fail.

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)

Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
client use incorrect share access mode of 0.

Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
just like firstly opening.

Fixes: ce4ef7c0a8 ("NFS: Split out NFS v4 file operations")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29 22:14:00 -04:00
Trond Myklebust
515dcdcd48 NFS: nfsiod should not block forever in mempool_alloc()
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
230bc98f7a NFS: Improve heuristic for readdirplus
The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.

The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.

To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02 08:43:38 -05:00
Trond Myklebust
84631f84ac NFS: Clean up NFSv4.2 xattrs
Add a helper for the xattr mask so that we can get rid of the inlined
ifdefs.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-02-25 18:50:13 -05:00
Trond Myklebust
00bdadc7ac NFS: Add a helper to remove case-insensitive aliases
When dealing with case insensitive names, the client has no idea how the
server performs the mapping, so cannot collapse the dentries into a
single representative. So both rename and unlink need to deal with the
fact that there could be several dentries representing the file, and
have to somehow force them to be revalidated. Use d_prune_aliases() as a
big hammer approach.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-06 14:00:20 -05:00
Anna Schumaker
d755ad8dc7 NFS: Create a new nfs_alloc_fattr_with_label() function
For creating fattrs with the label field already allocated for us. I
also update nfs_free_fattr() to free the label in the end.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-11-05 14:54:38 -04:00
Anna Schumaker
5fe1210d25 NFS: Unexport nfs_probe_fsinfo()
All the callers are now in client.c so we can remove the
EXPORT_SYMBOL_GPL() and make it static.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:55 -04:00
Anna Schumaker
e5731131fb NFS: Move nfs_probe_destination() into the generic client
And rename it to nfs_probe_server(). I also change it to take the nfs_fh
as an argument so callers can choose what filehandle to probe.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Anna Schumaker
01dde76e47 NFS: Create an nfs4_server_set_init_caps() function
And call it before doing an FSINFO probe to reset to the baseline
capabilities before probing.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Olga Kornievskaia
7e134205f6 NFSv4 introduce max_connect mount options
This option will control up to how many xprts can the client
establish to the server with a distinct address (that means
nconnect connections are not counted towards this new limit).
This patch is setting up nfs structures to keeep track of the
max_connect limit (does not enforce it).

The default value is kept at 1 so that no current mounts that
don't want any additional connections would be effected. The
maximum value is set at 16.

Mounts to DS are not limited to default value of 1 but instead
set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:17 -04:00
Linus Torvalds
a647034fe2 NFS client updates for Linux 5.13
Highlights include:
 
 Stable fixes:
 - Add validation of the UDP retrans parameter to prevent shift out-of-bounds
 - Don't discard pNFS layout segments that are marked for return
 
 Bugfixes:
 - Fix a NULL dereference crash in xprt_complete_bc_request() when the
   NFSv4.1 server misbehaves.
 - Fix the handling of NFS READDIR cookie verifiers
 - Sundry fixes to ensure attribute revalidation works correctly when the
   server does not return post-op attributes.
 - nfs4_bitmask_adjust() must not change the server global bitmasks
 - Fix major timeout handling in the RPC code.
 - NFSv4.2 fallocate() fixes.
 - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
 - Copy offload attribute revalidation fixes
 - Fix an incorrect filehandle size check in the pNFS flexfiles driver
 - Fix several RDMA transport setup/teardown races
 - Fix several RDMA queue wrapping issues
 - Fix a misplaced memory read barrier in sunrpc's call_decode()
 
 Features:
 - Micro optimisation of the TCP transmission queue using TCP_CORK
 - statx() performance improvements by further splitting up the tracking
   of invalid cached file metadata.
 - Support the NFSv4.2 "change_attr_type" attribute and use it to
   optimise handling of change attribute updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK
 APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL
 FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg
 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1
 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd
 W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn
 bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ
 XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2
 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp
 f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI
 daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R
 QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA=
 =h01b
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Add validation of the UDP retrans parameter to prevent shift
     out-of-bounds

   - Don't discard pNFS layout segments that are marked for return

  Bugfixes:

   - Fix a NULL dereference crash in xprt_complete_bc_request() when the
     NFSv4.1 server misbehaves.

   - Fix the handling of NFS READDIR cookie verifiers

   - Sundry fixes to ensure attribute revalidation works correctly when
     the server does not return post-op attributes.

   - nfs4_bitmask_adjust() must not change the server global bitmasks

   - Fix major timeout handling in the RPC code.

   - NFSv4.2 fallocate() fixes.

   - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling

   - Copy offload attribute revalidation fixes

   - Fix an incorrect filehandle size check in the pNFS flexfiles driver

   - Fix several RDMA transport setup/teardown races

   - Fix several RDMA queue wrapping issues

   - Fix a misplaced memory read barrier in sunrpc's call_decode()

  Features:

   - Micro optimisation of the TCP transmission queue using TCP_CORK

   - statx() performance improvements by further splitting up the
     tracking of invalid cached file metadata.

   - Support the NFSv4.2 'change_attr_type' attribute and use it to
     optimise handling of change attribute updates"

* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
  xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
  sunrpc: Fix misplaced barrier in call_decode
  NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
  xprtrdma: Move fr_mr field to struct rpcrdma_mr
  xprtrdma: Move the Work Request union to struct rpcrdma_mr
  xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
  xprtrdma: Move cqe to struct rpcrdma_mr
  xprtrdma: Move fr_cid to struct rpcrdma_mr
  xprtrdma: Remove the RPC/RDMA QP event handler
  xprtrdma: Don't display r_xprt memory addresses in tracepoints
  xprtrdma: Add an rpcrdma_mr_completion_class
  xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
  xprtrdma: Avoid Send Queue wrapping
  xprtrdma: Do not wake RPC consumer on a failed LocalInv
  xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
  xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
  xprtrdma: Rename frwr_release_mr()
  xprtrdma: rpcrdma_mr_pop() already does list_del_init()
  xprtrdma: Delete rpcrdma_recv_buffer_put()
  xprtrdma: Fix cwnd update ordering
  ...
2021-05-07 11:23:41 -07:00
Linus Torvalds
f1c921fb70 selinux/stable-5.13 PR 20210426
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM2sUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNfCg/9GmoCyCh+ZRj5RGQ6M+yJas1+yyJQ
 uEfTNde54yfATUTaaWYnZG59yqzM3I2uaV11U7tqg8ajiFPxJKqbs5R9jl3lnSjH
 0Dg22nXPSCOTKcU0x/DeLoKRr+M9jO1K/nQ8NEZvYX4nC/OgtCvJqb/oEQZIKAk5
 2a7OEmNNQyFGd274p9dELaDHxN9UIaJ2PzQFXtq7ROHgBXQO4ONb2ajOf6mDSFQb
 vP/CDHwaH+pcE28w44oRy0/YBkO1SrdqoFQchg5yFagM5tQRLGkXK4OFSs5KHi5Q
 YMtmaOzMPIv1e5eaC1HuuMJYA4pPb30T9hFHP7tmBVZfmZaFaDeUs+BhMm98WTiS
 o0iTP7tfs36/poOR1Q0/sB06uvF9hUAAX1ZuE95YySifbXU9hsUc9b0uQSwCdg9P
 /J9rcdHLTpWqjw9n02mezWmAvo5U8ZvbDs+0xPIwI+3RTUP5t6mp+Hd5Tc7bPTq1
 0rpWXx+FQoSytFap5qiUSiwBp+HF6HQnNIXB0Muf6wctChoTjvo7TwoxH//z4kEm
 +SddhOCNkB7VC/X7hOxhl0F/rdHuXvb1AFIWjpTLJH2CR1PvMtF+sGey+uPT6hKZ
 /gvhmQGjFdph99eGlfVbCNvx1pM61O25IscaYD1T2wGImw+z7dX4WkG3WoOdDSkR
 bRjrBkcHh0gLhWk=
 =HTEy
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Add support for measuring the SELinux state and policy capabilities
   using IMA.

 - A handful of SELinux/NFS patches to compare the SELinux state of one
   mount with a set of mount options. Olga goes into more detail in the
   patch descriptions, but this is important as it allows more
   flexibility when using NFS and SELinux context mounts.

 - Properly differentiate between the subjective and objective LSM
   credentials; including support for the SELinux and Smack. My clumsy
   attempt at a proper fix for AppArmor didn't quite pass muster so John
   is working on a proper AppArmor patch, in the meantime this set of
   patches shouldn't change the behavior of AppArmor in any way. This
   change explains the bulk of the diffstat beyond security/.

 - Fix a problem where we were not properly terminating the permission
   list for two SELinux object classes.

* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: add proper NULL termination to the secclass_map permissions
  smack: differentiate between subjective and objective task credentials
  selinux: clarify task subjective and objective credentials
  lsm: separate security_task_getsecid() into subjective and objective variants
  nfs: account for selinux security context when deciding to share superblock
  nfs: remove unneeded null check in nfs_fill_super()
  lsm,selinux: add new hook to compare new mount to an existing mount
  selinux: fix misspellings using codespell tool
  selinux: fix misspellings using codespell tool
  selinux: measure state and policy capabilities
  selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27 13:42:11 -07:00
Eryu Guan
c9301cb35b nfs: hornor timeo and retrans option when mounting NFSv3
Mounting NFSv3 uses default timeout parameters specified by underlying
sunrpc transport, and mount options like 'timeo' and 'retrans', unlike
NFSv4, are not honored.

But sometimes we want to set non-default timeout value when mounting
NFSv3, so pass 'timeo' and 'retrans' to nfs_mount() and fill the
'timeout' field of struct rpc_create_args before creating RPC
connection. This is also consistent with NFSv4 behavior.

Note that this only sets the timeout value of rpc connection to mountd,
but the timeout of rpcbind connection should be set as well. A later
patch will fix the rpcbind part.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-05 09:04:21 -04:00
Olga Kornievskaia
ec1ade6a04 nfs: account for selinux security context when deciding to share superblock
Keep track of whether or not there were LSM security context
options passed during mount (ie creation of the superblock).
Then, while deciding if the superblock can be shared for the new
mount, check if the newly passed in LSM security context options
are compatible with the existing superblock's ones by calling
security_sb_mnt_opts_compat().

Previously, with selinux enabled, NFS wasn't able to do the
following 2mounts:
mount -o vers=4.2,sec=sys,context=system_u:object_r:root_t:s0
<serverip>:/ /mnt
mount -o vers=4.2,sec=sys,context=system_u:object_r:swapfile_t:s0
<serverip>:/scratch /scratch

2nd mount would fail with "mount.nfs: an incorrect mount option was
specified" and var log messages would have:
"SElinux: mount invalid. Same superblock, different security
settings for.."

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 15:01:45 -04:00
Trond Myklebust
fd6d3feed0 NFS: Clean up function nfs_mark_dir_for_revalidate()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-03-08 16:01:02 -05:00
Christian Brauner
549c729771
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:20 +01:00
Trond Myklebust
896567ee7f NFS: nfs_igrab_and_active must first reference the superblock
Before referencing the inode, we must ensure that the superblock can be
referenced. Otherwise, we can end up with iput() calling superblock
operations that are no longer valid or accessible.

Fixes: ea7c38fef0 ("NFSv4: Ensure we reference the inode for return-on-close in delegreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 16:29:28 -05:00
Scott Mayhew
c98e9daa59 NFS: Adjust fs_context error logging
Several existing dprink()/dfprintk() calls were converted to use the new
mount API logging macros by commit ce8866f091 ("NFS: Attach
supplementary error information to fs_context").  If the fs_context was
not created using fsopen() then it will not have had a log buffer
allocated for it, and the new mount API logging macros will wind up
calling printk().

This can result in syslog messages being logged where previously there
were none... most notably "NFS4: Couldn't follow remote path", which can
happen if the client is auto-negotiating a protocol version with an NFS
server that doesn't support the higher v4.x versions.

Convert the nfs_errorf(), nfs_invalf(), and nfs_warnf() macros to check
for the existence of the fs_context's log buffer and call dprintk() if
it doesn't exist.  Add nfs_ferrorf(), nfs_finvalf(), and nfs_warnf(),
which do the same thing but take an NFS debug flag as an argument and
call dfprintk().  Finally, modify the "NFS4: Couldn't follow remote
path" message to use nfs_ferrorf().

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207385
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: ce8866f091 ("NFS: Attach supplementary error information to fs_context.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-10 13:32:39 -05:00
Trond Myklebust
1a34c8c9a4 NFS: Support larger readdir buffers
Support readdir buffers of up to 1MB in size so that we can read
large directories using few RPC calls.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
2020-12-02 14:05:52 -05:00
NeilBrown
8d92890bd6 mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead
After an NFS page has been written it is considered "unstable" until a
COMMIT request succeeds.  If the COMMIT fails, the page will be
re-written.

These "unstable" pages are currently accounted as "reclaimable", either
in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
'reclaimable' count.  This might have made sense when sending the COMMIT
required a separate action by the VFS/MM (e.g.  releasepage() used to
send a COMMIT).  However now that all writes generated by ->writepages()
will automatically be followed by a COMMIT (since commit 919e3bd9a8
("NFS: Ensure we commit after writeback is complete")) it makes more
sense to treat them as writeback pages.

So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
NR_WRITEBACK and WB_WRITEBACK.

A particular effect of this change is that when
wb_check_background_flush() calls wb_over_bg_threshold(), the latter
will report 'true' a lot less often as the 'unstable' pages are no
longer considered 'dirty' (as there is nothing that writeback can do
about them anyway).

Currently wb_check_background_flush() will trigger writeback to NFS even
when there are relatively few dirty pages (if there are lots of unstable
pages), this can result in small writes going to the server (10s of
Kilobytes rather than a Megabyte) which hurts throughput.  With this
patch, there are fewer writes which are each larger on average.

Where the NR_UNSTABLE_NFS count was included in statistics
virtual-files, the entry is retained, but the value is hard-coded as
zero.  static trace points and warning printks which mentioned this
counter no longer report it.

[akpm@linux-foundation.org: re-layout comment]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Acked-by: Michal Hocko <mhocko@suse.com>	[mm]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:08 -07:00
Trond Myklebust
377840ee48 NFS: Remove the redundant function nfs_pgio_has_mirroring()
We need to trust that desc->pg_mirror_idx is set correctly, whether
or not mirroring is enabled.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01 13:37:56 -04:00
Trond Myklebust
0aa647b736 NFS: Remove bucket array from struct pnfs_ds_commit_info
Remove the unused bucket array in struct pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
a9901899b6 pNFS: Add infrastructure for cleaning up per-layout commit structures
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
e3b9f7e60b NFS/pNFS: Support commit arrays in nfs_clear_pnfs_ds_commit_verifiers()
Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
1f28476dcb NFS: Fix O_DIRECT commit verifier handling
Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
3c9e502b59 NFS: Add a helper nfs_client_for_each_server()
Add a helper nfs_client_for_each_server() to iterate through all the
filesystems that are attached to a struct nfs_client, and apply
a function to all the active ones.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:30 -04:00
Trond Myklebust
f7b37b8b13 NFS: Add softreval behaviour to nfs_lookup_revalidate()
If the server is unavaliable, we want to allow the revalidating
lookup to time out, and to default to validating the cached dentry
if the 'softreval' mount option is set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24 16:51:13 -05:00
Arnd Bergmann
ae08483cdd nfs: use timespec64 in nfs_fattr
Push down the use of timespec64 into NFS nfs_fattr, to avoid needless
conversions, and get closer to having 64-bit time_t support on 32-bit
NFSv4 and removing some old interfaces from the kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:30 -05:00
Scott Mayhew
ce8866f091 NFS: Attach supplementary error information to fs_context.
Split out from commit "NFS: Add fs_context support."

Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
information to the fs_context.  Convert some printk's to use these new
wrappers instead.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew
62a55d088c NFS: Additional refactoring for fs_context conversion
Split out from commit "NFS: Add fs_context support."

This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
     nfs_clone_mount has had several fields removed, and nfs_mount_info
     has been removed altogether.
 (*) Various functions now take an fs_context as an argument instead
     of nfs_mount_info, nfs_fs_context, etc.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
f2aedb713c NFS: Add fs_context support.
Add filesystem context support to NFS, parsing the options in advance and
attaching the information to struct nfs_fs_context.  The highlights are:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.  This
     structure represents NFS's superblock config.

 (*) Make use of the VFS's parsing support to split comma-separated lists

 (*) Pin the NFS protocol module in the nfs_fs_context.

 (*) Attach supplementary error information to fs_context.  This has the
     downside that these strings must be static and can't be formatted.

 (*) Remove the auxiliary file_system_type structs since the information
     necessary can be conveyed in the nfs_fs_context struct instead.

 (*) Root mounts are made by duplicating the config for the requested mount
     so as to have the same parameters.  Submounts pick up their parameters
     from the parent superblock.

[AV -- retrans is u32, not string]
[SM -- Renamed cfg to ctx in a few functions in an earlier patch]
[SM -- Moved fs_context mount option parsing to an earlier patch]
[SM -- Moved fs_context error logging to a later patch]
[SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
[SM -- Added is_remount_fc() helper]
[SM -- Deferred some refactoring to a later patch]
[SM -- Fixed referral mounts, which were broken in the original patch]
[SM -- Fixed leak of nfs_fattr when fs_context is freed]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
e558100fda NFS: Do some tidying of the parsing code
Do some tidying of the parsing code, including:

 (*) Returning 0/error rather than true/false.

 (*) Putting the nfs_fs_context pointer first in some arg lists.

 (*) Unwrap some lines that will now fit on one line.

 (*) Provide unioned sockaddr/sockaddr_storage fields to avoid casts.

 (*) nfs_parse_devname() can paste its return values directly into the
     nfs_fs_context struct as that's where the caller puts them.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
48be8a66cf NFS: Add a small buffer in nfs_fs_context to avoid string dup
Add a small buffer in nfs_fs_context to avoid string duplication when
parsing numbers.  Also make the parsing function wrapper place the parsed
integer directly in the appropriate nfs_fs_context struct member.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
f8ee01e3e2 NFS: Split nfs_parse_mount_options()
Split nfs_parse_mount_options() to move the prologue, list-splitting and
epilogue into one function and the per-option processing into another.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
5eb005caf5 NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx".  At some point this will be pointed to by an
fs_context struct's fs_private pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
9954bf92c0 NFS: Move mount parameterisation bits into their own file
Split various bits relating to mount parameterisation out from
fs/nfs/super.c into their own file to form the basis of filesystem context
handling for NFS.

No other changes are made to the code beyond removing 'static' qualifiers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Al Viro
adf2314fe6 nfs: get rid of ->set_security()
it's always either nfs_set_sb_security() or nfs_clone_sb_security(),
the choice being controlled by mount_info->cloned != NULL.  No need
to add methods, especially when both instances live right next to
the caller and are never accessed anywhere else.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
ab88dca311 nfs: get rid of mount_info ->fill_super()
The only possible values are nfs_fill_super and nfs_clone_super.  The
latter is used only when crossing into a submount and it is almost
identical to the former; the only differences are
	* ->s_time_gran unconditionally set to 1 (even for v2 mounts).
Regression dating back to 2012, actually.
	* ->s_blocksize/->s_blocksize_bits set to that of parent.

Rather than messing with the method, stash ->s_blocksize_bits in
mount_info in submount case and after the (now unconditional)
call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
if that has been set.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
0c38f2131d nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
1bc3a2cbf2 nfs: unexport nfs_fs_mount_common()
Make it static, even.  And remove a stale extern of (long-gone)
nfs_xdev_mount_common() from internal.h, while we are at it.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
82eaed2bee nfs: merge xdev and remote file_system_type
they are identical now...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
a55d3297be nfs: don't bother passing nfs_subversion to ->try_mount() and nfs_fs_mount_common()
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
6a3f7a399e nfs: stash nfs_subversion reference into nfs_mount_info
That will allow to get rid of passing those references around in
quite a few places.  Moreover, that will allow to merge xdev and
remote file_system_type.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
250d69f6a4 nfs: lift setting mount_info from nfs_xdev_mount()
Do it in nfs_do_submount() instead.  As a side benefit, nfs_clone_data
doesn't need ->fh and ->fattr anymore.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
d0b779d47c nfs: stash server into struct nfs_mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Trond Myklebust
e86d5a0287 NFS: Convert struct nfs_fattr to use struct timespec64
NFSv4 supports 64-bit times, so we should switch to using struct
timespec64 when decoding attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Linus Torvalds
972a2bf7df NFS Client Updates for Linux 5.3
Stable bugfixes:
 - Dequeue the request from the receive queue while we're re-encoding # v4.20+
 - Fix buffer handling of GSS MIC without slack # 5.1
 
 Features:
 - Increase xprtrdma maximum transport header and slot table sizes
 - Add support for nfs4_call_sync() calls using a custom rpc_task_struct
 - Optimize the default readahead size
 - Enable pNFS filelayout LAYOUTGET on OPEN
 
 Other bugfixes and cleanups:
 - Fix possible null-pointer dereferences and memory leaks
 - Various NFS over RDMA cleanups
 - Various NFS over RDMA comment updates
 - Don't receive TCP data into a reset request buffer
 - Don't try to parse incomplete RPC messages
 - Fix congestion window race with disconnect
 - Clean up pNFS return-on-close error handling
 - Fixes for NFS4ERR_OLD_STATEID handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
 QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
 2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
 HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
 ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
 3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
 YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
 sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
 snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
 OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
 06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
 KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
 =4Ey6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Dequeue the request from the receive queue while we're re-encoding
     # v4.20+
   - Fix buffer handling of GSS MIC without slack # 5.1

  Features:
   - Increase xprtrdma maximum transport header and slot table sizes
   - Add support for nfs4_call_sync() calls using a custom
     rpc_task_struct
   - Optimize the default readahead size
   - Enable pNFS filelayout LAYOUTGET on OPEN

  Other bugfixes and cleanups:
   - Fix possible null-pointer dereferences and memory leaks
   - Various NFS over RDMA cleanups
   - Various NFS over RDMA comment updates
   - Don't receive TCP data into a reset request buffer
   - Don't try to parse incomplete RPC messages
   - Fix congestion window race with disconnect
   - Clean up pNFS return-on-close error handling
   - Fixes for NFS4ERR_OLD_STATEID handling"

* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
  pNFS/filelayout: enable LAYOUTGET on OPEN
  NFS: Optimise the default readahead size
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Fix OPEN_DOWNGRADE error handling
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  NFSv4: Add a helper to increment stateid seqids
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Clean up pNFS return-on-close error handling
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  NFS: remove unused check for negative dentry
  NFSv3: use nfs_add_or_obtain() to create and reference inodes
  NFS: Refactor nfs_instantiate() for dentry referencing callers
  SUNRPC: Fix congestion window race with disconnect
  SUNRPC: Don't try to parse incomplete RPC messages
  SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
  SUNRPC: Fix buffer handling of GSS MIC without slack
  SUNRPC: RPC level errors should always set task->tk_rpc_status
  SUNRPC: Don't receive TCP data into a request buffer that has been reset
  ...
2019-09-26 12:20:14 -07:00
Trond Myklebust
c128e57551 NFS: Optimise the default readahead size
In the years since the max readahead size was fixed in NFS, a number of
things have happened:
- Users can now set the value directly using /sys/class/bdi
- NFS max supported block sizes have increased by several orders of
  magnitude from 64K to 1MB.
- Disk access latencies are orders of magnitude faster due to SSD + NVME.

In particular note that if the server is advertising 1MB as the optimal
read size, as that will set the readahead size to 15MB.
Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
However let's inform the VM about our preferred block size so that it
can choose to round up in cases where that makes sense.

Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-24 15:58:20 -04:00
Trond Myklebust
8f54c7a4ba NFS: Fix spurious EIO read errors
If the client attempts to read a page, but the read fails due to some
spurious error (e.g. an ACCESS error or a timeout, ...) then we need
to allow other processes to retry.
Also try to report errors correctly when doing a synchronous readpage.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Max Kellermann
db531db951 Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
This reverts commit be4c2d4723.

That commit caused a severe memory leak in nfs_readdir_make_qstr().

When listing a directory with more than 100 files (this is how many
struct nfs_cache_array_entry elements fit in one 4kB page), all
allocated file name strings past those 100 leak.

The root of the leakage is that those string pointers are managed in
pages which are never linked into the page cache.

fs/nfs/dir.c puts pages into the page cache by calling
read_cache_page(); the callback function nfs_readdir_filler() will
then fill the given page struct which was passed to it, which is
already linked in the page cache (by do_read_cache_page() calling
add_to_page_cache_lru()).

Commit be4c2d4723 added another (local) array of allocated pages, to
be filled with more data, instead of discarding excess items received
from the NFS server.  Those additional pages can be used by the next
nfs_readdir_filler() call (from within the same nfs_readdir() call).

The leak happens when some of those additional pages are never used
(copied to the page cache using copy_highpage()).  The pages will be
freed by nfs_readdir_free_pages(), but their contents will not.  The
commit did not invoke nfs_readdir_clear_array() (and doing so would
have been dangerous, because it did not track which of those pages
were already copied to the page cache, risking double free bugs).

How to reproduce the leak:

- Use a kernel with CONFIG_SLUB_DEBUG_ON.

- Create a directory on a NFS mount with more than 100 files with
  names long enough to use the "kmalloc-32" slab (so we can easily
  look up the allocation counts):

  for i in `seq 110`; do touch ${i}_0123456789abcdef; done

- Drop all caches:

  echo 3 >/proc/sys/vm/drop_caches

- Check the allocation counter:

  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1

- Request a directory listing and check the allocation counters again:

  ls
  [...]
  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1

There are now 120 new allocations.

- Drop all caches and check the counters again:

  echo 3 >/proc/sys/vm/drop_caches
  grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
  30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1

110 allocations are gone, but 10 have leaked and will never be freed.

Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
that's why my initial attempts with KMEMLEAK were not successful:

	/*
	 * Avoid a kmemleak false positive. The pointer to the name is stored
	 * in a page cache page which kmemleak does not scan.
	 */
	kmemleak_not_leak(string->name);

It would be possible to solve this bug without reverting the whole
commit:

- keep track of which pages were not used, and call
  nfs_readdir_clear_array() on them, or
- manually link those pages into the page cache

But for now I have decided to just revert the commit, because the real
fix would require complex considerations, risking more dangerous
(crash) bugs, which may seem unsuitable for the stable branches.

Signed-off-by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12 16:01:37 -04:00