Commit Graph

6123 Commits

Author SHA1 Message Date
Olga Kornievskaia
7e134205f6 NFSv4 introduce max_connect mount options
This option will control up to how many xprts can the client
establish to the server with a distinct address (that means
nconnect connections are not counted towards this new limit).
This patch is setting up nfs structures to keeep track of the
max_connect limit (does not enforce it).

The default value is kept at 1 so that no current mounts that
don't want any additional connections would be effected. The
maximum value is set at 16.

Mounts to DS are not limited to default value of 1 but instead
set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:17 -04:00
Ye Bin
79d534f8cb NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
As eb96d5c97b ("SUNRPC handle EKEYEXPIRED in call_refreshresult")
commit handle EKEYEXPIRED in call_refreshresult, so there is only handle
when "task->tk_status" is equal "-EJUKEBOX" in nfs3_async_handle_jukebox.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:36:21 -04:00
J. Bruce Fields
bb0a55bb71 nfs: don't allow reexport reclaims
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set.  The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.

We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.

I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:28 -04:00
J. Bruce Fields
f657f8eef3 nfs: don't atempt blocking locks on nfs reexports
NFS implements blocking locks by blocking inside its lock method.  In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock.  It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.

Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks.  The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:10 -04:00
Jeff Layton
f7e33bdbd6 fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-08-23 06:15:36 -04:00
Miklos Szeredi
0cad624662 vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-18 22:08:24 +02:00
Dai Ngo
ca7d1d1a0b NFSv4.2: remove restriction of copy size for inter-server copy.
Currently inter-server copy is allowed only if the copy size is larger
than (rsize*14) which is the over-head of the mount operation of the
source export. This patch, relying on the delayed unmount feature,
removes this restriction since the mount and unmount overhead is now
not applicable for every inter-server copy.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
9eff97abef NFS: Clean up the synopsis of callback process_op()
The xdr_stream and rq_arg and rq_res are already accessible via the
@rqstp parameter.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
89ef17b663 NFS: Extract the xdr_init_encode/decode() calls from decode_compound
Clean up: Move the xdr_init_encode() and xdr_init_decode() calls
into the dispatcher, just like the NFSD and lockd dispatchers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
c35a810ce5 NFS: Remove unused callback void decoder
Clean up: The callback RPC dispatcher no longer invokes these call
outs, although svc_process_common() relies on seeing a .pc_encode
function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
7d34c96217 NFS: Add a private local dispatcher for NFSv4 callback operations
The client's NFSv4 callback service is the only remaining user of
svc_generic_dispatch().

Note that the NFSv4 callback service doesn't use the .pc_encode and
.pc_decode callouts in any substantial way, so they are removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
9082e1d914 SUNRPC: Eliminate the RQ_AUTHERR flag
Now that there is an alternate method for returning an auth_stat
value, replace the RQ_AUTHERR flag with use of that new method.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Chuck Lever
5c2465dfd4 SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
In a few moments, rq_auth_stat will need to be explicitly set to
rpc_auth_ok before execution gets to the dispatcher.

svc_authenticate() already sets it, but it often gets reset to
rpc_autherr_badcred right after that call, even when authentication
is successful. Let's ensure that the pg_authenticate callout and
svc_set_client() set it properly in every case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-10 14:18:35 -04:00
Trond Myklebust
d6236a98b3 NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid
The intention of the layout barrier is to ensure that we do not update
the layout to match an older value than the current expectation. Fix the
test in pnfs_layout_stateid_blocked() to reflect that it is legal for
the seqid of the stateid to match that of the barrier.

Fixes: aa95edf309 ("NFSv4/pnfs: Fix the layout barrier update")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Trond Myklebust
45baadaad7 NFSv4/pNFS: Always allow update of a zero valued layout barrier
A zero value for the layout barrier indicates that it has been cleared
(since seqid '0' is an illegal value), so we should always allow it to
be updated.

Fixes: d29b468da4 ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Trond Myklebust
7c0bbf2d3d NFSv4/pNFS: Remove dead code
Since commit 2b28a7bee4 ("fs, nfs: convert
pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t") it has not
been legal to bump a zero refcount, so the code that tries to allow it
if the NFS_LSEG_VALID flag is still set would cause trouble. Luckily,
NFS_LSEG_VALID has its own refcount so we can never hit this bad code
snippet in practice. Remove it to avoid confusion.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Trond Myklebust
e20772cbdf NFSv4/pNFS: Fix a layoutget livelock loop
If NFS_LAYOUT_RETURN_REQUESTED is set, but there is no value set for
the layout plh_return_seq, we can end up in a livelock loop in which
every layout segment retrieved by a new call to layoutget is immediately
invalidated by pnfs_layout_need_return().
To get around this, we should just set plh_return_seq to the current
value of the layout stateid's seqid.

Fixes: d474f96104 ("NFS: Don't return layout segments that are in use")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-09 16:57:04 -04:00
Linus Torvalds
96890bc2ea NFS client updates for Linux 5.14
Highlights include:
 
 Stable fixes:
 - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues
 
 Bugfixes
 - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()
 - SUNRPC: prevent port reuse on transports which don't request it.
 - NFSv3: Fix memory leak in posix_acl_create()
 - NFS: Various fixes to attribute revalidation timeouts
 - NFSv4: Fix handling of non-atomic change attribute updates
 - NFSv4: If a server is down, don't cause mounts to other servers to
   hang as well
 - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
 - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts
   filesystem flag
  - NFS: Ensure nfs_readpage returns promptly when an internal error occurs
  - NFS: Fix fscache read from NFS after cache error
  - pNFS: Various bugfixes around the LAYOUTGET operation
 
 Features
 - Multiple patches to add support for fcntl() leases over NFSv4.
 - A sysfs interface to display more information about the various
   transport connections used by the RPC client
 - A sysfs interface to allow a suitably privileged user to offline a
   transport that may no longer point to a valid server
 - A sysfs interface to allow a suitably privileged user to change the
   server IP address used by the RPC client
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmDnPrkACgkQZwvnipYK
 APKGAw/9EjdGoic6VpShQyb5uxaRoDd4uwWgFLBOfPhIWC7qNMtkj49wHIOEUm7e
 YfdF5RlCdmshaoMxjY84wjl8NTMwHbPahgooDd4+UsZUs2qxZ8dBsr0itfbFsJv8
 BpaCYKQt6XGQngGrWfC7SiCETnMej2YsmjDfHvhD58TxnRfPWexHUvx9xi9uGRCS
 sIWRA2QMNs7LwdShkkRotagodRLhu/zo4g0lon5lI8D/SRg6o8RoO4YP6oKH1FN4
 OyVzy1aWZGocgwCMUtNeuigJSRyDa+bJTfJ2c27uw5g18s0XWZ3j2DxD5I+HCEuE
 B4rhg+ujtPIifYLHf2Aj3nlxdBePZ5L67a2MOOUo+wSD+nPmNMZF1eIT/3Jsg/HA
 Z8gqcBiTIkBfVGJxWWbrbHfxPXQiK1IRGQx9acyhLCN9M6Kv5bbkn4R4dnronvJR
 g6O968fgC5uvl60CXdc8NCpWtSitXB/nH8pn7MbJ8JBGq7QIYNkS0d4E8ePhYwxk
 sRYJt21O+ryjodfQDHaUxodzCKGcpRoknpirMmgoAp4zdkva4ltViNsQvHa7jFh8
 HIuhU6Aia1xVYpUMDEXf2WMXCT9yLa2TyMDuS5KDfb69wBkQJWeKNkebf+1k03wQ
 saEmdoP4aEEujimkA7rqyOlI8XhsudKvBd3HXg+w9+xIt4yoie0=
 =NaOI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - Multiple patches to add support for fcntl() leases over NFSv4.

   - A sysfs interface to display more information about the various
     transport connections used by the RPC client

   - A sysfs interface to allow a suitably privileged user to offline a
     transport that may no longer point to a valid server

   - A sysfs interface to allow a suitably privileged user to change the
     server IP address used by the RPC client

  Stable fixes:

   - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues

  Bugfixes:

   - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()

   - SUNRPC: prevent port reuse on transports which don't request it.

   - NFSv3: Fix memory leak in posix_acl_create()

   - NFS: Various fixes to attribute revalidation timeouts

   - NFSv4: Fix handling of non-atomic change attribute updates

   - NFSv4: If a server is down, don't cause mounts to other servers to
     hang as well

   - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT

   - NFS: Fix mount failures due to incorrect setting of the
     has_sec_mnt_opts filesystem flag

   - NFS: Ensure nfs_readpage returns promptly when an internal error
     occurs

   - NFS: Fix fscache read from NFS after cache error

   - pNFS: Various bugfixes around the LAYOUTGET operation"

* tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
  NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
  NFSv4/pnfs: Clean up layout get on open
  NFSv4/pnfs: Fix layoutget behaviour after invalidation
  NFSv4/pnfs: Fix the layout barrier update
  NFS: Fix fscache read from NFS after cache error
  NFS: Ensure nfs_readpage returns promptly when internal error occurs
  sunrpc: remove an offlined xprt using sysfs
  sunrpc: provide showing transport's state info in the sysfs directory
  sunrpc: display xprt's queuelen of assigned tasks via sysfs
  sunrpc: provide multipath info in the sysfs directory
  NFSv4.1 identify and mark RPC tasks that can move between transports
  sunrpc: provide transport info in the sysfs directory
  SUNRPC: take a xprt offline using sysfs
  sunrpc: add dst_attr attributes to the sysfs xprt directory
  SUNRPC for TCP display xprt's source port in sysfs xprt_info
  SUNRPC query transport's source port
  SUNRPC display xprt's main value in sysfs's xprt_info
  SUNRPC mark the first transport
  sunrpc: add add sysfs directory per xprt under each xprt_switch
  ...
2021-07-09 09:43:57 -07:00
Trond Myklebust
878b3dfc42 Merge part 2 of branch 'sysfs-devel'
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
dd5c153ed7 NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
Currently we fail to return an error if the NFSv3 module failed to load
when we're trying to connect to a pNFS data server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
f46f84931a NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for
whether or not ds->ds_clp has already been initialised, so we can end up
adding the same transports multiple times.

Fixes: fc821d5920 ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
b4e89bcba2 NFSv4/pnfs: Clean up layout get on open
Cache the layout in the arguments so we don't have to keep looking it up
from the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
0b77f97a7e NFSv4/pnfs: Fix layoutget behaviour after invalidation
If the layout gets invalidated, we should wait for any outstanding
layoutget requests for that layout to complete, and we should resend
them only after re-establishing the layout stateid.

Fixes: d29b468da4 ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Trond Myklebust
aa95edf309 NFSv4/pnfs: Fix the layout barrier update
If we have multiple outstanding layoutget requests, the current code to
update the layout barrier assumes that the outstanding layout stateids
are updated in order. That's not necessarily the case.

Instead of using the value of lo->plh_outstanding as a guesstimate for
the window of values we need to accept, just wait to update the window
until we're processing the last one. The intention here is just to
ensure that we don't process 2^31 seqid updates without also updating
the barrier.

Fixes: 1bcf34fdac ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Dave Wysochanski
ba512c1bc3 NFS: Fix fscache read from NFS after cache error
Earlier commits refactored some NFS read code and removed
nfs_readpage_async(), but neglected to properly fixup
nfs_readpage_from_fscache_complete().  The code path is
only hit when something unusual occurs with the cachefiles
backing filesystem, such as an IO error or while a cookie
is being invalidated.

Mark page with PG_checked if fscache IO completes in error,
unlock the page, and let the VM decide to re-issue based on
PG_uptodate.  When the VM reissues the readpage, PG_checked
allows us to skip over fscache and read from the server.

Link: https://marc.info/?l=linux-nfs&m=162498209518739
Fixes: 1e83b173b2 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Dave Wysochanski
e0340f16a0 NFS: Ensure nfs_readpage returns promptly when internal error occurs
A previous refactoring of nfs_readpage() might end up calling
wait_on_page_locked_killable() even if readpage_async_filler() failed
with an internal error and pg_error was non-zero (for example, if
nfs_create_request() failed).  In the case of an internal error,
skip over wait_on_page_locked_killable() as this is only needed
when the read is sent and an error occurs during completion handling.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:26 -04:00
Olga Kornievskaia
85e39feead NFSv4.1 identify and mark RPC tasks that can move between transports
In preparation for when we can re-try a task on a different transport,
identify and mark such RPC tasks as moveable. Only 4.1+ operarations can
be re-tried on a different transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08 14:03:24 -04:00
Linus Torvalds
58ec9059b3 Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs name lookup updates from Al Viro:
 "Small namei.c patch series, mostly to simplify the rules for nameidata
  state. It's actually from the previous cycle - but I didn't post it
  for review in time...

  Changes visible outside of fs/namei.c: file_open_root() calling
  conventions change, some freed bits in LOOKUP_... space"

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  namei: make sure nd->depth is always valid
  teach set_nameidata() to handle setting the root as well
  take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space
  switch file_open_root() to struct path
2021-07-03 11:41:14 -07:00
Linus Torvalds
757fa80f4e Tracing updates for 5.14:
- Added option for per CPU threads to the hwlat tracer
 
  - Have hwlat tracer handle hotplug CPUs
 
  - New tracer: osnoise, that detects latency caused by interrupts, softirqs
    and scheduling of other tasks.
 
  - Added timerlat tracer that creates a thread and measures in detail what
    sources of latency it has for wake ups.
 
  - Removed the "success" field of the sched_wakeup trace event.
    This has been hardcoded as "1" since 2015, no tooling should be looking
    at it now. If one exists, we can revert this commit, fix that tool and
    try to remove it again in the future.
 
  - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.
 
  - New boot command line option "tp_printk_stop", as tp_printk causes trace
    events to write to console. When user space starts, this can easily live
    lock the system. Having a boot option to stop just after boot up is
    useful to prevent that from happening.
 
  - Have ftrace_dump_on_oops boot command line option take numbers that match
    the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.
 
  - Bootconfig clean ups, fixes and enhancements.
 
  - New ktest script that tests bootconfig options.
 
  - Add tracepoint_probe_register_may_exist() to register a tracepoint
    without triggering a WARN*() if it already exists. BPF has a path from
    user space that can do this. All other paths are considered a bug.
 
  - Small clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYN8YPhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhxLAP9Mo5hHv7Hg6W7Ddv77rThm+qclsMR/
 yW0P+eJpMm4+xAD8Cq03oE1DimPK+9WZBKU5rSqAkqG6CjgDRw6NlIszzQQ=
 =WEPR
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Added option for per CPU threads to the hwlat tracer

 - Have hwlat tracer handle hotplug CPUs

 - New tracer: osnoise, that detects latency caused by interrupts,
   softirqs and scheduling of other tasks.

 - Added timerlat tracer that creates a thread and measures in detail
   what sources of latency it has for wake ups.

 - Removed the "success" field of the sched_wakeup trace event. This has
   been hardcoded as "1" since 2015, no tooling should be looking at it
   now. If one exists, we can revert this commit, fix that tool and try
   to remove it again in the future.

 - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.

 - New boot command line option "tp_printk_stop", as tp_printk causes
   trace events to write to console. When user space starts, this can
   easily live lock the system. Having a boot option to stop just after
   boot up is useful to prevent that from happening.

 - Have ftrace_dump_on_oops boot command line option take numbers that
   match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.

 - Bootconfig clean ups, fixes and enhancements.

 - New ktest script that tests bootconfig options.

 - Add tracepoint_probe_register_may_exist() to register a tracepoint
   without triggering a WARN*() if it already exists. BPF has a path
   from user space that can do this. All other paths are considered a
   bug.

 - Small clean ups and fixes

* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
  tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
  tracing: Simplify & fix saved_tgids logic
  treewide: Add missing semicolons to __assign_str uses
  tracing: Change variable type as bool for clean-up
  trace/timerlat: Fix indentation on timerlat_main()
  trace/osnoise: Make 'noise' variable s64 in run_osnoise()
  tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
  tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
  Documentation: Fix a typo on trace/osnoise-tracer
  trace/osnoise: Fix return value on osnoise_init_hotplug_support
  trace/osnoise: Make interval u64 on osnoise_main
  trace/osnoise: Fix 'no previous prototype' warnings
  tracing: Have osnoise_main() add a quiescent state for task rcu
  seq_buf: Make trace_seq_putmem_hex() support data longer than 8
  seq_buf: Fix overflow in seq_buf_putmem_hex()
  trace/osnoise: Support hotplug operations
  trace/hwlat: Support hotplug operations
  trace/hwlat: Protect kdata->kthread with get/put_online_cpus
  trace: Add timerlat tracer
  trace: Add osnoise tracer
  ...
2021-07-03 11:13:22 -07:00
Joe Perches
78c14b385c treewide: Add missing semicolons to __assign_str uses
The __assign_str macro has an unusual ending semicolon but the vast
majority of uses of the macro already have semicolon termination.

$ git grep -P '\b__assign_str\b' | wc -l
551
$ git grep -P '\b__assign_str\b.*;' | wc -l
480

Add semicolons to the __assign_str() uses without semicolon termination
and all the other uses without semicolon termination via additional defines
that are equivalent to __assign_str() with the eventual goal of removing
the semicolon from the __assign_str() macro definition.

Link: https://lore.kernel.org/lkml/1e068d21106bb6db05b735b4916bb420e6c9842a.camel@perches.com/
Link: https://lkml.kernel.org/r/48a056adabd8f70444475352f617914cef504a45.camel@perches.com

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-30 09:19:14 -04:00
Trond Myklebust
e9e8ee40b3 Merge branch 'leases-devel'
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-29 13:13:34 -04:00
Trond Myklebust
df2c7b951f NFSv4: setlease should return EAGAIN if locks are not available
Instead of returning ENOLCK when we can't hand out a lease, we should be
returning EAGAIN.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-29 13:13:31 -04:00
Trond Myklebust
e97bc66377 NFS: nfs_find_open_context() may only select open files
If a file has already been closed, then it should not be selected to
support further I/O.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Trond: Fix an invalid pointer deref reported by Colin Ian King]
2021-06-29 13:12:39 -04:00
Dave Wysochanski
b42ad64f5f NFS: Remove unnecessary inode parameter from nfs_pageio_complete_read()
Simplify nfs_pageio_complete_read() by using the inode pointer saved
inside nfs_pageio_descriptor.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-28 09:34:39 -04:00
Scott Mayhew
eae00c5d6e nfs: update has_sec_mnt_opts after cloning lsm options from parent
After calling security_sb_clone_mnt_opts() in nfs_get_root(), it's
necessary to copy the value of has_sec_mnt_opts from the cloned
super_block's nfs_server.  Otherwise, calls to nfs_compare_super()
using this super_block may not return the correct result, leading to
mount failures.

For example, mounting an nfs server with the following in /etc/exports:
/export *(rw,insecure,crossmnt,no_root_squash,security_label)
and having /export/scratch on a separate block device.

mount -o v4.2,context=system_u:object_r:root_t:s0 server:/export/test /mnt/test
mount -o v4.2,context=system_u:object_r:swapfile_t:s0 server:/export/scratch /mnt/scratch

The second mount would fail with "mount.nfs: /mnt/scratch is busy or
already mounted or sharecache fail" and "SELinux: mount invalid.  Same
superblock, different security settings for..." would appear in the
syslog.

Also while we're in there, replace several instances of "NFS_SB(s)"
with "server", which was already declared at the top of the
nfs_get_root().

Fixes: ec1ade6a04 ("nfs: account for selinux security context when deciding to share superblock")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-28 09:34:39 -04:00
Trond Myklebust
a9601ac5e9 NFS: Avoid duplicate resets of attribute cache timeouts
We know that the attributes changed on the server if and only if the
change attribute is different. Otherwise, we're just refreshing our
cache with values that were already known to be stale.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26 12:13:40 -04:00
Trond Myklebust
20cf7d4ea4 NFSv4: Fix handling of non-atomic change attrbute updates
If the change attribute update is declared to be non-atomic by the
server, or our cached value does not match the server's value before the
operation was performed, then we should declare the inode cache invalid.

On the other hand, if the change to the directory raced with a lookup or
getattr which already updated the change attribute, then optimise away
the revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26 12:13:39 -04:00
Trond Myklebust
213bb58475 NFS: Fix up inode attribute revalidation timeouts
The inode is considered revalidated when we've checked the value of the
change attribute against our cached value since that suffices to
establish whether or not the other cached values are valid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-26 12:13:39 -04:00
Gao Xiang
1fcb6fcd74 nfs: fix acl memory leak of posix_acl_create()
When looking into another nfs xfstests report, I found acl and
default_acl in nfs3_proc_create() and nfs3_proc_mknod() error
paths are possibly leaked. Fix them in advance.

Fixes: 013cdf1088 ("nfs: use generic posix ACL infrastructure for v3 Posix ACLs")
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-21 12:06:16 -04:00
Trond Myklebust
3731d44bba NFSv4: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
Fix an Oopsable condition in pnfs_mark_request_commit() when we're
putting a set of writes on the commit list to reschedule them after a
failed pNFS attempt.

Fixes: 9c455a8c1e ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-13 19:36:49 -04:00
Trond Myklebust
dd99e9f98f NFSv4: Initialise connection to the server in nfs4_alloc_client()
Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.

Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com>
Fixes: 5c6e5b60aa ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-13 19:36:49 -04:00
Trond Myklebust
e93a5e9306 NFSv4: Add support for application leases underpinned by a delegation
If the NFSv4 client already holds a delegation for a file, then we can
support application leases (i.e. fcntl(fd, F_SETLEASE,...)) because the
underlying delegation guarantees that the file is not being modified on
the server by another client in a way that might conflict with the lease
guarantees.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-13 19:36:28 -04:00
Trond Myklebust
6b4befc0a0 NFSv4: Add lease breakpoints in case of a delegation recall or return
When we add support for application level leases and knfsd delegations
to the NFS client, we we want to have them safely underpinned by a
"real" delegation to provide the caching guarantees. If that real
delegation is recalled, then we need to ensure that the application
leases/delegations are recalled too.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-13 19:36:28 -04:00
Trond Myklebust
be20037725 NFSv4: Fix delegation return in cases where we have to retry
If we're unable to immediately recover all locks because the server is
unable to immediately service our reclaim calls, then we want to retry
after we've finished servicing all the other asynchronous delegation
returns on our queue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-13 19:36:27 -04:00
Trond Myklebust
c3aba897c6 NFSv4: Fix second deadlock in nfs4_evict_inode()
If the inode is being evicted but has to return a layout first, then
that too can cause a deadlock in the corner case where the server
reboots.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Trond Myklebust
dfe1fe75e0 NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
If the inode is being evicted, but has to return a delegation first,
then it can cause a deadlock in the corner case where the server reboots
before the delegreturn completes, but while the call to iget5_locked() in
nfs4_opendata_get_inode() is waiting for the inode free to complete.
Since the open call still holds a session slot, the reboot recovery
cannot proceed.

In order to break the logjam, we can turn the delegation return into a
privileged operation for the case where we're evicting the inode. We
know that in that case, there can be no other state recovery operation
that conflicts.

Reported-by: zhangxiaoxu (A) <zhangxiaoxu5@huawei.com>
Fixes: 5fcdfacc01 ("NFSv4: Return delegations synchronously in evict_inode")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Chuck Lever
d1b5c230e9 NFS: FMODE_READ and friends are C macros, not enum types
Address a sparse warning:

  CHECK   fs/nfs/nfstrace.c
fs/nfs/nfstrace.c: note: in included file (through /home/cel/src/linux/rpc-over-tls/include/trace/trace_events.h, /home/cel/src/linux/rpc-over-tls/include/trace/define_trace.h, ...):
fs/nfs/./nfstrace.h:424:1: warning: incorrect type in initializer (different base types)
fs/nfs/./nfstrace.h:424:1:    expected unsigned long eval_value
fs/nfs/./nfstrace.h:424:1:    got restricted fmode_t [usertype]
fs/nfs/./nfstrace.h:425:1: warning: incorrect type in initializer (different base types)
fs/nfs/./nfstrace.h:425:1:    expected unsigned long eval_value
fs/nfs/./nfstrace.h:425:1:    got restricted fmode_t [usertype]
fs/nfs/./nfstrace.h:426:1: warning: incorrect type in initializer (different base types)
fs/nfs/./nfstrace.h:426:1:    expected unsigned long eval_value
fs/nfs/./nfstrace.h:426:1:    got restricted fmode_t [usertype]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Dan Carpenter
09226e8303 NFS: Fix a potential NULL dereference in nfs_get_client()
None of the callers are expecting NULL returns from nfs_get_client() so
this code will lead to an Oops.  It's better to return an error
pointer.  I expect that this is dead code so hopefully no one is
affected.

Fixes: 31434f496a ("nfs: check hostname in nfs_get_client")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Anna Schumaker
476bdb04c5 NFS: Fix use-after-free in nfs4_init_client()
KASAN reports a use-after-free when attempting to mount two different
exports through two different NICs that belong to the same server.

Olga was able to hit this with kernels starting somewhere between 5.7
and 5.10, but I traced the patch that introduced the clear_bit() call to
4.13. So something must have changed in the refcounting of the clp
pointer to make this call to nfs_put_client() the very last one.

Fixes: 8dcbec6d20 ("NFSv41: Handle EXCHID4_FLAG_CONFIRMED_R during NFSv4.1 migration")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Scott Mayhew
0b4f132b15 NFS: Ensure the NFS_CAP_SECURITY_LABEL capability is set when appropriate
Commit ce62b114bb ("NFS: Split attribute support out from the server
capabilities") removed the logic from _nfs4_server_capabilities() that
sets the NFS_CAP_SECURITY_LABEL capability based on the presence of
FATTR4_WORD2_SECURITY_LABEL in the attr_bitmask of the server's response.
Now NFS_CAP_SECURITY_LABEL is never set, which breaks labelled NFS.

This was replaced with logic that clears the NFS_ATTR_FATTR_V4_SECURITY_LABEL
bit in the newly added fattr_valid field based on the absence of
FATTR4_WORD2_SECURITY_LABEL in the attr_bitmask of the server's response.
This essentially has no effect since there's nothing looks for that bit
in fattr_supported.

So revert that part of the commit, but adding the logic that sets
NFS_CAP_SECURITY_LABEL near where the other capabilities are set in
_nfs4_server_capabilities().

Fixes: ce62b114bb ("NFS: Split attribute support out from the server capabilities")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Dai Ngo
f8849e206e NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.
Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it
re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before
retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if
the retry fails. This causes problem for subsequent setattr
requests for v4 server that does not have idmapping configured.

This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER
and NFS4ERR_BADNAME and skips the retry, since the kernel isn't
involved in encoding the ACEs, and return -EINVAL.

Steps to reproduce the problem:

 # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
 # touch /tmp/mnt/file1
 # chown 99 /tmp/mnt/file1
 # nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1
 Failed setxattr operation: Invalid argument
 # chown 99 /tmp/mnt/file1
 chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument
 # umount /tmp/mnt
 # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt
 # chown 99 /tmp/mnt/file1
 #

v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry
       in nfs4_proc_set_acl.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-01 13:16:17 -04:00
Huilong Deng
a799b68a7c nfs: Remove trailing semicolon in macros
Macros should not use a trailing semicolon.

Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-27 09:19:33 -04:00
Zhang Xiaoxu
e67afa7ee4 NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
Since commit bdcc2cd14e ("NFSv4.2: handle NFS-specific llseek errors"),
nfs42_proc_llseek would return -EOPNOTSUPP rather than -ENOTSUPP when
SEEK_DATA on NFSv4.0/v4.1.

This will lead xfstests generic/285 not run on NFSv4.0/v4.1 when set the
CONFIG_NFS_V4_2, rather than run failed.

Fixes: bdcc2cd14e ("NFSv4.2: handle NFS-specific llseek errors")
Cc: <stable.vger.kernel.org> # 4.2
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-27 08:46:19 -04:00
Trond Myklebust
70536bf4eb NFS: Clean up reset of the mirror accounting variables
Now that nfs_pageio_do_add_request() resets the pg_count, we don't need
these other inlined resets.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26 06:36:13 -04:00
Trond Myklebust
0d0ea30935 NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
The value of mirror->pg_bytes_written should only be updated after a
successful attempt to flush out the requests on the list.

Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26 06:36:13 -04:00
Trond Myklebust
56517ab958 NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
Ensure that nfs_pageio_error_cleanup() resets the mirror array contents,
so that the structure reflects the fact that it is now empty.
Also change the test in nfs_pageio_do_add_request() to be more robust by
checking whether or not the list is empty rather than relying on the
value of pg_count.

Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26 06:36:13 -04:00
Anna Schumaker
a421d21860 NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
Commit de144ff423 changes _pnfs_return_layout() to call
pnfs_mark_matching_lsegs_return() passing NULL as the struct
pnfs_layout_range argument. Unfortunately,
pnfs_mark_matching_lsegs_return() doesn't check if we have a value here
before dereferencing it, causing an oops.

I'm able to hit this crash consistently when running connectathon basic
tests on NFS v4.1/v4.2 against Ontap.

Fixes: de144ff423 ("NFSv4: Don't discard segments marked for return in _pnfs_return_layout()")
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-20 12:17:08 -04:00
Yang Li
d1d973950a pNFS/NFSv4: Remove redundant initialization of 'rd_size'
Variable 'rd_size' is being initialized however
this value is never read as 'rd_size' is assigned
a new value in for statement. Remove the redundant
assignment.

Clean up clang warning:

fs/nfs/pnfs.c:2681:6: warning: Value stored to 'rd_size' during its
initialization is never read [clang-analyzer-deadcode.DeadStores]

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-20 12:17:08 -04:00
Dan Carpenter
769b01ea68 NFS: fix an incorrect limit in filelayout_decode_layout()
The "sizeof(struct nfs_fh)" is two bytes too large and could lead to
memory corruption.  It should be NFS_MAXFHSIZE because that's the size
of the ->data[] buffer.

I reversed the size of the arguments to put the variable on the left.

Fixes: 16b374ca43 ("NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-20 12:17:08 -04:00
zhouchuangao
bb00238890 fs/nfs: Use fatal_signal_pending instead of signal_pending
We set the state of the current process to TASK_KILLABLE via
prepare_to_wait(). Should we use fatal_signal_pending() to detect
the signal here?

Fixes: b4868b44c5 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE")
Signed-off-by: zhouchuangao <zhouchuangao@vivo.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-20 12:15:35 -04:00
Linus Torvalds
a647034fe2 NFS client updates for Linux 5.13
Highlights include:
 
 Stable fixes:
 - Add validation of the UDP retrans parameter to prevent shift out-of-bounds
 - Don't discard pNFS layout segments that are marked for return
 
 Bugfixes:
 - Fix a NULL dereference crash in xprt_complete_bc_request() when the
   NFSv4.1 server misbehaves.
 - Fix the handling of NFS READDIR cookie verifiers
 - Sundry fixes to ensure attribute revalidation works correctly when the
   server does not return post-op attributes.
 - nfs4_bitmask_adjust() must not change the server global bitmasks
 - Fix major timeout handling in the RPC code.
 - NFSv4.2 fallocate() fixes.
 - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling
 - Copy offload attribute revalidation fixes
 - Fix an incorrect filehandle size check in the pNFS flexfiles driver
 - Fix several RDMA transport setup/teardown races
 - Fix several RDMA queue wrapping issues
 - Fix a misplaced memory read barrier in sunrpc's call_decode()
 
 Features:
 - Micro optimisation of the TCP transmission queue using TCP_CORK
 - statx() performance improvements by further splitting up the tracking
   of invalid cached file metadata.
 - Support the NFSv4.2 "change_attr_type" attribute and use it to
   optimise handling of change attribute updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK
 APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL
 FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg
 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1
 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd
 W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn
 bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ
 XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2
 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp
 f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI
 daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R
 QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA=
 =h01b
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Add validation of the UDP retrans parameter to prevent shift
     out-of-bounds

   - Don't discard pNFS layout segments that are marked for return

  Bugfixes:

   - Fix a NULL dereference crash in xprt_complete_bc_request() when the
     NFSv4.1 server misbehaves.

   - Fix the handling of NFS READDIR cookie verifiers

   - Sundry fixes to ensure attribute revalidation works correctly when
     the server does not return post-op attributes.

   - nfs4_bitmask_adjust() must not change the server global bitmasks

   - Fix major timeout handling in the RPC code.

   - NFSv4.2 fallocate() fixes.

   - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling

   - Copy offload attribute revalidation fixes

   - Fix an incorrect filehandle size check in the pNFS flexfiles driver

   - Fix several RDMA transport setup/teardown races

   - Fix several RDMA queue wrapping issues

   - Fix a misplaced memory read barrier in sunrpc's call_decode()

  Features:

   - Micro optimisation of the TCP transmission queue using TCP_CORK

   - statx() performance improvements by further splitting up the
     tracking of invalid cached file metadata.

   - Support the NFSv4.2 'change_attr_type' attribute and use it to
     optimise handling of change attribute updates"

* tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits)
  xprtrdma: Fix a NULL dereference in frwr_unmap_sync()
  sunrpc: Fix misplaced barrier in call_decode
  NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
  xprtrdma: Move fr_mr field to struct rpcrdma_mr
  xprtrdma: Move the Work Request union to struct rpcrdma_mr
  xprtrdma: Move fr_linv_done field to struct rpcrdma_mr
  xprtrdma: Move cqe to struct rpcrdma_mr
  xprtrdma: Move fr_cid to struct rpcrdma_mr
  xprtrdma: Remove the RPC/RDMA QP event handler
  xprtrdma: Don't display r_xprt memory addresses in tracepoints
  xprtrdma: Add an rpcrdma_mr_completion_class
  xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation
  xprtrdma: Avoid Send Queue wrapping
  xprtrdma: Do not wake RPC consumer on a failed LocalInv
  xprtrdma: Do not recycle MR after FastReg/LocalInv flushes
  xprtrdma: Clarify use of barrier in frwr_wc_localinv_done()
  xprtrdma: Rename frwr_release_mr()
  xprtrdma: rpcrdma_mr_pop() already does list_del_init()
  xprtrdma: Delete rpcrdma_recv_buffer_put()
  xprtrdma: Fix cwnd update ordering
  ...
2021-05-07 11:23:41 -07:00
Masahiro Yamada
fa60ce2cb4 treewide: remove editor modelines and cruft
The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>	[auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:34 -07:00
Linus Torvalds
f1c921fb70 selinux/stable-5.13 PR 20210426
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmCHM2sUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNfCg/9GmoCyCh+ZRj5RGQ6M+yJas1+yyJQ
 uEfTNde54yfATUTaaWYnZG59yqzM3I2uaV11U7tqg8ajiFPxJKqbs5R9jl3lnSjH
 0Dg22nXPSCOTKcU0x/DeLoKRr+M9jO1K/nQ8NEZvYX4nC/OgtCvJqb/oEQZIKAk5
 2a7OEmNNQyFGd274p9dELaDHxN9UIaJ2PzQFXtq7ROHgBXQO4ONb2ajOf6mDSFQb
 vP/CDHwaH+pcE28w44oRy0/YBkO1SrdqoFQchg5yFagM5tQRLGkXK4OFSs5KHi5Q
 YMtmaOzMPIv1e5eaC1HuuMJYA4pPb30T9hFHP7tmBVZfmZaFaDeUs+BhMm98WTiS
 o0iTP7tfs36/poOR1Q0/sB06uvF9hUAAX1ZuE95YySifbXU9hsUc9b0uQSwCdg9P
 /J9rcdHLTpWqjw9n02mezWmAvo5U8ZvbDs+0xPIwI+3RTUP5t6mp+Hd5Tc7bPTq1
 0rpWXx+FQoSytFap5qiUSiwBp+HF6HQnNIXB0Muf6wctChoTjvo7TwoxH//z4kEm
 +SddhOCNkB7VC/X7hOxhl0F/rdHuXvb1AFIWjpTLJH2CR1PvMtF+sGey+uPT6hKZ
 /gvhmQGjFdph99eGlfVbCNvx1pM61O25IscaYD1T2wGImw+z7dX4WkG3WoOdDSkR
 bRjrBkcHh0gLhWk=
 =HTEy
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Add support for measuring the SELinux state and policy capabilities
   using IMA.

 - A handful of SELinux/NFS patches to compare the SELinux state of one
   mount with a set of mount options. Olga goes into more detail in the
   patch descriptions, but this is important as it allows more
   flexibility when using NFS and SELinux context mounts.

 - Properly differentiate between the subjective and objective LSM
   credentials; including support for the SELinux and Smack. My clumsy
   attempt at a proper fix for AppArmor didn't quite pass muster so John
   is working on a proper AppArmor patch, in the meantime this set of
   patches shouldn't change the behavior of AppArmor in any way. This
   change explains the bulk of the diffstat beyond security/.

 - Fix a problem where we were not properly terminating the permission
   list for two SELinux object classes.

* tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: add proper NULL termination to the secclass_map permissions
  smack: differentiate between subjective and objective task credentials
  selinux: clarify task subjective and objective credentials
  lsm: separate security_task_getsecid() into subjective and objective variants
  nfs: account for selinux security context when deciding to share superblock
  nfs: remove unneeded null check in nfs_fill_super()
  lsm,selinux: add new hook to compare new mount to an existing mount
  selinux: fix misspellings using codespell tool
  selinux: fix misspellings using codespell tool
  selinux: measure state and policy capabilities
  selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27 13:42:11 -07:00
Linus Torvalds
d1466bc583 Merge branch 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode type handling updates from Al Viro:
 "We should never change the type bits of ->i_mode or the method tables
  (->i_op and ->i_fop) of a live inode.

  Unfortunately, not all filesystems took care to prevent that"

* 'work.inode-type-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  spufs: fix bogosity in S_ISGID handling
  9p: missing chunk of "fs/9p: Don't update file type when updating file attributes"
  openpromfs: don't do unlock_new_inode() until the new inode is set up
  hostfs_mknod(): don't bother with init_special_inode()
  cifs: have cifs_fattr_to_inode() refuse to change type on live inode
  cifs: have ->mkdir() handle race with another client sanely
  do_cifs_create(): don't set ->i_mode of something we had not created
  gfs2: be careful with inode refresh
  ocfs2_inode_lock_update(): make sure we don't change the type bits of i_mode
  orangefs_inode_is_stale(): i_mode type bits do *not* form a bitmap...
  vboxsf: don't allow to change the inode type
  afs: Fix updating of i_mode due to 3rd party change
  ceph: don't allow type or device number to change on non-I_NEW inodes
  ceph: fix up error handling with snapdirs
  new helper: inode_wrong_type()
2021-04-27 10:57:42 -07:00
Dai Ngo
d9092b4bb2 NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code.
The client SSC code should not depend on any of the CONFIG_NFSD config.
This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and
simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-26 09:29:17 -04:00
Trond Myklebust
fb700ef026 NFSv4.1: Simplify layout return in pnfs_layout_process()
If the server hands us a layout that does not match the one we currently
hold, then have pnfs_mark_matching_lsegs_return() just ditch the old
layout if NFS_LSEG_LAYOUTRETURN is not set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-18 15:10:53 -04:00
Trond Myklebust
de144ff423 NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: 6d597e1750 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-18 15:10:53 -04:00
Trond Myklebust
39fd018636 NFS: Don't discard pNFS layout segments that are marked for return
If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: e0b7d420f7 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-16 08:50:29 -04:00
Trond Myklebust
8926cc8302 NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
If the NFS super block is being unmounted, then we currently may end up
telling the server that we've forgotten the layout while it is actually
still in use by the client.
In that case, just assume that the client will soon return the layout
anyway, and so return NFS4ERR_DELAY in response to the layout recall.

Fixes: 58ac3e5923 ("NFSv4/pnfs: Clean up nfs_layout_find_inode()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-16 08:50:21 -04:00
Trond Myklebust
febfeaaefe NFSv42: Don't force attribute revalidation of the copy offload source
When a copy offload is performed, we do not expect the source file to
change other than perhaps to see the atime be updated.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 10:42:24 -04:00
Trond Myklebust
94d202d5ca NFSv42: Copy offload should update the file size when appropriate
If the result of a copy offload or clone operation is to grow the
destination file size, then we should update it. The reason is that when
a client holds a delegation, it is authoritative for the file size.

Fixes: 16abd2a0c1 ("NFSv4.2: fix client's attribute cache management for copy_file_range")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 10:41:57 -04:00
Olga Kornievskaia
73f5c88f52 NFSv4.2 fix handling of sr_eof in SEEK's reply
Currently the client ignores the value of the sr_eof of the SEEK
operation. According to the spec, if the server didn't find the
requested extent and reached the end of the file, the server
would return sr_eof=true. In case the request for DATA and no
data was found (ie in the middle of the hole), then the lseek
expects that ENXIO would be returned.

Fixes: 1c6dcbe5ce ("NFS: Implement SEEK")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Nikola Livic
ed34695e15 pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym
bazalii) observed the check:

	if (fh->size > sizeof(struct nfs_fh))

should not use the size of the nfs_fh struct which includes an extra two
bytes from the size field.

struct nfs_fh {
	unsigned short         size;
	unsigned char          data[NFS_MAXFHSIZE];
}

but should determine the size from data[NFS_MAXFHSIZE] so the memcpy
will not write 2 bytes beyond destination.  The proposed fix is to
compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs
code base.

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Nikola Livic <nlivic@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
eb3d58c68e NFSv4: Catch and trace server filehandle encoding errors
If the server returns a filehandle with an invalid length, then trace
that, and return an EREMOTEIO error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
3d66bae156 NFSv4: Convert nfs_xdr_status tracepoint to an event class
We would like the ability to record other XDR errors, particularly
those that are due to server bugs.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
da934ae0a8 NFSv4: Add tracing for COMPOUND errors
When the server returns a different operation than we expected, then
trace that.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
ce62b114bb NFS: Split attribute support out from the server capabilities
There are lots of attributes, and they are crowding out the bit space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
cc7f2dae63 NFS: Don't store NFS_INO_REVAL_FORCED
NFS_INO_REVAL_FORCED is intended to tell us that the cache needs
revalidation despite the fact that we hold a delegation. We shouldn't
need to store it anymore, though.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
1301e421b7 NFSv4: link must update the inode nlink.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
82eae5a432 NFSv4: nfs4_inc/dec_nlink_locked should also invalidate ctime
If the nlink changes, then so will the ctime.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
7b24dacf08 NFS: Another inode revalidation improvement
If we're trying to update the inode because a previous update left the
cache in a partially unrevalidated state, then allow the update if the
change attrs match.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
6f9be83d07 NFS: Use information about the change attribute to optimise updates
If the NFSv4.2 server supports the 'change_attr_type' attribute, then
allow the client to optimise its attribute cache update strategy.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
7f08a3359a NFSv4: Add support for the NFSv4.2 "change_attr_type" attribute
The change_attr_type allows the server to provide a description of how
the change attribute will behave. This again will allow the client to
optimise its behaviour w.r.t. attribute revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
993e2d4bd9 NFSv4: Don't modify the change attribute cached in the inode
When the client is caching data and a write delegation is held, then the
server may send a CB_GETATTR to query the attributes. When this happens,
the client is supposed to bump the change attribute value that it
returns if it holds cached data.
However that process uses a value that is stored in the delegation. We
do not want to bump the change attribute held in the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
57a789a1de NFSv4: Fix value of decode_fsinfo_maxsz
At least two extra fields have been added to fsinfo since this was last
updated.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
04c63498b6 NFS: Simplify cache consistency in nfs_check_inode_attributes()
We should not be invalidating the access or acl caches in
nfs_check_inode_attributes(), since the point is we're unsure about
whether the contents of the struct nfs_fattr are fully up to date.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:02 -04:00
Trond Myklebust
c88c696c59 NFS: Remove a line of code that has no effect in nfs_update_inode()
Commit 0b467264d0 ("NFS: Fix attribute revalidation") changed the way
we populate the 'invalid' attribute, and made the line that strips away
the NFS_INO_INVALID_ATTR bits redundant.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:03:03 -04:00
Trond Myklebust
709fa57699 NFS: Fix up handling of outstanding layoutcommit in nfs_update_inode()
If there is an outstanding layoutcommit, then the list of attributes
whose values are expected to change is not the full set. So let's
be explicit about the full list.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:03:03 -04:00
Trond Myklebust
720869eb19 NFS: Separate tracking of file mode cache validity from the uid/gid
chown()/chgrp() and chmod() are separate operations, and in addition,
there are mode operations that are performed automatically by the
server. So let's track mode validity separately from the file ownership
validity.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 09:41:16 -04:00
Trond Myklebust
fabf2b3415 NFS: Separate tracking of file nlinks cache validity from the mode/uid/gid
Rename can cause us to revalidate the access cache, so lets track the
nlinks separately from the mode/uid/gid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 08:46:30 -04:00
Trond Myklebust
a71029b867 NFSv4: Fix nfs4_bitmap_copy_adjust()
Don't remove flags from the set retrieved from the cache_validity.
We do want to retrieve all attributes that are listed as being
invalid, whether or not there is a delegation set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 08:46:29 -04:00
Trond Myklebust
36a9346c22 NFS: Don't set NFS_INO_REVAL_PAGECACHE in the inode cache validity
It is no longer necessary to preserve the NFS_INO_REVAL_PAGECACHE flag.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 20:11:44 -04:00
Trond Myklebust
13c0b082b6 NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity
When checking cache validity, be more specific than just 'we want to
check the page cache validity'. In almost all cases, we want to check
that change attribute, and possibly also the size.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 20:11:44 -04:00
Trond Myklebust
1f3208b2d6 NFS: Add a cache validity flag argument to nfs_revalidate_inode()
Add an argument to nfs_revalidate_inode() to allow callers to specify
which attributes they need to check for validity.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 20:11:44 -04:00
Trond Myklebust
1f9f432815 NFS: nfs_setattr_update_inode() should clear the suid/sgid bits
When we do a 'chown' or 'chgrp', the server will clear the suid/sgid
bits. Ensure that we mirror that in nfs_setattr_update_inode().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 20:11:44 -04:00
Trond Myklebust
63cdd7edfd NFS: Fix up statx() results
If statx has valid attributes available that weren't asked for, then
return them and set the result mask appropriately.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 17:14:17 -04:00
Trond Myklebust
e8764a6f96 NFS: Don't revalidate attributes that are not being asked for
If the user doesn't set STATX_UID/GID/MODE, then don't care if they are
known to be stale. Ditto if we're not being asked for the file size.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 17:14:17 -04:00
Trond Myklebust
4cdfeb648a NFS: Fix up revalidation of space used
Ensure that when the change attribute or the size change, we also
remember to revalidate the space used.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 17:14:17 -04:00
Trond Myklebust
50c7a7994d NFS: NFS_INO_REVAL_PAGECACHE should mark the change attribute invalid
When we're looking to revalidate the page cache, we should just ensure
that we mark the change attribute invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 17:14:17 -04:00
Trond Myklebust
4eb6a8230b NFS: Mask out unsupported attributes in nfs_getattr()
We don't currently support STATX_BTIME, so don't advertise it in the
return values for nfs_getattr().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-12 17:14:17 -04:00