Commit Graph

73953 Commits

Author SHA1 Message Date
Chuck Lever
fcb5e3fa01 NFSD: Move fill_pre_wcc() and fill_post_wcc()
These functions are related to file handle processing and have
nothing to do with XDR encoding or decoding. Also they are no longer
NFSv3-specific. As a clean-up, move their definitions to a more
appropriate location. WCC is also an NFSv3-specific term, so rename
them as general-purpose helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:03 -05:00
Chuck Lever
58f258f652 Revert "nfsd: skip some unnecessary stats in the v4 case"
On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
returning a reasonable-looking value in the cinfo.before field and
zero in the cinfo.after field.

RFC 8881 Section 10.8.1 says:
> When a client is making changes to a given directory, it needs to
> determine whether there have been changes made to the directory by
> other clients.  It does this by using the change attribute as
> reported before and after the directory operation in the associated
> change_info4 value returned for the operation.

and

> ... The post-operation change
> value needs to be saved as the basis for future change_info4
> comparisons.

A good quality client implementation therefore saves the zero
cinfo.after value. During a subsequent OPEN operation, it will
receive a different non-zero value in the cinfo.before field for
that directory, and it will incorrectly believe the directory has
changed, triggering an undesirable directory cache invalidation.

There are filesystem types where fs_supports_change_attribute()
returns false, tmpfs being one. On NFSv4 mounts, this means the
fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
never invoked. Subsequently, nfsd4_change_attribute() is invoked
with an uninitialized @stat argument.

In fill_pre_wcc(), @stat contains stale stack garbage, which is
then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
zeroes, so zero is placed on the wire. Both of these values are
meaningless.

This fix can be applied immediately to stable kernels. Once there
are more regression tests in this area, this optimization can be
attempted again.

Fixes: 428a23d2bf ("nfsd: skip some unnecessary stats in the v4 case")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:03 -05:00
Chuck Lever
75acacb658 NFSD: Trace boot verifier resets
According to commit bbf2f09883 ("nfsd: Reset the boot verifier on
all write I/O errors"), the Linux NFS server forces all clients to
resend pending unstable writes if any server-side write or commit
operation encounters an error (say, ENOSPC). This is a rare and
quite exceptional event that could require administrative recovery
action, so it should be made trace-able. Example trace event:

nfsd-938   [002]  7174.945558: nfsd_writeverf_reset: boot_time=        61cc920d xid=0xdcd62036 error=-28 new verifier=0x08aecc6142515904

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:03 -05:00
Chuck Lever
3988a57885 NFSD: Rename boot verifier functions
Clean up: These functions handle what the specs call a write
verifier, which in the Linux NFS server implementation is now
divorced from the server's boot instance

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Chuck Lever
91d2e9b56c NFSD: Clean up the nfsd_net::nfssvc_boot field
There are two boot-time fields in struct nfsd_net: one called
boot_time and one called nfssvc_boot. The latter is used only to
form write verifiers, but its documenting comment declares:

        /* Time of server startup */

Since commit 27c438f53e ("nfsd: Support the server resetting the
boot verifier"), this field can be reset at any time; it's no
longer tied to server restart. So that comment is stale.

Also, according to pahole, struct timespec64 is 16 bytes long on
x86_64. The nfssvc_boot field is used only to form a write verifier,
which is 8 bytes long.

Let's clarify this situation by manufacturing an 8-byte verifier
in nfs_reset_boot_verifier() and storing only that in struct
nfsd_net.

We're grabbing 128 bits of time, so compress all of those into a
64-bit verifier instead of throwing out the high-order bits.
In the future, the siphash_key can be re-used for other hashed
objects per-nfsd_net.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Chuck Lever
cdc556600c NFSD: Write verifier might go backwards
When vfs_iter_write() starts to fail because a file system is full,
a bunch of writes can fail at once with ENOSPC. These writes
repeatedly invoke nfsd_reset_boot_verifier() in quick succession.

Ensure that the time it grabs doesn't go backwards due to an ntp
adjustment going on at the same time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Trond Myklebust
a2f4c3fa4d nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
Since a clone error commit can cause the boot verifier to change,
we should trace those errors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ]
2022-01-08 14:42:02 -05:00
Chuck Lever
2c445a0e72 NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
Since this pointer is used repeatedly, move it to a stack variable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Chuck Lever
fb7622c2db NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
Since this pointer is used repeatedly, move it to a stack variable.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Chuck Lever
33388b3aef NFSD: Clean up nfsd_vfs_write()
The RWF_SYNC and !RWF_SYNC arms are now exactly alike except that
the RWF_SYNC arm resets the boot verifier twice in a row. Fix that
redundancy and de-duplicate the code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Trond Myklebust
555dbf1a9a nfsd: Replace use of rwsem with errseq_t
The nfsd_file nf_rwsem is currently being used to separate file write
and commit instances to ensure that we catch errors and apply them to
the correct write/commit.
We can improve scalability at the expense of a little accuracy (some
extra false positives) by replacing the nf_rwsem with more careful
use of the errseq_t mechanism to track errors across the different
operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: rebased on zero-verifier fix ]
2022-01-08 14:42:02 -05:00
Chuck Lever
f11ad7aa65 NFSD: Fix verifier returned in stable WRITEs
RFC 8881 explains the purpose of the write verifier this way:

> The final portion of the result is the field writeverf. This field
> is the write verifier and is a cookie that the client can use to
> determine whether a server has changed instance state (e.g., server
> restart) between a call to WRITE and a subsequent call to either
> WRITE or COMMIT.

But then it says:

> This cookie MUST be unchanged during a single instance of the
> NFSv4.1 server and MUST be unique between instances of the NFSv4.1
> server. If the cookie changes, then the client MUST assume that
> any data written with an UNSTABLE4 value for committed and an old
> writeverf in the reply has been lost and will need to be
> recovered.

RFC 1813 has similar language for NFSv3. NFSv2 does not have a write
verifier since it doesn't implement the COMMIT procedure.

Since commit 19e0663ff9 ("nfsd: Ensure sampling of the write
verifier is atomic with the write"), the Linux NFS server has
returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero
verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC
WRITEs are not followed up with a COMMIT, so there's no need for
clients to compare verifiers for stable writes.

However, by returning a different verifier for stable and unstable
writes, the above commit puts the Linux NFS server a step farther
out of compliance with the first MUST above. At least one NFS client
(FreeBSD) noticed the difference, making this a potential
regression.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/
Fixes: 19e0663ff9 ("nfsd: Ensure sampling of the write verifier is atomic with the write")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Jeff Layton
12bcbd40fd nfsd: Retry once in nfsd_open on an -EOPENSTALE return
If we get back -EOPENSTALE from an NFSv4 open, then we either got some
unhandled error or the inode we got back was not the same as the one
associated with the dentry.

We really have no recourse in that situation other than to retry the
open, and if it fails to just return nfserr_stale back to the client.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:02 -05:00
Jeff Layton
a2694e51f6 nfsd: Add errno mapping for EREMOTEIO
The NFS client can occasionally return EREMOTEIO when signalling issues
with the server.  ...map to NFSERR_IO.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
Peng Tao
b3d0db706c nfsd: map EBADF
Now that we have open file cache, it is possible that another client
deletes the file and DP will not know about it. Then IO to MDS would
fail with BADSTATEID and knfsd would start state recovery, which
should fail as well and then nfs read/write will fail with EBADF.
And it triggers a WARN() in nfserrno().

-----------[ cut here ]------------
WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]()
nfsd: non-standard errno: -9
modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt
pata_acpi floppy
CPU: 0 PID: 13529 Comm: nfsd Tainted: G        W       4.1.5-00307-g6e6579b #7
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
 0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936
 0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea
 ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48
 all Trace:
 [<ffffffff81789936>] dump_stack+0x45/0x57
 [<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230
 [<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd]
 [<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd]
 [<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd]
 [<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd]
 [<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd]
 [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
 [<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd]
 [<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd]
 [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
 [<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc]
 [<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc]
 [<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd]
 [<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd]
 [<ffffffff810bf3a8>] kthread+0xd8/0xf0
 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
 [<ffffffff817912a2>] ret_from_fork+0x42/0x70
 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
Chuck Lever
6a2f774424 NFSD: Fix zero-length NFSv3 WRITEs
The Linux NFS server currently responds to a zero-length NFSv3 WRITE
request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE
with NFS4_OK and count of zero.

RFC 1813 says of the WRITE procedure's @count argument:

count
         The number of bytes of data to be written. If count is
         0, the WRITE will succeed and return a count of 0,
         barring errors due to permissions checking.

RFC 8881 has similar language for NFSv4, though NFSv4 removed the
explicit @count argument because that value is already contained in
the opaque payload array.

The synthetic client pynfs's WRT4 and WRT15 tests do emit zero-
length WRITEs to exercise this spec requirement. Commit fdec6114ee
("nfsd4: zero-length WRITE should succeed") addressed the same
problem there with the same fix.

But interestingly the Linux NFS client does not appear to emit zero-
length WRITEs, instead squelching them. I'm not aware of a test that
can generate such WRITEs for NFSv3, so I wrote a naive C program to
generate a zero-length WRITE and test this fix.

Fixes: 8154ef2776 ("NFSD: Clean up legacy NFS WRITE argument XDR decoders")
Reported-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
Vasily Averin
47446d74f1 nfsd4: add refcount for nfsd4_blocked_lock
nbl allocated in nfsd4_lock can be released by a several ways:
directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs
command RELEASE_LOCKOWNER or via nfsd4_callback.
This structure should be refcounted to be used and released correctly
in all these cases.

Refcount is initialized to 1 during allocation and is incremented
when nbl is added into nbl_list/nbl_lru lists.

Usually nbl is linked into both lists together, so only one refcount
is used for both lists.

However nfsd4_lock() should keep in mind that nbl can be present
in one of lists only. This can happen if nbl was handled already
by nfs4_laundromat/nfsd4_callback/etc.

Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED,
because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.

Refcount is not changed in find_blocked_lock() because of it reuses counter
released after removing nbl from lists.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
J. Bruce Fields
40595cdc93 nfs: block notification on fs with its own ->lock
NFSv4.1 supports an optional lock notification feature which notifies
the client when a lock comes available.  (Normally NFSv4 clients just
poll for locks if necessary.)  To make that work, we need to request a
blocking lock from the filesystem.

We turned that off for NFS in commit f657f8eef3 ("nfs: don't atempt
blocking locks on nfs reexports") [sic] because it actually blocks the
nfsd thread while waiting for the lock.

Thanks to Vasily Averin for pointing out that NFS isn't the only
filesystem with that problem.

Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
does the right thing.  Simplest is just to assume that any filesystem
that defines its own ->lock is not safe to request a blocking lock from.

So, this patch mostly reverts commit f657f8eef3 ("nfs: don't atempt
blocking locks on nfs reexports") [sic] and commit b840be2f00 ("lockd:
don't attempt blocking locks on nfs reexports"), and instead uses a
check of ->lock (Vasily's suggestion) to decide whether to support
blocking lock notifications on a given filesystem.  Also add a little
documentation.

Perhaps someday we could add back an export flag later to allow
filesystems with "good" ->lock methods to support blocking lock
notifications.

Reported-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: Description rewritten to address checkpatch nits ]
[ cel: Fixed warning when SUNRPC debugging is disabled ]
[ cel: Fixed NULL check ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
2022-01-08 14:42:01 -05:00
Chuck Lever
cd2e999c7c NFSD: De-duplicate nfsd4_decode_bitmap4()
Clean up. Trond points out that xdr_stream_decode_uint32_array()
does the same thing as nfsd4_decode_bitmap4().

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
J. Bruce Fields
3dcd1d8aab nfsd: improve stateid access bitmask documentation
The use of the bitmaps is confusing.  Add a cross-reference to make it
easier to find the existing comment.  Add an updated reference with URL
to make it quicker to look up.  And a bit more editorializing about the
value of this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:01 -05:00
Chuck Lever
70e94d757b NFSD: Combine XDR error tracepoints
Clean up: The garbage_args and cant_encode tracepoints report the
same information as each other, so combine them into a single
tracepoint class to reduce code duplication and slightly reduce the
size of trace.o.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:41:11 -05:00
NeilBrown
1463b38e7c NFSD: simplify per-net file cache management
We currently have a 'laundrette' for closing cached files - a different
work-item for each network-namespace.

These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a
list, and are freed using rcu.

The list is not necessary as we have a per-namespace structure (struct
nfsd_net) which can hold a link to the nfsd_fcache_disposal.
The use of kfree_rcu is also unnecessary as the cache is cleaned of all
files associated with a given namespace, and no new files can be added,
before the nfsd_fcache_disposal is freed.

So add a '->fcache_disposal' link to nfsd_net, and discard the list
management and rcu usage.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:59 -05:00
Jiapeng Chong
1e37d0e5bd NFSD: Fix inconsistent indenting
Eliminate the follow smatch warning:

fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent
indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:59 -05:00
Chuck Lever
7578b2f628 NFSD: Remove be32_to_cpu() from DRC hash function
Commit 7142b98d9f ("nfsd: Clean up drc cache in preparation for
global spinlock elimination"), billed as a clean-up, added
be32_to_cpu() to the DRC hash function without explanation. That
commit removed two comments that state that byte-swapping in the
hash function is unnecessary without explaining whether there was
a need for that change.

On some Intel CPUs, the swab32 instruction is known to cause a CPU
pipeline stall. be32_to_cpu() does not add extra randomness, since
the hash multiplication is done /before/ shifting to the high-order
bits of the result.

As a micro-optimization, remove the unnecessary transform from the
DRC hash function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:59 -05:00
NeilBrown
23a1a573c6 NFS: switch the callback service back to non-pooled.
Now that thread management is consistent there is no need for
nfs-callback to use svc_create_pooled() as introduced in Commit
df807fffaa ("NFSv4.x/callback: Create the callback service through
svc_create_pooled").  So switch back to svc_create().

If service pools were configured, but the number of threads were left at
'1', nfs callback may not work reliably when svc_create_pooled() is used.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:58 -05:00
NeilBrown
6b044fbaab lockd: use svc_set_num_threads() for thread start and stop
svc_set_num_threads() does everything that lockd_start_svc() does, except
set sv_maxconn.  It also (when passed 0) finds the threads and
stops them with kthread_stop().

So move the setting for sv_maxconn, and use svc_set_num_thread()

We now don't need nlmsvc_task.

Now that we use svc_set_num_threads() it makes sense to set svo_module.
This request that the thread exists with module_put_and_exit().
Also fix the documentation for svo_module to make this explicit.

svc_prepare_thread is now only used where it is defined, so it can be
made static.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:58 -05:00
NeilBrown
ecd3ad68d2 lockd: rename lockd_create_svc() to lockd_get()
lockd_create_svc() already does an svc_get() if the service already
exists, so it is more like a "get" than a "create".

So:
 - Move the increment of nlmsvc_users into the function as well
 - rename to lockd_get().

It is now the inverse of lockd_put().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:56 -05:00
NeilBrown
865b674069 lockd: introduce lockd_put()
There is some cleanup that is duplicated in lockd_down() and the failure
path of lockd_up().
Factor these out into a new lockd_put() and call it from both places.

lockd_put() does *not* take the mutex - that must be held by the caller.
It decrements nlmsvc_users and if that reaches zero, it cleans up.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:56 -05:00
NeilBrown
6a4e2527a6 lockd: move svc_exit_thread() into the thread
The normal place to call svc_exit_thread() is from the thread itself
just before it exists.
Do this for lockd.

This means that nlmsvc_rqst is not used out side of lockd_start_svc(),
so it can be made local to that function, and renamed to 'rqst'.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:55 -05:00
NeilBrown
b73a297204 lockd: move lockd_start_svc() call into lockd_create_svc()
lockd_start_svc() only needs to be called once, just after the svc is
created.  If the start fails, the svc is discarded too.

It thus makes sense to call lockd_start_svc() from lockd_create_svc().
This allows us to remove the test against nlmsvc_rqst at the start of
lockd_start_svc() - it must always be NULL.

lockd_up() only held an extra reference on the svc until a thread was
created - then it dropped it.  The thread - and thus the extra reference
- will remain until kthread_stop() is called.
Now that the thread is created in lockd_create_svc(), the extra
reference can be dropped there.  So the 'serv' variable is no longer
needed in lockd_up().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:55 -05:00
NeilBrown
5a8a7ff574 lockd: simplify management of network status notifiers
Now that the network status notifiers use nlmsvc_serv rather then
nlmsvc_rqst the management can be simplified.

Notifier unregistration synchronises with any pending notifications so
providing we unregister before nlm_serv is freed no further interlock
is required.

So we move the unregister call to just before the thread is killed
(which destroys the service) and just before the service is destroyed in
the failure-path of lockd_up().

Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:55 -05:00
NeilBrown
2840fe864c lockd: introduce nlmsvc_serv
lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it
wants the 'struct svc_serv', and when it doesn't want it exactly it can
get to what it wants from the serv.

This patch is a first step to removing nlmsvc_task and nlmsvc_rqst.  It
introduces nlmsvc_serv to store the 'struct svc_serv*'.  This is set as
soon as the serv is created, and cleared only when it is destroyed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:54 -05:00
NeilBrown
d057cfec49 NFSD: simplify locking for network notifier.
nfsd currently maintains an open-coded read/write semaphore (refcount
and wait queue) for each network namespace to ensure the nfs service
isn't shut down while the notifier is running.

This is excessive.  As there is unlikely to be contention between
notifiers and they run without sleeping, a single spinlock is sufficient
to avoid problems.

Signed-off-by: NeilBrown <neilb@suse.de>
[ cel: ensure nfsd_notifier_lock is static ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:54 -05:00
NeilBrown
3ebdbe5203 SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
The ->svo_setup callback serves no purpose.  It is always called from
within the same module that chooses which callback is needed.  So
discard it and call the relevant function directly.

Now that svc_set_num_threads() is no longer used remove it and rename
svc_set_num_threads_sync() to remove the "_sync" suffix.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:53 -05:00
NeilBrown
3409e4f1e8 NFSD: Make it possible to use svc_set_num_threads_sync
nfsd cannot currently use svc_set_num_threads_sync.  It instead
uses svc_set_num_threads which does *not* wait for threads to all
exit, and has a separate mechanism (nfsd_shutdown_complete) to wait
for completion.

The reason that nfsd is unlike other services is that nfsd threads can
exit separately from svc_set_num_threads being called - they die on
receipt of SIGKILL.  Also, when the last thread exits, the service must
be shut down (sockets closed).

For this, the nfsd_mutex needs to be taken, and as that mutex needs to
be held while svc_set_num_threads is called, the one cannot wait for
the other.

This patch changes the nfsd thread so that it can drop the ref on the
service without blocking on nfsd_mutex, so that svc_set_num_threads_sync
can be used:
 - if it can drop a non-last reference, it does that.  This does not
   trigger shutdown and does not require a mutex.  This will likely
   happen for all but the last thread signalled, and for all threads
   being shut down by nfsd_shutdown_threads()
 - if it can get the mutex without blocking (trylock), it does that
   and then drops the reference.  This will likely happen for the
   last thread killed by SIGKILL
 - Otherwise there might be an unrelated task holding the mutex,
   possibly in another network namespace, or nfsd_shutdown_threads()
   might be just about to get a reference on the service, after which
   we can drop ours safely.
   We cannot conveniently get wakeup notifications on these events,
   and we are unlikely to need to, so we sleep briefly and check again.

With this we can discard nfsd_shutdown_complete and
nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:53 -05:00
NeilBrown
9d3792aefd NFSD: narrow nfsd_mutex protection in nfsd thread
There is nothing happening in the start of nfsd() that requires
protection by the mutex, so don't take it until shutting down the thread
- which does still require protection - but only for nfsd_put().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:52 -05:00
NeilBrown
2a36395fac SUNRPC: use sv_lock to protect updates to sv_nrthreads.
Using sv_lock means we don't need to hold the service mutex over these
updates.

In particular,  svc_exit_thread() no longer requires synchronisation, so
threads can exit asynchronously.

Note that we could use an atomic_t, but as there are many more read
sites than writes, that would add unnecessary noise to the code.
Some reads are already racy, and there is no need for them to not be.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:52 -05:00
NeilBrown
9b6c8c9beb nfsd: make nfsd_stats.th_cnt atomic_t
This allows us to move the updates for th_cnt out of the mutex.
This is a step towards reducing mutex coverage in nfsd().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:51 -05:00
NeilBrown
ec52361df9 SUNRPC: stop using ->sv_nrthreads as a refcount
The use of sv_nrthreads as a general refcount results in clumsy code, as
is seen by various comments needed to explain the situation.

This patch introduces a 'struct kref' and uses that for reference
counting, leaving sv_nrthreads to be a pure count of threads.  The kref
is managed particularly in svc_get() and svc_put(), and also nfsd_put();

svc_destroy() now takes a pointer to the embedded kref, rather than to
the serv.

nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero.  This
happens when a transport is created before the first thread is started.
To support this, a 'keep_active' flag is introduced which holds a ref on
the svc_serv.  This is set when any listening socket is successfully
added (unless there are running threads), and cleared when the number of
threads is set.  So when the last thread exits, the nfs_serv will be
destroyed.
The use of 'keep_active' replaces previous code which checked if there
were any permanent sockets.

We no longer clear ->rq_server when nfsd() exits.  This was done
to prevent svc_exit_thread() from calling svc_destroy().
Instead we take an extra reference to the svc_serv to prevent
svc_destroy() from being called.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:51 -05:00
NeilBrown
8c62d12740 SUNRPC/NFSD: clean up get/put functions.
svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
it might just reduce the ref count.
nfsd_destroy() is poorly named for the same reason.

This patch:
 - removes the refcount functionality from svc_destroy(), moving it to
   a new svc_put().  Almost all previous callers of svc_destroy() now
   call svc_put().
 - renames nfsd_destroy() to nfsd_put() and improves the code, using
   the new svc_destroy() rather than svc_put()
 - removes a few comments that explain the important for balanced
   get/put calls.  This should be obvious.

The only non-trivial part of this is that svc_destroy() would call
svc_sock_update() on a non-final decrement.  It can no longer do that,
and svc_put() isn't really a good place of it.  This call is now made
from svc_exit_thread() which seems like a good place.  This makes the
call *before* sv_nrthreads is decremented rather than after.  This
is not particularly important as the call just sets a flag which
causes sv_nrthreads set be checked later.  A subsequent patch will
improve the ordering.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:50 -05:00
NeilBrown
df5e49c880 SUNRPC: change svc_get() to return the svc.
It is common for 'get' functions to return the object that was 'got',
and there are a couple of places where users of svc_get() would be a
little simpler if svc_get() did that.

Make it so.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:50 -05:00
NeilBrown
89b24336f0 NFSD: handle errors better in write_ports_addfd()
If write_ports_add() fails, we shouldn't destroy the serv, unless we had
only just created it.  So if there are any permanent sockets already
attached, leave the serv in place.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:49 -05:00
Chuck Lever
c2f1c4bd20 NFSD: Fix sparse warning
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    expected restricted __be32 [usertype] status
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    got int

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:49 -05:00
Linus Torvalds
e034d9cbf9 Fixes for 5.16-rc4:
- Fix a data corruption vector that can result from the ro remount
    process failing to clear all speculative preallocations from files
    and the rw remount process not noticing the incomplete cleanup.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGyQ6UACgkQ+H93GTRK
 tOskJA/+Jg+fn+Vs7dHwBN+SaHCBH81RT+YD9gbI7Lq5xcK16K7BWz3Hf360J9uJ
 BpVn0fv6/+3bNg0/QMWUE67CotB2S43CzWnltUN0ymiGgAEO1eMKL1JHRtZ/89d4
 pG2Lr9W1QsxMqP9L/TgxqHb5WC2xtromDLMWa7qYL8aH4qp4Dx6hv++u3qhbdTis
 sfvTOMsDKTwK5RC+7yB2qxr/Txn7viIlBLjTWR3UvvO0K1LxjuBTXpBsiVeQE92T
 ZqLgAhWDBu/wZFBibqtMsnLF0Wmn5oIZJgBoxp/+Ze8saUsuYdAI1X65ESdVmw7E
 ZJtSyZObDHOlRhvDn7Vlb8BNF7QY4l4n2eeGFB4zoPdy0jjz/7QKCqMSm7YCMXqc
 WycjeEIo1H+ZEk0AWu7KrTUAaO3zzqHB/DJt4zQA/w4QdSD14bAsyp8ZWyUEMnWy
 7M0Pc0q2ptzOOp1rjd2J+0s9YSc24/dceW1lCDpIDSkOuxO82HdCxlWQ/YPEt2Lx
 gGv2hll8Cg0j18VThB++XZ3ozImFu+sZhkQPPCM/kKY0tWZvd4Yby1bWLNzS+ppE
 RIKh+hyhFTXmr0C2iJ8fGNFu/KA3V0efztvCIATHDmGcC6f4WePUQyLzZGFgS0Yw
 uBPmvqWeUWySgfauIvqW7Oosf0pbKvYVEd+1fcLVdh0sgr2MUPE=
 =N9Wf
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "This fixes a race between a readonly remount process and other
  processes that hold a file IOLOCK on files that previously experienced
  copy on write, that could result in severe filesystem corruption if
  the filesystem is then remounted rw.

  I think this is fairly rare (since the only reliable reproducer I have
  that fits the second criteria is the experimental xfs_scrub program),
  but the race is clear, so we still need to fix this.

  Summary:

   - Fix a data corruption vector that can result from the ro remount
     process failing to clear all speculative preallocations from files
     and the rw remount process not noticing the incomplete cleanup"

* tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove all COW fork extents when remounting readonly
2021-12-11 16:21:06 -08:00
Linus Torvalds
f152165ada io_uring-5.16-2021-12-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG0PwYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgphbiEACv95PrGVuOzgXwC1Ydk4Km5Yyt8vANKcxA
 LFzeScrRhGG28vHCLZuPSxtFn9nGjBJPDKscDWQE0tpq8NwB6BAjr3Ps2AR4tI3M
 2bn1ssHIHADdkaV1ixwTtQynHgCvOtKSTmLJ+TTDIYdgwAyRVlRBtpke/nXn5VCR
 kUdQ/3hHSPbEY+WvrDz7yAvDIMxTrBHmt8TbDiI1FXzNf8AOd32ekpBNzwe2hTQi
 OLhpaQ60S74UG78c4OGg6SV+twW0nkp1DyTf8MhC3WoBBnTwQdc/BpLNJbnUwH57
 lJVJTz0sknbnQCtifJ8/Wjzn6r95cOwbYvaGgXyi7KtTGn0QXLcK+FohNVizFR0x
 qTMWlFJ5aPmzqed6oMZ2m8iaGxR9813I0GP54Hsqw4PmoPb1GDN5FvZIN+MKdaJd
 PSp9YH7RDHsusYZjjaUtHIcxhxztblf5wNVAkmkfrFZzNArnwsw0k9gX7DI0f+GX
 +TazZzX8gLRonAskBHgoSDT+8WOzDOHYYIVVe0g/8TR+jW8oGDd9D0sJbH74dV1Y
 3C4WLbJCquBzatotmcmgkt9CJzZz9y373cVVNu4HZgH49qjEZCJuFgBaTpanvwHk
 Qk5KTl9yIfjs+H2FDWa2Pd6LnsVvuIpwOd4R37RDLGM2ZiX100eX1VbD6RxtIehQ
 PPKP13kYAQ==
 =BUVd
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes that are all bound for stable:

   - Two syzbot reports for io-wq that turned out to be separate fixes,
     but ultimately very closely related

   - io_uring task_work running on cancelations"

* tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block:
  io-wq: check for wq exit after adding new worker task_work
  io_uring: ensure task_work gets run as part of cancelations
  io-wq: remove spurious bit clear on task_work addition
2021-12-11 09:19:44 -08:00
Linus Torvalds
6f51352929 for-5.16-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmGwxl8ACgkQxWXV+ddt
 WDvSeQ//T+mv4o7ucldQZCVdN0TTVCkQUhia+ZdMwBcPty2/ZEdap+KEIVmfCV/v
 OLRmSNkIPDhHcIc/O3zJ1/AY0DFbb9brYGkMD/qidgPbRArhDZSrDIr+xnrKJ0iq
 HFxM01B54l8hJe6GWIGFuuOz+nXUP1o9SfiDOwMDTkqgzz1JSvPec70RKMxG8pTC
 4plVrGaUXkKTC8WyBXnSvkP2gvjfJxqnEKv2Ru1eP1t7Bq65aed0+Z32Mogzl2ip
 ZSHVlRergeB03xF9YErWSfgofVWLEIXgLCh/Wfq73sMnHUHUGM/dpEKxP911MI00
 A5TuTl3I25cVnDv3qMayBPECMCCGZc2vUyXTLCWUNYLb/vEUTI36QBu/KglRURmO
 Zx20B+3/7Yu9pFeZ23S+nlNdGDADmjkOfvZIZDoYDzqBAKWM6kZ/9oWiib21Uwi6
 ql5oZNl7G6UXLPvgJoq8dqZIj/HYeLEHeqwf/tepSVQLXYzQyWYDMp686z958XI1
 K/A/TKaKk19nn1Dhsz4KJeb3xhMFlFN30K7BNjdH3XRH1RrwJP6pLF/WUduJq2qn
 rAvJoODLHkby1D9+HqSKW/RToeR5NbBYYdfB0Q8zpy9zF/gZX7wgFc21/4B7qtib
 hkKfNSfAVjedOJJWTFCUDnZwwrqtcrKxYzUpdcls/O1AjifK7+U=
 =Mh19
 -----END PGP SIGNATURE-----

Merge tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more regression fixes and stable patches, mostly one-liners.

  Regression fixes:

   - fix pointer/ERR_PTR mismatch returned from memdup_user

   - reset dedicated zoned mode relocation block group to avoid using it
     and filling it without any recourse

  Fixes:

   - handle a case to FITRIM range (also to make fstests/generic/260
     work)

   - fix warning when extent buffer state and pages get out of sync
     after an IO error

   - fix transaction abort when syncing due to missing mapping error set
     on metadata inode after inlining a compressed file

   - fix transaction abort due to tree-log and zoned mode interacting in
     an unexpected way

   - fix memory leak of additional extent data when qgroup reservation
     fails

   - do proper handling of slot search call when deleting root refs"

* tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling
  btrfs: zoned: clear data relocation bg on zone finish
  btrfs: free exchange changeset on failures
  btrfs: fix re-dirty process of tree-log nodes
  btrfs: call mapping_set_error() on btree inode with a write error
  btrfs: clear extent buffer uptodate when we fail to write it
  btrfs: fail if fstrim_range->start == U64_MAX
  btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2()
2021-12-10 17:28:02 -08:00
Linus Torvalds
e1b96811e2 two cifs/smb3 fixes, one for stable, the other fixes a recently reported NTLMSSP auth problem
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGzuJoACgkQiiy9cAdy
 T1H9MAv/cOAk4iUhgDUsa+HYsBNiAQ+IOOu/WOZI56jEV/qVP0I3cpkiIJenG5gu
 ZurivI4smsgNokHOFoT3vjtLXTfTl0OdHHY/mftd5IGIPG+KnXcg3+ZaE3T+fUV3
 uHX0cH3a3Azo3RGf2rRiTfW6u9FXJnb9aAUTif7UDVwsU37wUAQrKEmcazaUXdaT
 +j3KpqwGhCgbkneKuAd/FDTAg4wciJgRg3aE/4W2s2ovFiF6vsUcrmhQD1zP1EZh
 sPWdx4/U+WpeV02RfKBLQlXi6ofqRF5qRT4HkD07G5Zhz8OOZcm06wYFNl/+aYhF
 lTOTa9KAoTPfV2tFlTGZVEy07ggEFd3wDxeAPulDrqY8etwWSwRAHHRi5HstMlIX
 iYz06DLS+LSPyriy6GV6CIL01OPg8vkeLPbrLFbs8oLSVQgmx74UDS339oxJMdFe
 kkhLgtfRr5DfxS3uD0u17aBDL4ullH84dWdlsUKiUfGvVsBcoVSCPIQeWqkB8xjA
 OTKih3J1
 =+1L9
 -----END PGP SIGNATURE-----

Merge tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Two cifs/smb3 fixes - one for stable, the other fixes a recently
  reported NTLMSSP auth problem"

* tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix ntlmssp auth when there is no key exchange
  cifs: Fix crash on unload of cifs_arc4.ko
2021-12-10 17:24:57 -08:00
Linus Torvalds
e80bdc5ed0 Fix a race on startup and another in the delegation code. The latter
has been around for years, but I suspect recent changes may have
 widened the race window a little, so I'd like to go ahead and get it in.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGzuB8VHGJmaWVsZHNA
 ZmllbGRzZXMub3JnAAoJECebzXlCjuG+rYQP/1N61E3rf2MOfF4sluubaGGi0fVR
 cx3w8pTDSALfxAiz0ZTA6hJbZN0SR7MZuXMhWDHT/RiG7jFfuMJACU81BWn0cCPJ
 zuJziE3Y6fElGi5Ifp71yMLABqTlFPOUHBZkpzaADqXGwmZk4fecXmqKbtqa9kCU
 1mCQJlZDGU1ljGVYuXU4SaBOzsrCpPfbxby2ZWFDs8bu5cGKDTq9OSVF1LaGzjMN
 2+v0y2Qni+dRfTBCGAmxF8AukRa2r5i2rfHRatgKQO8PdROv/pguy1hYjjeeofhX
 /Jlq2RIVl1/9/Fwar4Cew2+fpQhpaY62wUuo4KG4PK5WZ92LGDjSpYqvu/I4lI6z
 BXdge0HzDPRnpZtD+tu4IJ7nQwG+JnO2Ws91rPxwT6DDX4zX9gNZc+7EGevVeXJI
 XTO+61Tyz+7/6YLR8UvnvyER9KvKjQiUWZkO81LZpuAAwn1PE3rahD48jxkmwwYE
 m5EvfuA6YR8oyk3Ml+nZNuKwgrNqPcvohyJz4QHpNpqmvOlVrb9I1k90JzIm8D9Q
 09M8An1Z3AGw76+RMnvBnWewn1Nx1bKIS32SJk9x3zBoeyXLS17G6FmpEhW81xvN
 LoToRqD/GNx+4AtfvkHqwv2/Qd1rvONCsjbZv+2P0DAj8ZEOWn0YHk18WlvAIZUg
 4ipyZbwMUbXeytKP
 =GgUu
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Fix a race on startup and another in the delegation code.

  The latter has been around for years, but I suspect recent changes may
  have widened the race window a little, so I'd like to go ahead and get
  it in"

* tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix use-after-free due to delegation race
  nfsd: Fix nsfd startup race (again)
2021-12-10 17:17:53 -08:00
Linus Torvalds
257dcf2923 tracing, ftrace and tracefs fixes:
- Have tracefs honor the gid mount option
 
  - Have new files in tracefs inherit the parent ownership
 
  - Have direct_ops unregister when it has no more functions
 
  - Properly clean up the ops when unregistering multi direct ops
 
  - Add a sample module to test the multiple direct ops
 
  - Fix memory leak in error path of __create_synth_event()
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYbOgPBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgOtAP0YD+cRLxnRKA376oQVB8zmuZ3mZ/4x
 6M1hqruSDlno3AEA19PyHpxl7flFwnBb6Gnfo9VeefcMS5ENDH9p1aHX4wU=
 =Tr6t
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Tracing, ftrace and tracefs fixes:

   - Have tracefs honor the gid mount option

   - Have new files in tracefs inherit the parent ownership

   - Have direct_ops unregister when it has no more functions

   - Properly clean up the ops when unregistering multi direct ops

   - Add a sample module to test the multiple direct ops

   - Fix memory leak in error path of __create_synth_event()"

* tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix possible memory leak in __create_synth_event() error path
  ftrace/samples: Add module to test multi direct modify interface
  ftrace: Add cleanup to unregister_ftrace_direct_multi
  ftrace: Use direct_ops hash in unregister_ftrace_direct
  tracefs: Set all files to the same group ownership as the mount option
  tracefs: Have new files inherit the ownership of their parent
2021-12-10 14:24:05 -08:00
Linus Torvalds
0d21e66847 aio poll fixes for 5.16-rc5
Fix three bugs in aio poll, and one issue with POLLFREE more broadly:
 
   - aio poll didn't handle POLLFREE, causing a use-after-free.
   - aio poll could block while the file is ready.
   - aio poll called eventfd_signal() when it isn't allowed.
   - POLLFREE didn't handle multiple exclusive waiters correctly.
 
 This has been tested with the libaio test suite, as well as with test
 programs I wrote that reproduce the first two bugs.  I am sending this
 pull request myself as no one seems to be maintaining this code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYbObthQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+3mAQC9W8ApzBleEPI6FXzIIo5AiQT/2jGl
 7FbO1MtkdUBU4QEAzf+VWl4Z4BJTgxl44avRdVDpXGAMnbWkd7heY+e3HwA=
 =mp+r
 -----END PGP SIGNATURE-----

Merge tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull aio poll fixes from Eric Biggers:
 "Fix three bugs in aio poll, and one issue with POLLFREE more broadly:

   - aio poll didn't handle POLLFREE, causing a use-after-free.

   - aio poll could block while the file is ready.

   - aio poll called eventfd_signal() when it isn't allowed.

   - POLLFREE didn't handle multiple exclusive waiters correctly.

  This has been tested with the libaio test suite, as well as with test
  programs I wrote that reproduce the first two bugs. I am sending this
  pull request myself as no one seems to be maintaining this code"

* tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  aio: Fix incorrect usage of eventfd_signal_allowed()
  aio: fix use-after-free due to missing POLLFREE handling
  aio: keep poll requests on waitqueue until completed
  signalfd: use wake_up_pollfree()
  binder: use wake_up_pollfree()
  wait: add wake_up_pollfree()
2021-12-10 14:15:39 -08:00