Commit Graph

77968 Commits

Author SHA1 Message Date
Dai Ngo
7746b32f46 NFSD: add shrinker to reap courtesy clients on low memory condition
Add courtesy_client_reaper to react to low memory condition triggered
by the system memory shrinker.

The delayed_work for the courtesy_client_reaper is scheduled on
the shrinker's count callback using the laundry_wq.

The shrinker's scan callback is not used for expiring the courtesy
clients due to potential deadlocks.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:41 -04:00
Dai Ngo
3a4ea23d86 NFSD: keep track of the number of courtesy clients in the system
Add counter nfs4_courtesy_client_count to nfsd_net to keep track
of the number of courtesy clients in the system.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:41 -04:00
Anna Schumaker
06981d5606 NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:41 -04:00
Chuck Lever
5f5f8b6d65 NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
nfsd_unlink() can kick off a CB_RECALL (via
vfs_unlink() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_unlink()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:41 -04:00
Chuck Lever
68c522afd0 NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

This version of the patch handles renaming an existing file,
but does not deal with renaming onto an existing file. That
case will still always trigger an NFS4ERR_DELAY.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:40 -04:00
Chuck Lever
34b91dda71 NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:36 -04:00
Chuck Lever
c0aa1913db NFSD: Refactor nfsd_setattr()
Move code that will be retried (in a subsequent patch) into a helper
function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:32 -04:00
Chuck Lever
c035362eb9 NFSD: Add a mechanism to wait for a DELEGRETURN
Subsequent patches will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.

The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN:

            nfsd-1155  [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:32 -04:00
Chuck Lever
1035d65446 NFSD: Add tracepoints to report NFSv4 callback completions
Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

            nfsd-1153  [002]   211.986391: nfsd_cb_recall:       addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

            nfsd-1153  [002]   212.095634: nfsd_compound:        xid=0x0000002c opcnt=2
            nfsd-1153  [002]   212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
            nfsd-1153  [002]   212.095658: nfsd_file_put:        hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
            nfsd-1153  [002]   212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
   kworker/u25:8-148   [002]   212.096713: nfsd_cb_recall_done:  client 62ea82e4:fee7492a stateid 00000003:00000001 status=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:32 -04:00
Chuck Lever
de29cf7e6c NFSD: Trace NFSv4 COMPOUND tags
The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:31 -04:00
Chuck Lever
948755efc9 NFSD: Replace dprintk() call site in fh_verify()
Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:31 -04:00
Gaosheng Cui
18224dc58d nfsd: remove nfsd4_prepare_cb_recall() declaration
nfsd4_prepare_cb_recall() has been removed since
commit 0162ac2b97 ("nfsd: introduce nfsd4_callback_ops"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:31 -04:00
Jeff Layton
6106d9119b nfsd: clean up mounted_on_fileid handling
We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Chuck Lever
7518a3dc5e NFSD: Fix handling of oversized NFSv4 COMPOUND requests
If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c6d ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
NeilBrown
9558f9304c NFSD: drop fname and flen args from nfsd_create_locked()
nfsd_create_locked() does not use the "fname" and "flen" arguments, so
drop them from declaration and all callers.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Chuck Lever
fa6be9cc6e NFSD: Protect against send buffer overflow in NFSv3 READ
Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Chuck Lever
401bc1f908 NFSD: Protect against send buffer overflow in NFSv2 READ
Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:29 -04:00
Chuck Lever
640f87c190 NFSD: Protect against send buffer overflow in NFSv3 READDIR
Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply message at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this
issue.

Reported-by: Ben Ronallo <Benjamin.Ronallo@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:26 -04:00
Chuck Lever
00b4492686 NFSD: Protect against send buffer overflow in NFSv2 READDIR
Restore the previous limit on the @count argument to prevent a
buffer overflow attack.

Fixes: 53b1119a6e ("NFSD: Fix READDIR buffer overflow")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:26 -04:00
Chuck Lever
80e591ce63 NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
single large COMPOUND that chains a series of LOOKUPs to get to the
pseudo filesystem root directory that is to be mounted. The Linux
NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
not large enough to ensure that this works for paths that are more
than a few components deep.

Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
trigger any re-allocation of the operation array on the server),
increasing this maximum should result in little to no impact.

The ops array can get large now, so allocate it via vmalloc() to
help ensure memory fragmentation won't cause an allocation failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:25 -04:00
Christophe JAILLET
30a30fcc3f nfsd: Propagate some error code returned by memdup_user()
Propagate the error code returned by memdup_user() instead of a hard coded
-EFAULT.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:22 -04:00
Christophe JAILLET
d44899b8bb nfsd: Avoid some useless tests
memdup_user() can't return NULL, so there is no point for checking for it.

Simplify some tests accordingly.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:21 -04:00
Christophe JAILLET
fd1ef88049 nfsd: Fix a memory leak in an error handling path
If this memdup_user() call fails, the memory allocated in a previous call
a few lines above should be freed. Otherwise it leaks.

Fixes: 6ee95d1c89 ("nfsd: add support for upcall version 2")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:21 -04:00
Jinpeng Cui
4ab3442ca3 NFSD: remove redundant variable status
Return value directly from fh_verify() do_open_permission()
exp_pseudoroot() instead of getting value from
redundant variable status.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:21 -04:00
Olga Kornievskaia
754035ff79 NFSD enforce filehandle check for source file in COPY
If the passed in filehandle for the source file in the COPY operation
is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:20 -04:00
Wolfram Sang
97f8e62572 lockd: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:20 -04:00
Wolfram Sang
72f78ae00a NFSD: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:20 -04:00
Linus Torvalds
714820c639 four smb3 fixes for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmMkBrUACgkQiiy9cAdy
 T1HweAv/XBCQJzpgObz6TDGBp38lu9DCRIRIZzkSMzuuwXyZhhRfdLrvtiuWHgbw
 A3kzRnhHuigGiWda6vY+IlncTJHomqAntsyVg+9Dj1MoNzGtbOLHYnBAV/4mz5GK
 zAJMp7LaZSSFJTcG9QlsbJvvxfFWBUHI3/feu7mhJBF9vCV2cfyuzJoEsF2A4x2k
 QbfyaVQyyJmKFu+c8Auzwz72scR0Qy98iYvd81DaU3IvTYgtHSbb79zNf02M+BOf
 Ocfl9c6DNawkcuXaLeCy5adScXBzzmmEfcZJvRHIfWZGTTaB1/6lMzABLAukY7RQ
 YWKtxQoVfpKchFUEmlzhEFzQWzZh/3C2lvmIDeINXbB+8+YNGqBTQTu8UtyfPBVI
 Bf+Z0zDpITocnwfjeUhD7fSD6YCpk+jymlBaRbxLGa7NlTWEltK6IITwT7Y4fuy+
 Dx6ev3rpeRSL25kmoJFYwA8/wnofBuO2mNE5FMvK33SO/ByGBY1oAnGghuRq0tJA
 JKUad27E
 =waB8
 -----END PGP SIGNATURE-----

Merge tag '6.0-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Four smb3 fixes for stable:

   - important fix to revalidate mapping when doing direct writes

   - missing spinlock

   - two fixes to socket handling

   - trivial change to update internal version number for cifs.ko"

* tag '6.0-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: add missing spinlock around tcon refcount
  cifs: always initialize struct msghdr smb_msg completely
  cifs: don't send down the destination address to sendmsg for a SOCK_STREAM
  cifs: revalidate mapping when doing direct writes
2022-09-16 06:41:44 -07:00
Steve French
8af8aed97b cifs: update internal module number
To 2.39

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-09-14 04:00:06 -05:00
Paulo Alcantara
621a41ae08 cifs: add missing spinlock around tcon refcount
Add missing spinlock to protect updates on tcon refcount in
cifs_put_tcon().

Fixes: d7d7a66aac ("cifs: avoid use of global locks for high contention data")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-09-14 03:59:51 -05:00
Stefan Metzmacher
bedc8f76b3 cifs: always initialize struct msghdr smb_msg completely
So far we were just lucky because the uninitialized members
of struct msghdr are not used by default on a SOCK_STREAM tcp
socket.

But as new things like msg_ubuf and sg_from_iter where added
recently, we should play on the safe side and avoid potention
problems in future.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-09-13 22:55:45 -05:00
Stefan Metzmacher
17d3df38dc cifs: don't send down the destination address to sendmsg for a SOCK_STREAM
This is ignored anyway by the tcp layer.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-09-13 22:55:15 -05:00
Linus Torvalds
d1221cea11 fix for nfsd regression caused by iov_iter stuff this window
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYx/tpQAKCRBZ7Krx/gZQ
 6wuPAQCrJ18XdncE/VnpRfcpQ+UQJXryURXwijaVBswv3LpM/gEAgQcmFff7pR0P
 ZXzS56F/D0faE2evltbAMQCWUoeapQA=
 =mTrE
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter fix from Al Viro:
 "Fix for a nfsd regression caused by the iov_iter stuff this window"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd_splice_actor(): handle compound pages
2022-09-13 15:11:38 +02:00
Al Viro
bfbfb6182a nfsd_splice_actor(): handle compound pages
pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
worth of data).  Theoretically it had been possible since way back, but
nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.
Fortunately, the only thing that changes for compound pages is that we
need to stuff each relevant subpage in and convert the offset into offset
in the first subpage.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: f0f6b614f8 "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-12 22:38:36 -04:00
Linus Torvalds
6504d82f44 NFS client bugfixes for Linux 6.0
Highlights include:
 
 Bugfixes
 - Fix SUNRPC call completion races with call_decode() that trigger a
   WARN_ON()
 - NFSv4.0 cannot support open-by-filehandle and NFS re-export
 - Revert "SUNRPC: Remove unreachable error condition" to allow handling
   of error conditions
 - Update suid/sgid mode bits after ALLOCATE and DEALLOCATE
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmMbd3UACgkQZwvnipYK
 APKG2A//RGBrs41UNY8FwEFZ6Q22OU7LzmpC5a1g97d35xHaYhJ7rGNydv3h7r0R
 T4/BzK3jdgH7jSr8mw/8lpCbh0mkSUY6kwu27qk6UVFel9RkKkxgy5VwPl7C8SzF
 u/zimyX3ftVd9h73ZKnKncx6d/88jlKccRvj1a6iAc+bj9jkeA85Gkl8tOXxEvtf
 BSwBpUKcmLtGpof5EPkt+eX9NKph2t1NhV/33CIRErWZpsAMEjhXvNTdca9biX2L
 fpVUyHrBceBqklO+dGIR3l6FeeZWkn/ceMZ043jzKsiufb9WKuy4X2pYHfijVA0W
 ZlMYUUtdrbju762GAfWjK0tiJG6ZieZs8SWQFmg3PjdNQBBHwlxRIwL8L+wOHLa0
 WzFzZy44IeW/7yAFdRp5YeZU9ZisMied3s4w1dWERKtgc8CtQzQTQ9XDTv9/33GC
 OIMl/Wf1Po8CaNLDb7T3FYST0IJQSwuHBSg/1B8+MezhsAhNKhYEy9BdRp2yeVI7
 +rau13ZBLCIaN3IjM0WF+6aqxmJh+euK9Wqs5DF6lsWzbVZQmTYYYv8U5J3lmJY4
 AN/laj+sUQ3gSfmR5tA9QGk4rPhPqKxIGu+kcRyaIaA6FRhJbj+45IkvHtudQvU+
 64/CoesuKDTjvi1UJk1oS/3TQHhogJgOXuY1orxBEBDE1MoCTHk=
 =KudX
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Fix SUNRPC call completion races with call_decode() that trigger a
   WARN_ON()

 - NFSv4.0 cannot support open-by-filehandle and NFS re-export

 - Revert "SUNRPC: Remove unreachable error condition" to allow handling
   of error conditions

 - Update suid/sgid mode bits after ALLOCATE and DEALLOCATE

* tag 'nfs-for-5.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  Revert "SUNRPC: Remove unreachable error condition"
  NFSv4.2: Update mode bits after ALLOCATE and DEALLOCATE
  NFSv4: Turn off open-by-filehandle and NFS re-export for NFSv4.0
  SUNRPC: Fix call completion races with call_decode()
2022-09-12 17:53:46 -04:00
Linus Torvalds
62d1cea7d6 Address an NFSD regression introduced during the 6.0 merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmMfnhkACgkQM2qzM29m
 f5cgWA/9HLrh55F9y4PEf1ZQFCVZ/PWUGNJlBOXwKfudxiBgOk/LsbC9b9DhQ4de
 KaGJZXs9pxNbOkKVJIPJM8GX4E8jcnWDMGBLI5CPlPk8x4QTL7TKrMg0Lb0Ihqo/
 yA7bVK9NBV0K5vMryx4tBgZEdjOJGCGESlBmAkx/wnsTlt7xuG6vy1KQ//wNY6Q4
 Y+iejoSKa4Dssqu+TC9Hm2VSPK3NJe2I+mmsNFqdFC0hV3/MmIvNzkE7P33qlK+H
 zofquzGziwAfOPGpDznqZi8rYUBiSceknHzoIa521jdwUNl1x1LKJAsYd0kaLH1i
 h3H5mUifouR3nwjexHYE2KYm3yGI+u3Pfupvdqa6kCTnRuw2Xqyh5OogAgHiuEYb
 SyHCatIA3QUOg8zuuh2IaKP8zeWkkVGvBAZ+O7jHytowMH9QWHq7p2lz/he8p6BF
 6Qk+lUKVsGLCdi2HK69lubT6odAJgMu9QAztfCVYgbPH3uxdOb93X9kODcoIDL6e
 T48uzDrTKyk26kRBiP8RNwpltJs12dfMo5W0I/BHI8WUB2EGweRKOPs0e2GAMH8k
 G4EsrJmUiN3CPWGxX/dZ3in6EFU0F+zc1S+MSDGgmp48RruRwq004V6tEylaRArY
 Wg6btK3xktiHCW6OqluKqzNVy4FDQrZfO6xdHMN5PI9jhFWbdP4=
 =fcET
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:
 "Address an NFSD regression introduced during the 6.0 merge window"

* tag 'nfsd-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix regression with setting ACLs.
2022-09-12 17:14:38 -04:00
Ronnie Sahlberg
7500a99281 cifs: revalidate mapping when doing direct writes
Kernel bugzilla: 216301

When doing direct writes we need to also invalidate the mapping in case
we have a cached copy of the affected page(s) in memory or else
subsequent reads of the data might return the old/stale content
before we wrote an update to the server.

Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-09-12 13:24:08 -05:00
Linus Torvalds
e35be05d74 Driver core fixes for 6.0-rc5
Here are some small driver core and debugfs fixes for 6.0-rc5.
 
 Included in here are:
   - multiple attempts to get the arch_topology code to work properly on
     non-cluster SMT systems.  First attempt caused build breakages in
     linux-next and 0-day, second try worked.
   - debugfs fixes for a long-suffering memory leak.  The pattern of
     debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so
     add debugfs_lookup_and_remove() to fix this problem.  Also fix up
     the scheduler debug code that highlighted this problem.  Fixes for
     other subsystems will be trickling in over the next few months for
     this same issue once the debugfs function is merged.
 
 All of these have been in linux-next since Wednesday with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYxuERw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylPqwCgjU6xlN2y/80HH+66k+yyzlxocE8AoLPgnGrA
 dJZIGWFXExzO26tvMT52
 =zGHA
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core and debugfs fixes for 6.0-rc5.

  Included in here are:

   - multiple attempts to get the arch_topology code to work properly on
     non-cluster SMT systems. First attempt caused build breakages in
     linux-next and 0-day, second try worked.

   - debugfs fixes for a long-suffering memory leak. The pattern of
     debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so
     add debugfs_lookup_and_remove() to fix this problem. Also fix up
     the scheduler debug code that highlighted this problem. Fixes for
     other subsystems will be trickling in over the next few months for
     this same issue once the debugfs function is merged.

  All of these have been in linux-next since Wednesday with no reported
  problems"

* tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  arch_topology: Make cluster topology span at least SMT CPUs
  sched/debug: fix dentry leak in update_sched_domain_debugfs
  debugfs: add debugfs_lookup_and_remove()
  driver core: fix driver_set_override() issue with empty strings
  Revert "arch_topology: Make cluster topology span at least SMT CPUs"
  arch_topology: Make cluster topology span at least SMT CPUs
2022-09-09 15:08:40 -04:00
Linus Torvalds
9b45094954 for-6.0-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmMaPukACgkQxWXV+ddt
 WDsWKw/+IcpMsb08sjudn4dtFQ3HSA1E+dOYDzXwUJTS7ZpZhLRniLe1XQwHxe4D
 7DUQA+e1RKGq4+TiznoLhaG/YCCcrLPZL/1aWhwO0M5Wj6BCIxSUa00BJNpxyBMw
 kWb9vQltc5w5zJXHeIr7m2ByzT+YIl0v1lf2GQrJVieHhGiKslfkJHLoJt49oJ0L
 9ka183VR/OCi/3uxUw6NMAjfv+0OGEsFZX/CF8Vo64IKg0I0Q248H4enZt43aDHA
 dQDapAyAr4f6RLDs6ULS2GSzKfZIKMLHlvSeg1BSPyUt/NZFVlC0VwVX0NmwP62a
 5NECYdimlQOGSlaahNEQpLIiyNYboi3Mq7m63BofWduDQanpnM1FByln9JVEizlm
 VuUs3+O0CMp81HecSk3VbSe3ukO2fqAdQjM5cdpRx30TYu7WRiYNE3aHchgLmXLP
 0zw9JV6ePg04Mstx+/3lo8D/X/7fMAT3NrqYmuImoekFWbdJfsiUtgdXNOglT9dt
 6lb1/0jBEbdiXnQ/jT1OreGwSdGZqkEKF4OE26kPRxURyTDESzglNVyhXmshIANC
 qnNuUFGea5d7LbyozYyfdcsQS7rEqLVKmUWrOb/3O/K1947/DegYodnhRwjCUSS7
 iUaetkYUWxHa7U9303KneCUAyLEf1S8NXRPIObL6YIw7D09wato=
 =WD7B
 -----END PGP SIGNATURE-----

Merge tag 'for-6.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes to zoned mode and one regression fix for chunk limit:

    - Zoned mode fixes:
        - fix how wait/wake up is done when finishing zone
        - fix zone append limit in emulated mode
        - fix mount on devices with conventional zones

   - fix regression, user settable data chunk limit got accidentally
     lowered and causes allocation problems on some profiles (raid0,
     raid1)"

* tag 'for-6.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix the max chunk size and stripe length calculation
  btrfs: zoned: fix mounting with conventional zones
  btrfs: zoned: set pseudo max append zone limit in zone emulation mode
  btrfs: zoned: fix API misuse of zone finish waiting
2022-09-09 07:54:19 -04:00
Linus Torvalds
460a75a6f7 Tracing fixes and updates for 6.0:
- Do not stop trace events in modules if TAINT_TEST is set
 
 - Do not clobber mount options when tracefs is mounted a second time
 
 - Prevent crash of kprobes in gate area
 
 - Add static annotation to some non global functions
 
 - Add some entries into the MAINTAINERS file
 
 - Fix check of event_mutex held when accessing trigger list
 
 - Add some __init/__exit annotations
 
 - Fix reporting of what called hardirq_{enable,disable}_ip function
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYxpcmRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsxpAQCJn2syotCcVN15NQc/1+bt3wceVqRK
 nOZXm1o5YnfNfQEAzngl+YFJ6YhBT68Uwz0U9i2hsl4tbc/VXzFfsCxweAQ=
 =sHO9
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Do not stop trace events in modules if TAINT_TEST is set

 - Do not clobber mount options when tracefs is mounted a second time

 - Prevent crash of kprobes in gate area

 - Add static annotation to some non global functions

 - Add some entries into the MAINTAINERS file

 - Fix check of event_mutex held when accessing trigger list

 - Add some __init/__exit annotations

 - Fix reporting of what called hardirq_{enable,disable}_ip function

* tag 'trace-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracefs: Only clobber mode/uid/gid on remount if asked
  kprobes: Prohibit probes in gate area
  rv/reactor: add __init/__exit annotations to module init/exit funcs
  tracing: Fix to check event_mutex is held while accessing trigger list
  tracing: hold caller_addr to hardirq_{enable,disable}_ip
  tracepoint: Allow trace events in modules with TAINT_TEST
  MAINTAINERS: add scripts/tracing/ to TRACING
  MAINTAINERS: Add Runtime Verification (RV) entry
  rv/monitors: Make monitor's automata definition static
2022-09-09 07:27:44 -04:00
NeilBrown
00801cd92d NFSD: fix regression with setting ACLs.
A recent patch moved ACL setting into nfsd_setattr().
Unfortunately it didn't work as nfsd_setattr() aborts early if
iap->ia_valid is 0.

Remove this test, and instead avoid calling notify_change() when
ia_valid is 0.

This means that nfsd_setattr() will now *always* lock the inode.
Previously it didn't if only a ATTR_MODE change was requested on a
symlink (see Commit 15b7a1b86d ("[PATCH] knfsd: fix setattr-on-symlink
error return")). I don't think this change really matters.

Fixes: c0cbe70742 ("NFSD: add posix ACLs to struct nfsd_attrs")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-08 17:53:24 -04:00
Brian Norris
47311db8e8 tracefs: Only clobber mode/uid/gid on remount if asked
Users may have explicitly configured their tracefs permissions; we
shouldn't overwrite those just because a second mount appeared.

Only clobber if the options were provided at mount time.

Note: the previous behavior was especially surprising in the presence of
automounted /sys/kernel/debug/tracing/.

Existing behavior:

  ## Pre-existing status: tracefs is 0755.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

  ## (Re)trigger the automount.
  # umount /sys/kernel/debug/tracing
  # stat -c '%A' /sys/kernel/debug/tracing/.
  drwx------

  ## Unexpected: the automount changed mode for other mount instances.
  # stat -c '%A' /sys/kernel/tracing/
  drwx------

New behavior (after this change):

  ## Pre-existing status: tracefs is 0755.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

  ## (Re)trigger the automount.
  # umount /sys/kernel/debug/tracing
  # stat -c '%A' /sys/kernel/debug/tracing/.
  drwxr-xr-x

  ## Expected: the automount does not change other mount instances.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

Link: https://lkml.kernel.org/r/20220826174353.2.Iab6e5ea57963d6deca5311b27fb7226790d44406@changeid

Cc: stable@vger.kernel.org
Fixes: 4282d60689 ("tracefs: Add new tracefs file system")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-08 17:10:54 -04:00
Anna Schumaker
d7a5118635 NFSv4.2: Update mode bits after ALLOCATE and DEALLOCATE
The fallocate call invalidates suid and sgid bits as part of normal
operation. We need to mark the mode bits as invalid when using fallocate
with an suid so these will be updated the next time the user looks at them.

This fixes xfstests generic/683 and generic/684.

Reported-by: Yue Cui <cuiyue-fnst@fujitsu.com>
Fixes: 913eca1aea ("NFS: Fallocate should use the nfs4_fattr_bitmap")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-09-08 11:11:23 -04:00
Linus Torvalds
26b1224903 Networking fixes for 6.0-rc5, including fixes from rxrpc, netfilter,
wireless and bluetooth subtrees
 
 Current release - regressions:
   - skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
 
   - bluetooth: fix regression preventing ACL packet transmission
 
 Current release - new code bugs:
   - dsa: microchip: fix kernel oops on ksz8 switches
 
   - dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
 
 Previous releases - regressions:
   - netfilter: clean up hook list when offload flags check fails
 
   - wifi: mt76: fix crash in chip reset fail
 
   - rxrpc: fix ICMP/ICMP6 error handling
 
   - ice: fix DMA mappings leak
 
   - i40e: fix kernel crash during module removal
 
 Previous releases - always broken:
   - ipv6: sr: fix out-of-bounds read when setting HMAC data.
 
   - tcp: TX zerocopy should not sense pfmemalloc status
 
   - sch_sfb: don't assume the skb is still around after enqueueing to child
 
   - netfilter: drop dst references before setting
 
   - wifi: wilc1000: fix DMA on stack objects
 
   - rxrpc: fix an insufficiently large sglist in rxkad_verify_packet_2()
 
   - fec: use a spinlock to guard `fep->ptp_clk_on`
 
 Misc:
   - usb: qmi_wwan: add Quectel RM520N
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmMZwOMSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkCVMP/jUOqbRTzPOqV7+HhDM65Ww9Prb1WYKS
 4o9Vi9DAL/DH8tXSDDtacE0GbtN6Mlr0kEYJE+lwn1hFsfGY1mzH2pcVjJqtZ7fh
 6o2joaQMJ5lJe4dsr2k0TRtYhCzdeCrvzoTs2qLSeb5KYFOsxMtBikSF3kOJ1TZq
 3bOK3OomiT/XO4FCnyPJvg8VBghkp69oNnefOavM8x7FzrMh6MY7VQem/IjnlIU2
 9sHvMPLai81B1NXu2eybThEYSqutZHzM1PLuqIMhSMwdiSrVRlLFq7/BNP0FxCTR
 /Mby+fnU4xIu+TK1bRoUM0fsipNceVEkXln4pQrRmxu1Yw62RFn5p+MGCM+Sc1UB
 8HZzjtNRtlD+ZfRyQKs2m1+WJtbuESyzAeJaKoQ2eO2StmLqlnZLEYGl6oVQj6Uj
 l3Bn1aBIOYrSM0T/qTNmAv7BTJYvpomlwqoQ5A1+5b0jKhn39Un3Yj3eU2aEv8eX
 mJ80KAWk26Cc7ZA8WbL70Ac8wV76+TcpWCezVvrcZqrFoYAvicLrwEc92Fq0IiG5
 bkA84zhd1TOFSuYVH7f79lSUPYHIs8FxFXIV8SZZGLbhj6nk607cwgbpA9IXizZT
 CUajNoJclS1tELPG7VOv6DJS+VnpdsCm0+q3dnYTYiLlrEX71Bgc/ofgGOBKTeS3
 iRyS2V0GLYoW
 =DeAB
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from rxrpc, netfilter, wireless and bluetooth
  subtrees.

  Current release - regressions:

   - skb: export skb drop reaons to user by TRACE_DEFINE_ENUM

   - bluetooth: fix regression preventing ACL packet transmission

  Current release - new code bugs:

   - dsa: microchip: fix kernel oops on ksz8 switches

   - dsa: qca8k: fix NULL pointer dereference for
     of_device_get_match_data

  Previous releases - regressions:

   - netfilter: clean up hook list when offload flags check fails

   - wifi: mt76: fix crash in chip reset fail

   - rxrpc: fix ICMP/ICMP6 error handling

   - ice: fix DMA mappings leak

   - i40e: fix kernel crash during module removal

  Previous releases - always broken:

   - ipv6: sr: fix out-of-bounds read when setting HMAC data.

   - tcp: TX zerocopy should not sense pfmemalloc status

   - sch_sfb: don't assume the skb is still around after
     enqueueing to child

   - netfilter: drop dst references before setting

   - wifi: wilc1000: fix DMA on stack objects

   - rxrpc: fix an insufficiently large sglist in
     rxkad_verify_packet_2()

   - fec: use a spinlock to guard `fep->ptp_clk_on`

  Misc:

   - usb: qmi_wwan: add Quectel RM520N"

* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  sch_sfb: Also store skb len before calling child enqueue
  net: phy: lan87xx: change interrupt src of link_up to comm_ready
  net/smc: Fix possible access to freed memory in link clear
  net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
  net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
  net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
  net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
  net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
  net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
  net: usb: qmi_wwan: add Quectel RM520N
  net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
  tcp: fix early ETIMEDOUT after spurious non-SACK RTO
  stmmac: intel: Simplify intel_eth_pci_remove()
  net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
  ipv6: sr: fix out-of-bounds read when setting HMAC data.
  bonding: accept unsolicited NA message
  bonding: add all node mcast address when slave up
  bonding: use unspecified address if no available link local address
  wifi: use struct_group to copy addresses
  wifi: mac80211_hwsim: check length for virtio packets
  ...
2022-09-08 08:15:01 -04:00
David Howells
0066f1b0e2 afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked
When trying to get a file lock on an AFS file, the server may return
UAEAGAIN to indicate that the lock is already held.  This is currently
translated by the default path to -EREMOTEIO.

Translate it instead to -EAGAIN so that we know we can retry it.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey E Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166075761334.3533338.2591992675160918098.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-06 21:33:01 -04:00
Linus Torvalds
d2ec799d1c Changes since last update:
- Fix return codes in erofs_fscache_{meta_,}read_folio error paths;
 
  - Fix potential wrong pcluster sizes for later non-4K lclusters;
 
  - Fix in-memory pcluster use-after-free on UP platforms.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYxdnPREceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBGzxAQCvIteTLkD2HWvvflFcrby6o1vWxoJSDFig
 J3QZUgH/TAEAvR6Jyl4oL4/kwtw+2ZuWeXnMI9FXBfz9pyDR/MiCQQw=
 =p6+J
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.0-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Fix return codes in erofs_fscache_{meta_,}read_folio error paths

 - Fix potential wrong pcluster sizes for later non-4K lclusters

 - Fix in-memory pcluster use-after-free on UP platforms

* tag 'erofs-for-6.0-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix pcluster use-after-free on UP platforms
  erofs: avoid the potentially wrong m_plen for big pcluster
  erofs: fix error return code in erofs_fscache_{meta_,}read_folio
2022-09-06 15:19:45 -04:00
Qu Wenruo
5da431b71d btrfs: fix the max chunk size and stripe length calculation
[BEHAVIOR CHANGE]
Since commit f6fca3917b ("btrfs: store chunk size in space-info
struct"), btrfs no longer can create larger data chunks than 1G:

  mkfs.btrfs -f -m raid1 -d raid0 $dev1 $dev2 $dev3 $dev4
  mount $dev1 $mnt

  btrfs balance start --full $mnt
  btrfs balance start --full $mnt
  umount $mnt

  btrfs ins dump-tree -t chunk $dev1 | grep "DATA|RAID0" -C 2

Before that offending commit, what we got is a 4G data chunk:

	item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 9492758528) itemoff 15491 itemsize 176
		length 4294967296 owner 2 stripe_len 65536 type DATA|RAID0
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 4 sub_stripes 1

Now what we got is only 1G data chunk:

	item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 6271533056) itemoff 15491 itemsize 176
		length 1073741824 owner 2 stripe_len 65536 type DATA|RAID0
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 4 sub_stripes 1

This will increase the number of data chunks by the number of devices,
not only increase system chunk usage, but also greatly increase mount
time.

Without a proper reason, we should not change the max chunk size.

[CAUSE]
Previously, we set max data chunk size to 10G, while max data stripe
length to 1G.

Commit f6fca3917b ("btrfs: store chunk size in space-info struct")
completely ignored the 10G limit, but use 1G max stripe limit instead,
causing above shrink in max data chunk size.

[FIX]
Fix the max data chunk size to 10G, and in decide_stripe_size_regular()
we limit stripe_size to 1G manually.

This should only affect data chunks, as for metadata chunks we always
set the max stripe size the same as max chunk size (256M or 1G
depending on fs size).

Now the same script result the same old result:

	item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 9492758528) itemoff 15491 itemsize 176
		length 4294967296 owner 2 stripe_len 65536 type DATA|RAID0
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 4 sub_stripes 1

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Fixes: f6fca3917b ("btrfs: store chunk size in space-info struct")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-06 17:49:58 +02:00
Gao Xiang
2f44013e39 erofs: fix pcluster use-after-free on UP platforms
During stress testing with CONFIG_SMP disabled, KASAN reports as below:

==================================================================
BUG: KASAN: use-after-free in __mutex_lock+0xe5/0xc30
Read of size 8 at addr ffff8881094223f8 by task stress/7789

CPU: 0 PID: 7789 Comm: stress Not tainted 6.0.0-rc1-00002-g0d53d2e882f9 #3
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
 <TASK>
..
 __mutex_lock+0xe5/0xc30
..
 z_erofs_do_read_page+0x8ce/0x1560
..
 z_erofs_readahead+0x31c/0x580
..
Freed by task 7787
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x20/0x30
 kasan_set_free_info+0x20/0x40
 __kasan_slab_free+0x10c/0x190
 kmem_cache_free+0xed/0x380
 rcu_core+0x3d5/0xc90
 __do_softirq+0x12d/0x389

Last potentially related work creation:
 kasan_save_stack+0x1e/0x40
 __kasan_record_aux_stack+0x97/0xb0
 call_rcu+0x3d/0x3f0
 erofs_shrink_workstation+0x11f/0x210
 erofs_shrink_scan+0xdc/0x170
 shrink_slab.constprop.0+0x296/0x530
 drop_slab+0x1c/0x70
 drop_caches_sysctl_handler+0x70/0x80
 proc_sys_call_handler+0x20a/0x2f0
 vfs_write+0x555/0x6c0
 ksys_write+0xbe/0x160
 do_syscall_64+0x3b/0x90

The root cause is that erofs_workgroup_unfreeze() doesn't reset to
orig_val thus it causes a race that the pcluster reuses unexpectedly
before freeing.

Since UP platforms are quite rare now, such path becomes unnecessary.
Let's drop such specific-designed path directly instead.

Fixes: 73f5c66df3 ("staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220902045710.109530-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05 23:23:30 +08:00
Yue Hu
ea0b7b0d59 erofs: avoid the potentially wrong m_plen for big pcluster
Actually, 'compressedlcs' stores compressed block count rather than
lcluster count. Therefore, the number of bits for shifting the count
should be 'LOG_BLOCK_SIZE' rather than 'lclusterbits' although current
lcluster size is 4K.

The value of 'm_plen' will be wrong once we enable the non 4K-sized
lcluster.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220812060150.8510-1-huyue2@coolpad.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05 23:22:01 +08:00
Sun Ke
5bd9628b78 erofs: fix error return code in erofs_fscache_{meta_,}read_folio
If erofs_fscache_alloc_request fail and then goto out, it will return 0.
it should return a negative error code instead of 0.

Fixes: d435d53228 ("erofs: change to use asynchronous io for fscache readpage/readahead")
Signed-off-by: Sun Ke <sunke32@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220815034829.3940803-1-sunke32@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05 23:21:15 +08:00