Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: MAINTAINERS, binfmt, and
mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
hugetlb, vmalloc, and kmemleak)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
hugetlb: do not demote poisoned hugetlb pages
mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
mm: fix unexpected zeroed page mapping with zram swap
mm, page_alloc: fix build_zonerefs_node()
mm, kfence: support kmem_dump_obj() for KFENCE objects
kasan: fix hw tags enablement when KUNIT tests are disabled
irq_work: use kasan_record_aux_stack_noalloc() record callstack
mm/secretmem: fix panic when growing a memfd_secret
tmpfs: fix regressions from wider use of ZERO_PAGE
MAINTAINERS: Broadcom internal lists aren't maintainers
Pull io_uring fixes from Jens Axboe:
- Ensure we check and -EINVAL any use of reserved or struct padding.
Although we generally always do that, it's missed in two spots for
resource updates, one for the ring fd registration from this merge
window, and one for the extended arg. Make sure we have all of them
handled. (Dylan)
- A few fixes for the deferred file assignment (me, Pavel)
- Add a feature flag for the deferred file assignment so apps can tell
we handle it correctly (me)
- Fix a small perf regression with the current file position fix in
this merge window (me)
* tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block:
io_uring: abort file assignment prior to assigning creds
io_uring: fix poll error reporting
io_uring: fix poll file assign deadlock
io_uring: use right issue_flags for splice/tee
io_uring: verify pad field is 0 in io_get_ext_arg
io_uring: verify resv is 0 in ringfd register/unregister
io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
io_uring: move io_uring_rsrc_update2 validation
io_uring: fix assign file locking issue
io_uring: stop using io_wq_work as an fd placeholder
io_uring: move apoll->events cache
io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
io_uring: flag the fact that linked file assignment is sane
If we (re-)calculate the file system overhead amount and it's
different from the on-disk s_overhead_clusters value, update the
on-disk version since this can take potentially quite a while on
bigalloc file systems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
If the file system does not use bigalloc, calculating the overhead is
cheap, so force the recalculation of the overhead so we don't have to
trust the precalculated overhead in the superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size.
If fat/exfat is a local share, ->f_bsize is a cluster size that is too
large to be used as a sector size. Sector sizes larger than 4K cause
problem occurs when mounting an iso file through windows client.
The error message can be obtained using Mount-DiskImage command,
the error is:
"Mount-DiskImage : The sector size of the physical disk on which the
virtual disk resides is not supported."
This patch reports fixed 4KB sector size if ->s_blocksize is bigger
than 4KB.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
If the filename is change by underlying rename the server, fp->filename
and real filename can be different. This patch remove the uses of
fp->filename in ksmbd and replace it with d_path().
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks. With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Pull cifs fixes from Steve French:
- two fixes related to unmount
- symlink overflow fix
- minor netfs fix
- improved tracing for crediting (flow control)
* tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: verify that tcon is valid before dereference in cifs_kill_sb
cifs: potential buffer overflow in handling symlinks
cifs: Split the smb3_add_credits tracepoint
cifs: release cached dentries only if mount is complete
cifs: Check the IOCB_DIRECT flag, not O_DIRECT
When asked to create a path ending '/', but which is not to be a
directory (LOOKUP_DIRECTORY not set), filename_create() will never try
to create the file. If it doesn't exist, -ENOENT is reported.
However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
->lookup() function, even though there is no intent to create. This is
misleading and can cause incorrect behaviour.
If you try
ln -s foo /path/dir/
where 'dir' is a directory on an NFS filesystem which is not currently
known in the dcache, this will fail with ENOENT.
But as the name is not in the dcache, nfs_lookup gets called with
LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
lookup, with the expectation that a subsequent call to create the target
will be made, and the lookup can be combined with the creation. In the
case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
made. Instead filename_create() sees that the dentry is not (yet)
positive and returns -ENOENT - even though the directory actually
exists.
So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
create, and use the absence of these flags to decide if -ENOENT should
be returned.
Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
split that out and store it in 'reval_flag'. __lookup_hash() then gets
reval_flag combined with whatever create flags were determined to be
needed.
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull btrfs fixes from David Sterba:
"A few more code and warning fixes.
There's one feature ioctl removal patch slated for 5.18 that did not
make it to the main pull request. It's just a one-liner and the ioctl
has a v2 that's in use for a long time, no point to postpone it to
5.19.
Late update:
- remove balance v1 ioctl, superseded by v2 in 2012
Fixes:
- add back cgroup attribution for compressed writes
- add super block write start/end annotations to asynchronous balance
- fix root reference count on an error handling path
- in zoned mode, activate zone at the chunk allocation time to avoid
ENOSPC due to timing issues
- fix delayed allocation accounting for direct IO
Warning fixes:
- simplify assertion condition in zoned check
- remove an unused variable"
* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix btrfs_submit_compressed_write cgroup attribution
btrfs: fix root ref counts in error handling in btrfs_get_root_ref
btrfs: zoned: activate block group only for extent allocation
btrfs: return allocated block group from do_chunk_alloc()
btrfs: mark resumed async balance as writing
btrfs: remove support of balance v1 ioctl
btrfs: release correct delalloc amount in direct IO write path
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
Pull fscache fixes from David Howells:
"Here's a collection of fscache and cachefiles fixes and misc small
cleanups. The two main fixes are:
- Add a missing unmark of the inode in-use mark in an error path.
- Fix a KASAN slab-out-of-bounds error when setting the xattr on a
cachefiles volume due to the wrong length being given to memcpy().
In addition, there's the removal of an unused parameter, removal of an
unused Kconfig option, conditionalising a bit of procfs-related stuff
and some doc fixes"
* tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: remove FSCACHE_OLD_API Kconfig option
fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
fscache: Remove the cookie parameter from fscache_clear_page_bits()
docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
cachefiles: unmark inode in use in error path
Smatch printed a warning:
arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error:
__memcpy() 'dctx->buf' too small (16 vs u32max)
It's caused because Smatch marks 'link_len' as untrusted since it comes
from sscanf(). Add a check to ensure that 'link_len' is not larger than
the size of the 'link_str' buffer.
Fixes: c69c1b6eae ("cifs: implement CIFSParseMFSymlink()")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We got issue as follows:
[home]# fsck.ext4 -fn ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no
Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no
As the symlink file size does not match the file content. If the writeback
of the symlink data block failed, ext4_finish_bio() handles the end of IO.
However this function fails to mark the buffer with BH_write_io_error and
so when unmount does journal checkpoint it cannot detect the writeback
error and will cleanup the journal. Thus we've lost the correct data in the
journal area. To solve this issue, mark the buffer as BH_write_io_error in
ext4_finish_bio().
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files. This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range). Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.
The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Pull nfsd fixes from Chuck Lever:
- Fix a write performance regression
- Fix crashes during request deferral on RDMA transports
* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Fix the svc_deferred_event trace class
SUNRPC: Fix NFSD's request deferral on RDMA transports
nfsd: Clean up nfsd_file_put()
nfsd: Fix a write performance regression
SUNRPC: Return true/false (not 1/0) from bool functions
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.
This patch fixes struct compat_stat to match struct stat.
[ Historical note: the old x86 'struct stat' did have that 16-bit field
that the compat layer had kept around, but it was changes back in 2003
by "struct stat - support larger dev_t":
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d
and back in those days, the x86_64 port was still new, and separate
from the i386 code, and had already picked up the old version with a
16-bit st_dev field ]
Note that we can't change compat_dev_t because it is used by
compat_loop_info.
Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them. This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two reasons why this isn't the best idea:
- It's an odd area to grab a bit of storage space, hence it's an odd area
to grab storage from.
- It puts the 3rd io_kiocb cacheline into the hot path, where normal hot
path just needs the first two.
Use 'cflags' for joint fd/cflags storage. We only need fd until we
successfully issue, and we only need cflags once a request is done and is
completed.
Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for fixing a regression with pulling in an extra cacheline
for IO that doesn't usually touch the last cacheline of the io_kiocb,
move the cached location of apoll->events to space shared with some other
completion data. Like cflags, this isn't used until after the request
has been completed, so we can piggy back on top of comp_list.
Fixes: 81459350d5 ("io_uring: cache req->apoll->events in req->cflags")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-1 tells use to use the current position, but we check if the file is
a stream regardless of that. Fix up io_kiocb_update_pos() to only
dip into file if we need to. This is both more efficient and also drops
12 bytes of text on aarch64 and 64 bytes on x86-64.
Fixes: b4aec40015 ("io_uring: do not recalculate ppos unnecessarily")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Give applications a way to tell if the kernel supports sane linked files,
as in files being assigned at the right time to be able to reliably
do <open file direct into slot X><read file from slot X> while using
IOSQE_IO_LINK to order them.
Not really a bug fix, but flag it as such so that it gets pulled in with
backports of the deferred file assignment.
Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull driver core updates from Greg KH:
"Here are two small driver core changes for 5.18-rc2.
They are the final bits in the removal of the default_attrs field in
struct kobj_type. I had to wait until after 5.18-rc1 for all of the
changes to do this came in through different development trees, and
then one new user snuck in. So this series has two changes:
- removal of the default_attrs field in the powerpc/pseries/vas code.
The change has been acked by the PPC maintainers to come through
this tree
- removal of default_attrs from struct kobj_type now that all
in-kernel users are removed.
This cleans up the kobject code a little bit and removes some
duplicated functionality that confused people (now there is only
one way to do default groups)
Both of these have been in linux-next for all of this week with no
reported problems"
* tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kobject: kobj_type: remove default_attrs
powerpc/pseries/vas: use default_groups in kobj_type
Pull io_uring fixes from Jens Axboe:
"A bit bigger than usual post merge window, largely due to a revert and
a fix of at what point files are assigned for requests.
The latter fixing a linked request use case where a dependent link can
rely on what file is assigned consistently.
Summary:
- 32-bit compat fix for IORING_REGISTER_IOWQ_AFF (Eugene)
- File assignment fixes (me)
- Revert of the NAPI poll addition from this merge window. The author
isn't available right now to engage on this, so let's revert it and
we can retry for the 5.19 release (me, Jakub)
- Fix a timeout removal race (me)
- File update and SCM fixes (Pavel)"
* tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-block:
io_uring: fix race between timeout flush and removal
io_uring: use nospec annotation for more indexes
io_uring: zero tag on rsrc removal
io_uring: don't touch scm_fp_list after queueing skb
io_uring: nospec index for tags on files update
io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
Revert "io_uring: Add support for napi_busy_poll"
io_uring: drop the old style inflight file tracking
io_uring: defer file assignment
io_uring: propagate issue_flags state down to file assignment
io_uring: move read/write file prep state into actual opcode handler
io_uring: defer splice/tee file validity check until command issue
io_uring: don't check req->file in io_fsync_prep()
Split the smb3_add_credits tracepoint to make it more obvious when looking
at the logs which line corresponds to what credit change. Also add a
tracepoint for credit overflow when it's being added back.
Note that it might be better to add another field to the tracepoint for
the information rather than splitting it. It would also be useful to store
the MID potentially, though that isn't available when the credits are first
obtained.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cifs@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Use the actual length of volume coherency data when setting the
xattr to avoid the following KASAN report.
BUG: KASAN: slab-out-of-bounds in cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
Write of size 4 at addr ffff888101e02af4 by task kworker/6:0/1347
CPU: 6 PID: 1347 Comm: kworker/6:0 Kdump: loaded Not tainted 5.18.0-rc1-nfs-fscache-netfs+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
Workqueue: events fscache_create_volume_work [fscache]
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x5a
print_report.cold+0x5e/0x5db
? __lock_text_start+0x8/0x8
? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
kasan_report+0xab/0x120
? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
kasan_check_range+0xf5/0x1d0
memcpy+0x39/0x60
cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
cachefiles_acquire_volume+0x2be/0x500 [cachefiles]
? __cachefiles_free_volume+0x90/0x90 [cachefiles]
fscache_create_volume_work+0x68/0x160 [fscache]
process_one_work+0x3b7/0x6a0
worker_thread+0x2c4/0x650
? process_one_work+0x6a0/0x6a0
kthread+0x16c/0x1a0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Allocated by task 1347:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
cachefiles_set_volume_xattr+0x76/0x350 [cachefiles]
cachefiles_acquire_volume+0x2be/0x500 [cachefiles]
fscache_create_volume_work+0x68/0x160 [fscache]
process_one_work+0x3b7/0x6a0
worker_thread+0x2c4/0x650
kthread+0x16c/0x1a0
ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff888101e02af0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 4 bytes inside of
8-byte region [ffff888101e02af0, ffff888101e02af8)
The buggy address belongs to the physical page:
page:00000000a2292d70 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101e02
flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0000200 0000000000000000 dead000000000001 ffff888100042280
raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888101e02980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc
ffff888101e02a00: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00
>ffff888101e02a80: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 04 fc
^
ffff888101e02b00: fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc
ffff888101e02b80: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc
==================================================================
Fixes: 413a4a6b0b "cachefiles: Fix volume coherency attribute"
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/20220405134649.6579-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20220405142810.8208-1-dwysocha@redhat.com/ # Incorrect v2
io_flush_timeouts() assumes the timeout isn't in progress of triggering
or being removed/canceled, so it unconditionally removes it from the
timeout list and attempts to cancel it.
Leave it on the list and let the normal timeout cancelation take care
of it.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
Bugfixes:
- Fix an Oopsable condition due to SLAB_ACCOUNT setting in the
NFSv4.2 xattr code.
- Fix for open() using an file open mode of '3' in NFSv4
- Replace readdir's use of xxhash() with hash_64()
- Several patches to handle malloc() failure in SUNRPC"
* tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()
SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
SUNRPC: Handle allocation failure in rpc_new_task()
NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
SUNRPC: Handle low memory situations in call_status()
SUNRPC: Handle ENOMEM in call_transmit_status()
NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
NFS: Replace readdir's use of xxhash() with hash_64()
SUNRPC: handle malloc failure in ->request_prepare
NFSv4: fix open failure with O_ACCMODE flag
Revert "NFSv4: Handle the special Linux file open access mode"
During cifs_kill_sb, we first dput all the dentries that we have cached.
However this function can also get called for mount failures.
So dput the cached dentries only if the filesystem mount is complete.
i.e. cifs_sb->root is populated.
Fixes: 5e9c89d43f ("cifs: Grab a reference for the dentry of the cached directory during the lifetime of the cache")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>