Use the superblock's UUID to generate the fsid when it's non-null.
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
This adds a special global buffer pool (in the end) for reserved pages.
Using a reserved pool for LZ4 decompression significantly reduces the
time spent on extra temporary page allocation for the extreme cases in
low memory scenarios.
The table below shows the reduction in time spent on page allocation for
LZ4 decompression when using a reserved pool. The results were obtained
from multi-app launch benchmarks on ARM64 Android devices running the
5.15 kernel with an 8-core CPU and 8GB of memory. In the benchmark, we
launched 16 frequently-used apps, and the camera app was the last one in
each round. The data in the table is the average time of camera app for
each round.
After using the reserved pool, there was an average improvement of 150ms
in the overall launch time of our camera app, which was obtained from
the systrace log.
+--------------+---------------+--------------+---------+
| | w/o page pool | w/ page pool | diff |
+--------------+---------------+--------------+---------+
| Average (ms) | 3434 | 21 | -99.38% |
+--------------+---------------+--------------+---------+
Based on the benchmark logs, 64 pages are sufficient for 95% of
scenarios. This value can be adjusted with a module parameter
`reserved_pages`. The default value is 0.
This pool is currently only used for the LZ4 decompressor, but it can be
applied to more decompressors if needed.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240402131523.2703948-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Let's use alloc_pages_bulk_array() for simplicity and get rid of
unnecessary pagepool.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240402092757.2635257-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
It will cost more time if compressed buffers are allocated on demand for
low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep
compressed data if in-place decompression is unfulfilled. While it is kind
of wasteful of memory for a device with hundreds of CPUs, and only a small
number of CPUs concurrently decompress most of the time.
This patch renames it as 'global buffer pool' and makes it configurable.
This allows two or more CPUs to share a common buffer to reduce memory
occupation.
Suggested-by: Gao Xiang <xiang@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Currently, utils.c is only useful if CONFIG_EROFS_FS_ZIP is on.
So let's rename it to zutil.c as well as avoid its inclusion if
CONFIG_EROFS_FS_ZIP is explicitly disabled.
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240401135550.2550043-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
epoll can call out to vfs_poll() with a file pointer that may race with
the last 'fput()'. That would make f_count go down to zero, and while
the ep->mtx locking means that the resulting file pointer tear-down will
be blocked until the poll returns, it means that f_count is already
dead, and any use of it won't actually get a reference to the file any
more: it's dead regardless.
Make sure we have a valid ref on the file pointer before we call down to
vfs_poll() from the epoll routines.
Link: https://lore.kernel.org/lkml/0000000000002d631f0615918f1e@google.com/
Reported-by: syzbot+045b454ab35fd82a35fb@syzkaller.appspotmail.com
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix RCU callback of freeing an eventfs_inode.
The freeing of the eventfs_inode from the kref going to zero
freed the contents of the eventfs_inode and then used kfree_rcu()
to free the inode itself. But the contents should also be protected
by RCU. Switch to a call_rcu() that calls a function to free all
of the eventfs_inode after the RCU synchronization.
- The tracing subsystem maps its own descriptor to a file represented by
eventfs. The freeing of this descriptor needs to know when the
last reference of an eventfs_inode is released, but currently
there is no interface for that. Add a "release" callback to
the eventfs_inode entry array that allows for freeing of data
that can be referenced by the eventfs_inode being opened.
Then increment the ref counter for this descriptor when the
eventfs_inode file is created, and decrement/free it when the
last reference to the eventfs_inode is released and the file
is removed. This prevents races between freeing the descriptor
and the opening of the eventfs file.
- Fix the permission processing of eventfs.
The change to make the permissions of eventfs default to the mount
point but keep track of when changes were made had a side effect
that could cause security concerns. When the tracefs is remounted
with a given gid or uid, all the files within it should inherit
that gid or uid. But if the admin had changed the permission of
some file within the tracefs file system, it would not get updated
by the remount. This caused the kselftest of file permissions
to fail the second time it is run. The first time, all changes
would look fine, but the second time, because the changes were
"saved", the remount did not reset them.
Create a link list of all existing tracefs inodes, and clear the
saved flags on them on a remount if the remount changes the
corresponding gid or uid fields.
This also simplifies the code by removing the distinction between the
toplevel eventfs and an instance eventfs. They should both act the
same. They were different because of a misconception due to the
remount not resetting the flags. Now that remount resets all the
files and directories to default to the root node if a uid/gid is
specified, it makes the logic simpler to implement.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZjXxzxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqzGAQCX8g7gtngGgwSsWqPW5GmecCifwFja
k7cVEDhMYPnDeAEAkYi2ZBgJRkPsWPfMRClDK/DXP4woOo58asxtIxfTMgg=
=mCkt
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing and tracefs fixes from Steven Rostedt:
- Fix RCU callback of freeing an eventfs_inode.
The freeing of the eventfs_inode from the kref going to zero freed
the contents of the eventfs_inode and then used kfree_rcu() to free
the inode itself. But the contents should also be protected by RCU.
Switch to a call_rcu() that calls a function to free all of the
eventfs_inode after the RCU synchronization.
- The tracing subsystem maps its own descriptor to a file represented
by eventfs. The freeing of this descriptor needs to know when the
last reference of an eventfs_inode is released, but currently there
is no interface for that.
Add a "release" callback to the eventfs_inode entry array that allows
for freeing of data that can be referenced by the eventfs_inode being
opened. Then increment the ref counter for this descriptor when the
eventfs_inode file is created, and decrement/free it when the last
reference to the eventfs_inode is released and the file is removed.
This prevents races between freeing the descriptor and the opening of
the eventfs file.
- Fix the permission processing of eventfs.
The change to make the permissions of eventfs default to the mount
point but keep track of when changes were made had a side effect that
could cause security concerns. When the tracefs is remounted with a
given gid or uid, all the files within it should inherit that gid or
uid. But if the admin had changed the permission of some file within
the tracefs file system, it would not get updated by the remount.
This caused the kselftest of file permissions to fail the second time
it is run. The first time, all changes would look fine, but the
second time, because the changes were "saved", the remount did not
reset them.
Create a link list of all existing tracefs inodes, and clear the
saved flags on them on a remount if the remount changes the
corresponding gid or uid fields.
This also simplifies the code by removing the distinction between the
toplevel eventfs and an instance eventfs. They should both act the
same. They were different because of a misconception due to the
remount not resetting the flags. Now that remount resets all the
files and directories to default to the root node if a uid/gid is
specified, it makes the logic simpler to implement.
* tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Have "events" directory get permissions from its parent
eventfs: Do not treat events directory different than other directories
eventfs: Do not differentiate the toplevel events directory
tracefs: Still use mount point as default permissions for instances
tracefs: Reset permissions on remount if permissions are options
eventfs: Free all of the eventfs_inode after RCU
eventfs/tracing: Add callback for release of an eventfs_inode
The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.
Currently the behavior is:
# cd /sys/kernel/tracing
# chgrp 1002 instances
# mkdir instances/foo
# ls -l instances/foo
[..]
-r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 18:55 current_tracer
-rw-r----- 1 root lkp 0 May 1 18:55 error_log
drwxr-xr-x 1 root root 0 May 1 18:55 events
--w------- 1 root lkp 0 May 1 18:55 free_buffer
drwxr-x--- 2 root lkp 0 May 1 18:55 options
drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu
-rw-r----- 1 root lkp 0 May 1 18:55 set_event
All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.
Have the "events" directory make its default value based on its parent's
permissions. That now gives:
# ls -l instances/foo
[..]
-rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb
-r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:16 current_tracer
-rw-r----- 1 root lkp 0 May 1 21:16 error_log
drwxr-xr-x 1 root lkp 0 May 1 21:16 events
--w------- 1 root lkp 0 May 1 21:16 free_buffer
drwxr-x--- 2 root lkp 0 May 1 21:16 options
drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu
-rw-r----- 1 root lkp 0 May 1 21:16 set_event
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Treat the events directory the same as other directories when it comes to
permissions. The events directory was considered different because it's
dentry is persistent, whereas the other directory dentries are created
when accessed. But the way tracefs now does its ownership by using the
root dentry's permissions as the default permissions, the events directory
can get out of sync when a remount is performed setting the group and user
permissions.
Remove the special case for the events directory on setting the
attributes. This allows the updates caused by remount to work properly as
well as simplifies the code.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.002923579@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The toplevel events directory is really no different than the events
directory of instances. Having the two be different caused
inconsistencies and made it harder to fix the permissions bugs.
Make all events directories act the same.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.846448710@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If the instances directory's permissions were never change, then have it
and its children use the mount point permissions as the default.
Currently, the permissions of instance directories are determined by the
instance directory's permissions itself. But if the tracefs file system is
remounted and changes the permissions, the instance directory and its
children should use the new permission.
But because both the instance directory and its children use the instance
directory's inode for permissions, it misses the update.
To demonstrate this:
# cd /sys/kernel/tracing/
# mkdir instances/foo
# ls -ld instances/foo
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 18:57 current_tracer
# mount -o remount,gid=1002 .
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo/
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 18:57 current_tracer
Notice that changing the group id to that of "lkp" did not affect the
instances directory nor its children. It should have been:
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 19:19 current_tracer
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:25 instances/foo/
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 19:19 instances
# mount -o remount,gid=1002 .
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 19:19 current_tracer
# ls -ld instances
drwxr-x--- 3 root lkp 0 May 1 19:19 instances
# ls -ld instances/foo/
drwxr-x--- 5 root lkp 0 May 1 19:25 instances/foo/
Where all files were updated by the remount gid update.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.686838327@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
There's an inconsistency with the way permissions are handled in tracefs.
Because the permissions are generated when accessed, they default to the
root inode's permission if they were never set by the user. If the user
sets the permissions, then a flag is set and the permissions are saved via
the inode (for tracefs files) or an internal attribute field (for
eventfs).
But if a remount happens that specify the permissions, all the files that
were not changed by the user gets updated, but the ones that were are not.
If the user were to remount the file system with a given permission, then
all files and directories within that file system should be updated.
This can cause security issues if a file's permission was updated but the
admin forgot about it. They could incorrectly think that remounting with
permissions set would update all files, but miss some.
For example:
# cd /sys/kernel/tracing
# chgrp 1002 current_tracer
# ls -l
[..]
-rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root root 0 May 1 21:25 dynamic_events
-r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root root 0 May 1 21:25 enabled_functions
Where current_tracer now has group "lkp".
# mount -o remount,gid=1001 .
# ls -l
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events
-r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root tracing 0 May 1 21:25 enabled_functions
Everything changed but the "current_tracer".
Add a new link list that keeps track of all the tracefs_inodes which has
the permission flags that tell if the file/dir should use the root inode's
permission or not. Then on remount, clear all the flags so that the
default behavior of using the root inode's permission is done for all
files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.529542160@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The freeing of eventfs_inode via a kfree_rcu() callback. But the content
of the eventfs_inode was being freed after the last kref. This is
dangerous, as changes are being made that can access the content of an
eventfs_inode from an RCU loop.
Instead of using kfree_rcu() use call_rcu() that calls a function to do
all the freeing of the eventfs_inode after a RCU grace period has expired.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.370261163@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 43aa6f97c2 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done
Script 'B':
echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.
Script 'B' removes all synthetic events (including the newly created
"hello" event).
What happens is that the opening of the "enable" file has:
{
struct trace_event_file *file = inode->i_private;
int ret;
ret = tracing_check_open_get_tr(file->tr);
[..]
But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.
This will protect the tracing file from being freed while a eventfs file
that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.Wu@mediatek.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <Tze-nan.Wu@mediatek.com>
Tested-by: Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYzivoACgkQxWXV+ddt
WDu4TxAAgK+W1RSvrc2xe6MfHFMi2x2pL2qM0IEcYbmjNZJDQlmGYNj3jILho62/
/mHyA5skMr9hN58FFUJveiBj3qOds/lZD0640sGGpysFJKzA4/Wdg5xJvpsQtyDM
jr6BcgZOQ+j7Pqe7zsm/sc0n5yG4P+cydnlCFMNvpRfZjg1kYIV9F92qEPAHtLCx
BoDJyHhCEqFWWyH2nALu3syTHyvGECUCBEHLFgyGcG/IXT6Oq/BpsDZPm1j72NCt
9f58OY7/2R9QJYfCjYidFGnr3EYdI5CnCOtR2sQLcRUOISOOQSni52r5tonPdpm2
7QRPyuXTiVxpM909phGJt5wwyssK/JQgxUjUo3s0U04+qXb3cRoJny3vAcGcnuyk
W7lYh08QRQa3dzZ/Q+GFxqPPovdZalTHXYMAYP7QGwLuv+fZkqh39oz6LQfw7F7c
JxEjuSCSd8lJpFyIDkirZF9lELurjgt0Zn3RNe25BLiBpeqFvTQdAYGo5wML3Ug0
kHSmZVFC2En8Ad2AahpkGToVKGgUumo4RAZDiRGIUaHEoS7XfBbnPOAtC7Z1RKTS
9N++XVtJ1/uYQiLM5afiZRtUTkA/jqjSNH/v3YYTS18SczKEOWlHnpJeQSWK0rD1
rzbKZ+2MhBL5CGQnwkhUi0u07QorvMkQhWCHpf9au9rtUggg+nU=
=zEs6
-----END PGP SIGNATURE-----
Merge tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- set correct ram_bytes when splitting ordered extent. This can be
inconsistent on-disk but harmless as it's not used for calculations
and it's only advisory for compression
- fix lockdep splat when taking cleaner mutex in qgroups disable ioctl
- fix missing mutex unlock on error path when looking up sys chunk for
relocation
* tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: set correct ram_bytes when splitting ordered extent
btrfs: take the cleaner_mutex earlier in qgroup disable
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
[BUG]
When running generic/287, the following file extent items can be
generated:
item 16 key (258 EXTENT_DATA 2682880) itemoff 15305 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 1378414592 nr 462848
extent data offset 0 nr 462848 ram 2097152
extent compression 0 (none)
Note that file extent item is not a compressed one, but its ram_bytes is
way larger than its disk_num_bytes.
According to btrfs on-disk scheme, ram_bytes should match disk_num_bytes
if it's not a compressed one.
[CAUSE]
Since commit b73a6fd1b1 ("btrfs: split partial dio bios before
submit"), for partial dio writes, we would split the ordered extent.
However the function btrfs_split_ordered_extent() doesn't update the
ram_bytes even it has already shrunk the disk_num_bytes.
Originally the function btrfs_split_ordered_extent() is only introduced
for zoned devices in commit d22002fd37 ("btrfs: zoned: split ordered
extent when bio is sent"), but later commit b73a6fd1b1 ("btrfs: split
partial dio bios before submit") makes non-zoned btrfs affected.
Thankfully for un-compressed file extent, we do not really utilize the
ram_bytes member, thus it won't cause any real problem.
[FIX]
Also update btrfs_ordered_extent::ram_bytes inside
btrfs_split_ordered_extent().
Fixes: d22002fd37 ("btrfs: zoned: split ordered extent when bio is sent")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Bugfixes:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmYv7PEACgkQZwvnipYK
APKcgA//XE7iII6qQU9IC6jiv44qO7NtB6Zy3PQFCWO2ssMSqXbc4lO2eCmR/1nA
3Mlf1RwPxp+M+iOqFgANZV7voQ/r6djEyM1ycr+J2G/mfoxmKMVmnvg3lcyAfYNj
3fm6n0t8ZCkb3URoO4K0ejw007QfN2zpCL2psKucBdahOX7OYHT45o02liKN8ge5
0h7XKUjDStKsId4y5UVNB+QUeaQaWzKKMCzTzX4CxfHXZpIjbDjkdJ9WxAue2+Th
5yvNtPKkTi9EYwHjFAgN7ZKAC7Gu+jZXs9ewdqdyaSfYlioGk7ALz2ZZSptKb5zr
+nwuR+SC4Yzg5uBdjOLXS7/6Z/CVyp2bmgoFAzrP8cC0zfB7wJMNwUTZbUx3d823
Q7xYwecj1F9PE5CjHJlYpFZiKkiMHB242EFmBrcUR+yoRPHRgEg32UpI1/YWYnVO
pG+Hto0O8JnlGzkqslKy/qN7OMFgNTli+nrFnJT7TxDk27GpOY3gv271164TgeUt
MOk7iY6QjDqk6Zpzbg5AMcq6UhB+QUe76XAAFtDvjXLLBbsRckmbNTBbkFv6/O3p
a3bOd7oeugNIaRJJaR/lQ/EVAoBSeNUUj5G1ivoXDWsNXE+ZKNNdYyrCociVgI6S
j31k3XtbGVMk8M/7Lu5slPOGL4xqDtJAddF0eE7buQhRZNX7pPA=
=OtQc
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
* tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
SUNRPC: add a missing rpc_stat for TCP TLS
- Better error message when prepare_ondemand_read failed;
- Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmYvafURHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qrgAg/+PK6WKc2t5SiLpG+dSLgQLhWfebJXhf3+
IWr4hT04JcFJgmxBJAcgh7uoJFrxjTxGDX0vtiLhAAOqRCXb9rDlpqbu+PXfURa1
YyiBAEuqnwqKCm3BixVixV8JCSLHu+UzJ72mZGw+wtSdhInXS/ddBAc8WNVbfjyU
cUJb30+N5vjYVRYtDQG5qP0dbpstm1cTjNJIhjStcSpk9LQkQ9degYctjJj1mk8B
ilplRIyy0bEBVbupUHPtmrHzpDZYe4QXFTtQaEGst1uDw/Cxw8I8zpwO4wavdxVz
DZwDF6bcKGsaCFIV4UuJvowU7syEDkBHk5JvJoyNEhRuZQYOIvMdYRxl6AxAp1Jk
vV5sn/LbWaqF7uR2Lw4NJBapulcDjEMfHuP2FyzzRqMmPO/4BiGrLnK+IRu3cSTH
0/bh6oRdplx8Q7faylMnSTXxYigXfu6vjPXuIZ95Wisk9L+URzhmdwkhnnu8MRPn
DSQfD2VvRrRr52uFiR+LppspPrb8ru6EqORW9ShpugM0V9XPiGyYnNFlCMtoCpgf
fYx5p2F23fXNq461dB6UwZfIM9jvMvrNsb3h3+pvpN/y5UbsW6kMUD38MwZxlSPT
z18P87HKHXGNc6+pQK7zI89I7g+2W5TCZVg0/w5DGTXkOlDQPi9X9woaOYcJIPYJ
LcDG0uM18kI=
=v0WH
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Three fixes related to EROFS fscache mode. The most important two
patches fix calling kill_block_super() in bdev-based mode instead of
kill_anon_super(). The remaining patch is an informative one.
Summary:
- Better error message when prepare_ondemand_read failed
- Fix unmount of bdev-based mode if CONFIG_EROFS_FS_ONDEMAND is on"
* tag 'erofs-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: reliably distinguish block based and fscache mode
erofs: get rid of erofs_fs_context
erofs: modify the error message when prepare_ondemand_read failed
When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:
============================================
ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
<TASK>
erofs_kill_sb+0x81/0x90
deactivate_locked_super+0x35/0x80
get_tree_bdev+0x136/0x1e0
vfs_get_tree+0x2c/0xf0
do_new_mount+0x190/0x2f0
[...]
============================================
Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.
Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-2-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
When prepare_ondemand_read failed, wrong error message is printed.
The prepare_read is also implemented in cachefiles, so we amend it.
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240424084247.759432-1-lihongbo22@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYtKK8ACgkQiiy9cAdy
T1EvKAwAtvALEtcuvh3rjMevy4lA12HMAKoAMrHbALKlC3Ile6lL8x6B7Os04S04
qs8aawmSONbL8W8be2gFOfKlaaAVU827HYllHaBi54xYUCj3ZaeDN0LuSHZEa37f
UR8SbJr+BVIigQd93Z+Sq/WglvFJig+AY4sIvvThWvlNXDQXRpCTsEi4nIMvQ68+
1jEAk5TlOS/CnNLGQI13m+Z3wbGYr9tj/fyvNh9C6pyP0nDP6COw9xDwLqYWHOEP
yBmEoDVThMbjj4thimrbF7n6K8TNf4dCXdhyo74ggotia5CvHeKtQ82DCNRG1sCZ
aMbPL53wM3BM+1Kuvsxx72bk4xOHQINKLM+AXPPgG/amjcs5OnLNEeiUfPWUZ4Rn
fMQkDblz2vwGzpRXAziQVyXutlJKqq2jXq4L52H5KCRChfU8KLy7rv27/CaOnDbr
Azjpk66RH4hjfaQnVWXFxqqY3h0d2M3ZUqQWcJRLDZvgnBwUhutytTYUf0G++lnP
7oSjF08O
=2FY2
-----END PGP SIGNATURE-----
Merge tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Three smb3 client fixes, all also for stable:
- two small locking fixes spotted by Coverity
- FILE_ALL_INFO and network_open_info packing fix"
* tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix lock ordering potential deadlock in cifs_sync_mid_result
smb3: missing lock when picking channel
smb: client: Fix struct_group() usage in __packed structs
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZiwPZwAKCRDdBJ7gKXxA
jmcQAPkB6UT/rBUMvFZb1dom9R6SDYl5ZBr20Vj1HvfakCLxmQEAqEd0N7QoWvKS
hKNCMDujiEKqDUWeUaJen4cqXFFE2Qg=
=1wP7
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address
post-6.8 issues or aren't considered suitable for backporting.
All except one of these are for MM. I see no particular theme - it's
singletons all over"
* tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio()
selftests: mm: protection_keys: save/restore nr_hugepages value from launch script
stackdepot: respect __GFP_NOLOCKDEP allocation flag
hugetlb: check for anon_vma prior to folio allocation
mm: zswap: fix shrinker NULL crash with cgroup_disable=memory
mm: turn folio_test_hugetlb into a PageType
mm: support page_mapcount() on page_has_type() pages
mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros
mm/hugetlb: fix missing hugetlb_lock for resv uncharge
selftests: mm: fix unused and uninitialized variable warning
selftests/harness: remove use of LINE_MAX
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZiulnAAKCRCRxhvAZXjc
ogO+AP9z3+WAvgGmJkWOjT1aOrcQWVe+ZEdEUdK26ufkHhM5vAD/RXmdUBVHcYWk
3oE1hG8bONOASUc6dUIATPHBDjvqFg8=
=LtmL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a few small fixes for this merge window and the attempt
to handle the ntfs removal regression that was reported a little while
ago:
- After the removal of the legacy ntfs driver we received reports
about regressions for some people that do mount "ntfs" explicitly
and expect the driver to be available. Since ntfs3 is a drop-in for
legacy ntfs we alias legacy ntfs to ntfs3 just like ext3 is aliased
to ext4.
We also enforce legacy ntfs is always mounted read-only and give it
custom file operations to ensure that ioctl()'s can't be abused to
perform write operations.
- Fix an unbalanced module_get() in bdev_open().
- Two smaller fixes for the netfs work done earlier in this cycle.
- Fix the errno returned from the new FS_IOC_GETUUID and
FS_IOC_GETFSSYSFSPATH ioctls. Both commands just pull information
out of the superblock so there's no need to call into the actual
ioctl handlers.
So instead of returning ENOIOCTLCMD to indicate to fallback we just
return ENOTTY directly avoiding that indirection"
* tag 'vfs-6.9-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs: Fix the pre-flush when appending to a file in writethrough mode
netfs: Fix writethrough-mode error handling
ntfs3: add legacy ntfs file operations
ntfs3: enforce read-only when used as legacy ntfs driver
ntfs3: serve as alias for the legacy ntfs driver
block: fix module reference leakage from bdev_open_by_dev error path
fs: Return ENOTTY directly if FS_IOC_GETUUID or FS_IOC_GETFSSYSFSPATH fail
In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH
or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed
on a buffered file. When setting up for write-through, we flush any
conflicting writes in the region and wait for the write to complete,
failing if there's a write error to return.
The issue arises if we're writing at or above the EOF position because we
skip the flush and - more importantly - the wait. This becomes a problem
if there's a partial folio at the end of the file that is being written out
and we want to make a write to it too. Both the already-running write and
the write we start both want to clear the writeback mark, but whoever is
second causes a warning looking something like:
------------[ cut here ]------------
R=00000012: folio 11 is not under writeback
WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105
...
CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ...
...
Workqueue: events_unbound netfs_write_collection_worker
...
RIP: 0010:netfs_writeback_lookup_folio
Fix this by making the flush-and-wait unconditional. It will do nothing if
there are no folios in the pagecache and will return quickly if there are
no folios in the region specified.
Further, move the WBC attachment above the flush call as the flush is going
to attach a WBC and detach it again if it is not present - and since we
need one anyway we might as well share it.
Fixes: 41d8e7673a ("netfs: Implement a write-through caching option")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202404161031.468b84f-oliver.sang@intel.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/2150448.1714130115@warthog.procyon.org.uk
Reviewed-by: Jeffrey Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Contains a single mitigation to help deal with an apparent
race condition between client and server having to deal with
inode number collisions.
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmYqu2QACgkQiP/V+0pf
/5h6FhAAjYxXb3zSeQouR7Nr+n4Hc1pIG2hwfwDg0ruZfXnKDCnDvCtMmKJZVQWc
PtR51+wlKLsrCcVArD4zUI9dezJCAzHrG2W+MMn0tka4rQYIGA8TAVXWC0dpVs+e
0UxG/CFjwcHcwRIyCcgmzSetO+rR2kVaK9Nmsd+DkCixRFsJdmG1xZqMEsUg339b
rAnA82fncR5cHvoaTNhFK3TIzIZ78v/xOTORjCSsXYgLBC1Sq7gwPxt11Ms9qZBK
2ttkU6PB/AIL2gXm7VfAQ82HZY5AWRlwH1EFHcgge1vylXoFJuqadhIaj+l6QC4N
fgQGA6q+288vrjr5z5WXlBUCqHO2MXxPhJxEyViif+TIyJ/eCW+G777J4wPKHMiZ
LHx3/4XpbzCMgAZs29Y5l7e53xE13OfnrIngC18iX3AP/fUPKi5fYkiDX59id++k
PPjKlZJI7zW29hXXgBoqQtGG/h2H3d5y4B6dummDv4teGjnY5jPhA+KaL8Z9MUkV
NzSozmsL+zFVK9El+FRI+4REltrg/UwdznknMHe7MJeqigdCMWUSpn7TWh9rxWft
0yLzf3QXZuYG06o3YP6RAwBTdeRay5THU8X09nusPnmoJf9oXYB/lxlqa1k/4msa
kBrZzAHGOmVTpJQYTrspgWp57NyFsAbQqkzYCg6zrBj6P0mdCfs=
=IrQ9
-----END PGP SIGNATURE-----
Merge tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fix from Eric Van Hensbergen:
"This contains a single mitigation to help deal with an apparent race
condition between client and server having to deal with inode number
collisions"
* tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: mitigate inode collisions
Ensure that args.acl is initialized early. It is used in an
unconditional call to kfree() on the way out of
nfsd4_encode_fattr4().
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 83ab8678ad ("NFSD: Add struct nfsd4_fattr_args")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Coverity spotted that the cifs_sync_mid_result function could deadlock
"Thread deadlock (ORDER_REVERSAL) lock_order: Calling spin_lock acquires
lock TCP_Server_Info.srv_lock while holding lock TCP_Server_Info.mid_lock"
Addresses-Coverity: 1590401 ("Thread deadlock (ORDER_REVERSAL)")
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Coverity spotted a place where we should have been holding the
channel lock when accessing the ses channel index.
Addresses-Coverity: 1582039 ("Data race condition (MISSING_LOCK)")
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
- Revert some backchannel fixes that went into v6.9-rc
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmYqZecACgkQM2qzM29m
f5fOPhAAlxsB8ThAAyg+vOMeR6lxutEjPzez+U4M3N1s8dj9eMmcx8s6DPxsvfTS
ExhcqxauCEwh9q+yZQ5ismWiE43Gvf39tvKGXzH30oM4v8xYo9nzxi6w9mEGyqR+
bkpbvbpuCfSeVY4WXD2rFI8DMh1G8PM9llue2QVLWPny2y5K2gAsbinpV/QQtdgp
LEwhnGcX2ZmSh9yd7aGiOOO18CHU9rMygcb4XDzmwWggO8eCrElWbp4x8yuZJFAD
OzuIx00WYny00rlMpZUyZ67hhg0+KcGQVldhr+A8f0K1baFCfW4QDq2T3RGTonoe
DWdDGW7zwBbBw/rg4NUhdIZ9CArml9bB8ohPsaGr99e/95mtpSnXoEDJvKYgsFVE
QembV7zPZqzEjtrR6g/tdf8/g1yRRwUMLze+RLjzz70TohLKvj7+qHCHirNMCMVE
+HXvEdPKCzdZ24YST2wDh6sQ571ka6TX8fL6gYU8O+w+fiitLkEVXGE7KDJvynkm
bsoJxTtbTuFleaW79d51VpIuiW1uyqKJrD1RnwDlMpp+HDtUnoVXU5SfGnmMp4KG
Tz9Hx7hgFthXpajDpyzgLMnYVJ8IvXnoQaOp97mqFe/jSktXfslDdsj/R5ZAjZHO
SfViZ8hjyoxrna7bPh250tSCb/S3qljhsIOc/3tAl8nPWGLoV28=
=O2Ox
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Revert some backchannel fixes that went into v6.9-rc
* tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Convert the callback workqueue to use delayed_work"
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
One of my CI runs popped the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc4+ #1 Not tainted
------------------------------------------------------
btrfs/471533 is trying to acquire lock:
ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0
but task is already holding lock:
ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&fs_info->subvol_sem){++++}-{3:3}:
down_read+0x42/0x170
btrfs_rename+0x607/0xb00
btrfs_rename2+0x2e/0x70
vfs_rename+0xaf8/0xfc0
do_renameat2+0x586/0x600
__x64_sys_rename+0x43/0x50
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
down_write+0x3f/0xc0
btrfs_inode_lock+0x40/0x70
prealloc_file_extent_cluster+0x1b0/0x370
relocate_file_extent_cluster+0xb2/0x720
relocate_data_extent+0x107/0x160
relocate_block_group+0x442/0x550
btrfs_relocate_block_group+0x2cb/0x4b0
btrfs_relocate_chunk+0x50/0x1b0
btrfs_balance+0x92f/0x13d0
btrfs_ioctl+0x1abf/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
__mutex_lock+0xbe/0xc00
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Chain exists of:
&fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->subvol_sem);
lock(&sb->s_type->i_mutex_key#16);
lock(&fs_info->subvol_sem);
lock(&fs_info->cleaner_mutex);
*** DEADLOCK ***
2 locks held by btrfs/471533:
#0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
#1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600
stack backtrace:
CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
Call Trace:
<TASK>
dump_stack_lvl+0x77/0xb0
check_noncircular+0x148/0x160
? lock_acquire+0xcb/0x2e0
__lock_acquire+0x13e7/0x2180
lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? lock_is_held_type+0x9a/0x110
__mutex_lock+0xbe/0xc00
? btrfs_quota_disable+0x54/0x4c0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? btrfs_quota_disable+0x54/0x4c0
? btrfs_quota_disable+0x54/0x4c0
btrfs_quota_disable+0x54/0x4c0
btrfs_ioctl+0x206b/0x2600
? srso_return_thunk+0x5/0x5f
? __do_sys_statfs+0x61/0x70
__x64_sys_ioctl+0x97/0xd0
do_syscall_64+0x95/0x180
? srso_return_thunk+0x5/0x5f
? reacquire_held_locks+0xd1/0x1f0
? do_user_addr_fault+0x307/0x8a0
? srso_return_thunk+0x5/0x5f
? lock_acquire+0xcb/0x2e0
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? find_held_lock+0x2b/0x80
? srso_return_thunk+0x5/0x5f
? lock_release+0xca/0x2a0
? srso_return_thunk+0x5/0x5f
? do_user_addr_fault+0x35c/0x8a0
? srso_return_thunk+0x5/0x5f
? trace_hardirqs_off+0x4b/0xc0
? srso_return_thunk+0x5/0x5f
? lockdep_hardirqs_on_prepare+0xde/0x190
? srso_return_thunk+0x5/0x5f
This happens because when we call rename we already have the inode mutex
held, and then we acquire the subvol_sem if we are a subvolume. This
makes the dependency
inode lock -> subvol sem
When we're running data relocation we will preallocate space for the
data relocation inode, and we always run the relocation under the
->cleaner_mutex. This now creates the dependency of
cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem
Qgroup delete is doing this in the opposite order, it is acquiring the
subvol_sem and then it is acquiring the cleaner_mutex, which results in
this lockdep splat. This deadlock can't happen in reality, because we
won't ever rename the data reloc inode, nor is the data reloc inode a
subvolume.
However this is fairly easy to fix, simply take the cleaner mutex in the
case where we are disabling qgroups before we take the subvol_sem. This
resolves the lockdep splat.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The previous patch that replaced BUG_ON by error handling forgot to
unlock the mutex in the error path.
Link: https://lore.kernel.org/all/Zh%2fHpAGFqa7YAFuM@duo.ucw.cz
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 7411055db5 ("btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()")
CC: stable@vger.kernel.org
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Return 0 for pages which can't be mapped. This matches how page_mapped()
works. It is more convenient for users to not have to filter out these
pages.
Link: https://lkml.kernel.org/r/20240321142448.1645400-5-willy@infradead.org
Fixes: 9c5ccf2db0 ("mm: remove HUGETLB_PAGE_DTOR")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYnyRwACgkQxWXV+ddt
WDtnlA/9GWPYrQFBBzPZeXZHldr7grn8t8oDVVvMhSxRslrk2XYGqRVEVfZT5+Wp
wsbqdfmDXShstWkU43jxXcJAg4QxQNBBSVaKVMO5rXkM5ZHLdn78EJs5htSuy+67
n0zFqqxVr0F9LvrHqs/JJp70fr3WQtGINAkxda0JHKaMEj2nSnGjzKf6GAPomAs+
7BWQlV4cc8tQAox2MxCFx1eXTISepa9pi0ojm0R+siZGgMkmzpjJTy9WZ3EtIWQN
4LU5FMCQMsKkqUETxsEs5Va0QkEvN3SuiNsoUIJZFSArws3cwYz+u10+8Rpc/U1o
8P3H8fGxvbYpRt6QjG1GGYy0LxhyYeyGp+fNfeBF4pl2MAn2e8qeiPpPlC+Q6/U2
/Nn8+x9/FgWQKNmu76DQ1BM4WoD18mEUQrB6OYLO/9FBttLAevaEO3vxECxosBIj
wGyfXJ/r4Y2Vva+pkjreBpc7m/VwwOPGdRHkKk8mzFGqoQzSwvs/pm3ldF9dV7ud
smZ0H8vvaEDigOd4oFR2vC2wpETaCL89oS9x/NzMvqlqQaGJoD6t591c3yRwmvro
hJYT5lG6KR+ZeDgv+2ZZA2s5/2l7193pkS3toj7v1xSOkpjADPaTBg+2P5o27YyJ
UvXbEpdcCV6xWxT3Ak/bkfS1dDGfxAtwV7c/sPGvY5KtmbvjOBE=
=riVJ
-----END PGP SIGNATURE-----
Merge tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix information leak by the buffer returned from LOGICAL_INO ioctl
- fix flipped condition in scrub when tracking sectors in zoned mode
- fix calculation when dropping extent range
- reinstate fallback to write uncompressed data in case of fragmented
space that could not store the entire compressed chunk
- minor fix to message formatting style to make it conforming to the
commonly used style
* tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
btrfs: fallback if compressed IO fails for ENOSPC
btrfs: scrub: run relocation repair when/only needed
btrfs: remove colon from messages with state
This commit was a pre-requisite for commit c1ccfcf1a9 ("NFSD:
Reschedule CB operations when backchannel rpc_clnt is shut down"),
which has already been reverted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The reverted commit attempted to enable NFSD to retransmit pending
callback operations if an NFS client disconnects, but
unintentionally introduces a hazardous behavior regression if the
client becomes permanently unreachable while callback operations are
still pending.
A disconnect can occur due to network partition or if the NFS server
needs to force the NFS client to retransmit (for example, if a GSS
window under-run occurs).
Reverting the commit will make NFSD behave the same as it did in
v6.8 and before. Pending callback operations are permanently lost if
the client connection is terminated before the client receives them.
For some callback operations, this loss is not harmful.
However, for CB_RECALL, the loss means a delegation might be revoked
unnecessarily. For CB_OFFLOAD, pending COPY operations will never
complete unless the NFS client subsequently sends an OFFLOAD_STATUS
operation, which the Linux NFS client does not currently implement.
These issues still need to be addressed somehow.
Reported-by: Dai Ngo <dai.ngo@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYm8oEACgkQiiy9cAdy
T1HAVAv/T9NXYhTYQK8EK7DvzcGDV4dOcFI6GVsrinp1KHGyuxoPABKGctQXfou0
4DJ9ik0CYFWmVuz8CODxmHkJq8fclnkJtVPH4DjcOm8CqhcaMJNVgEWP6eMGevGo
sQPEtFAYRPQrIm5X8u2uvARE490YiUD85Se+6LrYZdzt/BOUQQHtKodSTre1ZCAV
F6GLEVVuncP9iqno9lNu1EI8ltcOW6i1i15s4HOmULNwtUKdwsYTWVqW6JDOy5gM
9YqXJhPobVcZnY/m8QVWfE/lEPOwJO5lbRF4Ktykz4PQkxZqD6t+Noesj73GKEgC
7jnt3L79s1zA51gHdn96Z1qWlaIruX4ugYhfzQAW7PfYmSUr3I8G09ofzwFc77aH
osYoU6mZWWL+4/RIKK3DGYRe2ET68KmlNheG2OTOeCNfwjSoI5rWHSLMGqlegTc0
40gz62OHuwncMyKFLwGaf6ztzlOLHIdVU4uhRv1/9ptZuMl/2LoiN7G8ZddbtYIS
JxL/kIEE
=PaAG
-----END PGP SIGNATURE-----
Merge tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fscache fix
- fix for case where we could use uninitialized lease
- add tracepoint for debugging refcounting of tcon
- fix mount option regression (e.g. forceuid vs. noforceuid when uid=
specified) caused by conversion to the new mount API
* tag '6.9-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: reinstate original behavior again for forceuid/forcegid
smb: client: fix rename(2) regression against samba
cifs: Add tracing for the cifs_tcon struct refcounting
cifs: Fix reacquisition of volume cookie on still-live connection
Fix the error return in netfs_perform_write() acting in writethrough-mode
to return any cached error in the case that netfs_end_writethrough()
returns 0.
This can affect the use of O_SYNC/O_DSYNC/RWF_SYNC/RWF_DSYNC in 9p and afs.
Fixes: 41d8e7673a ("netfs: Implement a write-through caching option")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/6736.1713343639@warthog.procyon.org.uk
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYmfyAACgkQiiy9cAdy
T1H8CQv/RlvQNbfSc9V5/vUPqfEWhx+5EyFoMFPZmo2HmtxT4BI8sEICUQcC6h+Y
L2li6ihmsF1DnxDxsFwKAhQ9nTpYG4LoBbljk/j/N+sKhUFE3ZLWxZyoEAbWWfeh
UJt8ZuGTgbc8nyPZ1s2d5oZy37PNlGd7CilLtP5HQyEW3l8EHg0qQOp98cibh0mm
+kS2vilb4nIlapblTq2uFDcBTolXRSjTlY8eWC9DNvWdjTJLdTEjAurqeuN71FA0
+XXnA9EYE9MDEkCqnQdZEwGvR8ZKy9sR4cqCUGZO/tNz/KfOGfjJkrEf+OEj37Af
I5ahxUKrYwBEC/6gaFx8i1ZiivdDOyIaM7b+rpazpiV1sCCbxlM18TlDSyODFnUC
nqEABR3slx/oZ4xYcFuxjzN0L9ZS7+0NQ1VQovFLlNqW7f9qFdUq0JKqfNDvFOm4
tgi5/lCnlMco0ts6h0HIw5LrwDi7HJWbYf1Fmv/1jC2Ed5Ut0GCMNpSC9WQxDo+t
G4m8ul1S
=ZizJ
-----END PGP SIGNATURE-----
Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Five ksmbd server fixes, most also for stable:
- rename fix
- two fixes for potential out of bounds
- fix for connections from MacOS (padding in close response)
- fix for when to enable persistent handles"
* tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add continuous availability share parameter
ksmbd: common: use struct_group_attr instead of struct_group for network_open_info
ksmbd: clear RENAME_NOREPLACE before calling vfs_rename
ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
- fix a few more deadlocks in recovery
- fix u32/u64 issues in mi_btree_bitmap
- btree key cache shrinker now actually frees, with more instrumentation
coming so we can verify that it's working correctly more easily in the
future
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYmveYACgkQE6szbY3K
bnY+kA//dtdKfPliuQlfYhUvIZSvgEy+KxjDaDmeJFVMKFYHiip/JJt7V/YU7jWW
DmWGtPqo1hoGZnic7h9fstbBCRgUdIYGBInqAWPzL3wmFYe5FPE02KN5fuZ+A+Mp
sn0QFML4oA0uxD7TRXfCeNx3NRwXonztletVskXCuLa0T8iTOTdOuAH0MCGow3OC
mlfRZyK05f6Q0UOIzvntfBl8Tkr+yk5g0GFz54U7s+qs/zJsYiYfruXuq6AxLTy2
xVMDj3Hc6W0vggVpv68HInluubl/b7rVSy+w59GG0D3iQ9/fBdqitFLdw49ReXzi
J//ctZLb3n+IM4lA7t5ev0lY7bvI2FwFNkrL4qW41E4un5eQ3ghbyOmMoz87svyg
4JW/CPGP7uKNVmfRuHn1qhgJ9/vIXkObJVl9GKZF2BylaZwjMM5YrL5MZwrKFQYy
9BMgemgvHFK+wRi74q/OUu3PyH045AoJdIKI66ypFexhGi5YeNqtRHLUKdfSrJR+
eEkAUaHcgLLfxyk+fIRvcSK+Q9j3BibvsSkU3vLSnl2B+xdvfJqb+I/yvBt/SZQW
P09JceDQABRBMu9beVbVqMED+PniSZVfG2eU2jBcZ+jhbGQgmiWfMNJVKS4/0uwz
PRS33P2mViVZ6PJokWyecbgGtVxKrK2ruPdcu6/W05Qi0Vv58+k=
=lCNd
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Nothing too crazy in this one, and it looks like (fingers crossed) the
recovery and repair issues are settling down - although there's going
to be a long tail there, as we've still yet to really ramp up on error
injection or syzbot.
- fix a few more deadlocks in recovery
- fix u32/u64 issues in mi_btree_bitmap
- btree key cache shrinker now actually frees, with more
instrumentation coming so we can verify that it's working
correctly more easily in the future"
* tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs:
bcachefs: If we run merges at a lower watermark, they must be nonblocking
bcachefs: Fix inode early destruction path
bcachefs: Fix deadlock in journal write path
bcachefs: Tweak btree key cache shrinker so it actually frees
bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong
bcachefs: Fix missing call to bch2_fs_allocator_background_exit()
bcachefs: Check for journal entries overruning end of sb clean section
bcachefs: Fix bio alloc in check_extent_checksum()
bcachefs: fix leak in bch2_gc_write_reflink_key
bcachefs: KEY_TYPE_error is allowed for reflink
bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift
bcachefs: make sure to release last journal pin in replay
bcachefs: node scan: ignore multiple nodes with same seq if interior
bcachefs: Fix format specifier in validate_bset_keys()
bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
forceuid/forcegid should be enabled by default when uid=/gid= options are
specified, but commit 24e0a1eff9 ("cifs: switch to new mount api")
changed the behavior. Due to the change, a mounted share does not show
intentional uid/gid for files and directories even though uid=/gid=
options are specified since forceuid/forcegid are not enabled.
This patch reinstates original behavior that overrides uid/gid with
specified uid/gid by the options.
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Signed-off-by: Takayuki Nagata <tnagata@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Detect and mitigate inode collsions that now occur since we
fixed 9p generating duplicate inode structures. Underlying
cause of these appears to be a race condition between reuse
of inode numbers in underlying file system and cleanup of
inode numbers in the client. Enabling caching
makes this much more likely to happen as it increases cleanup
latency due to writebacks.
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>