When compiling with gcc version 14.0.0 20231126 (experimental)
and CONFIG_FORTIFY_SOURCE=y, I've noticed the following:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/smb2pdu.c:18:
In function 'fortify_memcpy_chk',
inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/cifssmb.c:17:
In function 'fortify_memcpy_chk',
inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In both cases, the fortification logic inteprets calls to 'memcpy()' as an
attempts to copy an amount of data which exceeds the size of the specified
field (i.e. more than 8 bytes from __le64 value) and thus issues an overread
warning. Both of these warnings may be silenced by using the convenient
'struct_group()' quirk.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
- debugfs had a deadlock (removal vs. use of files),
fixes going through wireless ACKed by Greg
- support for HT STAs on 320 MHz channels, even if it's
not clear that should ever happen (that's 6 GHz), best
not to WARN()
- fix for the previous CQM fix that broke most cases
- various wiphy locking fixes
- various small driver fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmVnUrkACgkQ10qiO8sP
aACE4A/9EdhyexRlNKdF3yLh0Q4acuR8IlaLBNgGiQDFxx17djE04p2PTpwkxOyn
0EPy8dKhhWwoAkbvZ+6ToNFa+Jv9w+C5xVls2osGRuYGSVNlxCNy+8tWSNVT73jp
rrapEkYHzuz9wBZiJpKwC+mK9uH9gAQgWyYUaTTeOnO/+m3HVhZdU6y71j8gQlm6
9YFSI3r/VWlq1JpThn8WGULJXOMICWN0Sp4XRhEzHPLjK7MiCLNrrQSyV+uMHHUI
PmRQN8QW6Oomjbcih3YNn+Geps7xUJFEG4mZvM6GUBXugnIq9t9xmzEbEJyBRIpl
MGTwrIAyxtvBeHMpB7U/R3RSEyHfSW6fXHgN2S8ZEFTu6v70gP3TK8lL1gWc5mR4
GogNaGQ0G6LdiHjM7XeGUv/SfzD6HJWa9aR1JbRwapkvxFfM7BEc14gxip0nFDLG
z9sixiztGmzBsxmDc+h7M6YLlvWe4oqueUyjA7/p42NhYq41Zy/BYK5oUgsuL4ah
Lsb00O6Mz9m6iPtJlREe81Qsclkjg8xMY2YcCYELXP3v50KwU3THBAOYbh5hvgrL
T2gibkR0zFAGz1EDcPyQq/SBeUtASPVHyLGUMae6DOcVnOFdAhDTbX0B/TvcnRSq
e1gvmNywL7wFVZLD/RYZtWW9g6NUcwK+CU59PRBh2OqVb0ow8Ls=
=X6t2
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
wireless fixes:
- debugfs had a deadlock (removal vs. use of files),
fixes going through wireless ACKed by Greg
- support for HT STAs on 320 MHz channels, even if it's
not clear that should ever happen (that's 6 GHz), best
not to WARN()
- fix for the previous CQM fix that broke most cases
- various wiphy locking fixes
- various small driver fixes
* tag 'wireless-2023-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: use wiphy locked debugfs for sdata/link
wifi: mac80211: use wiphy locked debugfs helpers for agg_status
wifi: cfg80211: add locked debugfs wrappers
debugfs: add API to allow debugfs operations cancellation
debugfs: annotate debugfs handlers vs. removal with lockdep
debugfs: fix automount d_fsdata usage
wifi: mac80211: handle 320 MHz in ieee80211_ht_cap_ie_to_sta_ht_cap
wifi: avoid offset calculation on NULL pointer
wifi: cfg80211: hold wiphy mutex for send_interface
wifi: cfg80211: lock wiphy mutex for rfkill poll
wifi: cfg80211: fix CQM for non-range use
wifi: mac80211: do not pass AP_VLAN vif pointer to drivers during flush
wifi: iwlwifi: mvm: fix an error code in iwl_mvm_mld_add_sta()
wifi: mt76: mt7925: fix typo in mt7925_init_he_caps
wifi: mt76: mt7921: fix 6GHz disabled by the missing default CLC config
====================
Link: https://lore.kernel.org/r/20231129150809.31083-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in
smb3_insert_range(), to set i_size after extending the file on the server
and before we do the copy to open the gap (as we don't clean up the EOF
marker if the copy fails).
Fixes: 7fe6fe95b9 ("cifs: add FALLOC_FL_INSERT_RANGE support")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in
smb3_zero_range(), to set i_size after extending the file on the server.
Fixes: 72c419d9b0 ("cifs: fix smb3_zero_range so it can expand the file-size when required")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
This fixes a bug where going read-only was taking longer than it should
have due to copygc forgetting to check kthread_should_stop()
Additionally: fix a missing is_kthread check in bch2_move_ratelimit().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This eliminates some SRCU warnings: for_each_btree_key2() runs every
loop iteration in a distinct transaction context.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
btree writes update the btree node key after every write, in order to
update sectors_written, and they also might need to drop pointers if one
of the writes failed in a replicated btree node.
But the btree node might also have had a pointer dropped while the write
was in flight, by bch2_dev_metadata_drop(), and thus there was a bug
where the btree node write would ovewrite the btree node's key with what
it had at the start of the write.
Fix this by dropping pointers not currently in the btree node key.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
journal_cur_seq() can legitimately be used outside of the journal lock,
where this assert can race
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The automated tests check if we've hit too many slowpath/error path
events and fail the test - if we're just shutting down, that naturally
shouldn't count.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse
points as they will be always zero. Set it to symlink target's length
as specified by POSIX.
This will make stat() family of syscalls return the correct st_size
for such files.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When instantiating inodes for SMB symlinks, add the mode bits from
@cifs_sb->ctx->file_mode as we already do for the other special files.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Renamed from trace_move_extent_alloc_mem_fail, because there are other
reasons we colud fail (disk space allocation failure).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_btree_update_start() calculates which nodes are going to have to be
split/rewritten, so that we know how many nodes to reserve and how deep
in the tree we have to take locks.
But btree node merges require inserting two keys into the parent node,
not just splits.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Validation was completely missing for replicas entries in the journal
(not the superblock replicas section) - we can't have replicas entries
pointing to invalid devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
zstd apparently lies about the size of the compression workspace it
requires; if we double it compression succeeds.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmVmHvEACgkQxWXV+ddt
WDtczA//UdxSPxQwJY1oOQj3k5Zgb/zOThfyD4x5wrDxFAYwVh1XhuyxV2XT8qyq
ipp+mi9doVykZKLSY+oRAeqjGlF0nIYm1z2PGSU7JXz4VsEj970rsDs3ePwH5TW6
V8/6VraOIIvmOnTft7aiuM8CXUjyndalNl7RvHu+v6grgAAgQAaly/3CmRIsm/Ui
2Wb6/J8ciAOBZ8TkFMr0PiTJd+CjUL+1Y9IaYEywujf0nVNJSgHp+R2CpwLDvM/5
1x6LdRrUKmnY7mhOvC7QwWfQmgsPnj3OuR+3+L+8jULTvcpwka2KEcpCH8/s6mUK
+4XhKQ4xXOJr8M+KmAUpy1yZZ30G6cDSwnfCWbWRCfR03396tTb08kb6G21fR+NL
o2qEUOe4DoMbYX/5zd9xEVqbwyGhAIXB0fJ7KJ0RqbaNBh/roRALBVCseP2CFwJE
P0DE9phjeIGQf3ybdfP7XqnMfk520bqoeV49Akbn2us2SrV1+O9Yjqmj2pbTnljE
M30Jh/btaiTFtsGB3MBDRRnGhf7F2l1dsmdmMVhdOK8HMY6obcJUdv6YXVLAjBDn
ATWtUVVizOpHvSZL0G/+1fXqHhLqOnHLY4A97uMjcElK5WJfuYZv8vZK7GVKC/jW
y5F4w/FPxU8dmhorMGksya2CLMvUsv5dikyAzGHirjEAdyrK1jg=
=85Pb
-----END PGP SIGNATURE-----
Merge tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes and message updates:
- for simple quotas, handle the case when a snapshot is created and
the target qgroup already exists
- fix a warning when file descriptor given to send ioctl is not
writable
- fix off-by-one condition when checking chunk maps
- free pages when page array allocation fails during compression
read, other cases were handled
- fix memory leak on error handling path in ref-verify debugging
feature
- copy missing struct member 'version' in 64/32bit compat send ioctl
- tree-checker verifies inline backref ordering
- print messages to syslog on first mount and last unmount
- update error messages when reading chunk maps"
* tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: send: ensure send_fd is writable
btrfs: free the allocated memory if btrfs_alloc_page_array() fails
btrfs: fix 64bit compat send ioctl arguments not initializing version member
btrfs: make error messages more clear when getting a chunk map
btrfs: fix off-by-one when checking chunk map includes logical address
btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()
btrfs: add dmesg output for first mount and last unmount of a filesystem
btrfs: do not abort transaction if there is already an existing qgroup
btrfs: tree-checker: add type and sequence check for inline backrefs
In some cases there might be longer-running hardware accesses
in debugfs files, or attempts to acquire locks, and we want
to still be able to quickly remove the files.
Introduce a cancellations API to use inside the debugfs handler
functions to be able to cancel such operations on a per-file
basis.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When you take a lock in a debugfs handler but also try
to remove the debugfs file under that lock, things can
deadlock since the removal has to wait for all users
to finish.
Add lockdep annotations in debugfs_file_get()/_put()
to catch such issues.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
debugfs_create_automount() stores a function pointer in d_fsdata,
but since commit 7c8d469877 ("debugfs: add support for more
elaborate ->d_fsdata") debugfs_release_dentry() will free it, now
conditionally on DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, but that's not
set for the function pointer in automount. As a result, removing
an automount dentry would attempt to free the function pointer.
Luckily, the only user of this (tracing) never removes it.
Nevertheless, it's safer if we just handle the fsdata in one way,
namely either DEBUGFS_FSDATA_IS_REAL_FOPS_BIT or allocated. Thus,
change the automount to allocate it, and use the real_fops in the
data to indicate whether or not automount is filled, rather than
adding a type tag. At least for now this isn't actually needed,
but the next changes will require it.
Also check in debugfs_file_get() that it gets only called
on regular files, just to make things clearer.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- With the usage of simple_recursive_remove() recommended by Al Viro,
the code should not be calling "d_invalidate()" itself. Doing so
is causing crashes. The code was calling d_invalidate() on the race
of trying to look up a file while the parent was being deleted.
This was detected, and the added dentry was having d_invalidate() called
on it, but the deletion of the directory was also calling d_invalidate()
on that same dentry.
- A fix to not free the eventfs_inode (ei) until the last dput() was called
on its ei->dentry made the ei->dentry exist even after it was marked
for free by setting the ei->is_freed. But code elsewhere still was
checking if ei->dentry was NULL if ei->is_freed is set and would
trigger WARN_ON if that was the case. That's no longer true and there
should not be any warnings when it is true.
- Use GFP_NOFS for allocations done under eventfs_mutex.
The eventfs_mutex can be taken on file system reclaim, make sure
that allocations done under that mutex do not trigger file system
reclaim.
- Clean up code by moving the taking of inode_lock out of the helper
functions and into where they are needed, and not use the
parameter to know to take it or not. It must always be held but
some callers of the helper function have it taken when they were
called.
- Warn if the inode_lock is not held in the helper functions.
- Warn if eventfs_start_creating() is called without a parent.
As eventfs is underneath tracefs, all files created will have
a parent (the top one will have a tracefs parent).
Tracing update;
- Add Mathieu Desnoyers as an official reviewer of the tracing sub system.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZWNdfBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qsw1AQC0x3cdNhIhzi1mWh9a8KSH/GJPdl/a
7t0sv9XT8HV+iQEAyvvr0ov/s7XSAffG2HcYU0WxRGXTI5MyQlzmaIQ9TQo=
=ZQYD
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt::
"Eventfs fixes:
- With the usage of simple_recursive_remove() recommended by Al Viro,
the code should not be calling "d_invalidate()" itself. Doing so is
causing crashes. The code was calling d_invalidate() on the race of
trying to look up a file while the parent was being deleted. This
was detected, and the added dentry was having d_invalidate() called
on it, but the deletion of the directory was also calling
d_invalidate() on that same dentry.
- A fix to not free the eventfs_inode (ei) until the last dput() was
called on its ei->dentry made the ei->dentry exist even after it
was marked for free by setting the ei->is_freed. But code elsewhere
still was checking if ei->dentry was NULL if ei->is_freed is set
and would trigger WARN_ON if that was the case. That's no longer
true and there should not be any warnings when it is true.
- Use GFP_NOFS for allocations done under eventfs_mutex. The
eventfs_mutex can be taken on file system reclaim, make sure that
allocations done under that mutex do not trigger file system
reclaim.
- Clean up code by moving the taking of inode_lock out of the helper
functions and into where they are needed, and not use the parameter
to know to take it or not. It must always be held but some callers
of the helper function have it taken when they were called.
- Warn if the inode_lock is not held in the helper functions.
- Warn if eventfs_start_creating() is called without a parent. As
eventfs is underneath tracefs, all files created will have a parent
(the top one will have a tracefs parent).
Tracing update:
- Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
* tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
eventfs: Make sure that parent->d_inode is locked in creating files/dirs
eventfs: Do not allow NULL parent to eventfs_start_creating()
eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
eventfs: Do not invalidate dentry in create_file/dir_dentry()
eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVioPcACgkQiiy9cAdy
T1Gf7wv/bbwvRAhPi2m51Iv/LZn+XzY639rtvl2RvfuwxzdEsJzmzXHEU4SFYj7w
9d4vcVndwTzcD8sE9jf6tSgkFd96U9ZEoias+nUdU2CY02DW7C/PVlTckiC4brHD
2ZOOYqn5w3k8mJGN1dP7+1yw1OtXJeOr65/8JbP12z0eru3lzNn91blbW3faeQSt
P+dqFTYFygBLiB3AFqlrtXh6TeUpT88UkfYpDrhT+TIiY+BwYKyQBcgfW9Ho12Cx
FcD5Bc8QhXT2IM0+igjNJgLwqhko6KGz585m/ojABJwIvstl6gtPhT+r07VLAl8K
87rNuKvIXfdmbaWEcUgOBX9KTa8fduTgVi6s4wj/1jP8NxihOXjDtpRO5xGyTudh
w3HooLC8SajEzH76MbLRPnmrG07IG+ZRPiYkMIB65sZvXh/rlAM/59k8K+2WMtEN
cDZ53EO1Y2qd5C/wto/dg3e3eWEC2wxu94x5uWYaXHxAmdDCfoZ4Qj1kSkgSXHd5
FoMd4xAJ
=6aLE
-----END PGP SIGNATURE-----
Merge tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- use after free fix in releasing multichannel interfaces
- fixes for special file types (report char, block, FIFOs properly when
created e.g. by NFS to Windows)
- fixes for reporting various special file types and symlinks properly
when using SMB1
* tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: introduce cifs_sfu_make_node()
smb: client: set correct file type from NFS reparse points
smb: client: introduce ->parse_reparse_point()
smb: client: implement ->query_reparse_point() for SMB1
cifs: fix use after free for iface while disabling secondary channels
bkey embeds a bpos that is misaligned on big endian; this is so that
bch2_bkey_swab() works correctly without having to differentiate between
packed and non-packed keys (a debatable design decision).
This means it can't have the __aligned() tag on big endian.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Durability of an erasure coded pointer doesn't add the device
durability; durability is the same for any extent in that stripe so the
calculation only comes from the stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, there was a bug where if an extent had greater durability
than required (because we needed to move a durability=1 pointer and
ended up putting it on a durability 2 device), we would submit a write
for replicas=2 - the durability of the pointer being rewritten - instead
of the number of replicas required to bring it back up to the
data_replicas option.
This, plus the allocation path sometimes allocating on a greater
durability device than requested, meant that extents could continue
having more and more replicas added as they were being rewritten.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
* Validate quota records recovered from the log before writing them to the
disk.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZV5ElQAKCRAH7y4RirJu
9DCnAP0bth5eVyCxq9teNsql8sDnWzYtgdp3Sgo6LGjKcbUigAEAldS0EW86fva6
X60DComoQfxT4zMKR6K6h7VvhcF3dwc=
=PQ3p
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Chandan Babu:
- Validate quota records recovered from the log before writing them to
the disk.
* tag 'xfs-6.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: dquot recovery does not validate the recovered dquot
xfs: clean up dqblk extraction
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmVguZwACgkQ+7dXa6fL
C2uS0g/6A3HN0NG2XmLjvtR9z76aszitAX0EM2qhzj+rFcOVDCGdcataJoi3r9bx
+bYoXp6PmiqHrX/w6GIV68dlrBNHVXnySigekE/XsfMYcvaDSEa90zDriASI+85m
p01oOHZDZ9XbRpx9DHb9xHSu4W+KtKjTCnuhS4EnlOPfQuBFY+o94iPThaJa9DT3
tKyHgOvxtn2NqyP4xW13h/oyCjx93ked8nMWOPSM9scesmUXCbInjpBiuQj/rSqS
OP6h+xNr7jufQr7L+pBxliRJ3SyhJQhAY9JD450t2E8FffaVZpE1GbK7ud+l3Tdq
F93BbqisL17kJSpedu44sduqziXK9oTrgNSp7zCwO92BlzS7/hoPBMj0Ki4TqHsu
5hH3qS994uVHGwQZzZaqwPcI8gJTsDhyMAIClsp9ZbDeJW3LNxtvZELXW1+XbFwk
SFJRL7DHZJ+aitqrwVd+Ub1m4yfJTHNEp52YwiQKQrYMnUnS9CfOKukvD+0lC6eg
M2ohVtXO4gk+cjODLYGuQcLdSlpKY+yAJ0ujgR3WoLpSyRbhUXIAI0fOETPkJv84
vYUcN3znMfFdhQrse/Hw57xzHrzQNq3yqOG1UypplflwK94f1OpmFMB4Ufb5NSih
9uKjwnmLu4gau8pqkwuNppV13evaAZPmUv//7CHY0dfDYyB1Xcc=
=rN72
-----END PGP SIGNATURE-----
Merge tag 'afs-fixes-20231124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Fix the afs_server_list struct to be cleaned up with RCU
- Fix afs to translate a no-data result from a DNS lookup into ENOENT,
not EDESTADDRREQ for consistency with OpenAFS
- Fix afs to translate a negative DNS lookup result into ENOENT rather
than EDESTADDRREQ
- Fix file locking on R/O volumes to operate in local mode as the
server doesn't handle exclusive locks on such files
- Set SB_RDONLY on superblocks for RO and Backup volumes so that the
VFS can see that they're read only
* tag 'afs-fixes-20231124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY
afs: Fix file locking on R/O volumes to operate in local mode
afs: Return ENOENT if no cell DNS record can be found
afs: Make error on cell lookup failure consistent with OpenAFS
afs: Fix afs_server_list to be cleaned up with RCU
kernel_write() requires the caller to ensure that the file is writable.
Let's do that directly after looking up the ->send_fd.
We don't need a separate bailout path because the "out" path already
does fput() if ->send_filp is non-NULL.
This has no security impact for two reasons:
- the ioctl requires CAP_SYS_ADMIN
- __kernel_write() bails out on read-only files - but only since 5.8,
see commit a01ac27be4 ("fs: check FMODE_WRITE in __kernel_write")
Reported-and-tested-by: syzbot+12e098239d20385264d3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=12e098239d20385264d3
Fixes: 31db9f7c23 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If btrfs_alloc_page_array() fail to allocate all pages but part of the
slots, then the partially allocated pages would be leaked in function
btrfs_submit_compressed_read().
[CAUSE]
As explicitly stated, if btrfs_alloc_page_array() returned -ENOMEM,
caller is responsible to free the partially allocated pages.
For the existing call sites, most of them are fine:
- btrfs_raid_bio::stripe_pages
Handled by free_raid_bio().
- extent_buffer::pages[]
Handled btrfs_release_extent_buffer_pages().
- scrub_stripe::pages[]
Handled by release_scrub_stripe().
But there is one exception in btrfs_submit_compressed_read(), if
btrfs_alloc_page_array() failed, we didn't cleanup the array and freed
the array pointer directly.
Initially there is still the error handling in commit dd137dd1f2
("btrfs: factor out allocating an array of pages"), but later in commit
544fe4a903 ("btrfs: embed a btrfs_bio into struct compressed_bio"),
the error handling is removed, leading to the possible memory leak.
[FIX]
This patch would add back the error handling first, then to prevent such
situation from happening again, also
Make btrfs_alloc_page_array() to free the allocated pages as a extra
safety net, then we don't need to add the error handling to
btrfs_submit_compressed_read().
Fixes: 544fe4a903 ("btrfs: embed a btrfs_bio into struct compressed_bio")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When the send protocol versioning was added in 5.16 e77fbf9903
("btrfs: send: prepare for v2 protocol"), the 32/64bit compat code was
not updated (added by 2351f431f7 ("btrfs: fix send ioctl on 32bit with
64bit kernel")), missing the version struct member. The compat code is
probably rarely used, nobody reported any bugs.
Found by tool https://github.com/jirislaby/clang-struct .
Fixes: e77fbf9903 ("btrfs: send: prepare for v2 protocol")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZWBq0gAKCRCRxhvAZXjc
ot4EAP48O5ExMtQ3/AIkNDo+/9/Iz4g7bE1HYmdyiMPO3Ou/uwEAySwBXRJrFAsS
9omvkEdqrfyguW0xgoYwcxBdATVHnAE=
=ScR3
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Avoid calling back into LSMs from vfs_getattr_nosec() calls.
IMA used to query inode properties accessing raw inode fields without
dedicated helpers. That was finally fixed a few releases ago by
forcing IMA to use vfs_getattr_nosec() helpers.
The goal of the vfs_getattr_nosec() helper is to query for attributes
without calling into the LSM layer which would be quite problematic
because incredibly IMA is called from __fput()...
__fput()
-> ima_file_free()
What it does is to call back into the filesystem to update the file's
IMA xattr. Querying the inode without using vfs_getattr_nosec() meant
that IMA didn't handle stacking filesystems such as overlayfs
correctly. So the switch to vfs_getattr_nosec() is quite correct. But
the switch to vfs_getattr_nosec() revealed another bug when used on
stacking filesystems:
__fput()
-> ima_file_free()
-> vfs_getattr_nosec()
-> i_op->getattr::ovl_getattr()
-> vfs_getattr()
-> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr()
-> security_inode_getattr() # calls back into LSMs
Now, if that __fput() happens from task_work_run() of an exiting task
current->fs and various other pointer could already be NULL. So
anything in the LSM layer relying on that not being NULL would be
quite surprised.
Fix that by passing the information that this is a security request
through to the stacking filesystem by adding a new internal
ATT_GETATTR_NOSEC flag. Now the callchain becomes:
__fput()
-> ima_file_free()
-> vfs_getattr_nosec()
-> i_op->getattr::ovl_getattr()
-> if (AT_GETATTR_NOSEC)
vfs_getattr_nosec()
else
vfs_getattr()
-> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr()
- Fix a bug introduced with the iov_iter rework from last cycle.
This broke /proc/kcore by copying too much and without the correct
offset.
- Add a missing NULL check when allocating the root inode in
autofs_fill_super().
- Fix stable writes for multi-device filesystems (xfs, btrfs etc) and
the block device pseudo filesystem.
Stable writes used to be a superblock flag only, making it a per
filesystem property. Add an additional AS_STABLE_WRITES mapping flag
to allow for fine-grained control.
- Ensure that offset_iterate_dir() returns 0 after reaching the end of
a directory so it adheres to getdents() convention.
* tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
libfs: getdents() should return 0 after reaching EOD
xfs: respect the stable writes flag on the RT device
xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
block: update the stable_writes flag in bdev_add
filemap: add a per-mapping stable writes flag
autofs: add: new_inode check in autofs_fill_super()
iov_iter: fix copy_page_to_iter_nofault()
fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
Mark a superblock that is for for an R/O or Backup volume as SB_RDONLY when
mounting it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
AFS doesn't really do locking on R/O volumes as fileservers don't maintain
state with each other and thus a lock on a R/O volume file on one
fileserver will not be be visible to someone looking at the same file on
another fileserver.
Further, the server may return an error if you try it.
Fix this by doing what other AFS clients do and handle filelocking on R/O
volume files entirely within the client and don't touch the server.
Fixes: 6c6c1d63c2 ("afs: Provide mount-time configurable byte-range file locking emulation")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Make AFS return error ENOENT if no cell SRV or AFSDB DNS record (or
cellservdb config file record) can be found rather than returning
EDESTADDRREQ.
Also add cell name lookup info to the cursor dump.
Fixes: d5c32c89b2 ("afs: Fix cell DNS lookup")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
When allocating from devices with different durability, we might end up
with more replicas than required; this changes
bch2_alloc_sectors_start() to check for this, and drop replicas that
aren't needed to hit the number of replicas requested.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The btree iterator code overlays keys from the journal until journal
replay is finished; since we're now starting copygc/rebalance etc.
before replay is finished, this is multithreaded access and thus needs
refcounting.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Various userspace scripts/tools may expect mount entries in
/proc/mounts to reflect the device path names used to mount the
associated filesystem. bcachefs seems to normalize the device path
to the underlying device name based on the block device. This
confuses tools like fstests when the test devices might be lvm or
device-mapper based.
The default behavior for show_vfsmnt() appers to be to use the
string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path
at device superblock read time and to display it via
->show_devname().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug where copygc would occasionally race with going
read-write and die, thinking we were read only, because it couldn't take
a ref on c->writes.
It's not necessary for copygc (or rebalance, or copygc) to take write
refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an
easier fix that making sure that flag is passed correctly everywhere.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
copygc no longer has to scan the buckets, so it's no longer a problem if
the number of buckets is changing while it's running.
This also fixes a bug where we forgot to restart copygc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout
while also issueing pending moves as reads complete.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Introduce a new helper to flush all move IOs, and use it in a few places
where we should have been.
The new helper also drops btree locks before waiting on outstanding move
writes, avoiding potential deadlocks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We still have disk space accounting changes coming for erasure coding,
and the changes won't be as strictly backwards compatible as they'd
ought to be - specifically, we need to start accounting striped data
under a separate counter in bch_alloc (which describes buckets).
A fsck will suffice for upgrading/downgrading, but since erasure coding
is the most incomplete major feature of bcachefs it still makes sense to
put behind a separate kconfig option, so that users are fully aware.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a closure in such a way that a closure_fn may also be used as a
workqueue fn by the underlying closure code.
So we have to change closure fns to take a work_struct as their
argument - but that results in a loss of clarity, as closure fns have
different semantics from normal workqueue functions (they run owning a
ref on the closure, which must be released with continue_at() or
closure_return()).
Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros
as suggested by Kees, to smooth things over a bit.
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
ksmbd set ->op_state as OPLOCK_STATE_NONE on lease break ack error.
op_state of lease should not be updated because client can send lease
break ack again. This patch fix smb2.lease.breaking2 test failure.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Directly set SMB2_FLAGS_ASYNC_COMMAND flags and AsyncId in smb2 header of
interim response instead of current response header.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add missing release async id and delete interim response entry after
sending status pending response. This only cause when smb2 lease is enable.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd should process secound parallel smb2 create request during waiting
oplock break ack. parent lock range that is too large in smb2_open() causes
smb2_open() to be serialized. Move the oplock handling to the bottom of
smb2_open() and make it called after parent unlock. This fixes the failure
of smb2.lease.breaking1 testcase.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
xfstests generic/002 test fail when enabling smb2 leases feature.
This test create hard link file, but removeal failed.
ci has a file open count to count file open through the smb client,
but in the case of hard link files, The allocation of ci per inode
cause incorrectly open count for file deletion. This patch allocate
ci per dentry to counts open counts for hard link.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
When allocated memory for 'new' failed,just return
will cause memory leak of 'ar'.
Fixes: 1819a90429 ("ksmbd: reorganize ksmbd_iov_pin_rsp()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202311031837.H3yo7JVl-lkp@intel.com/
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity
checks to verify we found a chunk map and that map found covers the
logical address the caller passed in. However the messages aren't very
clear in the sense that don't mention the issue is with a chunk map and
one of them prints the 'length' argument as if it were the end offset of
the requested range (while the in the string format we use %llu-%llu
which suggests a range, and the second %llu-%llu is actually a range for
the chunk map). So improve these two details in the error messages.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At btrfs_get_chunk_map() we get the extent map for the chunk that contains
the given logical address stored in the 'logical' argument. Then we do
sanity checks to verify the extent map contains the logical address. One
of these checks verifies if the extent map covers a range with an end
offset behind the target logical address - however this check has an
off-by-one error since it will consider an extent map whose start offset
plus its length matches the target logical address as inclusive, while
the fact is that the last byte it covers is behind the target logical
address (by 1).
So fix this condition by using '<=' rather than '<' when comparing the
extent map's "start + length" against the target logical address.
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_ref_tree_mod(), when !parent 're' was allocated through
kmalloc(). In the following code, if an error occurs, the execution will
be redirected to 'out' or 'out_unlock' and the function will be exited.
However, on some of the paths, 're' are not deallocated and may lead to
memory leaks.
For example: lookup_block_entry() for 'be' returns NULL, the out label
will be invoked. During that flow ref and 'ra' are freed but not 're',
which can potentially lead to a memory leak.
CC: stable@vger.kernel.org # 5.10+
Reported-and-tested-by: syzbot+d66de4cbf532749df35f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d66de4cbf532749df35f
Signed-off-by: Bragatheswaran Manickavel <bragathemanick0908@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a feature request to add dmesg output when unmounting a btrfs.
There are several alternative methods to do the same thing, but with
their own problems:
- Use eBPF to watch btrfs_put_super()/open_ctree()
Not end user friendly, they have to dip their head into the source
code.
- Watch for directory /sys/fs/<uuid>/
This is way more simple, but still requires some simple device -> uuid
lookups. And a script needs to use inotify to watch /sys/fs/.
Compared to all these, directly outputting the information into dmesg
would be the most simple one, with both device and UUID included.
And since we're here, also add the output when mounting a filesystem for
the first time for parity. A more fine grained monitoring of subvolume
mounts should be done by another layer, like audit.
Now mounting a btrfs with all default mkfs options would look like this:
[81.906566] BTRFS info (device dm-8): first mount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2
[81.907494] BTRFS info (device dm-8): using crc32c (crc32c-intel) checksum algorithm
[81.908258] BTRFS info (device dm-8): using free space tree
[81.912644] BTRFS info (device dm-8): auto enabling async discard
[81.913277] BTRFS info (device dm-8): checking UUID tree
[91.668256] BTRFS info (device dm-8): last unmount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2
CC: stable@vger.kernel.org # 5.4+
Link: https://github.com/kdave/btrfs-progs/issues/689
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Remove duplicate code and add new helper for creating special files in
SFU (Services for UNIX) format that can be shared by SMB1+ code.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Handle all file types in NFS reparse points as specified in MS-FSCC
2.1.2.6 Network File System (NFS) Reparse Data Buffer.
The client is now able to set all file types based on the parsed NFS
reparse point, which used to support only symlinks. This works for
SMB1+.
Before patch:
$ mount.cifs //srv/share /mnt -o ...
$ ls -l /mnt
ls: cannot access 'block': Operation not supported
ls: cannot access 'char': Operation not supported
ls: cannot access 'fifo': Operation not supported
ls: cannot access 'sock': Operation not supported
total 1
l????????? ? ? ? ? ? block
l????????? ? ? ? ? ? char
-rwxr-xr-x 1 root root 5 Nov 18 23:22 f0
l????????? ? ? ? ? ? fifo
l--------- 1 root root 0 Nov 18 23:23 link -> f0
l????????? ? ? ? ? ? sock
After patch:
$ mount.cifs //srv/share /mnt -o ...
$ ls -l /mnt
total 1
brwxr-xr-x 1 root root 123, 123 Nov 18 00:34 block
crwxr-xr-x 1 root root 1234, 1234 Nov 18 00:33 char
-rwxr-xr-x 1 root root 5 Nov 18 23:22 f0
prwxr-xr-x 1 root root 0 Nov 18 23:23 fifo
lrwxr-xr-x 1 root root 0 Nov 18 23:23 link -> f0
srwxr-xr-x 1 root root 0 Nov 19 2023 sock
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Parse reparse point into cifs_open_info_data structure and feed it
through cifs_open_info_to_fattr().
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reparse points are not limited to symlinks, so implement
->query_reparse_point() in order to handle different file types.
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We were deferencing iface after it has been released. Fix is to
release after all dereference instances have been encountered.
Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202311110815.UJaeU3Tt-lkp@intel.com/
Signed-off-by: Steve French <stfrench@microsoft.com>
Since the locking of the parent->d_inode has been moved outside the
creation of the files and directories (as it use to be locked via a
conditional), add a WARN_ON_ONCE() to the case that it's not locked.
Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The eventfs directory is dynamically created via the meta data supplied by
the existing trace events. All files and directories in eventfs has a
parent. Do not allow NULL to be passed into eventfs_start_creating() as
the parent because that should never happen. Warn if it does.
Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The both create_file_dentry() and create_dir_dentry() takes a boolean
parameter "lookup", as on lookup the inode_lock should already be taken,
but for dcache_dir_open_wrapper() it is not taken.
There's no reason that the dcache_dir_open_wrapper() can't take the
inode_lock before calling these functions. In fact, it's better if it
does, as the lock can be held throughout both directory and file
creations.
This also simplifies the code, and possibly prevents unexpected race
conditions when the lock is released.
Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If memory reclaim happens, it can reclaim file system pages. The file
system pages from eventfs may take the eventfs_mutex on reclaim. This
means that allocation while holding the eventfs_mutex must not call into
filesystem reclaim. A lockdep splat uncovered this.
Link: https://lkml.kernel.org/r/20231121231112.373501894@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 28e12c09f5 ("eventfs: Save ownership and mode")
Fixes: 5790b1fb3d ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When we're recovering ondisk quota records from the log, we need to
validate the recovered buffer contents before writing them to disk.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Since the introduction of xfs_dqblk in V5, xfs really ought to find the
dqblk pointer from the dquot buffer, then compute the xfs_disk_dquot
pointer from the dqblk pointer. Fix the open-coded xfs_buf_offset calls
and do the type checking in the correct order.
Note that this has made no practical difference since the start of the
xfs_disk_dquot is coincident with the start of the xfs_dqblk.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Commit "filemap: update ki_pos in generic_perform_write", made updating
of ki_pos into common code in generic_perform_write() function.
This also causes generic/091 to fail.
This happened due to an in-flight collision with:
fb5de4358e ("ext2: Move direct-io to use iomap"). I have chosen fixes tag
based on which commit got landed later to upstream kernel.
Fixes: 182c25e9c1 ("filemap: update ki_pos in generic_perform_write")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <d595bee9f2475ed0e8a2e7fb94f7afc2c6ffc36a.1700643443.git.ritesh.list@gmail.com>
- Tidy up erofs_read_inode() for simplicity;
- Fix broken fscache mode due to NULL dereference of dif->bdev_handle;
- Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmVaugMRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qojvg//ajFjjAVQwVtyjfni1PwmbMiKtlQ/Brta
mhtfbcgOkR5sInCeuat2C3u0G7bbWISWSCEUEqv3qjjEIMVpZSJq++tctMDFiM9u
kSPgq/TMnbt1tEwRWXiost1o/ijCBBtQRPW2vK3kytZ/PKKLswhf4BrSAYANX/ne
2MGh8RQFwz8mDjBTtQ2mQMOIEb4aHon+RYbgw/pMaV53OiY8DuHIs0GXKYdYPhXA
O5je5xk6dmSBkmxGyfCg8iImq6H+aU2bSi0D62VaTN9aZ11VTpjHU9Ce+Y9mCTVp
OX47mhvrT/b7kR1gpM8hj4gg5moUebRvStoG43LCWAtGWvTEqgT9PlL1WFPdTZAA
QxjdJ8svAsweCliNDuu7U3ZNWgHiMOu2WqtrHMoxR+tfbqbqcvCRkPAHOlFI0gmS
ws2EsM/3uw1I13z0ndQPQTb6x2JHDM60a3/8qhXzambuU87GR8FtN09OPHToNLhQ
odwirLF8FVg+UL+gVnkXVqXkECVSBNaq0eO2lSSWvo2/hq1MLXlsSZvIsiGYICBx
JoCvlezeEkq1VUAn2j7oq18Jr7U5ZnX+jQI6APG4k9XdxL+0ZOPYTnxSnKP+DXom
CA/rWWYWZZVXZIHmYF32JVs3ymBAXBORbZID9Jv/Nucs9MiLrnpVhPxPl+OQLly0
JpvDhDeSyms=
=Ez2+
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Tidy up erofs_read_inode() for simplicity
- Fix broken fscache mode due to NULL dereference of dif->bdev_handle
- Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig
* tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
MAINTAINERS: erofs: add EROFS webpage
erofs: fix NULL dereference of dif->bdev_handle in fscache mode
erofs: simplify erofs_read_inode()
With the call to simple_recursive_removal() on the entire eventfs sub
system when the directory is removed, it performs the d_invalidate on all
the dentries when it is removed. There's no need to do clean ups when a
dentry is being created while the directory is being deleted.
As dentries are cleaned up by the simpler_recursive_removal(), trying to
do d_invalidate() in these functions will cause the dentry to be
invalidated twice, and crash the kernel.
Link: https://lore.kernel.org/all/20231116123016.140576-1-naresh.kamboju@linaro.org/
Link: https://lkml.kernel.org/r/20231120235154.422970988@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 407c6726ca ("eventfs: Use simple_recursive_removal() to clean up dentries")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The logic to free the eventfs_inode (ei) use to set is_freed and clear the
"dentry" field under the eventfs_mutex. But that changed when a race was
found where the ei->dentry needed to be cleared when the last dput() was
called on it. But there was still logic that checked if ei->dentry was not
NULL and is_freed is set, and would warn if it was.
But since that situation was changed and the ei->dentry isn't cleared
until the last dput() is called on it while the ei->is_freed is set, do
not test for that condition anymore, and change the comments to reflect
that.
Link: https://lkml.kernel.org/r/20231120235154.265826243@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 020010fbfa ("eventfs: Delete eventfs_inode when the last dentry is freed")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Update the per-folio stable writes flag dependening on which device an
inode resides on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-5-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Introduce a local boolean variable if FS_XFLAG_REALTIME to make the
checks for it more obvious, and de-densify a few of the conditionals
using it to make them more readable while at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-4-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
folio_wait_stable waits for writeback to finish before modifying the
contents of a folio again, e.g. to support check summing of the data
in the block integrity code.
Currently this behavior is controlled by the SB_I_STABLE_WRITES flag
on the super_block, which means it is uniform for the entire file system.
This is wrong for the block device pseudofs which is shared by all
block devices, or file systems that can use multiple devices like XFS
witht the RT subvolume or btrfs (although btrfs currently reimplements
folio_wait_stable anyway).
Add a per-address_space AS_STABLE_WRITES flag to control the behavior
in a more fine grained way. The existing SB_I_STABLE_WRITES is kept
to initialize AS_STABLE_WRITES to the existing default which covers
most cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-2-hch@lst.de
Tested-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add missing NULL check of root_inode in autofs_fill_super().
While we are at it simplify the logic by taking advantage of the VFS
cleanup procedures and get rid of the goto error handling, as suggested
by Al Viro.
Signed-off-by: Ian Kent <raven@themaw.net>
Link: https://lore.kernel.org/r/20231119225319.331156-1-raven@themaw.net
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Bill O'Donnell <billodo@redhat.com>
Reported-by: <syzbot+662f87a8ef490f45fa64@syzkaller.appspotmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
* Fix deadlock arising due to intent items in AIL not being cleared when log
recovery fails.
* Fix stale data exposure bug when remapping COW fork extents to data fork.
* Fix deadlock when data device flush fails.
* Fix AGFL minimum size calculation.
* Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is
selected.
* Fix corruption of log inode's extent count field when NREXT64 feature is
enabled.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZVNouAAKCRAH7y4RirJu
9O0mAQDePPSRT8ZrR63dxFZ1AW55q4y9iqgBxWcnKEelmVULPwD/byzoAJ46jvcL
qpBHUJ1rUIcd/fGqAEkwfG6hKzD99w8=
=G+60
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix deadlock arising due to intent items in AIL not being cleared
when log recovery fails
- Fix stale data exposure bug when remapping COW fork extents to data
fork
- Fix deadlock when data device flush fails
- Fix AGFL minimum size calculation
- Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is
selected
- Fix corruption of log inode's extent count field when NREXT64 feature
is enabled
* tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: recovery should not clear di_flushiter unconditionally
xfs: inode recovery does not validate the recovered inode
xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
xfs: fix internal error from AGFL exhaustion
xfs: up(ic_sema) if flushing data device fails
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
XFS: Update MAINTAINERS to catch all XFS documentation
xfs: abort intent items when recovery intents fail
xfs: factor out xfs_defer_pending_abort
- Fix several long-standing bugs in the duplicate reply cache
- Fix a memory leak
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmVY6iwACgkQM2qzM29m
f5fDqhAAnsHqZNG2I6asqh/5pPoqcp7kXkEe1l+6jr4jz/00R0lg+oLbsC6/S6eY
tzGVkQtIkl2OpL8lt4JTgUL/xiO2JaqbdiIuHelnT62l97r7kbKWJ0ALHahLafiX
hCQdJWOdud86kZ5x6/cYVfqyO08bhqDLUFvWd7zSLmzW/9U3bG4v6yXvHT3qqnnE
dCJtuM9+DUfnDJKHe6+BFIobkyta8+Tpsg4QSgSAu4hg+dTcqtPCOxMeT+YwgNQd
uZY1xiIjPLkufsBF86xzC3tyoFNaZc5QhIwv7ZBtmzNUw3906SbuST9hJiWeHeWq
m7p0YeWDJrygiyIrvYxv6NDUCqnkoOxbuKTUAniTEj1SsE2gcQCfij0bU1OSRk6r
CpT8TdJ6j2zP78+xxMsSIiA/gR7uJtCKs7LABru7DX25+sDkKK2Te9+PXINarJ1k
fraeDeuXMQ1cu71WRXUJ3QKGn1/bC8DYGHFVQqcB+gVXqcm3BuKNYGt01nFyCp5s
+1jL+lRxUjPydI2J1VKw5g+5jW9rBgLOT0T9xlr+TZDnYADBAakwpbujrBH5Ey+W
BswGoNzqsW9B0U4N6o8OP0FA6IxcddX6Uan2czienmqj217w4AaPPT/vuA5EyC9W
zNzBAi87rnaqQs4LnWiS09K+RvYIedQl1lJIzwIdQZfnMKmCSEI=
=GOU4
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix several long-standing bugs in the duplicate reply cache
- Fix a memory leak
* tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix checksum mismatches in the duplicate reply cache
NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()
NFSD: Update nfsd_cache_append() to use xdr_stream
nfsd: fix file memleak on client_opens_release
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVXyvsACgkQiiy9cAdy
T1FuyQv/aPFI4XdIYwneZT0VRIxKtZgmek2SRfA+U3fiMNnBG90SqzYzswgkJqHZ
vLdjGcwDXR0M2S9zf74lDtzqhfyGvf7d+YCwQ+vXTmhWAcneYM7w+AtFjD88rLAr
GjS4oUM/BeZQ9nyPNTibueJld2cXXXSkGjRP/vu4RmsVWDzMJjlSOe+ZG0FBr32a
x8JvCOtvUmIJ1uY4uwsDtA1uUpgq0QEO1pi+mlcn3tMxPpIypVzdWwnbex0XR4BO
hzRcGJDAi6g4uQ43A5a9ypRN02zaX/PXbPg6IgLXlYm4Oce9um1MmrqAssVnGCXZ
FaKMSxxnoQXjNW8Oxt0/RvWo2cHbUNPn6pq/Pvhj8FWq6AT+PZWW5JQy673ZhxWK
WVy7L5Y1R4BDDceIrlJRb+8WOaP+sprgsWZI0WsOCBvrI9uoTSqqXjy+fhTC/6zi
HZfC7kFHDh2jpbUFdBUt3ChIW2RCuowj2XEN3GrZr495vSLahokitl2grYbb16U9
squwsK1A
=idlR
-----END PGP SIGNATURE-----
Merge tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- multichannel fixes (including a lock ordering fix and an important
refcounting fix)
- spnego fix
* tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix lock ordering while disabling multichannel
cifs: fix leak of iface for primary channel
cifs: fix check of rc in function generate_smb3signingkey
cifs: spnego: add ';' in HOST_KEY_LEN
When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.
In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current->fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.
Reported-by: <syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com>
Fixes: db1d1e8b98 ("IMA: use vfs_getattr_nosec to get the i_version")
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Tyler Hicks <code@tyhicks.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Suggested-by: Christian Brauner <brauner@kernel.org>
Co-developed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Lots of small fixes for minor nits and compiler warnings. Bigger items:
- The six locks lost wakeup is finally fixed: six_read_trylock() was
checking for the waiting bit before decrementing the number of
readers - validated the fix with a torture test.
- Fix for a memory reclaim issue: when needing to reallocate a key
cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL
dance.
- Multiple deleted inodes btree fixes
- Fix an issue in fsck, where i_nlink would be recalculated incorrectly
for hardlinked files if a snapshot had ever been taken.
- Kill journal pre-reservations: This is a bigger patch than I would
normally send at this point, but it deletes code and it fixes some of
our tests that would sporadically die with the journal getting stuck,
and it's a performance improvement, too.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmVX6YUACgkQE6szbY3K
bnbFTBAAmtGYPXXLUZrr6uZc2Kp/a/v86FUatkrmXS4wVydQSBVhbo6CdxTAAYbx
C1K/+WG5MIfkFaVNLQvIXjGECFPffRmA/xtgfjaHzXvUnG74sLGrsOW7eBvkEd3j
8sEPjfpp39LSLIWs9GbqP3kNF2ax6kZ4Y+kTod4PZB14S4VrXRmd2ZGjJ2VARsig
ygFi+qOBJ0ojZHo7VSOzGRxcGNvVmbr76vOBuqEwR9PYnT3JKH2I+ANFdc1YLnAH
mMr/nwTMzuLcN6FUSnPBoX+x1WyPaXDbBMNLBSoVBcpP6X/DqcNIr9yvkUGqT6uR
cxW6oOten9M+JSbGALnTrQjc9Khug5SqjvTJ1fYl4NELocvapWvBdHtICwEpYl9F
REeTGqTHMb8j4VJDl+JcP9cPbDEVa6TGa4SXB4NfF70MZf0y7HR2Y/ms0bi47+Zb
IxLlbhFZUiqbnsBx+jPhuKD86mijjEbmjPWkqcsWG/olg8Sdor4LdU4CbaGEB1fn
oSfMnwb5fKI4fFPVMSREJ2ktpeb1DUmvkdbt5klYTL4DBK2GIgGdYlnVXg9Z5AAY
kz6fvJE0PhWw8cDb4ClGo6ZYffmlX6m4LoX0q3C1O0Wt+Q4av2/vtLuAMKXz38Y4
zzw+JO/h0X9ECJdFPPsTq2sg7cWo7oFVrO3+ZQ0dTfKKtAPUwWY=
=7zgm
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of small fixes for minor nits and compiler warnings.
Bigger items:
- The six locks lost wakeup is finally fixed: six_read_trylock() was
checking for the waiting bit before decrementing the number of
readers - validated the fix with a torture test.
- Fix for a memory reclaim issue: when needing to reallocate a key
cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL
dance.
- Multiple deleted inodes btree fixes
- Fix an issue in fsck, where i_nlink would be recalculated
incorrectly for hardlinked files if a snapshot had ever been taken.
- Kill journal pre-reservations: This is a bigger patch than I would
normally send at this point, but it deletes code and it fixes some
of our tests that would sporadically die with the journal getting
stuck, and it's a performance improvement, too"
* tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs: (22 commits)
bcachefs: Fix missing locking for dentry->d_parent access
bcachefs: six locks: Fix lost wakeup
bcachefs: Fix no_data_io mode checksum check
bcachefs: Fix bch2_check_nlinks() for snapshots
bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y
bcachefs: Disable debug log statements
bcachefs: Fix missing transaction commit
bcachefs: Fix error path in bch2_mount()
bcachefs: Fix potential sleeping during mount
bcachefs: Fix iterator leak in may_delete_deleted_inode()
bcachefs: Kill journal pre-reservations
bcachefs: Check for nonce offset inconsistency in data_update path
bcachefs: Make sure to drop/retake btree locks before reclaim
bcachefs: btree_trans->write_locked
bcachefs: Run btree key cache shrinker less aggressively
bcachefs: Split out btree_key_cache_types.h
bcachefs: Guard against insufficient devices to create stripes
bcachefs: Fix null ptr deref in bch2_backpointer_get_node()
bcachefs: Fix multiple -Warray-bounds warnings
bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds warnings
...
nfsd_cache_csum() currently assumes that the server's RPC layer has
been advancing rq_arg.head[0].iov_base as it decodes an incoming
request, because that's the way it used to work. On entry, it
expects that buf->head[0].iov_base points to the start of the NFS
header, and excludes the already-decoded RPC header.
These days however, head[0].iov_base now points to the start of the
RPC header during all processing. It no longer points at the NFS
Call header when execution arrives at nfsd_cache_csum().
In a retransmitted RPC the XID and the NFS header are supposed to
be the same as the original message, but the contents of the
retransmitted RPC header can be different. For example, for krb5,
the GSS sequence number will be different between the two. Thus if
the RPC header is always included in the DRC checksum computation,
the checksum of the retransmitted message might not match the
checksum of the original message, even though the NFS part of these
messages is identical.
The result is that, even if a matching XID is found in the DRC,
the checksum mismatch causes the server to execute the
retransmitted RPC transaction again.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The "statp + 1" pointer that is passed to nfsd_cache_update() is
supposed to point to the start of the egress NFS Reply header. In
fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
But both krb5i and krb5p add fields between the RPC header's
accept_stat field and the start of the NFS Reply header. In those
cases, "statp + 1" points at the extra fields instead of the Reply.
The result is that nfsd_cache_update() caches what looks to the
client like garbage.
A connection break can occur for a number of reasons, but the most
common reason when using krb5i/p is a GSS sequence number window
underrun. When an underrun is detected, the server is obliged to
drop the RPC and the connection to force a retransmit with a fresh
GSS sequence number. The client presents the same XID, it hits in
the server's DRC, and the server returns the garbage cache entry.
The "statp + 1" argument has been used since the oldest changeset
in the kernel history repo, so it has been in nfsd_dispatch()
literally since before history began. The problem arose only when
the server-side GSS implementation was added twenty years ago.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
When inserting a DRC-cached response into the reply buffer, ensure
that the reply buffer's xdr_stream is updated properly. Otherwise
the server will send a garbage response.
Cc: stable@vger.kernel.org # v6.3+
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
seq_release should be called to free the allocated seq_file
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Fixes: 78599c42ae ("nfsd4: add file to display list of client's opens")
Reviewed-by: NeilBrown <neilb@suse.de>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmVXZFQACgkQEVvVyTe/
1Wr/hQ//YofnLzFuE172QmfiYLQYAnBJONql47Hs32g+9zGjw7ev6tVbwEwuduY9
23lktlJthJO15+L8mfG3ECqpV7KfdBfuipjI6nO9V/Br7YEdHtDgk7jUqFWoADUA
tEtXqjk8cqkWc4+6XFKHYeN04Nd0tvRIFmtW90gIxANE/AZxiPcCGKIKfqKgDLZ2
0IbN7yeJASc7XCtcjl9uldvhgmltpu1xX3IETsKOtLh1H8J3+DSI/5K7kQ4if5q/
6Hi3+6Qf3aTqyaqG6z8RVhbwvrRWFNvaUWpjW5F1sBpNddtq8ioHmqX4L3Caybsw
ukitshGj59MfmNnirxryO8MXv4RwqOAZFQc7ZfQhL6RzEO6WiqNybQ112SOh25E+
NsKSy4vhCiH3ifGQC8LZtdeWmcPS/5vPUMv81w7P6Y/VWZImQQ04kf1akSr9/iBX
KCLFhYb8lKu+pBHFEZkYrdTDbIby+7QKraIi9hC2RsfFiIfvHn4Y1AtUt9M145va
vBTF/7y8t5VhftMhP77ZUvREwIMrzcBJtqIH8J5XoT6EkxlGCV5ft9el20VyXYia
tkWSzW9dQzGG+eGtdSX490MQMlZs7yN0SzyP0rUrZ2LMycwmMX976ssXtnp4NWBM
sAnHbZMS1eXwK57WP9gOXiKLZ7sMWj03NzYK+cITL6Gttq/bAdw=
=OtFU
-----END PGP SIGNATURE-----
Merge tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs fixes from Amir Goldstein:
"A fix to an overlayfs param parsing bug and a misformatted comment"
* tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fix memory leak in ovl_parse_param()
ovl: fix misformatted comment
After commit 1c7f49a767 ("erofs: tidy up EROFS on-disk naming"),
there is a unique `union erofs_inode_i_u` so that we could parse
the union directly.
Besides, it also replaces `inode->i_sb` with `sb` for simplicity.
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20231109111822.17944-1-mengferry@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
When kafs tries to look up a cell in the DNS or the local config, it will
translate a lookup failure into EDESTADDRREQ whereas OpenAFS translates it
into ENOENT. Applications such as West expect the latter behaviour and
fail if they see the former.
This can be seen by trying to mount an unknown cell:
# mount -t afs %example.com:cell.root /mnt
mount: /mnt: mount(2) system call failed: Destination address required.
Fixes: 4d673da145 ("afs: Support the AFS dynamic root")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
afs_server_list is accessed with the rcu_read_lock() held from
volume->servers, so it needs to be cleaned up correctly.
Fix this by using kfree_rcu() instead of kfree().
Fixes: 8a070a9648 ("afs: Detect cell aliases 1 - Cells with root volumes")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
[BUG]
Syzbot reported a regression that after commit 6ed05643dd ("btrfs:
create qgroup earlier in snapshot creation") we can trigger transaction
abort during snapshot creation:
BTRFS: Transaction aborted (error -17)
WARNING: CPU: 0 PID: 5057 at fs/btrfs/transaction.c:1778 create_pending_snapshot+0x25f4/0x2b70 fs/btrfs/transaction.c:1778
Modules linked in:
CPU: 0 PID: 5057 Comm: syz-executor225 Not tainted 6.6.0-syzkaller-15365-g305230142ae0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
RIP: 0010:create_pending_snapshot+0x25f4/0x2b70 fs/btrfs/transaction.c:1778
Call Trace:
<TASK>
create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1967
btrfs_commit_transaction+0xf1c/0x3730 fs/btrfs/transaction.c:2440
create_snapshot+0x4a5/0x7e0 fs/btrfs/ioctl.c:845
btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:995
btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1041
__btrfs_ioctl_snap_create+0x344/0x460 fs/btrfs/ioctl.c:1294
btrfs_ioctl_snap_create+0x13c/0x190 fs/btrfs/ioctl.c:1321
btrfs_ioctl+0xbbf/0xd40
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f2f791127b9
</TASK>
[CAUSE]
The error number is -EEXIST, which can happen for qgroup if there is
already an existing qgroup and then we're trying to create a snapshot
for it.
[FIX]
In that case, we can continue creating the snapshot, although it may
lead to qgroup inconsistency, it's not so critical to abort the current
transaction.
So in this case, we can just ignore the non-critical errors, mostly -EEXIST
(there is already a qgroup).
Reported-by: syzbot+4d81015bc10889fd12ea@syzkaller.appspotmail.com
Fixes: 6ed05643dd ("btrfs: create qgroup earlier in snapshot creation")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In percpu reader mode, trylock() for read had a lost wakeup: on failure
to get the lock, we may have caused a writer to fail to get the lock,
because we temporarily elevated the reader count.
We need to check for waiters after decrementing the read count - not
before.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In no_data_io mode, we expect data checksums to be wrong - don't want to
spew the log with them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When searching the link table for the matching inode, we were searching
for a specific - incorrect - snapshot ID as well, causing us to fail to
find the inode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Running with fewer max btree paths doesn't work anymore when replication
is enabled - as we've added e.g. the freespace and bucket gens btrees,
we naturally end up needing more btree paths.
This is an issue with lockdep, we end up taking more locks than lockdep
will track (the MAX_LOCKD_DEPTH constant). But bcachefs as merged does
not yet support lockdep anyways, so we can leave that for later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The journal read path had some informational log statements preperatory
for ZNS support - they're not of interest to users, so we can turn them
off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In may_delete_deleted_inode(), there's a corner case when a snapshot was
taken while we had an unlinked inode: we don't want to delete the inode
in the internal (shared) snapshot node, since it might have been
reattached in a descendent snapshot.
Instead we propagate the key to any snapshot leaves it doesn't exist in,
so that it can be deleted there if necessary, and then clear the
unlinked flag in the internal node.
But we forgot to commit after clearing the unlinked flag, causing us to
go into an infinite loop.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>