linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 20:22:09 +00:00

Author	SHA1	Message	Date
Jann Horn	0ac1d13a55	btrfs: send: ensure send_fd is writable kernel_write() requires the caller to ensure that the file is writable. Let's do that directly after looking up the ->send_fd. We don't need a separate bailout path because the "out" path already does fput() if ->send_filp is non-NULL. This has no security impact for two reasons: - the ioctl requires CAP_SYS_ADMIN - __kernel_write() bails out on read-only files - but only since 5.8, see commit `a01ac27be4` ("fs: check FMODE_WRITE in __kernel_write") Reported-and-tested-by: syzbot+12e098239d20385264d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=12e098239d20385264d3 Fixes: `31db9f7c23` ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-24 18:50:53 +01:00
Qu Wenruo	94dbf7c087	btrfs: free the allocated memory if btrfs_alloc_page_array() fails [BUG] If btrfs_alloc_page_array() fail to allocate all pages but part of the slots, then the partially allocated pages would be leaked in function btrfs_submit_compressed_read(). [CAUSE] As explicitly stated, if btrfs_alloc_page_array() returned -ENOMEM, caller is responsible to free the partially allocated pages. For the existing call sites, most of them are fine: - btrfs_raid_bio::stripe_pages Handled by free_raid_bio(). - extent_buffer::pages[] Handled btrfs_release_extent_buffer_pages(). - scrub_stripe::pages[] Handled by release_scrub_stripe(). But there is one exception in btrfs_submit_compressed_read(), if btrfs_alloc_page_array() failed, we didn't cleanup the array and freed the array pointer directly. Initially there is still the error handling in commit `dd137dd1f2` ("btrfs: factor out allocating an array of pages"), but later in commit `544fe4a903` ("btrfs: embed a btrfs_bio into struct compressed_bio"), the error handling is removed, leading to the possible memory leak. [FIX] This patch would add back the error handling first, then to prevent such situation from happening again, also Make btrfs_alloc_page_array() to free the allocated pages as a extra safety net, then we don't need to add the error handling to btrfs_submit_compressed_read(). Fixes: `544fe4a903` ("btrfs: embed a btrfs_bio into struct compressed_bio") CC: stable@vger.kernel.org # 6.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-24 18:50:49 +01:00
David Sterba	5de0434bc0	btrfs: fix 64bit compat send ioctl arguments not initializing version member When the send protocol versioning was added in 5.16 `e77fbf9903` ("btrfs: send: prepare for v2 protocol"), the 32/64bit compat code was not updated (added by `2351f431f7` ("btrfs: fix send ioctl on 32bit with 64bit kernel")), missing the version struct member. The compat code is probably rarely used, nobody reported any bugs. Found by tool https://github.com/jirislaby/clang-struct . Fixes: `e77fbf9903` ("btrfs: send: prepare for v2 protocol") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-24 18:50:42 +01:00
Linus Torvalds	fa2b906f51	vfs-6.7-rc3.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZWBq0gAKCRCRxhvAZXjc ot4EAP48O5ExMtQ3/AIkNDo+/9/Iz4g7bE1HYmdyiMPO3Ou/uwEAySwBXRJrFAsS 9omvkEdqrfyguW0xgoYwcxBdATVHnAE= =ScR3 -----END PGP SIGNATURE----- Merge tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Avoid calling back into LSMs from vfs_getattr_nosec() calls. IMA used to query inode properties accessing raw inode fields without dedicated helpers. That was finally fixed a few releases ago by forcing IMA to use vfs_getattr_nosec() helpers. The goal of the vfs_getattr_nosec() helper is to query for attributes without calling into the LSM layer which would be quite problematic because incredibly IMA is called from __fput()... __fput() -> ima_file_free() What it does is to call back into the filesystem to update the file's IMA xattr. Querying the inode without using vfs_getattr_nosec() meant that IMA didn't handle stacking filesystems such as overlayfs correctly. So the switch to vfs_getattr_nosec() is quite correct. But the switch to vfs_getattr_nosec() revealed another bug when used on stacking filesystems: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() -> security_inode_getattr() # calls back into LSMs Now, if that __fput() happens from task_work_run() of an exiting task current->fs and various other pointer could already be NULL. So anything in the LSM layer relying on that not being NULL would be quite surprised. Fix that by passing the information that this is a security request through to the stacking filesystem by adding a new internal ATT_GETATTR_NOSEC flag. Now the callchain becomes: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> if (AT_GETATTR_NOSEC) vfs_getattr_nosec() else vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() - Fix a bug introduced with the iov_iter rework from last cycle. This broke /proc/kcore by copying too much and without the correct offset. - Add a missing NULL check when allocating the root inode in autofs_fill_super(). - Fix stable writes for multi-device filesystems (xfs, btrfs etc) and the block device pseudo filesystem. Stable writes used to be a superblock flag only, making it a per filesystem property. Add an additional AS_STABLE_WRITES mapping flag to allow for fine-grained control. - Ensure that offset_iterate_dir() returns 0 after reaching the end of a directory so it adheres to getdents() convention. * tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: libfs: getdents() should return 0 after reaching EOD xfs: respect the stable writes flag on the RT device xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags block: update the stable_writes flag in bdev_add filemap: add a per-mapping stable writes flag autofs: add: new_inode check in autofs_fill_super() iov_iter: fix copy_page_to_iter_nofault() fs: Pass AT_GETATTR_NOSEC flag to getattr interface function	2023-11-24 09:45:40 -08:00
David Howells	68516f60c1	afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY Mark a superblock that is for for an R/O or Backup volume as SB_RDONLY when mounting it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2023-11-24 14:52:24 +00:00
David Howells	b590eb41be	afs: Fix file locking on R/O volumes to operate in local mode AFS doesn't really do locking on R/O volumes as fileservers don't maintain state with each other and thus a lock on a R/O volume file on one fileserver will not be be visible to someone looking at the same file on another fileserver. Further, the server may return an error if you try it. Fix this by doing what other AFS clients do and handle filelocking on R/O volume files entirely within the client and don't touch the server. Fixes: `6c6c1d63c2` ("afs: Provide mount-time configurable byte-range file locking emulation") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2023-11-24 14:52:01 +00:00
David Howells	0167236e7d	afs: Return ENOENT if no cell DNS record can be found Make AFS return error ENOENT if no cell SRV or AFSDB DNS record (or cellservdb config file record) can be found rather than returning EDESTADDRREQ. Also add cell name lookup info to the cursor dump. Fixes: `d5c32c89b2` ("afs: Fix cell DNS lookup") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2023-11-24 14:51:18 +00:00
Kent Overstreet	0af8a06a4c	bcachefs: deallocate_extra_replicas() When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 03:03:47 -05:00
Kent Overstreet	8a443d3ea1	bcachefs: Proper refcounting for journal_keys The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:43:12 -05:00
Brian Foster	63807d9518	bcachefs: preserve device path as device name Various userspace scripts/tools may expect mount entries in /proc/mounts to reflect the device path names used to mount the associated filesystem. bcachefs seems to normalize the device path to the underlying device name based on the block device. This confuses tools like fstests when the test devices might be lvm or device-mapper based. The default behavior for show_vfsmnt() appers to be to use the string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path at device superblock read time and to display it via ->show_devname(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:42:07 -05:00
Kent Overstreet	0a11adfb7a	bcachefs: Fix an endianness conversion cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:42:07 -05:00
Kent Overstreet	468035ca4b	bcachefs: Start gc, copygc, rebalance threads after initing writes ref This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:30:32 -05:00
Kent Overstreet	202a7c292e	bcachefs: Don't stop copygc thread on device resize copygc no longer has to scan the buckets, so it's no longer a problem if the number of buckets is changing while it's running. This also fixes a bug where we forgot to restart copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:30:32 -05:00
Kent Overstreet	261af2f1c6	bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout while also issueing pending moves as reads complete. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:10:28 -05:00
Kent Overstreet	50e029c639	bcachefs: bch2_moving_ctxt_flush_all() Introduce a new helper to flush all move IOs, and use it in a few places where we should have been. The new helper also drops btree locks before waiting on outstanding move writes, avoiding potential deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 02:08:25 -05:00
Kent Overstreet	6201d91ee3	bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig option We still have disk space accounting changes coming for erasure coding, and the changes won't be as strictly backwards compatible as they'd ought to be - specifically, we need to start accounting striped data under a separate counter in bch_alloc (which describes buckets). A fsck will suffice for upgrading/downgrading, but since erasure coding is the most incomplete major feature of bcachefs it still makes sense to put behind a separate kconfig option, so that users are fully aware. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 00:29:58 -05:00
Kent Overstreet	d4e3b928ab	closures: CLOSURE_CALLBACK() to fix type punning Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 00:29:58 -05:00
Namjae Jeon	cd80ce7e68	ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error ksmbd set ->op_state as OPLOCK_STATE_NONE on lease break ack error. op_state of lease should not be updated because client can send lease break ack again. This patch fix smb2.lease.breaking2 test failure. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Namjae Jeon	9ac45ac7cf	ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId Directly set SMB2_FLAGS_ASYNC_COMMAND flags and AsyncId in smb2 header of interim response instead of current response header. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Namjae Jeon	2a3f7857ec	ksmbd: release interim response after sending status pending response Add missing release async id and delete interim response entry after sending status pending response. This only cause when smb2 lease is enable. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Namjae Jeon	2e450920d5	ksmbd: move oplock handling after unlock parent dir ksmbd should process secound parallel smb2 create request during waiting oplock break ack. parent lock range that is too large in smb2_open() causes smb2_open() to be serialized. Move the oplock handling to the bottom of smb2_open() and make it called after parent unlock. This fixes the failure of smb2.lease.breaking1 testcase. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Namjae Jeon	4274a9dc6a	ksmbd: separately allocate ci per dentry xfstests generic/002 test fail when enabling smb2 leases feature. This test create hard link file, but removeal failed. ci has a file open count to count file open through the smb client, but in the case of hard link files, The allocation of ci per inode cause incorrectly open count for file deletion. This patch allocate ci per dentry to counts open counts for hard link. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Namjae Jeon	864fb5d371	ksmbd: fix possible deadlock in smb2_open [ 8743.393379] ====================================================== [ 8743.393385] WARNING: possible circular locking dependency detected [ 8743.393391] 6.4.0-rc1+ #11 Tainted: G OE [ 8743.393397] ------------------------------------------------------ [ 8743.393402] kworker/0:2/12921 is trying to acquire lock: [ 8743.393408] ffff888127a14460 (sb_writers#8){.+.+}-{0:0}, at: ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393510] but task is already holding lock: [ 8743.393515] ffff8880360d97f0 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: ksmbd_vfs_kern_path_locked+0x181/0x670 [ksmbd] [ 8743.393618] which lock already depends on the new lock. [ 8743.393623] the existing dependency chain (in reverse order) is: [ 8743.393628] -> #1 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}: [ 8743.393648] down_write_nested+0x9a/0x1b0 [ 8743.393660] filename_create+0x128/0x270 [ 8743.393670] do_mkdirat+0xab/0x1f0 [ 8743.393680] __x64_sys_mkdir+0x47/0x60 [ 8743.393690] do_syscall_64+0x5d/0x90 [ 8743.393701] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 8743.393711] -> #0 (sb_writers#8){.+.+}-{0:0}: [ 8743.393728] __lock_acquire+0x2201/0x3b80 [ 8743.393737] lock_acquire+0x18f/0x440 [ 8743.393746] mnt_want_write+0x5f/0x240 [ 8743.393755] ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393839] ksmbd_vfs_set_dos_attrib_xattr+0xcc/0x110 [ksmbd] [ 8743.393924] compat_ksmbd_vfs_set_dos_attrib_xattr+0x39/0x50 [ksmbd] [ 8743.394010] smb2_open+0x3432/0x3cc0 [ksmbd] [ 8743.394099] handle_ksmbd_work+0x2c9/0x7b0 [ksmbd] [ 8743.394187] process_one_work+0x65a/0xb30 [ 8743.394198] worker_thread+0x2cf/0x700 [ 8743.394209] kthread+0x1ad/0x1f0 [ 8743.394218] ret_from_fork+0x29/0x50 This patch add mnt_want_write() above parent inode lock and remove nested mnt_want_write calls in smb2_open(). Fixes: `40b268d384` ("ksmbd: add mnt_want_write to ksmbd vfs functions") Cc: stable@vger.kernel.org Reported-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:45 -06:00
Zongmin Zhou	90044481e7	ksmbd: prevent memory leak on error return When allocated memory for 'new' failed,just return will cause memory leak of 'ar'. Fixes: `1819a90429` ("ksmbd: reorganize ksmbd_iov_pin_rsp()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311031837.H3yo7JVl-lkp@intel.com/ Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 20:50:44 -06:00
Filipe Manana	7d410d5efe	btrfs: make error messages more clear when getting a chunk map When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity checks to verify we found a chunk map and that map found covers the logical address the caller passed in. However the messages aren't very clear in the sense that don't mention the issue is with a chunk map and one of them prints the 'length' argument as if it were the end offset of the requested range (while the in the string format we use %llu-%llu which suggests a range, and the second %llu-%llu is actually a range for the chunk map). So improve these two details in the error messages. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-23 22:27:46 +01:00
Filipe Manana	5fba5a5718	btrfs: fix off-by-one when checking chunk map includes logical address At btrfs_get_chunk_map() we get the extent map for the chunk that contains the given logical address stored in the 'logical' argument. Then we do sanity checks to verify the extent map contains the logical address. One of these checks verifies if the extent map covers a range with an end offset behind the target logical address - however this check has an off-by-one error since it will consider an extent map whose start offset plus its length matches the target logical address as inclusive, while the fact is that the last byte it covers is behind the target logical address (by 1). So fix this condition by using '<=' rather than '<' when comparing the extent map's "start + length" against the target logical address. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-23 22:27:42 +01:00
Bragatheswaran Manickavel	f91192cd68	btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod() In btrfs_ref_tree_mod(), when !parent 're' was allocated through kmalloc(). In the following code, if an error occurs, the execution will be redirected to 'out' or 'out_unlock' and the function will be exited. However, on some of the paths, 're' are not deallocated and may lead to memory leaks. For example: lookup_block_entry() for 'be' returns NULL, the out label will be invoked. During that flow ref and 'ra' are freed but not 're', which can potentially lead to a memory leak. CC: stable@vger.kernel.org # 5.10+ Reported-and-tested-by: syzbot+d66de4cbf532749df35f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d66de4cbf532749df35f Signed-off-by: Bragatheswaran Manickavel <bragathemanick0908@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-23 22:27:34 +01:00
Qu Wenruo	2db313205f	btrfs: add dmesg output for first mount and last unmount of a filesystem There is a feature request to add dmesg output when unmounting a btrfs. There are several alternative methods to do the same thing, but with their own problems: - Use eBPF to watch btrfs_put_super()/open_ctree() Not end user friendly, they have to dip their head into the source code. - Watch for directory /sys/fs/<uuid>/ This is way more simple, but still requires some simple device -> uuid lookups. And a script needs to use inotify to watch /sys/fs/. Compared to all these, directly outputting the information into dmesg would be the most simple one, with both device and UUID included. And since we're here, also add the output when mounting a filesystem for the first time for parity. A more fine grained monitoring of subvolume mounts should be done by another layer, like audit. Now mounting a btrfs with all default mkfs options would look like this: [81.906566] BTRFS info (device dm-8): first mount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 [81.907494] BTRFS info (device dm-8): using crc32c (crc32c-intel) checksum algorithm [81.908258] BTRFS info (device dm-8): using free space tree [81.912644] BTRFS info (device dm-8): auto enabling async discard [81.913277] BTRFS info (device dm-8): checking UUID tree [91.668256] BTRFS info (device dm-8): last unmount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/kdave/btrfs-progs/issues/689 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-23 22:27:26 +01:00
Paulo Alcantara	b0348e459c	smb: client: introduce cifs_sfu_make_node() Remove duplicate code and add new helper for creating special files in SFU (Services for UNIX) format that can be shared by SMB1+ code. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 11:46:05 -06:00
Paulo Alcantara	45e724022e	smb: client: set correct file type from NFS reparse points Handle all file types in NFS reparse points as specified in MS-FSCC 2.1.2.6 Network File System (NFS) Reparse Data Buffer. The client is now able to set all file types based on the parsed NFS reparse point, which used to support only symlinks. This works for SMB1+. Before patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt ls: cannot access 'block': Operation not supported ls: cannot access 'char': Operation not supported ls: cannot access 'fifo': Operation not supported ls: cannot access 'sock': Operation not supported total 1 l????????? ? ? ? ? ? block l????????? ? ? ? ? ? char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 l????????? ? ? ? ? ? fifo l--------- 1 root root 0 Nov 18 23:23 link -> f0 l????????? ? ? ? ? ? sock After patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt total 1 brwxr-xr-x 1 root root 123, 123 Nov 18 00:34 block crwxr-xr-x 1 root root 1234, 1234 Nov 18 00:33 char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 prwxr-xr-x 1 root root 0 Nov 18 23:23 fifo lrwxr-xr-x 1 root root 0 Nov 18 23:23 link -> f0 srwxr-xr-x 1 root root 0 Nov 19 2023 sock Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 11:44:55 -06:00
Paulo Alcantara	539aad7f14	smb: client: introduce ->parse_reparse_point() Parse reparse point into cifs_open_info_data structure and feed it through cifs_open_info_to_fattr(). Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 11:44:42 -06:00
Paulo Alcantara	ed3e0a149b	smb: client: implement ->query_reparse_point() for SMB1 Reparse points are not limited to symlinks, so implement ->query_reparse_point() in order to handle different file types. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 11:44:31 -06:00
Ritvik Budhiraja	a15ccef82d	cifs: fix use after free for iface while disabling secondary channels We were deferencing iface after it has been released. Fix is to release after all dereference instances have been encountered. Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311110815.UJaeU3Tt-lkp@intel.com/ Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-23 11:42:55 -06:00
Steven Rostedt (Google)	f49f950c21	eventfs: Make sure that parent->d_inode is locked in creating files/dirs Since the locking of the parent->d_inode has been moved outside the creation of the files and directories (as it use to be locked via a conditional), add a WARN_ON_ONCE() to the case that it's not locked. Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-22 18:37:33 -05:00
Steven Rostedt (Google)	fc4561226f	eventfs: Do not allow NULL parent to eventfs_start_creating() The eventfs directory is dynamically created via the meta data supplied by the existing trace events. All files and directories in eventfs has a parent. Do not allow NULL to be passed into eventfs_start_creating() as the parent because that should never happen. Warn if it does. Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-22 18:37:26 -05:00
Steven Rostedt (Google)	bcae32c563	eventfs: Move taking of inode_lock into dcache_dir_open_wrapper() The both create_file_dentry() and create_dir_dentry() takes a boolean parameter "lookup", as on lookup the inode_lock should already be taken, but for dcache_dir_open_wrapper() it is not taken. There's no reason that the dcache_dir_open_wrapper() can't take the inode_lock before calling these functions. In fact, it's better if it does, as the lock can be held throughout both directory and file creations. This also simplifies the code, and possibly prevents unexpected race conditions when the lock is released. Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-22 18:37:05 -05:00
Steven Rostedt (Google)	4763d635c9	eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held If memory reclaim happens, it can reclaim file system pages. The file system pages from eventfs may take the eventfs_mutex on reclaim. This means that allocation while holding the eventfs_mutex must not call into filesystem reclaim. A lockdep splat uncovered this. Link: https://lkml.kernel.org/r/20231121231112.373501894@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `28e12c09f5` ("eventfs: Save ownership and mode") Fixes: `5790b1fb3d` ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-22 18:36:35 -05:00
Darrick J. Wong	9c235dfc3d	xfs: dquot recovery does not validate the recovered dquot When we're recovering ondisk quota records from the log, we need to validate the recovered buffer contents before writing them to disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-22 23:39:36 +05:30
Darrick J. Wong	ed17f7da5f	xfs: clean up dqblk extraction Since the introduction of xfs_dqblk in V5, xfs really ought to find the dqblk pointer from the dquot buffer, then compute the xfs_disk_dquot pointer from the dqblk pointer. Fix the open-coded xfs_buf_offset calls and do the type checking in the correct order. Note that this has made no practical difference since the start of the xfs_disk_dquot is coincident with the start of the xfs_dqblk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-22 23:39:27 +05:30
Ritesh Harjani (IBM)	8abc712ea4	ext2: Fix ki_pos update for DIO buffered-io fallback case Commit "filemap: update ki_pos in generic_perform_write", made updating of ki_pos into common code in generic_perform_write() function. This also causes generic/091 to fail. This happened due to an in-flight collision with: `fb5de4358e` ("ext2: Move direct-io to use iomap"). I have chosen fixes tag based on which commit got landed later to upstream kernel. Fixes: `182c25e9c1` ("filemap: update ki_pos in generic_perform_write") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <d595bee9f2475ed0e8a2e7fb94f7afc2c6ffc36a.1700643443.git.ritesh.list@gmail.com>	2023-11-22 10:17:10 +01:00
Linus Torvalds	6b65522316	Changes since last update: - Tidy up erofs_read_inode() for simplicity; - Fix broken fscache mode due to NULL dereference of dif->bdev_handle; - Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmVaugMRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qojvg//ajFjjAVQwVtyjfni1PwmbMiKtlQ/Brta mhtfbcgOkR5sInCeuat2C3u0G7bbWISWSCEUEqv3qjjEIMVpZSJq++tctMDFiM9u kSPgq/TMnbt1tEwRWXiost1o/ijCBBtQRPW2vK3kytZ/PKKLswhf4BrSAYANX/ne 2MGh8RQFwz8mDjBTtQ2mQMOIEb4aHon+RYbgw/pMaV53OiY8DuHIs0GXKYdYPhXA O5je5xk6dmSBkmxGyfCg8iImq6H+aU2bSi0D62VaTN9aZ11VTpjHU9Ce+Y9mCTVp OX47mhvrT/b7kR1gpM8hj4gg5moUebRvStoG43LCWAtGWvTEqgT9PlL1WFPdTZAA QxjdJ8svAsweCliNDuu7U3ZNWgHiMOu2WqtrHMoxR+tfbqbqcvCRkPAHOlFI0gmS ws2EsM/3uw1I13z0ndQPQTb6x2JHDM60a3/8qhXzambuU87GR8FtN09OPHToNLhQ odwirLF8FVg+UL+gVnkXVqXkECVSBNaq0eO2lSSWvo2/hq1MLXlsSZvIsiGYICBx JoCvlezeEkq1VUAn2j7oq18Jr7U5ZnX+jQI6APG4k9XdxL+0ZOPYTnxSnKP+DXom CA/rWWYWZZVXZIHmYF32JVs3ymBAXBORbZID9Jv/Nucs9MiLrnpVhPxPl+OQLly0 JpvDhDeSyms= =Ez2+ -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Tidy up erofs_read_inode() for simplicity - Fix broken fscache mode due to NULL dereference of dif->bdev_handle - Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig * tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: add EROFS webpage erofs: fix NULL dereference of dif->bdev_handle in fscache mode erofs: simplify erofs_read_inode()	2023-11-21 11:45:48 -08:00
Steven Rostedt (Google)	71cade82f2	eventfs: Do not invalidate dentry in create_file/dir_dentry() With the call to simple_recursive_removal() on the entire eventfs sub system when the directory is removed, it performs the d_invalidate on all the dentries when it is removed. There's no need to do clean ups when a dentry is being created while the directory is being deleted. As dentries are cleaned up by the simpler_recursive_removal(), trying to do d_invalidate() in these functions will cause the dentry to be invalidated twice, and crash the kernel. Link: https://lore.kernel.org/all/20231116123016.140576-1-naresh.kamboju@linaro.org/ Link: https://lkml.kernel.org/r/20231120235154.422970988@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `407c6726ca` ("eventfs: Use simple_recursive_removal() to clean up dentries") Reported-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-20 19:02:11 -05:00
Steven Rostedt (Google)	88903daeca	eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL The logic to free the eventfs_inode (ei) use to set is_freed and clear the "dentry" field under the eventfs_mutex. But that changed when a race was found where the ei->dentry needed to be cleared when the last dput() was called on it. But there was still logic that checked if ei->dentry was not NULL and is_freed is set, and would warn if it was. But since that situation was changed and the ei->dentry isn't cleared until the last dput() is called on it while the ei->is_freed is set, do not test for that condition anymore, and change the comments to reflect that. Link: https://lkml.kernel.org/r/20231120235154.265826243@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: `020010fbfa` ("eventfs: Delete eventfs_inode when the last dentry is freed") Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2023-11-20 19:01:54 -05:00
Chuck Lever	796432efab	libfs: getdents() should return 0 after reaching EOD The new directory offset helpers don't conform with the convention of getdents() returning no more entries once a directory file descriptor has reached the current end-of-directory. To address this, copy the logic from dcache_readdir() to mark the open directory file descriptor once EOD has been reached. Seeking resets the mark. Reported-by: Tavian Barnes <tavianator@tavianator.com> Closes: https://lore.kernel.org/linux-fsdevel/20231113180616.2831430-1-tavianator@tavianator.com/ Fixes: `6faddda69f` ("libfs: Add directory operations for stable offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170043792492.4628.15646203084646716134.stgit@bazille.1015granger.net Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-20 15:34:22 +01:00
Christoph Hellwig	9c04138414	xfs: respect the stable writes flag on the RT device Update the per-folio stable writes flag dependening on which device an inode resides on. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-5-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-20 15:05:19 +01:00
Christoph Hellwig	c421df0b19	xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags Introduce a local boolean variable if FS_XFLAG_REALTIME to make the checks for it more obvious, and de-densify a few of the conditionals using it to make them more readable while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-20 15:05:18 +01:00
Christoph Hellwig	762321dab9	filemap: add a per-mapping stable writes flag folio_wait_stable waits for writeback to finish before modifying the contents of a folio again, e.g. to support check summing of the data in the block integrity code. Currently this behavior is controlled by the SB_I_STABLE_WRITES flag on the super_block, which means it is uniform for the entire file system. This is wrong for the block device pseudofs which is shared by all block devices, or file systems that can use multiple devices like XFS witht the RT subvolume or btrfs (although btrfs currently reimplements folio_wait_stable anyway). Add a per-address_space AS_STABLE_WRITES flag to control the behavior in a more fine grained way. The existing SB_I_STABLE_WRITES is kept to initialize AS_STABLE_WRITES to the existing default which covers most cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-2-hch@lst.de Tested-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-20 15:05:18 +01:00
Ian Kent	66917f85db	autofs: add: new_inode check in autofs_fill_super() Add missing NULL check of root_inode in autofs_fill_super(). While we are at it simplify the logic by taking advantage of the VFS cleanup procedures and get rid of the goto error handling, as suggested by Al Viro. Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20231119225319.331156-1-raven@themaw.net Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Bill O'Donnell <billodo@redhat.com> Reported-by: <syzbot+662f87a8ef490f45fa64@syzkaller.appspotmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-20 14:56:36 +01:00
Linus Torvalds	b8f1fa2419	Bug fixes for 6.7-rc2: * Fix deadlock arising due to intent items in AIL not being cleared when log recovery fails. * Fix stale data exposure bug when remapping COW fork extents to data fork. * Fix deadlock when data device flush fails. * Fix AGFL minimum size calculation. * Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is selected. * Fix corruption of log inode's extent count field when NREXT64 feature is enabled. Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZVNouAAKCRAH7y4RirJu 9O0mAQDePPSRT8ZrR63dxFZ1AW55q4y9iqgBxWcnKEelmVULPwD/byzoAJ46jvcL qpBHUJ1rUIcd/fGqAEkwfG6hKzD99w8= =G+60 -----END PGP SIGNATURE----- Merge tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Chandan Babu: - Fix deadlock arising due to intent items in AIL not being cleared when log recovery fails - Fix stale data exposure bug when remapping COW fork extents to data fork - Fix deadlock when data device flush fails - Fix AGFL minimum size calculation - Select DEBUG_FS instead of XFS_DEBUG when XFS_ONLINE_SCRUB_STATS is selected - Fix corruption of log inode's extent count field when NREXT64 feature is enabled * tag 'xfs-6.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: recovery should not clear di_flushiter unconditionally xfs: inode recovery does not validate the recovered inode xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS xfs: fix internal error from AGFL exhaustion xfs: up(ic_sema) if flushing data device fails xfs: only remap the written blocks in xfs_reflink_end_cow_extent XFS: Update MAINTAINERS to catch all XFS documentation xfs: abort intent items when recovery intents fail xfs: factor out xfs_defer_pending_abort	2023-11-18 11:28:28 -08:00
Linus Torvalds	bb28378af3	nfsd-6.7 fixes: - Fix several long-standing bugs in the duplicate reply cache - Fix a memory leak -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmVY6iwACgkQM2qzM29m f5fDqhAAnsHqZNG2I6asqh/5pPoqcp7kXkEe1l+6jr4jz/00R0lg+oLbsC6/S6eY tzGVkQtIkl2OpL8lt4JTgUL/xiO2JaqbdiIuHelnT62l97r7kbKWJ0ALHahLafiX hCQdJWOdud86kZ5x6/cYVfqyO08bhqDLUFvWd7zSLmzW/9U3bG4v6yXvHT3qqnnE dCJtuM9+DUfnDJKHe6+BFIobkyta8+Tpsg4QSgSAu4hg+dTcqtPCOxMeT+YwgNQd uZY1xiIjPLkufsBF86xzC3tyoFNaZc5QhIwv7ZBtmzNUw3906SbuST9hJiWeHeWq m7p0YeWDJrygiyIrvYxv6NDUCqnkoOxbuKTUAniTEj1SsE2gcQCfij0bU1OSRk6r CpT8TdJ6j2zP78+xxMsSIiA/gR7uJtCKs7LABru7DX25+sDkKK2Te9+PXINarJ1k fraeDeuXMQ1cu71WRXUJ3QKGn1/bC8DYGHFVQqcB+gVXqcm3BuKNYGt01nFyCp5s +1jL+lRxUjPydI2J1VKw5g+5jW9rBgLOT0T9xlr+TZDnYADBAakwpbujrBH5Ey+W BswGoNzqsW9B0U4N6o8OP0FA6IxcddX6Uan2czienmqj217w4AaPPT/vuA5EyC9W zNzBAi87rnaqQs4LnWiS09K+RvYIedQl1lJIzwIdQZfnMKmCSEI= =GOU4 -----END PGP SIGNATURE----- Merge tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix several long-standing bugs in the duplicate reply cache - Fix a memory leak * tag 'nfsd-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix checksum mismatches in the duplicate reply cache NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() NFSD: Update nfsd_cache_append() to use xdr_stream nfsd: fix file memleak on client_opens_release	2023-11-18 11:23:32 -08:00
Linus Torvalds	33b63f159a	four cifs/smb3 fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmVXyvsACgkQiiy9cAdy T1FuyQv/aPFI4XdIYwneZT0VRIxKtZgmek2SRfA+U3fiMNnBG90SqzYzswgkJqHZ vLdjGcwDXR0M2S9zf74lDtzqhfyGvf7d+YCwQ+vXTmhWAcneYM7w+AtFjD88rLAr GjS4oUM/BeZQ9nyPNTibueJld2cXXXSkGjRP/vu4RmsVWDzMJjlSOe+ZG0FBr32a x8JvCOtvUmIJ1uY4uwsDtA1uUpgq0QEO1pi+mlcn3tMxPpIypVzdWwnbex0XR4BO hzRcGJDAi6g4uQ43A5a9ypRN02zaX/PXbPg6IgLXlYm4Oce9um1MmrqAssVnGCXZ FaKMSxxnoQXjNW8Oxt0/RvWo2cHbUNPn6pq/Pvhj8FWq6AT+PZWW5JQy673ZhxWK WVy7L5Y1R4BDDceIrlJRb+8WOaP+sprgsWZI0WsOCBvrI9uoTSqqXjy+fhTC/6zi HZfC7kFHDh2jpbUFdBUt3ChIW2RCuowj2XEN3GrZr495vSLahokitl2grYbb16U9 squwsK1A =idlR -----END PGP SIGNATURE----- Merge tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - multichannel fixes (including a lock ordering fix and an important refcounting fix) - spnego fix * tag '6.7-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix lock ordering while disabling multichannel cifs: fix leak of iface for primary channel cifs: fix check of rc in function generate_smb3signingkey cifs: spnego: add ';' in HOST_KEY_LEN	2023-11-18 11:18:46 -08:00
Stefan Berger	8a924db2d7	fs: Pass AT_GETATTR_NOSEC flag to getattr interface function When vfs_getattr_nosec() calls a filesystem's getattr interface function then the 'nosec' should propagate into this function so that vfs_getattr_nosec() can again be called from the filesystem's gettattr rather than vfs_getattr(). The latter would add unnecessary security checks that the initial vfs_getattr_nosec() call wanted to avoid. Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass with the new getattr_flags parameter to the getattr interface function. In overlayfs and ecryptfs use this flag to determine which one of the two functions to call. In a recent code change introduced to IMA vfs_getattr_nosec() ended up calling vfs_getattr() in overlayfs, which in turn called security_inode_getattr() on an exiting process that did not have current->fs set anymore, which then caused a kernel NULL pointer dereference. With this change the call to security_inode_getattr() can be avoided, thus avoiding the NULL pointer dereference. Reported-by: <syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com> Fixes: `db1d1e8b98` ("IMA: use vfs_getattr_nosec to get the i_version") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Tyler Hicks <code@tyhicks.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Christian Brauner <brauner@kernel.org> Co-developed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2023-11-18 14:54:07 +01:00
Linus Torvalds	791c8ab095	bcachefs bugfixes for 6.7 Lots of small fixes for minor nits and compiler warnings. Bigger items: - The six locks lost wakeup is finally fixed: six_read_trylock() was checking for the waiting bit before decrementing the number of readers - validated the fix with a torture test. - Fix for a memory reclaim issue: when needing to reallocate a key cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL dance. - Multiple deleted inodes btree fixes - Fix an issue in fsck, where i_nlink would be recalculated incorrectly for hardlinked files if a snapshot had ever been taken. - Kill journal pre-reservations: This is a bigger patch than I would normally send at this point, but it deletes code and it fixes some of our tests that would sporadically die with the journal getting stuck, and it's a performance improvement, too. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmVX6YUACgkQE6szbY3K bnbFTBAAmtGYPXXLUZrr6uZc2Kp/a/v86FUatkrmXS4wVydQSBVhbo6CdxTAAYbx C1K/+WG5MIfkFaVNLQvIXjGECFPffRmA/xtgfjaHzXvUnG74sLGrsOW7eBvkEd3j 8sEPjfpp39LSLIWs9GbqP3kNF2ax6kZ4Y+kTod4PZB14S4VrXRmd2ZGjJ2VARsig ygFi+qOBJ0ojZHo7VSOzGRxcGNvVmbr76vOBuqEwR9PYnT3JKH2I+ANFdc1YLnAH mMr/nwTMzuLcN6FUSnPBoX+x1WyPaXDbBMNLBSoVBcpP6X/DqcNIr9yvkUGqT6uR cxW6oOten9M+JSbGALnTrQjc9Khug5SqjvTJ1fYl4NELocvapWvBdHtICwEpYl9F REeTGqTHMb8j4VJDl+JcP9cPbDEVa6TGa4SXB4NfF70MZf0y7HR2Y/ms0bi47+Zb IxLlbhFZUiqbnsBx+jPhuKD86mijjEbmjPWkqcsWG/olg8Sdor4LdU4CbaGEB1fn oSfMnwb5fKI4fFPVMSREJ2ktpeb1DUmvkdbt5klYTL4DBK2GIgGdYlnVXg9Z5AAY kz6fvJE0PhWw8cDb4ClGo6ZYffmlX6m4LoX0q3C1O0Wt+Q4av2/vtLuAMKXz38Y4 zzw+JO/h0X9ECJdFPPsTq2sg7cWo7oFVrO3+ZQ0dTfKKtAPUwWY= =7zgm -----END PGP SIGNATURE----- Merge tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Lots of small fixes for minor nits and compiler warnings. Bigger items: - The six locks lost wakeup is finally fixed: six_read_trylock() was checking for the waiting bit before decrementing the number of readers - validated the fix with a torture test. - Fix for a memory reclaim issue: when needing to reallocate a key cache key, we now do our usual GFP_NOWAIT; unlock(); GFP_KERNEL dance. - Multiple deleted inodes btree fixes - Fix an issue in fsck, where i_nlink would be recalculated incorrectly for hardlinked files if a snapshot had ever been taken. - Kill journal pre-reservations: This is a bigger patch than I would normally send at this point, but it deletes code and it fixes some of our tests that would sporadically die with the journal getting stuck, and it's a performance improvement, too" * tag 'bcachefs-2023-11-17' of https://evilpiepirate.org/git/bcachefs: (22 commits) bcachefs: Fix missing locking for dentry->d_parent access bcachefs: six locks: Fix lost wakeup bcachefs: Fix no_data_io mode checksum check bcachefs: Fix bch2_check_nlinks() for snapshots bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y bcachefs: Disable debug log statements bcachefs: Fix missing transaction commit bcachefs: Fix error path in bch2_mount() bcachefs: Fix potential sleeping during mount bcachefs: Fix iterator leak in may_delete_deleted_inode() bcachefs: Kill journal pre-reservations bcachefs: Check for nonce offset inconsistency in data_update path bcachefs: Make sure to drop/retake btree locks before reclaim bcachefs: btree_trans->write_locked bcachefs: Run btree key cache shrinker less aggressively bcachefs: Split out btree_key_cache_types.h bcachefs: Guard against insufficient devices to create stripes bcachefs: Fix null ptr deref in bch2_backpointer_get_node() bcachefs: Fix multiple -Warray-bounds warnings bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds warnings ...	2023-11-17 14:36:58 -08:00
Chuck Lever	bf51c52a1f	NFSD: Fix checksum mismatches in the duplicate reply cache nfsd_cache_csum() currently assumes that the server's RPC layer has been advancing rq_arg.head[0].iov_base as it decodes an incoming request, because that's the way it used to work. On entry, it expects that buf->head[0].iov_base points to the start of the NFS header, and excludes the already-decoded RPC header. These days however, head[0].iov_base now points to the start of the RPC header during all processing. It no longer points at the NFS Call header when execution arrives at nfsd_cache_csum(). In a retransmitted RPC the XID and the NFS header are supposed to be the same as the original message, but the contents of the retransmitted RPC header can be different. For example, for krb5, the GSS sequence number will be different between the two. Thus if the RPC header is always included in the DRC checksum computation, the checksum of the retransmitted message might not match the checksum of the original message, even though the NFS part of these messages is identical. The result is that, even if a matching XID is found in the DRC, the checksum mismatch causes the server to execute the retransmitted RPC transaction again. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-11-17 15:13:01 -05:00
Chuck Lever	1caf5f61dd	NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() The "statp + 1" pointer that is passed to nfsd_cache_update() is supposed to point to the start of the egress NFS Reply header. In fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. But both krb5i and krb5p add fields between the RPC header's accept_stat field and the start of the NFS Reply header. In those cases, "statp + 1" points at the extra fields instead of the Reply. The result is that nfsd_cache_update() caches what looks to the client like garbage. A connection break can occur for a number of reasons, but the most common reason when using krb5i/p is a GSS sequence number window underrun. When an underrun is detected, the server is obliged to drop the RPC and the connection to force a retransmit with a fresh GSS sequence number. The client presents the same XID, it hits in the server's DRC, and the server returns the garbage cache entry. The "statp + 1" argument has been used since the oldest changeset in the kernel history repo, so it has been in nfsd_dispatch() literally since before history began. The problem arose only when the server-side GSS implementation was added twenty years ago. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-11-17 15:12:55 -05:00
Chuck Lever	49cecd8628	NFSD: Update nfsd_cache_append() to use xdr_stream When inserting a DRC-cached response into the reply buffer, ensure that the reply buffer's xdr_stream is updated properly. Otherwise the server will send a garbage response. Cc: stable@vger.kernel.org # v6.3+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-11-17 15:12:46 -05:00
Mahmoud Adam	bc1b5acb40	nfsd: fix file memleak on client_opens_release seq_release should be called to free the allocated seq_file Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Mahmoud Adam <mngyadam@amazon.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: `78599c42ae` ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2023-11-17 15:12:39 -05:00
Linus Torvalds	6bc40e44f1	overlayfs fixes for 6.7-rc2 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmVXZFQACgkQEVvVyTe/ 1Wr/hQ//YofnLzFuE172QmfiYLQYAnBJONql47Hs32g+9zGjw7ev6tVbwEwuduY9 23lktlJthJO15+L8mfG3ECqpV7KfdBfuipjI6nO9V/Br7YEdHtDgk7jUqFWoADUA tEtXqjk8cqkWc4+6XFKHYeN04Nd0tvRIFmtW90gIxANE/AZxiPcCGKIKfqKgDLZ2 0IbN7yeJASc7XCtcjl9uldvhgmltpu1xX3IETsKOtLh1H8J3+DSI/5K7kQ4if5q/ 6Hi3+6Qf3aTqyaqG6z8RVhbwvrRWFNvaUWpjW5F1sBpNddtq8ioHmqX4L3Caybsw ukitshGj59MfmNnirxryO8MXv4RwqOAZFQc7ZfQhL6RzEO6WiqNybQ112SOh25E+ NsKSy4vhCiH3ifGQC8LZtdeWmcPS/5vPUMv81w7P6Y/VWZImQQ04kf1akSr9/iBX KCLFhYb8lKu+pBHFEZkYrdTDbIby+7QKraIi9hC2RsfFiIfvHn4Y1AtUt9M145va vBTF/7y8t5VhftMhP77ZUvREwIMrzcBJtqIH8J5XoT6EkxlGCV5ft9el20VyXYia tkWSzW9dQzGG+eGtdSX490MQMlZs7yN0SzyP0rUrZ2LMycwmMX976ssXtnp4NWBM sAnHbZMS1eXwK57WP9gOXiKLZ7sMWj03NzYK+cITL6Gttq/bAdw= =OtFU -----END PGP SIGNATURE----- Merge tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs fixes from Amir Goldstein: "A fix to an overlayfs param parsing bug and a misformatted comment" * tag 'ovl-fixes-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fix memory leak in ovl_parse_param() ovl: fix misformatted comment	2023-11-17 09:18:48 -05:00
Gao Xiang	62b241efff	MAINTAINERS: erofs: add EROFS webpage Add a new `W:` field of the EROFS entry points to the documentation site at <https://erofs.docs.kernel.org>. In addition, update the in-tree documentation and Kconfig too. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231117085329.1624223-1-hsiangkao@linux.alibaba.com	2023-11-17 19:55:46 +08:00
Jingbo Xu	8bd90b6ae7	erofs: fix NULL dereference of dif->bdev_handle in fscache mode Avoid NULL dereference of dif->bdev_handle, as dif->bdev_handle is NULL in fscache mode. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:erofs_map_dev+0xbd/0x1c0 Call Trace: <TASK> erofs_fscache_data_read_slice+0xa7/0x340 erofs_fscache_data_read+0x11/0x30 erofs_fscache_readahead+0xd9/0x100 read_pages+0x47/0x1f0 page_cache_ra_order+0x1e5/0x270 filemap_get_pages+0xf2/0x5f0 filemap_read+0xb8/0x2e0 vfs_read+0x18d/0x2b0 ksys_read+0x53/0xd0 do_syscall_64+0x42/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Reported-by: Yiqun Leng <yqleng@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7245 Fixes: `4984572008` ("erofs: Convert to use bdev_open_by_path()") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231114070704.23398-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2023-11-17 19:55:46 +08:00
Ferry Meng	914fa861e3	erofs: simplify erofs_read_inode() After commit `1c7f49a767` ("erofs: tidy up EROFS on-disk naming"), there is a unique `union erofs_inode_i_u` so that we could parse the union directly. Besides, it also replaces `inode->i_sb` with `sb` for simplicity. Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231109111822.17944-1-mengferry@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2023-11-17 19:55:34 +08:00
David Howells	2a4ca1b4b7	afs: Make error on cell lookup failure consistent with OpenAFS When kafs tries to look up a cell in the DNS or the local config, it will translate a lookup failure into EDESTADDRREQ whereas OpenAFS translates it into ENOENT. Applications such as West expect the latter behaviour and fail if they see the former. This can be seen by trying to mount an unknown cell: # mount -t afs %example.com:cell.root /mnt mount: /mnt: mount(2) system call failed: Destination address required. Fixes: `4d673da145` ("afs: Support the AFS dynamic root") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2023-11-17 07:55:28 +00:00
David Howells	e6bace7313	afs: Fix afs_server_list to be cleaned up with RCU afs_server_list is accessed with the rcu_read_lock() held from volume->servers, so it needs to be cleaned up correctly. Fix this by using kfree_rcu() instead of kfree(). Fixes: `8a070a9648` ("afs: Detect cell aliases 1 - Cells with root volumes") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2023-11-17 07:55:27 +00:00
Kent Overstreet	ba276ce586	bcachefs: Fix missing locking for dentry->d_parent access Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-16 16:57:19 -05:00
Qu Wenruo	8049ba5d0a	btrfs: do not abort transaction if there is already an existing qgroup [BUG] Syzbot reported a regression that after commit `6ed05643dd` ("btrfs: create qgroup earlier in snapshot creation") we can trigger transaction abort during snapshot creation: BTRFS: Transaction aborted (error -17) WARNING: CPU: 0 PID: 5057 at fs/btrfs/transaction.c:1778 create_pending_snapshot+0x25f4/0x2b70 fs/btrfs/transaction.c:1778 Modules linked in: CPU: 0 PID: 5057 Comm: syz-executor225 Not tainted 6.6.0-syzkaller-15365-g305230142ae0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 RIP: 0010:create_pending_snapshot+0x25f4/0x2b70 fs/btrfs/transaction.c:1778 Call Trace: <TASK> create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1967 btrfs_commit_transaction+0xf1c/0x3730 fs/btrfs/transaction.c:2440 create_snapshot+0x4a5/0x7e0 fs/btrfs/ioctl.c:845 btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:995 btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1041 __btrfs_ioctl_snap_create+0x344/0x460 fs/btrfs/ioctl.c:1294 btrfs_ioctl_snap_create+0x13c/0x190 fs/btrfs/ioctl.c:1321 btrfs_ioctl+0xbbf/0xd40 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f2f791127b9 </TASK> [CAUSE] The error number is -EEXIST, which can happen for qgroup if there is already an existing qgroup and then we're trying to create a snapshot for it. [FIX] In that case, we can continue creating the snapshot, although it may lead to qgroup inconsistency, it's not so critical to abort the current transaction. So in this case, we can just ignore the non-critical errors, mostly -EEXIST (there is already a qgroup). Reported-by: syzbot+4d81015bc10889fd12ea@syzkaller.appspotmail.com Fixes: `6ed05643dd` ("btrfs: create qgroup earlier in snapshot creation") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-15 17:08:14 +01:00
Qu Wenruo	1645c283a8	btrfs: tree-checker: add type and sequence check for inline backrefs [BUG] There is a bug report that ntfs2btrfs had a bug that it can lead to transaction abort and the filesystem flips to read-only. [CAUSE] For inline backref items, kernel has a strict requirement for their ordered, they must follow the following rules: - All btrfs_extent_inline_ref::type should be in an ascending order - Within the same type, the items should follow a descending order by their sequence number For EXTENT_DATA_REF type, the sequence number is result from hash_extent_data_ref(). For other types, their sequence numbers are btrfs_extent_inline_ref::offset. Thus if there is any code not following above rules, the resulted inline backrefs can prevent the kernel to locate the needed inline backref and lead to transaction abort. [FIX] Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the ability to detect such problems. For kernel, let's be more noisy and be more specific about the order, so that the next time kernel hits such problem we would reject it in the first place, without leading to transaction abort. Link: https://github.com/kdave/btrfs-progs/pull/622 Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-15 17:08:09 +01:00
Kent Overstreet	61b85cb0d7	bcachefs: six locks: Fix lost wakeup In percpu reader mode, trylock() for read had a lost wakeup: on failure to get the lock, we may have caused a writer to fail to get the lock, because we temporarily elevated the reader count. We need to check for waiters after decrementing the read count - not before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:44 -05:00
Kent Overstreet	62d73dfc44	bcachefs: Fix no_data_io mode checksum check In no_data_io mode, we expect data checksums to be wrong - don't want to spew the log with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:44 -05:00
Kent Overstreet	db18ef1a02	bcachefs: Fix bch2_check_nlinks() for snapshots When searching the link table for the matching inode, we were searching for a specific - incorrect - snapshot ID as well, causing us to fail to find the inode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:44 -05:00
Kent Overstreet	7125063fc6	bcachefs: Don't decrease BTREE_ITER_MAX when LOCKDEP=y Running with fewer max btree paths doesn't work anymore when replication is enabled - as we've added e.g. the freespace and bucket gens btrees, we naturally end up needing more btree paths. This is an issue with lockdep, we end up taking more locks than lockdep will track (the MAX_LOCKD_DEPTH constant). But bcachefs as merged does not yet support lockdep anyways, so we can leave that for later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:44 -05:00
Kent Overstreet	497c57a303	bcachefs: Disable debug log statements The journal read path had some informational log statements preperatory for ZNS support - they're not of interest to users, so we can turn them off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:44 -05:00
Kent Overstreet	f42fa17883	bcachefs: Fix missing transaction commit In may_delete_deleted_inode(), there's a corner case when a snapshot was taken while we had an unlinked inode: we don't want to delete the inode in the internal (shared) snapshot node, since it might have been reattached in a descendent snapshot. Instead we propagate the key to any snapshot leaves it doesn't exist in, so that it can be deleted there if necessary, and then clear the unlinked flag in the internal node. But we forgot to commit after clearing the unlinked flag, causing us to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	178c4873fd	bcachefs: Fix error path in bch2_mount() This fixes a bug discovered by generic/388 where sb->s_fs_info was NULL while the superblock was still active - the error path was entirely fubar, and was trying to do something unclear and unecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Daniel J Blueman	b783fc4d13	bcachefs: Fix potential sleeping during mount During mount, bcachefs mount option processing may sleep while allocating a string buffer. Fix this by reference counting in order to take the atomic path. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	069749688e	bcachefs: Fix iterator leak in may_delete_deleted_inode() may_delete_deleted_inode() was returning without exiting a btree iterator, eventually causing propagate_key_to_snaphot_leaves() to go into an infinite loop hitting btree_trans_too_many_iters(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	006ccc3090	bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Shyam Prasad N	5eef12c4e3	cifs: fix lock ordering while disabling multichannel The code to handle the case of server disabling multichannel was picking iface_lock with chan_lock held. This goes against the lock ordering rules, as iface_lock is a higher order lock (even if it isn't so obvious). This change fixes the lock ordering by doing the following in that order for each secondary channel: 1. store iface and server pointers in local variable 2. remove references to iface and server in channels 3. unlock chan_lock 4. lock iface_lock 5. dec ref count for iface 6. unlock iface_lock 7. dec ref count for server 8. lock chan_lock again Since this function can only be called in smb2_reconnect, and that cannot be called by two parallel processes, we should not have races due to dropping chan_lock between steps 3 and 8. Fixes: `ee1d21794e` ("cifs: handle when server stops supporting multichannel") Reported-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-14 11:39:35 -06:00
Shyam Prasad N	29954d5b1e	cifs: fix leak of iface for primary channel My last change in this area introduced a change which accounted for primary channel in the interface ref count. However, it did not reduce this ref count on deallocation of the primary channel. i.e. during umount. Fixing this leak here, by dropping this ref count for primary channel while freeing up the session. Fixes: `fa1d0508bd` ("cifs: account for primary channel in the interface list") Cc: stable@vger.kernel.org Reported-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-14 11:38:00 -06:00
Amir Goldstein	37f32f5264	ovl: fix memory leak in ovl_parse_param() On failure to parse parameters in ovl_parse_param_lowerdir(), it is necessary to update ctx->nr with the correct nr before using ovl_reset_lowerdirs() to release l->name. Reported-and-tested-by: syzbot+26eedf3631650972f17c@syzkaller.appspotmail.com Fixes: `c835110b58` ("ovl: remove unused code in lowerdir param parsing") Co-authored-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-11-14 08:09:36 +02:00
Amir Goldstein	b28060db71	ovl: fix misformatted comment Remove misleading /** prefix from a regular comment. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311121628.byHp8tkv-lkp@intel.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>	2023-11-14 08:09:36 +02:00
Kent Overstreet	701ff57eb3	bcachefs: Check for nonce offset inconsistency in data_update path We've rarely been seeing a nonce offset inconsistency that doesn't show up in tests: this adds some extra verification code to the data update path that prints out more relevant info when it occurs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:45:03 -05:00
Kent Overstreet	09b0283ee2	bcachefs: Make sure to drop/retake btree locks before reclaim We really don't want to be invoking memory reclaim with btree locks held: even aside from (solvable, but tricky) recursion issues, it can cause painful to diagnose performance edge cases. This fixes a recently reported issue in btree_key_can_insert_cached(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Mateusz Guzik <mjguzik@gmail.com> Fixes: https://lore.kernel.org/linux-bcachefs/CAGudoHEsb_hGRMeWeXh+UF6po0qQuuq_NKSEo+s1sEb6bDLjpA@mail.gmail.com/T/	2023-11-13 21:45:03 -05:00
Kent Overstreet	3b8c450777	bcachefs: btree_trans->write_locked As prep work for the next patch to fix a key cache reclaim issue, we need to start tracking whether we're currently holding write locks - so that we can release and retake the before calling into memory reclaim. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:45:03 -05:00
Kent Overstreet	c65c13f0ea	bcachefs: Run btree key cache shrinker less aggressively The btree key cache maintains lists of items that have been freed, but can't yet be reclaimed because a bch2_trans_relock() call might find them - we're waiting for SRCU readers to release. Previously, we wouldn't count these items against the number we're attempting to scan for, which would mean we'd evict more live key cache entries - doing quite a bit of potentially unecessary work. With recent work to make sure we don't hold SRCU locks for too long, it should be safe to count all the items on the freelists against number to scan - even if we can't reclaim them yet, we will be able to soon. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:45:01 -05:00
Kent Overstreet	1bd5bcc9f5	bcachefs: Split out btree_key_cache_types.h More consistent organization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:44:14 -05:00
Kent Overstreet	4d6128dca6	bcachefs: Guard against insufficient devices to create stripes We can't create stripes if we don't have enough devices - this manifested as an integer underflow bug later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:42:22 -05:00
Kent Overstreet	03cc1e67a2	bcachefs: Fix null ptr deref in bch2_backpointer_get_node() bch2_btree_iter_peek_node() can return a NULL ptr (when the tree is shorter than the search depth); handle this with an early return. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: https://lore.kernel.org/linux-bcachefs/5fc3c28b-c232-4ec7-b0ac-4ef220ddf976@moroto.mountain/T/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:42:22 -05:00
Gustavo A. R. Silva	274c2f8fd2	bcachefs: Fix multiple -Warray-bounds warnings Transform zero-length array `entries` into a proper flexible-array member in `struct journal_seq_blacklist_table`; and fix the following -Warray-bounds warnings: fs/bcachefs/journal_seq_blacklist.c:148:26: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:150:30: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:154:27: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:176:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:177:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:297:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:298:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:300:31: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] This results in no differences in binary output. This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:42:22 -05:00
Gustavo A. R. Silva	1b8bc55628	bcachefs: Use DECLARE_FLEX_ARRAY() helper and fix multiple -Warray-bounds warnings Transform zero-length array `s` into a proper flexible-array member in `struct snapshot_table` via the DECLARE_FLEX_ARRAY() helper; and fix tons of the following -Warray-bounds warnings: fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.c:135:70: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] fs/bcachefs/snapshot.h:36:21: warning: array subscript <unknown> is outside array bounds of 'struct snapshot_t[0]' [-Warray-bounds=] This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:42:21 -05:00
Kent Overstreet	c4f1f80a0e	bcachefs: Use correct fgf_t type as function argument This quiets a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-13 21:42:21 -05:00
Jiapeng Chong	48d584b7f9	bcachefs: make bch2_target_to_text_sb static The bch2_target_to_text_sb are not used outside the file disk_groups.c, so the modification is defined as static. fs/bcachefs/disk_groups.c:583:6: warning: no previous prototype for ‘bch2_target_to_text_sb’. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7144 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-11-13 21:42:21 -05:00
Ekaterina Esina	181724fc72	cifs: fix check of rc in function generate_smb3signingkey Remove extra check after condition, add check after generating key for encryption. The check is needed to return non zero rc before rewriting it with generating key for decryption. Found by Linux Verification Center (linuxtesting.org) with SVACE. Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Fixes: `d70e9fa558` ("cifs: try opening channels after mounting") Signed-off-by: Ekaterina Esina <eesina@astralinux.ru> Co-developed-by: Anastasia Belova <abelova@astralinux.ru> Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-13 16:22:30 -06:00
Anastasia Belova	ff31ba19d7	cifs: spnego: add ';' in HOST_KEY_LEN "host=" should start with ';' (as in cifs_get_spnego_key) So its length should be 6. Found by Linux Verification Center (linuxtesting.org) with SVACE. Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Fixes: `7c9c3760b3` ("[CIFS] add constants for string lengths of keynames in SPNEGO upcall string") Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Co-developed-by: Ekaterina Esina <eesina@astralinux.ru> Signed-off-by: Ekaterina Esina <eesina@astralinux.ru> Signed-off-by: Steve French <stfrench@microsoft.com>	2023-11-13 16:21:34 -06:00
Linus Torvalds	9bacdd8996	for-6.7-rc1-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmVSO50ACgkQxWXV+ddt WDuiyg/7BZviFAyiQMAzpA319qRJZ+EemfTdF/k69q4axGYuvqVdXKnpOV44AR4I dKcHLOPpDZIxsh8lFytkm1UEAHptw1v7A64c+gcdjGK0tAA7aKbw/1nmNysowT23 L0v2+34hkBUfG8A3uVgOwL1rjItEX5Fl54slpVsazSqlEbKrqC4MGNjmqdp3IOeC qfXTgkvkXmm8s8NyoJybKewM9Aw0tmK0jkAFHA+2sgcZPYKXjqWGv9KUOsXnCx5o 3kPWIRT1sj4q2qzrgP14Q12O6qPLZ2/0oTBhi6nhj8+N1yiH+USS5zBITegF+w2n leQeVHtyBYHlPYQSQlCIZy7+10gkePvs+JmoAuL8YFISnGYnvOZqCeArlV7cnNI3 CQt7ZBER5Dqw78Y756usUhpYrLWa9kOpcPVRmjJ/R62+TY1FkkyY7irETbn5EGjI NlhEa4PMYeYpAOccoxWEm9tIiiVD1abURhVBdn3Znfcb1Sv/lrGBlo9DYGFCxbBh xU1JP7sly8w0aPLqCbn1X3VY8dXp+CeYz4FQabHjQA/zr9lF08/pRYj3haAbYAyH 0KphXurwz/YqY+LmRg7SbQ/KMgBAiBV8Qk9JyNvdvaQbnYnq7CWdpoHcpZu3mvpb HLGoXew58kZaSfxLHlcT5wwYlbq0rooXRstuFg2+BBcOFOMCQfw= =GM+1 -----END PGP SIGNATURE----- Merge tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential overflow in returned value from SEARCH_TREE_V2 ioctl on 32bit architecture - zoned mode fixes: - drop unnecessary write pointer check for RAID0/RAID1/RAID10 profiles, now it works because of raid-stripe-tree - wait for finishing the zone when direct IO needs a new allocation - simple quota fixes: - pass correct owning root pointer when cleaning up an aborted transaction - fix leaking some structures when processing delayed refs - change key type number of BTRFS_EXTENT_OWNER_REF_KEY, reorder it before inline refs that are supposed to be sorted, keeping the original number would complicate a lot of things; this change needs an updated version of btrfs-progs to work and filesystems need to be recreated - fix error pointer dereference after failure to allocate fs devices - fix race between accounting qgroup extents and removing a qgroup * tag 'for-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: make OWNER_REF_KEY type value smallest among inline refs btrfs: fix qgroup record leaks when using simple quotas btrfs: fix race between accounting qgroup extents and removing a qgroup btrfs: fix error pointer dereference after failure to allocate fs devices btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe() btrfs: get correct owning_root when dropping snapshot btrfs: zoned: wait for data BG to be finished on direct IO allocation btrfs: zoned: drop no longer valid write pointer check btrfs: directly return 0 on no error code in btrfs_insert_raid_extent() btrfs: use u64 for buffer sizes in the tree search ioctls	2023-11-13 09:09:12 -08:00
Dave Chinner	7930d9e103	xfs: recovery should not clear di_flushiter unconditionally Because on v3 inodes, di_flushiter doesn't exist. It overlaps with zero padding in the inode, except when NREXT64=1 configurations are in use and the zero padding is no longer padding but holds the 64 bit extent counter. This manifests obviously on big endian platforms (e.g. s390) because the log dinode is in host order and the overlap is the LSBs of the extent count field. It is not noticed on little endian machines because the overlap is at the MSB end of the extent count field and we need to get more than 2^^48 extents in the inode before it manifests. i.e. the heat death of the universe will occur before we see the problem in little endian machines. This is a zero-day issue for NREXT64=1 configuraitons on big endian machines. Fix it by only clearing di_flushiter on v2 inodes during recovery. Fixes: `9b7d16e34b` ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") cc: stable@kernel.org # 5.19+ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:11:41 +05:30
Dave Chinner	038ca189c0	xfs: inode recovery does not validate the recovered inode Discovered when trying to track down a weird recovery corruption issue that wasn't detected at recovery time. The specific corruption was a zero extent count field when big extent counts are in use, and it turns out the dinode verifier doesn't detect that specific corruption case, either. So fix it too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:11:41 +05:30
Anthony Iliopoulos	a2e4388adf	xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS Commit `57c0f4a8ea` attempted to fix the select in the kconfig entry XFS_ONLINE_SCRUB_STATS by selecting XFS_DEBUG, but the original intention was to select DEBUG_FS, since the feature relies on debugfs to export the related scrub statistics. Fixes: `57c0f4a8ea` ("xfs: fix select in config XFS_ONLINE_SCRUB_STATS") Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:11:41 +05:30
Omar Sandoval	f63a5b3769	xfs: fix internal error from AGFL exhaustion We've been seeing XFS errors like the following: XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280 ... Call Trace: xfs_corruption_error+0x94/0xa0 xfs_btree_insert+0x221/0x280 xfs_alloc_fixup_trees+0x104/0x3e0 xfs_alloc_ag_vextent_size+0x667/0x820 xfs_alloc_fix_freelist+0x5d9/0x750 xfs_free_extent_fix_freelist+0x65/0xa0 __xfs_free_extent+0x57/0x180 ... This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when xfs_btree_insrec() fails. After converting this into a panic and dissecting the core dump, I found that xfs_btree_insrec() is failing because it's trying to split a leaf node in the cntbt when the AG free list is empty. In particular, it's failing to get a block from the AGFL _while trying to refill the AGFL_. If a single operation splits every level of the bnobt and the cntbt (and the rmapbt if it is enabled) at once, the free list will be empty. Then, when the next operation tries to refill the free list, it allocates space. If the allocation does not use a full extent, it will need to insert records for the remaining space in the bnobt and cntbt. And if those new records go in full leaves, the leaves (and potentially more nodes up to the old root) need to be split. Fix it by accounting for the additional splits that may be required to refill the free list in the calculation for the minimum free list size. P.S. As far as I can tell, this bug has existed for a long time -- maybe back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros ...") in April 1994! It requires a very unlucky sequence of events, and in fact we didn't hit it until a particular sparse mmap workload updated from 5.12 to 5.19. But this bug existed in 5.12, so it must've been exposed by some other change in allocation or writeback patterns. It's also much less likely to be hit with the rmapbt enabled, since that increases the minimum free list size and is unlikely to split at the same time as the bnobt and cntbt. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:11:40 +05:30
Leah Rumancik	471de20303	xfs: up(ic_sema) if flushing data device fails We flush the data device cache before we issue external log IO. If the flush fails, we shut down the log immediately and return. However, the iclog->ic_sema is left in a decremented state so let's add an up(). Prior to this patch, xfs/438 would fail consistently when running with an external log device: sync -> xfs_log_force -> xlog_write_iclog -> down(&iclog->ic_sema) -> blkdev_issue_flush (fail causes us to intiate shutdown) -> xlog_force_shutdown -> return unmount -> xfs_log_umount -> xlog_wait_iclog_completion -> down(&iclog->ic_sema) --------> HANG There is a second early return / shutdown. Make sure the up() happens for it as well. Also make sure we cleanup the iclog state, xlog_state_done_syncing, before dropping the iclog lock. Fixes: `b5d721eaae` ("xfs: external logs need to flush data device") Fixes: `842a42d126` ("xfs: shutdown on failure to add page to log bio") Fixes: `7d839e325a` ("xfs: check return codes when flushing block devices") Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:11:40 +05:30
Christoph Hellwig	55f669f341	xfs: only remap the written blocks in xfs_reflink_end_cow_extent xfs_reflink_end_cow_extent looks up the COW extent and the data fork extent at offset_fsb, and then proceeds to remap the common subset between the two. It does however not limit the remapped extent to the passed in [*offset_fsbm end_fsb] range and thus potentially remaps more blocks than the one handled by the current I/O completion. This means that with sufficiently large data and COW extents we could be remapping COW fork mappings that have not been written to, leading to a stale data exposure on a powerfail event. We use to have a xfs_trim_range to make the remap fit the I/O completion range, but that got (apparently accidentally) removed in commit `df2fd88f8a` ("xfs: rewrite xfs_reflink_end_cow to use intents"). Note that I've only found this by code inspection, and a test case would probably require very specific delay and error injection. Fixes: `df2fd88f8a` ("xfs: rewrite xfs_reflink_end_cow to use intents") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-11-13 09:09:19 +05:30

1 2 3 4 5 ...

87794 Commits