linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-24 21:21:41 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	62d597a197	xfs: pass the fsbno to xfs_perag_intent_get All callers of xfs_perag_intent_get have a fsbno and need boilerplate code to turn that into an agno. Just pass the fsbno to xfs_perag_intent_get and look up the agno there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-07-02 11:37:01 -07:00
Darrick J. Wong	980faece91	xfs: convert "skip_discard" to a proper flags bitset Convert the boolean to skip discard on free into a proper flags field so that we can add more flags in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:37:01 -07:00
Darrick J. Wong	4e0e2c0fe3	xfs: clean up extent free log intent item tracepoint callsites Pass the incore EFI structure to the tracepoints instead of open-coding the argument passing. This cleans up the call sites a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:37:01 -07:00
Darrick J. Wong	ac3a027516	xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct (xfs_sb) to compute the address of sb_crc within the ondisk superblock struct (xfs_dsb). This is a landmine if we ever change the layout of the incore superblock (as we're about to do), so redefine the macro to use xfs_dsb to compute the layout of xfs_dsb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:37:00 -07:00
Darrick J. Wong	47d4d5961f	xfs: get rid of trivial rename helpers Get rid of the largely pointless xfs_cross_rename and xfs_finish_rename now that we've refactored its parent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:37:00 -07:00
Darrick J. Wong	62bbf50bea	xfs: move dirent update hooks to xfs_dir2.c Move the directory entry update hook code to xfs_dir2 so that it is mostly consolidated with the higher level directory functions. Retain the exports so that online fsck can still send notifications through the hooks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:37:00 -07:00
Darrick J. Wong	28d0d81344	xfs: create libxfs helper to rename two directory entries Create a new libxfs function to rename two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:59 -07:00
Darrick J. Wong	a55712b35c	xfs: create libxfs helper to exchange two directory entries Create a new libxfs function to exchange two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:59 -07:00
Darrick J. Wong	90636e4531	xfs: create libxfs helper to remove an existing inode/name from a directory Create a new libxfs function to remove a (name, inode) entry from a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:59 -07:00
Darrick J. Wong	1964435d19	xfs: hoist inode free function to libxfs Create a libxfs helper function that marks an inode free on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:59 -07:00
Darrick J. Wong	c1f0bad423	xfs: create libxfs helper to link an existing inode into a directory Create a new libxfs function to link an existing inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:59 -07:00
Darrick J. Wong	1fa2e81957	xfs: create libxfs helper to link a new inode into a directory Create a new libxfs function to link a newly created inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:58 -07:00
Darrick J. Wong	b11b11e3b7	xfs: separate the icreate logic around INIT_XATTRS INIT_XATTRS is overloaded here -- it's set during the creat process when we think that we're immediately going to set some ACL xattrs to save time. However, it's also used by the parent pointers code to enable the attr fork in preparation to receive ppptr xattrs. This results in xfs_has_parent() branches scattered around the codebase to turn on INIT_XATTRS. Linkable files are created far more commonly than unlinkable temporary files or directory tree roots, so we should centralize this logic in xfs_inode_init. For the three callers that don't want parent pointers (online repiar tempfiles, unlinkable tempfiles, rootdir creation) we provide an UNLINKABLE flag to skip attr fork initialization. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:58 -07:00
Darrick J. Wong	a9e583d34f	xfs: hoist xfs_{bump,drop}link to libxfs Move xfs_bumplink and xfs_droplink to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:58 -07:00
Darrick J. Wong	b8a6107921	xfs: hoist xfs_iunlink to libxfs Move xfs_iunlink and xfs_iunlink_remove to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:58 -07:00
Darrick J. Wong	c0223b8d66	xfs: wrap inode creation dqalloc calls Create a helper that calls dqalloc to allocate and grab a reference to dquots for the user, group, and project ids listed in an icreate structure. This simplifies the creat-related dqalloc callsites scattered around the code base. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:57 -07:00
Darrick J. Wong	dfaf884233	xfs: push xfs_icreate_args creation out of xfs_create* Move the initialization of the xfs_icreate_args structure out of xfs_create and xfs_create_tempfile into their callers so that we can set the new inode's attributes in one place and pass that through instead of open coding the collection of attributes all over the code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:57 -07:00
Darrick J. Wong	e9d2b35bb9	xfs: hoist new inode initialization functions to libxfs Move all the code that initializes a new inode's attributes from the icreate_args structure and the parent directory into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:57 -07:00
Darrick J. Wong	38fd3d6a95	xfs: split new inode creation into two pieces There are two parts to initializing a newly allocated inode: setting up the incore structures, and initializing the new inode core based on the parent inode and the current user's environment. The initialization code is not specific to the kernel, so we would like to share that with userspace by hoisting it to libxfs. Therefore, split xfs_icreate into separate functions to prepare for the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:57 -07:00
Darrick J. Wong	a7b12718cb	xfs: use xfs_trans_ichgtime to set times when allocating inode Use xfs_trans_ichgtime to set the inode times when allocating an inode, instead of open-coding them here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:56 -07:00
Darrick J. Wong	3d1dfb6df9	xfs: implement atime updates in xfs_trans_ichgtime Enable xfs_trans_ichgtime to change the inode access time so that we can use this function to set inode times when allocating inodes instead of open-coding it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:56 -07:00
Darrick J. Wong	ba4b39fe4c	xfs: pack icreate initialization parameters into a separate structure Callers that want to create an inode currently pass all possible file attribute values for the new inode into xfs_init_new_inode as ten separate parameters. This causes two code maintenance issues: first, we have large multi-line call sites which programmers must read carefully to make sure they did not accidentally invert a value. Second, all three file id parameters must be passed separately to the quota functions; any discrepancy results in quota count errors. Clean this up by creating a new icreate_args structure to hold all this information, some helpers to initialize them properly, and make the callers pass this structure through to the creation function, whose name we shorten to xfs_icreate. This eliminates the issues, enables us to keep the inode init code in sync with userspace via libxfs, and is needed for future metadata directory tree management. (A subsequent cleanup will also fix the quota alloc calls.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:56 -07:00
Darrick J. Wong	fcea5b35f3	xfs: hoist project id get/set functions to libxfs Move the project id get and set functions into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:56 -07:00
Darrick J. Wong	b7c477be39	xfs: hoist inode flag conversion functions to libxfs Hoist the inode flag conversion functions into libxfs so that we can keep them in sync. Do this by creating a new xfs_inode_util.c file in libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:55 -07:00
Darrick J. Wong	acdddbe168	xfs: hoist extent size helpers to libxfs Move the extent size helpers to xfs_bmap.c in libxfs since they're used there already. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:55 -07:00
Darrick J. Wong	d76e137057	xfs: move inode copy-on-write predicates to xfs_inode.[ch] Move these inode predicate functions to xfs_inode.[ch] since they're not reflink functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:55 -07:00
Darrick J. Wong	24a4e1cb32	xfs: use consistent uid/gid when grabbing dquots for inodes I noticed that callers of xfs_qm_vop_dqalloc use the following code to compute the anticipated uid of the new file: mapped_fsuid(idmap, &init_user_ns); whereas the VFS uses a slightly different computation for actually assigning i_uid: mapped_fsuid(idmap, i_user_ns(inode)); Technically, these are not the same things. According to Christian Brauner, the only time that inode->i_sb->s_user_ns != &init_user_ns is when the filesystem was mounted in a new mount namespace by an unpriviledged user. XFS does not allow this, which is why we've never seen bug reports about quotas being incorrect or the uid checks in xfs_qm_vop_create_dqattach tripping debug assertions. However, this /is/ a logic bomb, so let's make the code consistent. Link: https://lore.kernel.org/linux-fsdevel/20240617-weitblick-gefertigt-4a41f37119fa@brauner/ Fixes: `c14329d39f` ("fs: port fs{g,u}id helpers to mnt_idmap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:55 -07:00
Darrick J. Wong	150bb10a28	xfs: verify buffer, inode, and dquot items every tx commit generic/388 has an annoying tendency to fail like this during log recovery: XFS (sda4): Unmounting Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Mounting V5 Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Starting recovery (logdev: internal) 00000000: 49 4e 81 b6 03 02 00 00 00 00 00 07 00 00 00 07 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 10 ................ 00000020: 35 9a 8b c1 3e 6e 81 00 35 9a 8b c1 3f dc b7 00 5...>n..5...?... 00000030: 35 9a 8b c1 3f dc b7 00 00 00 00 00 00 3c 86 4f 5...?........<.O 00000040: 00 00 00 00 00 00 02 f3 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 1f 01 00 00 00 00 00 00 00 02 b2 74 c9 0b .............t.. 00000060: ff ff ff ff d7 45 73 10 00 00 00 00 00 00 00 2d .....Es........- 00000070: 00 00 07 92 00 01 fe 30 00 00 00 00 00 00 00 1a .......0........ 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000090: 35 9a 8b c1 3b 55 0c 00 00 00 00 00 04 27 b2 d1 5...;U.......'.. 000000a0: 43 5f e3 9b 82 b6 46 ef be 56 81 94 99 58 51 30 C_....F..V...XQ0 XFS (sda4): Internal error Bad dinode after recovery at line 539 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x4e/0xc0 [xfs] CPU: 0 PID: 2189311 Comm: mount Not tainted 6.9.0-rc4-djwx #rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 xfs_corruption_error+0x90/0xa0 xlog_recover_inode_commit_pass2+0x5f1/0xb00 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x2db/0x350 xlog_recovery_process_trans+0xab/0xe0 xlog_recover_process_data+0xa7/0x130 xlog_do_recovery_pass+0x398/0x840 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x34/0x1d0 xlog_recover+0xe9/0x1a0 xfs_log_mount+0xff/0x260 xfs_mountfs+0x5d9/0xb60 xfs_fs_fill_super+0x76b/0xa30 get_tree_bdev+0x124/0x1d0 vfs_get_tree+0x17/0xa0 path_mount+0x72b/0xa90 __x64_sys_mount+0x112/0x150 do_syscall_64+0x49/0x100 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> XFS (sda4): Corruption detected. Unmount and run xfs_repair XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0x739/0x920 [xfs], inode 0x427b2d1 XFS (sda4): Filesystem has been shut down due to log error (0x2). XFS (sda4): Please unmount the filesystem and rectify the problem(s). XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed This inode log item recovery failing the dinode verifier after replaying the contents of the inode log item into the ondisk inode. Looking back into what the kernel was doing at the time of the fs shutdown, a thread was in the middle of running a series of transactions, each of which committed changes to the inode. At some point in the middle of that chain, an invalid (at least according to the verifier) change was committed. Had the filesystem not shut down in the middle of the chain, a subsequent transaction would have corrected the invalid state and nobody would have noticed. But that's not what happened here. Instead, the invalid inode state was committed to the ondisk log, so log recovery tripped over it. The actual defect here was an overzealous inode verifier, which was fixed in a separate patch. This patch adds some transaction precommit functions for CONFIG_XFS_DEBUG=y mode so that we can detect these kinds of transient errors at transaction commit time, where it's much easier to find the root cause. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-07-02 11:36:54 -07:00
Darrick J. Wong	3ba3ab1f67	xfs: enable FITRIM on the realtime device Implement FITRIM for the realtime device by pretending that it's "space" immediately after the data device. We have to hold the rtbitmap ILOCK while the discard operations are ongoing because there's no busy extent tracking for the rt volume to prevent reallocations. Cc: Konst Mayer <cdlscpmv@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Wenchao Hao	a330cae8a7	xfs: Remove header files which are included more than once Following warning is reported, so remove these duplicated header including: ./fs/xfs/libxfs/xfs_trans_resv.c: xfs_da_format.h is included more than once. ./fs/xfs/scrub/quota_repair.c: xfs_format.h is included more than once. ./fs/xfs/xfs_handle.c: xfs_da_btree.h is included more than once. ./fs/xfs/xfs_qm_bhv.c: xfs_mount.h is included more than once. ./fs/xfs/xfs_trace.c: xfs_bmap.h is included more than once. This is just a clean code, no logic changed. Signed-off-by: Wenchao Hao <haowenchao22@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	4818fd60db	xfs: fold xfs_ilock_for_write_fault into xfs_write_fault Now that the page fault handler has been refactored, the only caller of xfs_ilock_for_write_fault is simple enough and calls it unconditionally. Fold the logic and expand the comments explaining it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	4e82fa11fb	xfs: always take XFS_MMAPLOCK shared in xfs_dax_read_fault After the previous refactoring, xfs_dax_fault is now never used for write faults, so don't bother with the xfs_ilock_for_write_fault logic to protect against writes when remapping is in progress. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	6a39ec1d39	xfs: refactor __xfs_filemap_fault Split the write fault and DAX fault handling into separate helpers so that the main fault handler is easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	9092b1de35	xfs: simplify xfs_dax_fault Replace the separate stub with an IS_ENABLED check, and take the call to dax_finish_sync_fault into xfs_dax_fault instead of leaving it in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	29bc0dd0a2	xfs: cleanup xfs_ilock_iocb_for_write Move the relock path out of the straight line and add a comment explaining why it exists. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Christoph Hellwig	8626b67acf	xfs: move the dio write relocking out of xfs_ilock_for_iomap About half of xfs_ilock_for_iomap deals with a special case for direct I/O writes to COW files that need to take the ilock exclusively. Move this code into the one callers that cares and simplify xfs_ilock_for_iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
lei lu	0c7fcdb6d0	xfs: don't walk off the end of a directory data block This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry to make sure don't stray beyond valid memory region. Before patching, the loop simply checks that the start offset of the dup and dep is within the range. So in a crafted image, if last entry is xfs_dir2_data_unused, we can change dup->length to dup->length-1 and leave 1 byte of space. In the next traversal, this space will be considered as dup or dep. We may encounter an out of bound read when accessing the fixed members. In the patch, we make sure that the remaining bytes large enough to hold an unused entry before accessing xfs_dir2_data_unused and xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make sure that the remaining bytes large enough to hold a dirent with a single-byte name before accessing xfs_dir2_data_entry. Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
lei lu	fb63435b7c	xfs: add bounds checking to xlog_recover_process_data There is a lack of verification of the space occupied by fixed members of xlog_op_header in the xlog_recover_process_data. We can create a crafted image to trigger an out of bounds read by following these steps: 1) Mount an image of xfs, and do some file operations to leave records 2) Before umounting, copy the image for subsequent steps to simulate abnormal exit. Because umount will ensure that tail_blk and head_blk are the same, which will result in the inability to enter xlog_recover_process_data 3) Write a tool to parse and modify the copied image in step 2 4) Make the end of the xlog_op_header entries only 1 byte away from xlog_rec_header->h_size 5) xlog_rec_header->h_num_logops++ 6) Modify xlog_rec_header->h_crc Fix: Add a check to make sure there is sufficient space to access fixed members of xlog_op_header. Signed-off-by: lei lu <llfamsec@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
John Garry	f23660f059	xfs: Fix xfs_prepare_shift() range for RT The RT extent range must be considered in the xfs_flush_unmap_range() call to stabilize the boundary. This code change is originally from Dave Chinner. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
John Garry	d3b689d7c7	xfs: Fix xfs_flush_unmap_range() range for RT Currently xfs_flush_unmap_range() does unmap for a full RT extent range, which we also want to ensure is clean and idle. This code change is originally from Dave Chinner. Reviewed-by: Christoph Hellwig <hch@lst.de>4 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Gao Xiang	d40c2865bd	xfs: avoid redundant AGFL buffer invalidation Currently AGFL blocks can be filled from the following three sources: - allocbt free blocks, as in xfs_allocbt_free_block(); - rmapbt free blocks, as in xfs_rmapbt_free_block(); - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). Originally, allocbt free blocks would be marked as stale only when they put back in the general free space pool as Dave mentioned on IRC, "we don't stale AGF metadata btree blocks when they are returned to the AGFL .. but once they get put back in the general free space pool, we have to make sure the buffers are marked stale as the next user of those blocks might be user data...." However, after commit `ca250b1b3d` ("xfs: invalidate allocbt blocks moved to the free list") and commit `edfd9dd549` ("xfs: move buffer invalidation to xfs_btree_free_block"), even allocbt / bmapbt free blocks will be invalidated immediately since they may fail to pass V5 format validation on writeback even writeback to free space would be safe. IOWs, IMHO currently there is actually no difference of free blocks between AGFL freespace pool and the general free space pool. So let's avoid extra redundant AGFL buffer invalidation, since otherwise we're currently facing unnecessary xfs_log_force() due to xfs_trans_binval() again on buffers already marked as stale before as below: [ 333.507469] Call Trace: [ 333.507862] xfs_buf_find+0x371/0x6a0 <- xfs_buf_lock [ 333.508451] xfs_buf_get_map+0x3f/0x230 [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 [ 333.511871] xfs_defer_finish+0xc/0xa0 [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 [ 333.513253] xfs_inactive_truncate+0xb8/0x130 [ 333.513930] xfs_inactive+0x223/0x270 xfs_log_force() will take tens of milliseconds with AGF buffer locked. It becomes an unnecessary long latency especially on our PMEM devices with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same time are stuck due to the same AGF lock. Removing the double invalidation on the AGFL blocks does not make this issue go away, but this patch fixes for our workloads in reality and it should also work by the code analysis. Note that I'm not sure I need to remove another redundant one in xfs_alloc_ag_vextent_small() since it's unrelated to our workloads. Also fstests are passed with this patch. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-07-01 09:32:29 +05:30
Linus Torvalds	22a40d14b5	Linux 6.10-rc6	2024-06-30 14:40:44 -07:00
Linus Torvalds	aca7c377d8	ata fixes for 6.10-rc6 - Add NOLPM quirk for for all Crucial BX SSD1 models. Considering that we now have had bug reports for 3 different BX SSD1 variants from Crucial with the same product name, make the quirk more inclusive, to catch more device models from the same generation. - Fix a trivial null pointer dereference in the error path for ata_host_release(). - Create a ata_port_free(), so that we don't miss freeing ata_port struct members when freeing a struct ata_port. - Fix a trivial double free in the error path for ata_host_alloc(). - Ensure that we remove the libata "remapped NVMe device count" sysfs entry on .probe() error. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRN+ES/c4tHlMch3DzJZDGjmcZNcgUCZoHMzwAKCRDJZDGjmcZN cj4wAP4nHi3Jr/ezwrIahwMGXV/VSlUhxOakj3OADdHe5spIogD/SrR8Uz7/m5QC sMF0Rd2L7tfrQSDdJ/MztBi7cVprtQo= =lnku -----END PGP SIGNATURE----- Merge tag 'ata-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Add NOLPM quirk for for all Crucial BX SSD1 models. Considering that we now have had bug reports for 3 different BX SSD1 variants from Crucial with the same product name, make the quirk more inclusive, to catch more device models from the same generation. - Fix a trivial NULL pointer dereference in the error path for ata_host_release(). - Create a ata_port_free(), so that we don't miss freeing ata_port struct members when freeing a struct ata_port. - Fix a trivial double free in the error path for ata_host_alloc(). - Ensure that we remove the libata "remapped NVMe device count" sysfs entry on .probe() error. * tag 'ata-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: ahci: Clean up sysfs file on error ata: libata-core: Fix double free on error ata,scsi: libata-core: Do not leak memory for ata_port struct members ata: libata-core: Fix null pointer dereference on error ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models	2024-06-30 14:32:24 -07:00
Niklas Cassel	eeb25a09c5	ata: ahci: Clean up sysfs file on error .probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however, if probe() fails after this call, we currently never call sysfs_remove_file_from_group(). (The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one()) does not help, as .remove() is not called on .probe() error.) Thus, if probe() fails after the sysfs_add_file_to_group() call, the next time we insmod the module we will get: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme' CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_add_file_mode_ns+0x11a/0x130 sysfs_add_file_to_group+0x7e/0xc0 ahci_init_one+0x31f/0xd40 [ahci] Fixes: `894fba7f43` ("ata: ahci: Add sysfs attribute to show remapped NVMe device count") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240629124210.181537-10-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-06-30 22:23:39 +02:00
Niklas Cassel	ab9e0c529e	ata: libata-core: Fix double free on error If e.g. the ata_port_alloc() call in ata_host_alloc() fails, we will jump to the err_out label, which will call devres_release_group(). devres_release_group() will trigger a call to ata_host_release(). ata_host_release() calls kfree(host), so executing the kfree(host) in ata_host_alloc() will lead to a double free: kernel BUG at mm/slub.c:553! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 11 PID: 599 Comm: (udev-worker) Not tainted 6.10.0-rc5 #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:kfree+0x2cf/0x2f0 Code: 5d 41 5e 41 5f 5d e9 80 d6 ff ff 4d 89 f1 41 b8 01 00 00 00 48 89 d9 48 89 da RSP: 0018:ffffc90000f377f0 EFLAGS: 00010246 RAX: ffff888112b1f2c0 RBX: ffff888112b1f2c0 RCX: ffff888112b1f320 RDX: 000000000000400b RSI: ffffffffc02c9de5 RDI: ffff888112b1f2c0 RBP: ffffc90000f37830 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90000f37610 R11: 617461203a736b6e R12: ffffea00044ac780 R13: ffff888100046400 R14: ffffffffc02c9de5 R15: 0000000000000006 FS: 00007f2f1cabe980(0000) GS:ffff88813b380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2f1c3acf75 CR3: 0000000111724000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? kfree+0x2cf/0x2f0 ? exc_invalid_op+0x50/0x70 ? kfree+0x2cf/0x2f0 ? asm_exc_invalid_op+0x1a/0x20 ? ata_host_alloc+0xf5/0x120 [libata] ? ata_host_alloc+0xf5/0x120 [libata] ? kfree+0x2cf/0x2f0 ata_host_alloc+0xf5/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Ensure that we will not call kfree(host) twice, by performing the kfree() only if the devres_open_group() call failed. Fixes: `dafd6c4963` ("libata: ensure host is free'd on error exit paths") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240629124210.181537-9-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-06-30 22:23:39 +02:00
Niklas Cassel	f6549f538f	ata,scsi: libata-core: Do not leak memory for ata_port struct members libsas is currently not freeing all the struct ata_port struct members, e.g. ncq_sense_buf for a driver supporting Command Duration Limits (CDL). Add a function, ata_port_free(), that is used to free a ata_port, including its struct members. It makes sense to keep the code related to freeing a ata_port in its own function, which will also free all the struct members of struct ata_port. Fixes: `18bd7718b5` ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-8-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-06-30 22:23:21 +02:00
Niklas Cassel	5d92c7c566	ata: libata-core: Fix null pointer dereference on error If the ata_port_alloc() call in ata_host_alloc() fails, ata_host_release() will get called. However, the code in ata_host_release() tries to free ata_port struct members unconditionally, which can lead to the following: BUG: unable to handle page fault for address: 0000000000003990 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 PID: 594 Comm: (udev-worker) Not tainted 6.10.0-rc5 #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:ata_host_release.cold+0x2f/0x6e [libata] Code: e4 4d 63 f4 44 89 e2 48 c7 c6 90 ad 32 c0 48 c7 c7 d0 70 33 c0 49 83 c6 0e 41 RSP: 0018:ffffc90000ebb968 EFLAGS: 00010246 RAX: 0000000000000041 RBX: ffff88810fb52e78 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88813b3218c0 RDI: ffff88813b3218c0 RBP: ffff88810fb52e40 R08: 0000000000000000 R09: 6c65725f74736f68 R10: ffffc90000ebb738 R11: 73692033203a746e R12: 0000000000000004 R13: 0000000000000000 R14: 0000000000000011 R15: 0000000000000006 FS: 00007f6cc55b9980(0000) GS:ffff88813b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000003990 CR3: 00000001122a2000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? ata_host_release.cold+0x2f/0x6e [libata] ? ata_host_release.cold+0x2f/0x6e [libata] release_nodes+0x35/0xb0 devres_release_group+0x113/0x140 ata_host_alloc+0xed/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Do not access ata_port struct members unconditionally. Fixes: `633273a3ed` ("libata-pmp: hook PMP support and enable it") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-7-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>	2024-06-30 22:16:15 +02:00
Linus Torvalds	e0b668b070	Kbuild fixes for v6.10 (third) - Remove the executable bit from installed DTB files - Escape $ in subshell execution in the debian-orig target - Fix RPM builds with CONFIG_MODULES=n - Fix xconfig with the O= option - Fix scripts_gdb with the O= option -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmaBjccVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGkugP+wdEiVWMpps+/CpTJuvSroEv0Kos Qqy0iQO9JP77tHfLnnMG2+tSvJh6nk7weitoiSkUuCfXde/8Y7HfvdldxzJI4kJD HkJF6b0MCbZDPkvAQhj8CCtsGKNy7X020047/qVUK9OrNH1HZiLOvFNirYuN3nrF T4/HK098S1ij8oAcF1F7psB231pXijQmIeMpVJAhjh9T3Kimu3tGENcVi1yu42tg /emZvhgst1eCOz/e02GHt7n8v5SncYnT5/eBPmZ+nd19uW1QMLY0xES6/dpEd/hN /jjMppqPPlJaYewIHXH8qaejcjyom/CBgxc9B6phGRd8IVR4YjK6DzPiwoAHXcvs KWoJUfLX4FV3PvoaW4XiKifW4lnaADy11bZjIXiV2PHSM/PBNNNtS6BZQv/gB+Sa a5qPRGjhc55pDIv63vTdEZ1HekWp8/XO5HkzWAFnInB+Fnuxqtwx8a6taM7Vgadq O1K4b17UedmFMHSB4UUDKra+7pS0ZP6hLXzozGf3AS/TgUHMTvU8mN0+2k4kxJol dA84pJUHBdnn/61SplNw2PBfP7QYhNJWfPU91Mn/noBwF/sMRFatCuFEgscgP2h+ 0itls2RT1Z4dPN/XN7UiJ8s4vdeYkwCJGhMDIfFe/g3JWFYTuGPeKATVPJZ0DWzz 6otCioAfqFFotky7 =JlIv -----END PGP SIGNATURE----- Merge tag 'kbuild-fixes-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Remove the executable bit from installed DTB files - Escape $ in subshell execution in the debian-orig target - Fix RPM builds with CONFIG_MODULES=n - Fix xconfig with the O= option - Fix scripts_gdb with the O= option * tag 'kbuild-fixes-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: scripts/gdb: bring the "abspath" back kbuild: Use $(obj)/%.cc to fix host C++ module builds kbuild: rpm-pkg: fix build error with CONFIG_MODULES=n kbuild: Fix build target deb-pkg: ln: failed to create hard link kbuild: doc: Update default INSTALL_MOD_DIR from extra to updates kbuild: Install dtb files as 0644 in Makefile.dtbinst	2024-06-30 10:00:01 -07:00
Linus Torvalds	769327258a	x86-32: fix cmpxchg8b_emu build error with clang The kernel test robot reported that clang no longer compiles the 32-bit x86 kernel in some configurations due to commit `95ece48165` ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions"). The build fails with arch/x86/include/asm/cmpxchg_32.h:149:9: error: inline assembly requires more registers than available and the reason seems to be that not only does the cmpxchg8b instruction need four fixed registers (EDX:EAX and ECX:EBX), with the emulation fallback the inline asm also wants a fifth fixed register for the address (it uses %esi for that, but that's just a software convention with cmpxchg8b_emu). Avoiding using another pointer input to the asm (and just forcing it to use the "0(%esi)" addressing that we end up requiring for the sw fallback) seems to fix the issue. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406230912.F6XFIyA6-lkp@intel.com/ Fixes: `95ece48165` ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions") Link: https://lore.kernel.org/all/202406230912.F6XFIyA6-lkp@intel.com/ Suggested-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-and-Tested-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-06-30 09:21:29 -07:00
Linus Torvalds	84dd4373d5	Char/Misc driver fixes for 6.10-rc6 Here are some small driver fixes for 6.10-rc6. Included in here are: - IIO driver fixes for reported issues - Counter driver fix for a reported problem. All of these have been in linux-next this week with no reported issues Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZoFlrA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymiMQCgkUmsCT9KQDt+/IqClfL+A6nBvUkAn08jRwGA dXjTvroHgHsNCU/VXMwV =FcWr -----END PGP SIGNATURE----- Merge tag 'char-misc-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 6.10-rc6. Included in here are: - IIO driver fixes for reported issues - Counter driver fix for a reported problem. All of these have been in linux-next this week with no reported issues" * tag 'char-misc-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: counter: ti-eqep: enable clock at probe iio: chemical: bme680: Fix sensor data read operation iio: chemical: bme680: Fix overflows in compensate() functions iio: chemical: bme680: Fix calibration data variable iio: chemical: bme680: Fix pressure value output iio: humidity: hdc3020: fix hysteresis representation iio: dac: fix ad9739a random config compile error iio: accel: fxls8962af: select IIO_BUFFER & IIO_KFIFO_BUF iio: adc: ad7266: Fix variable checking bug iio: xilinx-ams: Don't include ams_ctrl_channels in scan_mask	2024-06-30 09:16:08 -07:00

1 2 3 4 5 ...

1281058 Commits