linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 12:11:40 +00:00

Author	SHA1	Message	Date
Darrick J. Wong	64dfa18d6e	xfs: fix C++ compilation errors in xfs_fs.h Several people reported C++ compilation errors due to things that C compilers allow but C++ compilers do not. Fix both of these problems, and hope there aren't more of these brown paper bags in 2 months when we finally get these fixes through the process into a released xfsprogs. NOTE: I am submitting this bugfix over the objections of a former maintainer, who insists that we should remove this function from the published userspace ABI instead of fixing the C++ compilation errors. No deprecation period, no discussion, just a hard drop of an already provided and correct C function, which would be in contravention of Linus' rules. IOWs, removing ABI that have already shipped in a released kernel requires a careful deprecation period, so I will let that maintainer run that process. Reported-by: kernel@mattwhitlock.name Reported-by: sam@gentoo.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219203 Fixes: `233f4e12bb` ("xfs: add parent pointer ioctls") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:20 -07:00
Darrick J. Wong	2c4162be6c	xfs: refactor loading quota inodes in the regular case Create a helper function to load quota inodes in the case where the dqtype and the sb quota inode fields correspond. This is true for nearly all the iget callsites in the quota code, except for when we're switching the group and project quota inodes. We'll need this in subsequent patches to make the metadir handling less convoluted. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:20 -07:00
Darrick J. Wong	2ca7b9d7b8	xfs: move xfs_ioc_getfsmap out of xfs_ioctl.c Move this function out of xfs_ioctl.c to reduce the clutter in there, and make the entire getfsmap implementation self-contained in a single file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	516f91035c	xfs: rearrange xfs_fsmap.c a little bit The order of the functions in this file has gotten a little confusing over the years. Specifically, the two data device implementations (bnobt and rmapbt) could be adjacent in the source code instead of split in two by the logdev and rtdev fsmap implementations. We're about to add more functionality to this file, so rearrange things now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	33912286cb	xfs: replace m_rsumsize with m_rsumblocks Track the RT summary file size in blocks, just like the RT bitmap file. While we have users of both units, blocks are used slightly more often and this matches the bitmap file for consistency. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	1fc51cf11d	xfs: remove xfs_{rtbitmap,rtsummary}_wordcount xfs_rtbitmap_wordcount and xfs_rtsummary_wordcount are currently unused, so remove them to simplify refactoring other rtbitmap helpers. They can be added back or simply open coded when actually needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	0902819fe6	xfs: add xchk_setup_nothing and xchk_nothing helpers Add common helpers for no-op scrubbing methods. Signed-off-by: Darrick J. Wong <djwong@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	ec12f97f1b	xfs: make the rtalloc start hint a xfs_rtblock_t 0 is a valid start RT extent, and with pending changes it will become both more common and non-unique. Switch to pass a xfs_rtblock_t instead so that we can use NULLRTBLOCK to determine if a hint was set or not. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	b2dd85f414	xfs: factor out a xfs_rtallocate_align helper Split the code to calculate the aligned allocation request from xfs_bmap_rtalloc into a separate self-contained helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	fd048a1bb3	xfs: rework the rtalloc fallback handling xfs_rtallocate currently has two fallbacks, when an allocation fails: 1) drop the requested extent size alignment, if any, and retry 2) ignore the locality hint Oddly enough it does those in order, as trying a different location is more in line with what the user asked for, and does it in a very unstructured way. Lift the fallback to try to allocate without the locality hint into xfs_rtallocate to both perform them in a more sensible order and to clean up the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	a9f646af43	xfs: factor out a xfs_rtallocate helper Split out a helper from xfs_rtallocate that performs the actual allocation. This keeps the scope of the xfs_rtalloc_args structure contained, and prepares for rtgroups support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	1e21d1897f	xfs: clean up the ISVALID macro in xfs_bmap_adjacent Turn the ISVALID macro defined and used inside in xfs_bmap_adjacent that relies on implict context into a proper inline function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	df8b181f15	xfs: simplify xfs_rtalloc_query_range There isn't much of a good reason to pass the xfs_rtalloc_rec structures that describe extents to xfs_rtalloc_query_range as we really just want a lower and upper bound xfs_rtxnum_t. Pass the rtxnum directly and simply the interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	fa0fc38b25	xfs: remove xfs_rtb_to_rtxrem Simplify the number of block number conversion helpers by removing xfs_rtb_to_rtxrem. Any recent compiler is smart enough to eliminate the double divisions if using separate xfs_rtb_to_rtx and xfs_rtb_to_rtxoff calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	9e9be9840f	xfs: fix broken variable-sized allocation detection in xfs_rtallocate_extent_block This function tries to find a suitable free space extent starting from a particular rtbitmap block. Some time ago, I added a clamping function to prevent the free space scans from running off the end of the bitmap, but I didn't quite get the logic right. Let's say there's an allocation request with a minlen of 5 and a maxlen of 32 and we're scanning the last rtbitmap block. If we come within 4 rtx of the end of the rt volume, maxlen will get clamped to 4. If the next 3 rtx are free, we could have satisfied the allocation, but the code setting partial besti/bestlen for "minlen < maxlen" will think that we're doing a non-variable allocation and ignore it. The root of this problem is overwriting maxlen; I should have stuffed the results in a different variable, which would not have introduced this bug. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	74c234bbe5	xfs: reduce excessive clamping of maxlen in xfs_rtallocate_extent_near The near rt allocator employs two allocation strategies -- first it tries to allocate at exactly @start. If that fails, it will pivot back and forth around that starting point looking for an appropriately sized free space. However, I clamped maxlen ages ago to prevent the exact allocation scan from running off the end of the rt volume. This, I realize, was excessive. If the allocation request is (say) for 32 rtx but the start position is 5 rtx from the end of the volume, we clamp maxlen to 5. If the exact allocation fails, we then pivot back and forth looking for 5 rtx, even though the original intent was to try to get 32 rtx. If we then find 5 rtx when we could have gotten 32 rtx, we've not done as well as we could have. This may be moot if the caller immediately comes back for more space, but it might not be. Either way, we can do better here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	62c3d24968	xfs: clean up xfs_rtallocate_extent_exact a bit Before we start doing more surgery on the rt allocator, let's clean up the exact allocator so that it doesn't change its arguments and uses the helper introduced in the previous patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	e6a74dcf9b	xfs: refactor aligning bestlen to prod There are two places in xfs_rtalloc.c where we want to make sure that a count of rt extents is aligned with a particular prod(uct) factor. In one spot, we actually use rounddown(), albeit unnecessarily if prod < 2. In the other case, we open-code this rounding inefficiently by promoting the 32-bit length value to a 64-bit value and then performing a 64-bit division to figure out the subtraction. Refactor this into a single helper that uses the correct types and division method for the type, and skips the division entirely unless prod is large enough to make a difference. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	e99aa0401e	xfs: don't scan off the end of the rt volume in xfs_rtallocate_extent_block The loop conditional here is not quite correct because an rtbitmap block can represent rtextents beyond the end of the rt volume. There's no way that it makes sense to scan for free space beyond EOFS, so don't do it. This overrun has been present since v2.6.0. Also fix the type of bestlen, which was incorrectly converted. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	cb59233e82	xfs: don't return too-short extents from xfs_rtallocate_extent_block If xfs_rtallocate_extent_block is asked for a variable-sized allocation, it will try to return the best-sized free extent, which is apparently the largest one that it finds starting in this rtbitmap block. It will then trim the size of the extent as needed to align it with prod. However, it misses one thing -- rounding down the best-fit candidate to the required alignment could make the extent shorter than minlen. In the case where minlen > 1, we'd rather the caller relaxed its alignment requirements and tried again, as the allocator already supports that. Returning a too-short extent that causes xfs_bmapi_write to return ENOSR if there aren't enough nmaps to handle multiple new allocations, which can then cause filesystem shutdowns. I haven't seen this happen on any production systems, but then I don't think it's very common to set a per-file extent size hint on realtime files. I tripped it while working on the rtgroups feature and pounding on the realtime allocator enthusiastically. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	86a0264ef2	xfs: ensure rtx mask/shift are correct after growfs When growfs sets an extent size, it doesn't updated the m_rtxblklog and m_rtxblkmask values, which could lead to incorrect usage of them if they were set before and can't be used for the new extent size. Add a xfs_mount_sb_set_rextsize helper that updates the two fields, and also use it when calculating the new RT geometry instead of disabling the optimization there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	a18a69bbec	xfs: use the recalculated transaction reservation in xfs_growfs_rt_bmblock After going great length to calculate the transaction reservation for the new geometry, we should also use it to allocate the transaction it was calculated for. Fixes: `578bd4ce71` ("xfs: recompute growfsrtfree transaction reservation while growing rt volume") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	0a59e4f3e1	xfs: push transaction join out of xfs_rtbitmap_lock and xfs_rtgroup_lock To prepare for being able to join an already locked rtbitmap inode to a transaction split out separate helpers for joining the transaction from the locking helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	2a95ffc44b	xfs: factor out rtbitmap/summary initialization helpers Add helpers to libxfs that can be shared by growfs and mkfs for initializing the rtbitmap and summary, and by passing the optional data pointer also by repair for rebuilding them. This will become even more useful when the rtgroups feature adds a metadata header to each block, which means even more shared code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: minor documentation and data advance tweaks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	266e78aec4	xfs: factor out a xfs_last_rt_bmblock helper Add helper to calculate the last currently used rt bitmap block to better structure the growfs code and prepare for future changes to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	7996f10ce6	xfs: factor out a xfs_growfs_rt_bmblock helper Add a helper to contain the per-rtbitmap block logic in xfs_growfs_rt. Note that this helper now allocates a new fake mount structure for each rtbitmap block iteration instead of reusing the memory for an entire growfs call. Compared to all the other work done when freeing the blocks the overhead for this is in the noise and it keeps the code nicely modular. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	c8e5a0bfe0	xfs: push the calls to xfs_rtallocate_range out to xfs_bmap_rtalloc Currently the various low-level RT allocator functions call into xfs_rtallocate_range directly, which ties them into the locking protocol for the RT bitmap. As these helpers already return the allocated range, lift the call to xfs_rtallocate_range into xfs_bmap_rtalloc so that it happens as high as possible in the stack, which will simplify future changes to the locking protocol. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	237130564e	xfs: cleanup the calling convention for xfs_rtpick_extent xfs_rtpick_extent never returns an error. Do away with the error return and directly return the picked extent instead of doing that through a call by reference argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	b4781eea68	xfs: add bounds checking to xfs_rt{bitmap,summary}_read_buf Add a corruption check for passing an invalid block number, which is a lot easier to understand than the xfs_bmapi_read failure later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	6d2db12d56	xfs: assert a valid limit in xfs_rtfind_forw Protect against developers passing stupid limits when refactoring the RT code once again. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	119c65e56b	xfs: remove the limit argument to xfs_rtfind_back All callers pass a 0 limit to xfs_rtfind_back, so remove the argument and hard code it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	3cb30d5162	xfs: make the RT rsum_cache mandatory Currently the RT mount code simply ignores an allocation failure for the rsum_cache. The code mostly works fine with it, but not having it leads to nasty corner cases in the growfs code that we don't really handle well. Switch to failing the mount if we can't allocate the memory, the file system would not exactly be useful in such a constrained environment to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	6529eef810	xfs: factor out a xfs_validate_rt_geometry helper Split the RT geometry validation in the early mount code into a helper than can be reused by repair (from which this code was apparently originally stolen anyway). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: u64 return value for calc_rbmblocks] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	021d9c107e	xfs: remove xfs_validate_rtextents Replace xfs_validate_rtextents with an open coded check for 0 rtextents. The name for the function implies it does a lot more than a zero check, which is more obvious when open coded. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	390b4775d6	xfs: pass the icreate args object to xfs_dialloc Pass the xfs_icreate_args object to xfs_dialloc since we can extract the relevant mode (really just the file type) and parent inumber from there. This simplifies the calling convention in preparation for the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Christoph Hellwig	feb09b727b	xfs: match on the global RT inode numbers in xfs_is_metadata_inode Match the inode number instead of the inode pointers, as the inode pointers in the superblock will go away soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: port to my tree, make the parameter a const pointer] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	05aba1953f	xfs: validate inumber in xfs_iget Actually use the inumber validator to check the argument passed in here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	398597c3ef	xfs: introduce new file range commit ioctls This patch introduces two more new ioctls to manage atomic updates to file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE. The commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE does, but with the additional requirement that file2 cannot have changed since some sampling point. The start-commit ioctl performs the sampling of file attributes. Note: This patch currently samples i_ctime during START_COMMIT and checks that it hasn't changed during COMMIT_RANGE. This isn't entirely safe in kernels prior to 6.12 because ctime only had coarse grained granularity and very fast updates could collide with a COMMIT_RANGE. With the multi-granularity ctime introduced by Jeff Layton, it's now possible to update ctime such that this does not happen. It is critical, then, that this patch must not be backported to any kernel that does not support fine-grained file change timestamps. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-09-01 08:58:19 -07:00
Darrick J. Wong	a24cae8fc1	xfs: reset rootdir extent size hint after growfsrt If growfsrt is run on a filesystem that doesn't have a rt volume, it's possible to change the rt extent size. If the root directory was previously set up with an inherited extent size hint and rtinherit, it's possible that the hint is no longer a multiple of the rt extent size. Although the verifiers don't complain about this, xfs_repair will, so if we detect this situation, log the root directory to clean it up. This is still racy, but it's better than nothing. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Darrick J. Wong	16e1fbdce9	xfs: take m_growlock when running growfsrt Take the grow lock when we're expanding the realtime volume, like we do for the other growfs calls. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Zizhi Wo	ca6448aed4	xfs: Fix missing interval for missing_owner in xfs fsmap In the fsmap query of xfs, there is an interval missing problem: [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16 2: 253:16 [24..39]: inode btree 0 (24..39) 16 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8 4: 253:16 [48..55]: refcount btree 0 (48..55) 8 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48 6: 253:16 [104..127]: free space 0 (104..127) 24 ...... BUG: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt [root@fedora ~]# Normally, we should be able to get [104, 107), but we got nothing. The problem is caused by shifting. The query for the problem-triggered scenario is for the missing_owner interval (e.g. freespace in rmapbt/ unknown space in bnobt), which is obtained by subtraction (gap). For this scenario, the interval is obtained by info->last. However, rec_daddr is calculated based on the start_block recorded in key[1], which is converted by calling XFS_BB_TO_FSBT. Then if rec_daddr does not exceed info->next_daddr, which means keys[1].fmr_physical >> (mp)->m_blkbb_log <= info->next_daddr, no records will be displayed. In the above example, 104 >> (mp)->m_blkbb_log = 12 and 107 >> (mp)->m_blkbb_log = 12, so the two are reduced to 0 and the gap is ignored: before calculate ----------------> after shifting 104(st) 107(ed) 12(st/ed) \|---------\| \| sector size block size Resolve this issue by introducing the "end_daddr" field in xfs_getfsmap_info. This records \|key[1].fmr_physical + key[1].length\| at the granularity of sector. If the current query is the last, the rec_daddr is end_daddr to prevent missing interval problems caused by shifting. We only need to focus on the last query, because xfs disks are internally aligned with disk blocksize that are powers of two and minimum 512, so there is no problem with shifting in previous queries. After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 104 107' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [104..106]: free space 0 (104..106) 3 Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: limit the range of end_addr correctly] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:14 +05:30
Darrick J. Wong	6b35cc8d92	xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code Use XFS_BUF_DADDR_NULL (instead of a magic sentinel value) to mean "this field is null" like the rest of xfs. Cc: wozizhi@huawei.com Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-27 18:32:08 +05:30
Zizhi Wo	68415b349f	xfs: Fix the owner setting issue for rmap query in xfs fsmap I notice a rmap query bug in xfs_io fsmap: [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16 2: 253:16 [24..39]: inode btree 0 (24..39) 16 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8 4: 253:16 [48..55]: refcount btree 0 (48..55) 8 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48 6: 253:16 [104..127]: free space 0 (104..127) 24 ...... Bug: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt [root@fedora ~]# Normally, we should be able to get one record, but we got nothing. The root cause of this problem lies in the incorrect setting of rm_owner in the rmap query. In the case of the initial query where the owner is not set, __xfs_getfsmap_datadev() first sets info->high.rm_owner to ULLONG_MAX. This is done to prevent any omissions when comparing rmap items. However, if the current ag is detected to be the last one, the function sets info's high_irec based on the provided key. If high->rm_owner is not specified, it should continue to be set to ULLONG_MAX; otherwise, there will be issues with interval omissions. For example, consider "start" and "end" within the same block. If high->rm_owner == 0, it will be smaller than the founded record in rmapbt, resulting in a query with no records. The main call stack is as follows: xfs_ioc_getfsmap xfs_getfsmap xfs_getfsmap_datadev_rmapbt __xfs_getfsmap_datadev info->high.rm_owner = ULLONG_MAX if (pag->pag_agno == end_ag) xfs_fsmap_owner_to_rmap // set info->high.rm_owner = 0 because fmr_owner == -1ULL dest->rm_owner = 0 // get nothing xfs_getfsmap_datadev_rmapbt_query The problem can be resolved by simply modify the xfs_fsmap_owner_to_rmap function internal logic to achieve. After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 Fixes: `e89c041338` ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:27 +05:30
Darrick J. Wong	410e8a18f8	xfs: don't bother reporting blocks trimmed via FITRIM Don't bother reporting the number of bytes that we "trimmed" because the underlying storage isn't required to do anything(!) and failed discard IOs aren't reported to the caller anyway. It's not like userspace can use the reported value for anything useful like adjusting the offset parameter of the next call, and it's not like anyone ever wrote a manpage about FITRIM's out parameters. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:13 +05:30
Dave Chinner	95179935be	xfs: xfs_finobt_count_blocks() walks the wrong btree As a result of the factoring in commit `14dd46cf31` ("xfs: split xfs_inobt_init_cursor"), mount started taking a long time on a user's filesystem. For Anders, this made mount times regress from under a second to over 15 minutes for a filesystem with only 30 million inodes in it. Anders bisected it down to the above commit, but even then the bug was not obvious. In this commit, over 20 calls to xfs_inobt_init_cursor() were modified, and some we modified to call a new function named xfs_finobt_init_cursor(). If that takes you a moment to reread those function names to see what the rename was, then you have realised why this bug wasn't spotted during review. And it wasn't spotted on inspection even after the bisect pointed at this commit - a single missing "f" isn't the easiest thing for a human eye to notice.... The result is that xfs_finobt_count_blocks() now incorrectly calls xfs_inobt_init_cursor() so it is now walking the inobt instead of the finobt. Hence when there are lots of allocated inodes in a filesystem, mount takes a -long- time run because it now walks a massive allocated inode btrees instead of the small, nearly empty free inode btrees. It also means all the finobt space reservations are wrong, so mount could potentially given ENOSPC on kernel upgrade. In hindsight, commit `14dd46cf31` should have been two commits - the first to convert the finobt callers to the new API, the second to modify the xfs_inobt_init_cursor() API for the inobt callers. That would have made the bug very obvious during review. Fixes: `14dd46cf31` ("xfs: split xfs_inobt_init_cursor") Reported-by: Anders Blomdell <anders.blomdell@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:52:00 +05:30
Darrick J. Wong	5335affcff	xfs: fix folio dirtying for XFILE_ALLOC callers willy pointed out that folio_mark_dirty is the correct function to use to mark an xfile folio dirty because it calls out to the mapping's aops to mark it dirty. For tmpfs this likely doesn't matter much since it currently uses nop_dirty_folio, but let's use the abstractions properly. Reported-by: willy@infradead.org Fixes: `6907e3c00a` ("xfs: add file_{get,put}_folio") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:51:27 +05:30
Darrick J. Wong	e21fea4ac3	xfs: fix di_onlink checking for V1/V2 inodes "KjellR" complained on IRC that an old V4 filesystem suddenly stopped mounting after upgrading from 6.9.11 to 6.10.3, with the following splat when trying to read the rt bitmap inode: 00000000: 49 4e 80 00 01 02 00 01 00 00 00 00 00 00 00 00 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 43 d2 a9 da 21 0f d6 30 ........C...!..0 00000030: 43 d2 a9 da 21 0f d6 30 00 00 00 00 00 00 00 00 C...!..0........ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00 ................ 00000060: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ As Dave Chinner points out, this is a V1 inode with both di_onlink and di_nlink set to 1 and di_flushiter == 0. In other words, this inode was formatted this way by mkfs and hasn't been touched since then. Back in the old days of xfsprogs 3.2.3, I observed that libxfs_ialloc would set di_nlink, but if the filesystem didn't have NLINK, it would then set di_version = 1. libxfs_iflush_int later sees the V1 inode and copies the value of di_nlink to di_onlink without zeroing di_onlink. Eventually this filesystem must have been upgraded to support NLINK because 6.10 doesn't support !NLINK filesystems, which is how we tripped over this old behavior. The filesystem doesn't have a realtime section, so that's why the rtbitmap inode has never been touched. Fix this by removing the di_onlink/di_nlink checking for all V1/V2 inodes because this is a muddy mess. The V3 inode handling code has always supported NLINK and written di_onlink==0 so keep that check. The removal of the V1 inode handling code when we dropped support for !NLINK obscured this old behavior. Reported-by: kjell.m.randa@gmail.com Fixes: `40cb8613d6` ("xfs: check unused nlink fields in the ondisk inode") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-26 09:50:41 +05:30
Darrick J. Wong	8d16762047	xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set If a file has the S_DAX flag (aka fsdax access mode) set, we cannot allow users to change the realtime flag unless the datadev and rtdev both support fsdax access modes. Even if there are no extents allocated to the file, the setattr thread could be racing with another thread that has already started down the write code paths. Fixes: `ba23cba9b3` ("fs: allow per-device dax status checking for filesystems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-14 21:20:24 +05:30
Darrick J. Wong	04d6dbb553	xfs: revert AIL TASK_KILLABLE threshold In commit `9adf40249e`, we changed the behavior of the AIL thread to set its own task state to KILLABLE whenever the timeout value is nonzero. Unfortunately, this missed the fact that xfsaild_push will return 50ms (aka a longish sleep) when we reach the push target or the AIL becomes empty, so xfsaild goes to sleep for a long period of time in uninterruptible D state. This results in artificially high load averages because KILLABLE processes are UNINTERRUPTIBLE, which contributes to load average even though the AIL is asleep waiting for someone to interrupt it. It's not blocked on IOs or anything, but people scrap ps for processes that look like they're stuck in D state, so restore the previous threshold. Fixes: `9adf40249e` ("xfs: AIL doesn't need manual pushing") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-14 21:19:34 +05:30
Darrick J. Wong	73c34b0b85	xfs: attr forks require attr, not attr2 It turns out that I misunderstood the difference between the attr and attr2 feature bits. "attr" means that at some point an attr fork was created somewhere in the filesystem. "attr2" means that inodes have variable-sized forks, but says nothing about whether or not there actually /are/ attr forks in the system. If we have an attr fork, we only need to check that attr is set. Fixes: `99d9d8d05d` ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-08-14 21:19:14 +05:30

1 2 3 4 5 ...

8832 Commits