xfs_attr_sf_findname has the simple job of finding a xfs_attr_sf_entry in
the attr fork, but the convoluted calling convention obfuscates that.
Return the found entry as the return value instead of an pointer
argument, as the -ENOATTR/-EEXIST can be trivally derived from that, and
remove the basep argument, as it is equivalent of the offset of sfe in
the data for if an sfe was found, or an offset of totsize if not was
found. To simplify the totsize computation add a xfs_attr_sf_endptr
helper that returns the imaginative xfs_attr_sf_entry at the end of
the current attrs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
trace_xfs_attr_sf_lookup is currently only called by
xfs_attr_shortform_lookup, which despit it's name is a simple helper for
xfs_attr_shortform_addname, which has it's own tracing. Move the
callsite to xfs_attr_shortform_getvalue, which is the closest thing to
a high level lookup we have for the Linux xattr API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Many of the xfs_idata_realloc callers need to set a local pointer to the
just reallocated if_data memory. Return the pointer to simplify them a
bit and use the opportunity to re-use krealloc for freeing if_data if the
size hits 0.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The xfs_ifork structure currently has a union of the if_root void pointer
and the if_data char pointer. In either case it is an opaque pointer
that depends on the fork format. Replace the union with a single if_data
void pointer as that is what almost all callers want. Only the symlink
NULL termination code in xfs_init_local_fork actually needs a new local
variable now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
There isn't really much left in xfs_rtallocate_extent now, fold it into
the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
There are currently multiple levels of fall back if an RT allocation
can not be satisfied:
1) xfs_rtallocate_extent extends the minlen and reduces the maxlen due
to the extent size hint. If that can't be done, it return -ENOSPC
and let's xfs_bmap_rtalloc retry, which then not only drops the
extent size hint based alignment, but also the minlen adjustment
2) if xfs_rtallocate_extent gets -ENOSPC from the underlying functions,
it only drops the extent size hint based alignment and retries
3) if that still does not succeed, xfs_rtallocate_extent drops the
extent size hint (which is a complex no-op at this point) and the
minlen using the same code as (1) above
4) if that still doesn't success and the caller wanted an allocation
near a blkno, drop that blkno hint.
The handling in 1 is rather inefficient as we could just drop the
alignment and continue, and 2/3 interact in really weird ways due to
the duplicate policy.
Move aligning the min and maxlen out of xfs_rtallocate_extent and into
a helper called directly by xfs_bmap_rtalloc. This allows just
continuing with the allocation if we have to drop the alignment instead
of going through the retry loop and also dropping the perfectly usable
minlen adjustment that didn't cause the problem, and then just use
a single retry that drops both the minlen and alignment requirement
when we really are out of space, thus consolidating cases (2) and (3)
above.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_bmap_rtalloc is a bit of a mess in terms of calculating the locally
need variables. Reorder them a bit so that related code is located
next to each other - the raminlen calculation moves up next to where
the maximum len is calculated, and all the prod calculation is move
into a single place and rearranged so that the real prod calculation
only happens when it actually is needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Use the kernel min/max helpers instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_format.h has a bunch odd wrappers for helper functions and mount
structure access using RT* prefixes. Replace them with their open coded
versions (for those that weren't entirely unused) and remove the wrappers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_rtallocate_extent_size has two loops with nearly identical logic
in them. Split that logic into a separate xfs_rtalloc_sumlevel helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Use common code for both xfs_rtallocate_range calls by moving
the !isfree logic into the non-default branch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Use a goto to use a common tail for the case of being able to allocate
an extent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Change polarity of a check so that the successful case of being able to
allocate an extent is in the main path of the function and error handling
is on a branch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Doing a break in the else side of a conditional is rather silly. Invert
the check, break ASAP and unindent the other leg.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Inline the logic of xfs_rtmodify_summary_int into xfs_rtmodify_summary
and xfs_rtget_summary instead of having a somewhat awkward helper to
share a little bit of code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_rtmodify_summary_int is only used inside xfs_rtbitmap.c and to
implement xfs_rtget_summary. Move xfs_rtget_summary to xfs_rtbitmap.c
as the exported API and mark xfs_rtmodify_summary_int static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Clean up the logical in xfs_bmap_rtalloc that tries to find a rtextent
to start the search from by using a separate variable for the hint, not
calling xfs_bmap_adjacent when we want to ignore the locality and avoid
an extra roundtrip converting between block numbers and RT extent
numbers.
As a side-effect this doesn't pointlessly call xfs_rtpick_extent and
increment the start rtextent hint if we are going to ignore the result
anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Add a return value to xfs_bmap_adjacent to indicate if it did change
ap->blkno or not.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Reorder the tail end of xfs_bmap_rtalloc so that the successfully
allocation is in the main path, and the error handling is on a branch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Just return -ENOSPC instead of returning 0 and setting the return rt
extent number to NULLRTEXTNO. This is turn removes all users of
NULLRTEXTNO, so remove that as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_bmap_rtalloc is currently in xfs_bmap_util.c, which is a somewhat
odd spot for it, given that is only called from xfs_bmap.c and calls
into xfs_rtalloc.c to do the actual work. Move xfs_bmap_rtalloc to
xfs_rtalloc.c and mark xfs_rtpick_extent xfs_rtallocate_extent and
xfs_rtallocate_extent static now that they aren't called from outside
of xfs_rtalloc.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Make xfs_bmap_btalloc_accounting more generic by handling the RT quota
reservations and then also use it from xfs_bmap_rtalloc instead of
open coding the accounting logic there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
xfs_bmap_btalloc_accounting only uses the len field from args, but that
has just been propagated to ap->length field by the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Without this upcoming change can cause an unused variable warning,
when adding a local variable for the fields field passed to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
minlen is the lower bound on the extent length that the caller can
accept, and maxlen is at this point the maximal available length.
This means a minlen extent is perfectly fine to use, so do it. This
matches the equivalent logic in xfs_rtallocate_extent_exact that also
accepts a minlen sized extent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
remove the second ones:
\#include "xfs_trans_resv.h"
\#include "xfs_mount.h"
Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
During growfs, if new ag in memory has been initialized, however
sb_agcount has not been updated, if an error occurs at this time it
will cause perag leaks as follows, these new AGs will not been freed
during umount , because of these new AGs are not visible(that is
included in mp->m_sb.sb_agcount).
unreferenced object 0xffff88810be40200 (size 512):
comm "xfs_growfs", pid 857, jiffies 4294909093
hex dump (first 32 bytes):
00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 381741e2):
[<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
[<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
[<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
[<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
[<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
[<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
[<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
[<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
unreferenced object 0xffff88810be40800 (size 512):
comm "xfs_growfs", pid 857, jiffies 4294909093
hex dump (first 32 bytes):
20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W.......
10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................
backtrace (crc bde50e2d):
[<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
[<ffffffff81814489>] kvmalloc_node+0x99/0x160
[<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
[<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
[<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
[<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
[<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
[<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
[<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
[<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
Factor out xfs_free_unused_perag_range() from xfs_initialize_perag(),
used for freeing unused perag within a specified range in error handling,
included in the error path of the growfs failure.
Fixes: 1c1c6ebcf5 ("xfs: Replace per-ag array with a radix tree")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Take mp->m_perag_lock for deletions from the perag radix tree in
xfs_initialize_perag to prevent racing with tagging operations.
Lookups are fine - they are RCU protected so already deal with the
tree changing shape underneath the lookup - but tagging operations
require the tree to be stable while the tags are propagated back up
to the root.
Right now there's nothing stopping radix tree tagging from operating
while a growfs operation is progress and adding/removing new entries
into the radix tree.
Hence we can have traversals that require a stable tree occurring at
the same time we are removing unused entries from the radix tree which
causes the shape of the tree to change.
Likely this hasn't caused a problem in the past because we are only
doing append addition and removal so the active AG part of the tree
is not changing shape, but that doesn't mean it is safe. Just making
the radix tree modifications serialise against each other is obviously
correct.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
XFS stores quota records and free space bitmap information in files.
Add the necessary infrastructure to enable repairing metadata inodes and
their forks, and then make it so that we can repair the file metadata
for the rtbitmap. Repairing the bitmap contents (and the summary file)
is left for subsequent patchsets.
We also add the ability to repair file metadata the quota files. As
part of these repairs, we also reinitialize the ondisk dquot records as
necessary to get the incore dquots working. We can also correct
obviously bad dquot record attributes, but we leave checking the
resource usage counts for the next patchsets.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKBAAKCRBKO3ySh0YR
pq7JAPwM5CRyw09Ys4iy1QO3iX6y4awKVONO/qIWKzGi56RRggEAoDYNXtx8Kvd8
gQS/PdrDBdQrwdu5W2stUyPt6xPxqwM=
=DNHN
-----END PGP SIGNATURE-----
Merge tag 'repair-quota-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: online repair of quota and rt metadata files
XFS stores quota records and free space bitmap information in files.
Add the necessary infrastructure to enable repairing metadata inodes and
their forks, and then make it so that we can repair the file metadata
for the rtbitmap. Repairing the bitmap contents (and the summary file)
is left for subsequent patchsets.
We also add the ability to repair file metadata the quota files. As
part of these repairs, we also reinitialize the ondisk dquot records as
necessary to get the incore dquots working. We can also correct
obviously bad dquot record attributes, but we leave checking the
resource usage counts for the next patchsets.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-quota-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair quotas
xfs: improve dquot iteration for scrub
xfs: check dquot resource timers
xfs: check the ondisk space mapping behind a dquot
Add in the necessary infrastructure to check the inode and data forks of
metadata files, then apply that to the realtime bitmap file. We won't
be able to reconstruct the contents of the rtbitmap file until rmapbt is
added for realtime volumes, but we can at least get the basics started.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKAwAKCRBKO3ySh0YR
pilcAP9Zuwcc01Wsll9bWHEOLVPmZbGkatLQfq3kx1f0AjIJ2gEArPaKRfNNwLdY
lxCgSdJbCFBKW1oQR+k+W07Om7d3Zww=
=B5sk
-----END PGP SIGNATURE-----
Merge tag 'repair-rtbitmap-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: online repair of rt bitmap file
Add in the necessary infrastructure to check the inode and data forks of
metadata files, then apply that to the realtime bitmap file. We won't
be able to reconstruct the contents of the rtbitmap file until rmapbt is
added for realtime volumes, but we can at least get the basics started.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-rtbitmap-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: online repair of realtime bitmaps
xfs: create a new inode fork block unmap helper
xfs: repair the inode core and forks of a metadata inode
xfs: always check the rtbitmap and rtsummary files
xfs: check rt summary file geometry more thoroughly
xfs: check rt bitmap file geometry more thoroughly
In this series, online repair gains the ability to rebuild data and attr
fork mappings from the reverse mapping information. It is at this point
where we reintroduce the ability to reap file extents.
Repair of CoW forks is a little different -- on disk, CoW staging
extents are owned by the refcount btree and cannot be mapped back to
individual files. Hence we can only detect staging extents that don't
quite look right (missing reverse mappings, shared staging extents) and
replace them with fresh allocations.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKAwAKCRBKO3ySh0YR
prEyAP95FTXnn0pe08gcyzICmcnMM762VNleDRawhAj+LU2J4AEAj4eOC5AelmYo
MHwdRYJzug539fliCF8m6a+id+aeCQo=
=cONL
-----END PGP SIGNATURE-----
Merge tag 'repair-file-mappings-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: online repair of file fork mappings
In this series, online repair gains the ability to rebuild data and attr
fork mappings from the reverse mapping information. It is at this point
where we reintroduce the ability to reap file extents.
Repair of CoW forks is a little different -- on disk, CoW staging
extents are owned by the refcount btree and cannot be mapped back to
individual files. Hence we can only detect staging extents that don't
quite look right (missing reverse mappings, shared staging extents) and
replace them with fresh allocations.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-file-mappings-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair problems in CoW forks
xfs: create a ranged query function for refcount btrees
xfs: refactor repair forcing tests into a repair.c helper
xfs: repair inode fork block mapping data structures
xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents
In this series, online repair gains the ability to repair inode records.
To do this, we must repair the ondisk inode and fork information enough
to pass the iget verifiers and hence make the inode igettable again.
Once that's done, we can perform higher level repairs on the incore
inode. The fstests counterpart of this patchset implements stress
testing of repair.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKAwAKCRBKO3ySh0YR
ppnbAP9zh9hc4Lr+3EMMZMsFXNxScP8iV7kCmHoorgMhHI9z8AEApBxkILxLjbwC
TJtoSmbnBhQWXR/my0MBPpLgD7v3ugk=
=lwKw
-----END PGP SIGNATURE-----
Merge tag 'repair-inodes-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: online repair of inodes and forks
In this series, online repair gains the ability to repair inode records.
To do this, we must repair the ondisk inode and fork information enough
to pass the iget verifiers and hence make the inode igettable again.
Once that's done, we can perform higher level repairs on the incore
inode. The fstests counterpart of this patchset implements stress
testing of repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-inodes-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: skip the rmapbt search on an empty attr fork unless we know it was zapped
xfs: abort directory parent scrub scans if we encounter a zapped directory
xfs: zap broken inode forks
xfs: repair inode records
xfs: set inode sick state flags when we zap either ondisk fork
xfs: dont cast to char * for XFS_DFORK_*PTR macros
xfs: add missing nrext64 inode flag check to scrub
xfs: try to attach dquots to files before repairing them
xfs: disable online repair quota helpers when quota not enabled
Now that we've spent a lot of time reworking common code in online fsck,
we're ready to start rebuilding the AG space btrees. This series
implements repair functions for the free space, inode, and refcount
btrees. Rebuilding the reverse mapping btree is much more intense and
is left for a subsequent patchset. The fstests counterpart of this
patchset implements stress testing of repair.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKAwAKCRBKO3ySh0YR
pkAeAQCM8fCc4T777ZOSE+RchXESmXeIKX3fAtgqTFa5dumauAEA1CSgTHnCASmL
9K5zdFwRyH2jFeFrxCMAqwNn+X2Z0gg=
=Cv+u
-----END PGP SIGNATURE-----
Merge tag 'repair-ag-btrees-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: online repair of AG btrees
Now that we've spent a lot of time reworking common code in online fsck,
we're ready to start rebuilding the AG space btrees. This series
implements repair functions for the free space, inode, and refcount
btrees. Rebuilding the reverse mapping btree is much more intense and
is left for a subsequent patchset. The fstests counterpart of this
patchset implements stress testing of repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-ag-btrees-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair refcount btrees
xfs: repair inode btrees
xfs: repair free space btrees
xfs: remove trivial bnobt/inobt scrub helpers
xfs: roll the scrub transaction after completing a repair
xfs: move the per-AG datatype bitmaps to separate files
xfs: create separate structures and code for u32 bitmaps
Before we start merging the online repair functions, let's improve the
bulk loading code a bit. First, we need to fix a misinteraction between
the AIL and the btree bulkloader wherein the delwri at the end of the
bulk load fails to queue a buffer for writeback if it happens to be on
the AIL list.
Second, we introduce a defer ops barrier object so that the process of
reaping blocks after a repair cannot queue more than two extents per EFI
log item. This increases our exposure to leaking blocks if the system
goes down during a reap, but also should prevent transaction overflows,
which result in the system going down.
Third, we change the bulkloader itself to copy multiple records into a
block if possible, and add some debugging knobs so that developers can
control the slack factors, just like they can do for xfs_repair.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXzKAwAKCRBKO3ySh0YR
pkcvAP0SEt4VLGrWQJqlWZ5e4sWnqDqVPyT/CQMvG86Qm9VcYwEAzbE0/DaA7uN0
DnceMdho49kTo6FC7+z/lQyGKbl89As=
=8FAT
-----END PGP SIGNATURE-----
Merge tag 'repair-prep-for-bulk-loading-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB
xfs: prepare repair for bulk loading
Before we start merging the online repair functions, let's improve the
bulk loading code a bit. First, we need to fix a misinteraction between
the AIL and the btree bulkloader wherein the delwri at the end of the
bulk load fails to queue a buffer for writeback if it happens to be on
the AIL list.
Second, we introduce a defer ops barrier object so that the process of
reaping blocks after a repair cannot queue more than two extents per EFI
log item. This increases our exposure to leaking blocks if the system
goes down during a reap, but also should prevent transaction overflows,
which result in the system going down.
Third, we change the bulkloader itself to copy multiple records into a
block if possible, and add some debugging knobs so that developers can
control the slack factors, just like they can do for xfs_repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-prep-for-bulk-loading-6.8_2023-12-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: constrain dirty buffers while formatting a staged btree
xfs: move btree bulkload record initialization to ->get_record implementations
xfs: add debug knobs to control btree bulk load slack factors
xfs: read leaf blocks when computing keys for bulkloading into node blocks
xfs: set XBF_DONE on newly formatted btree block that are ready for writing
xfs: force all buffers to be written during btree bulk load
Upon a closer inspection of the quota record scrubber, I noticed that
dqiterate wasn't actually walking all possible dquots for the mapped
blocks in the quota file. This is due to xfs_qm_dqget_next skipping all
XFS_IS_DQUOT_UNINITIALIZED dquots.
For a fsck program, we really want to look at all the dquots, even if
all counters and limits in the dquot record are zero. Rewrite the
implementation to do this, as well as switching to an iterator paradigm
to reduce the number of indirect calls.
This enables removal of the old broken dqiterate code from xfs_dquot.c.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
For each dquot resource, ensure either (a) the resource usage is over
the soft limit and there is a nonzero timer; or (b) usage is at or under
the soft limit and the timer is unset. (a) is redundant with the dquot
buffer verifier, but (b) isn't checked anywhere.
Found by fuzzing xfs/426 and noticing that diskdq.btimer = add didn't
trip any kind of warning for having a timer set even with no limits.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Each xfs_dquot object caches the file offset and daddr of the ondisk
block that backs the dquot. Make sure these cached values are the same
as the bmapi data, and that the block state is written.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fix all the file metadata surrounding the realtime bitmap file, which
includes the rt geometry, file size, forks, and space mappings. The
bitmap contents themselves cannot be fixed without rt rmap, so that will
come later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a new helper to unmap blocks from an inode's fork.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a helper function to repair the core and forks of a metadata inode,
so that we can get move onto the task of repairing higher level metadata
that lives in an inode.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
XFS filesystems always have a realtime bitmap and summary file, even if
there has never been a realtime volume attached. Always check them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
I forgot that the xfs_mount tracks the size and number of levels in the
realtime summary file, and that the rt summary file can have more blocks
mapped to the data fork than m_rsumsize implies if growfsrt fails.
So. Add to the rtsummary scrubber an explicit check that all the
summary geometry values are correct, then adjust the rtsummary i_size
checks to allow for the growfsrt failure case. Finally, flag post-eof
blocks in the summary file.
While we're at it, split the extent map checking so that we only call
xfs_bmapi_read once per extent instead of once per rtsummary block.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
I forgot that the superblock tracks the number of blocks that are in the
realtime bitmap, and that the rt bitmap file can have more blocks mapped
to the data fork than sb_rbmblocks if growfsrt fails.
So. Add to the rtbitmap scrubber an explicit check that sb_rextents and
sb_rbmblocks are correct, then adjust the rtbitmap i_size checks to
allow for the growfsrt failure case. Finally, flag post-eof blocks in
the rtbitmap.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Try to repair errors that we see in file CoW forks so that we don't do
stupid things like remap garbage into a file. There's not a lot we can
do with the COW fork -- the ondisk metadata record only that the COW
staging extents are owned by the refcount btree, which effectively means
that we can't reconstruct this incore structure from scratch.
Actually, this is even worse -- we can't touch written extents, because
those map space that are actively under writeback, and there's not much
to do with delalloc reservations. Hence we can only detect crosslinked
unwritten extents and fix them by punching out the problematic parts and
replacing them with delalloc extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Implement ranged queries for refcount records. The next patch will use
this to scan refcount data.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There are a couple of conditions that userspace can set to force repairs
of metadata. These really belong in the repair code and not open-coded
into the check code, so refactor them into a helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Use the reverse-mapping btree information to rebuild an inode block map.
Update the btree bulk loading code as necessary to support inode rooted
btrees and fix some bitrot problems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The attribute fork scrubber can optionally scan the reverse mapping
records of the filesystem to determine if the fork is missing mappings
that it should have. However, this is a very expensive operation, so we
only want to do this if we suspect that the fork is missing records.
For attribute forks the criteria for suspicion is that the attr fork is
in EXTENTS format and has zero extents.
However, there are several ways that a file can end up in this state
through regular filesystem usage. For example, an LSM can set a
s_security hook but then decide not to set an ACL; or an attr set can
create the attr fork but then the actual set operation fails with
ENOSPC; or we can delete all the attrs on a file whose data fork is in
btree format, in which case we do not delete the attr fork. We don't
want to run the expensive check for any case that can be arrived at
through regular operations.
However.
When online inode repair decides to zap an attribute fork, it cannot
determine if it is zapping ACL information. As a precaution it removes
all the discretionary access control permissions and sets the user and
group ids to zero. Check these three additional conditions to decide if
we want to scan the rmap records.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Back in commit a55e073088 ("xfs: only allow reaping of per-AG
blocks in xrep_reap_extents"), we removed from the reaping code the
ability to handle bmbt blocks. At the time, the reaping code only
walked single blocks, didn't correctly detect crosslinked blocks, and
the special casing made the function hard to understand. It was easier
to remove unneeded functionality prior to fixing all the bugs.
Now that we've fixed the problems, we want again the ability to reap
file metadata blocks. Reintroduce the per-file reaping functionality
atop the current implementation. We require that sc->sa is
uninitialized, so that we can use it to hold all the per-AG context for
a given extent.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>