linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-12 22:23:55 +00:00

Author	SHA1	Message	Date
Darrick J. Wong	215b2bf72a	xfs: fix dev_t usage in xmbuf tracepoints Fix some inconsistencies in the xmbuf tracepoints -- they should be reporting the major/minor of the filesystem that they're associated with, so that we have some clue on whose behalf the xmbuf was created. Fix the xmbuf_free tracepoint to report the same. Don't call the trace function until the xmbuf is fully initialized. Fixes: `5076a6040c` ("xfs: support in-memory buffer cache target") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-03-15 10:30:23 +05:30
Dave Chinner	75bcffbb9e	xfs: shrink failure needs to hold AGI buffer Chandan reported a AGI/AGF lock order hang on xfs/168 during recent testing. The cause of the problem was the task running xfs_growfs to shrink the filesystem. A failure occurred trying to remove the free space from the btrees that the shrink would make disappear, and that meant it ran the error handling for a partial failure. This error path involves restoring the per-ag block reservations, and that requires calculating the amount of space needed to be reserved for the free inode btree. The growfs operation hung here: [18679.536829] down+0x71/0xa0 [18679.537657] xfs_buf_lock+0xa4/0x290 [xfs] [18679.538731] xfs_buf_find_lock+0xf7/0x4d0 [xfs] [18679.539920] xfs_buf_lookup.constprop.0+0x289/0x500 [xfs] [18679.542628] xfs_buf_get_map+0x2b3/0xe40 [xfs] [18679.547076] xfs_buf_read_map+0xbb/0x900 [xfs] [18679.562616] xfs_trans_read_buf_map+0x449/0xb10 [xfs] [18679.569778] xfs_read_agi+0x1cd/0x500 [xfs] [18679.573126] xfs_ialloc_read_agi+0xc2/0x5b0 [xfs] [18679.578708] xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs] [18679.582480] xfs_ag_resv_init+0x2c5/0x490 [xfs] [18679.586023] xfs_ag_shrink_space+0x736/0xd30 [xfs] [18679.590730] xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs] [18679.599764] xfs_growfs_data+0x2f1/0x410 [xfs] [18679.602212] xfs_file_ioctl+0xd1e/0x1370 [xfs] trying to get the AGI lock. The AGI lock was held by a fstress task trying to do an inode allocation, and it was waiting on the AGF lock to allocate a new inode chunk on disk. Hence deadlock. The fix for this is for the growfs code to hold the AGI over the transaction roll it does in the error path. It already holds the AGF locked across this, and that is what causes the lock order inversion in the xfs_ag_resv_init() call. Reported-by: Chandan Babu R <chandanbabu@kernel.org> Fixes: `46141dc891` ("xfs: introduce xfs_ag_shrink_space()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-03-07 14:59:05 +05:30
Akira Yokosawa	8d4dd9d741	mm/shmem.c: Use new form of @param in kernel-doc Use the form of @param which kernel-doc recognizes now. This resolves the warnings from "make htmldocs" as reported in [1]. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: [1] https://lore.kernel.org/r/20240223153636.41358be5@canb.auug.org.au/ Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-29 09:49:02 +05:30
Akira Yokosawa	69fc23efc7	kernel-doc: Add unary operator * to $type_param_ref In kernel-doc comments, unary operator * collides with Sphinx/ docutil's markdown for emphasizing. This resulted in additional warnings from "make htmldocs": WARNING: Inline emphasis start-string without end-string. , as reported recently [1]. Those have been worked around either by escaping * (like \param) or by using inline-literal form of ``param``, both of which are specific to Sphinx/docutils. Such workarounds are against the kenrel-doc's ideal and should better be avoided. Instead, add "" to the list of unary operators kernel-doc recognizes and make the form of @param available in kernel-doc comments. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: [1] https://lore.kernel.org/r/20240223153636.41358be5@canb.auug.org.au/ Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-29 09:49:02 +05:30
Dave Chinner	b8c0d6fa41	xfs: use kvfree() in xlog_cil_free_logvec() The xfs_log_vec items are allocated by xlog_kvmalloc(), and so need to be freed with kvfree. This was missed when coverting from the kmem_free() API. Fixes: `4929257613` ("xfs: convert kmem_free() for kvmalloc users to kvfree()") Reported-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-28 14:04:45 +05:30
Dave Chinner	3aca0676a1	xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL This was missed in the conversion from KM* flags. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `10634530f7` ("xfs: convert kmem_zalloc() to kzalloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-28 14:04:30 +05:30
Darrick J. Wong	e610e856b9	xfs: fix scrub stats file permissions When the kernel is in lockdown mode, debugfs will only show files that are world-readable and cannot be written, mmaped, or used with ioctl. That more or less describes the scrub stats file, except that the permissions are wrong -- they should be 0444, not 0644. You can't write the stats file, so the 0200 makes no sense. Meanwhile, the clear_stats file is only writable, but it got mode 0400 instead of 0200, which would make more sense. Fix both files so that they make sense. Fixes: `d7a74cad8f` ("xfs: track usage statistics of online fsck") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-26 17:58:37 +05:30
Darrick J. Wong	1e5efd72a2	xfs: fix log recovery erroring out on refcount recovery failure Per the comment in the error case of xfs_reflink_recover_cow, zero out any error (after shutting down the log) so that we actually kill any new intent items that might have gotten logged by later recovery steps. Discovered by xfs/434, which few people actually seem to run. Fixes: `2c1e31ed5c` ("xfs: place intent recovery under NOFS allocation context") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-24 10:43:26 +05:30
Chandan Babu R	e6469b22bd	xfs: clean up symbolic link code [v29.3 18/18] This series cleans up a few bits of the symbolic link code as needed for future projects. Online repair requires the ability to commit fixed fork-based filesystem metadata such as directories, xattrs, and symbolic links atomically, so we need to rearrange the symlink code before we land the atomic extent swapping. Accomplish this by moving the remote symlink target block code and declarations to xfs_symlink_remote.[ch]. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYwAKCRBKO3ySh0YR ppvvAP0S+kTZ96zROb68pfy4xo5X0mcFvtuHQo4mc4Mu6UZf0AD+Lr/Xdnj/J9k1 8FEV933MFzWHINeeGUpaN8zgZBCvKA0= =Lgry -----END PGP SIGNATURE----- Merge tag 'symlink-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: clean up symbolic link code This series cleans up a few bits of the symbolic link code as needed for future projects. Online repair requires the ability to commit fixed fork-based filesystem metadata such as directories, xattrs, and symbolic links atomically, so we need to rearrange the symlink code before we land the atomic extent swapping. Accomplish this by moving the remote symlink target block code and declarations to xfs_symlink_remote.[ch]. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'symlink-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move symlink target write function to libxfs xfs: move remote symlink target read function to libxfs xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h	2024-02-24 10:39:07 +05:30
Chandan Babu R	6723ca9997	xfs: support attrfork and unwritten BUIs [v29.3 17/18] In preparation for atomic extent swapping and the online repair functionality that wants atomic extent swaps, enhance the BUI code so that we can support deferred work on the extended attribute fork and on unwritten extents. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYwAKCRBKO3ySh0YR pnrTAP9dlLFGNBbmsOBR2KxzC56gShXMXHzytjxdMhh4JjnyJgD/XVTJYSxZuX9z VbWq8ZyCZwvS/onTnQw7WQcdhIkDTAo= =WQpT -----END PGP SIGNATURE----- Merge tag 'expand-bmap-intent-usage_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: support attrfork and unwritten BUIs In preparation for atomic extent swapping and the online repair functionality that wants atomic extent swaps, enhance the BUI code so that we can support deferred work on the extended attribute fork and on unwritten extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'expand-bmap-intent-usage_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: xfs_bmap_finish_one should map unwritten extents properly xfs: support deferred bmap updates on the attr fork	2024-02-24 10:36:15 +05:30
Chandan Babu R	4e3f7e7ab8	xfs: widen BUI formats to support realtime [v29.3 16/18] Atomic extent swapping (and later, reverse mapping and reflink) on the realtime device needs to be able to defer file mapping and extent freeing work in much the same manner as is required on the data volume. Make the BUI log items operate on rt extents in preparation for atomic swapping and realtime rmap. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR poOrAPwPw7LSL7/19Q2pIr1UsI798Opw079T2n8FHAVK1weEzQD/TzGgglrTu0Fx CHKYwjtIDw8drMli/6XWSAcE1JcuIAo= =KBwk -----END PGP SIGNATURE----- Merge tag 'realtime-bmap-intents-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: widen BUI formats to support realtime Atomic extent swapping (and later, reverse mapping and reflink) on the realtime device needs to be able to defer file mapping and extent freeing work in much the same manner as is required on the data volume. Make the BUI log items operate on rt extents in preparation for atomic swapping and realtime rmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'realtime-bmap-intents-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: support recovering bmap intent items targetting realtime extents xfs: add a realtime flag to the bmap update log redo items xfs: fix xfs_bunmapi to allow unmapping of partial rt extents	2024-02-24 10:32:34 +05:30
Chandan Babu R	10ea6158b4	xfs: bmap log intent cleanups [v29.3 15/18] The next major target of online repair are metadata that are persisted in blocks mapped by a file fork. In other words, we want to repair directories, extended attributes, symbolic links, and the realtime free space information. For file-based metadata, we assume that the space metadata is correct, which enables repair to construct new versions of the metadata in a temporary file. We then need to swap the file fork mappings of the two files atomically. With this patchset, we begin constructing such a facility based on the existing bmap log items and a new extent swap log item. This series cleans up a few parts of the file block mapping log intent code before we start adding support for realtime bmap intents. Most of it involves cleaning up tracepoints so that more of the data extraction logic ends up in the tracepoint code and not the tracepoint call site, which should reduce overhead further when tracepoints are disabled. There is also a change to pass bmap intents all the way back to the bmap code instead of unboxing the intent values and re-boxing them after the _finish_one function completes. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR phAMAP9huan/JlVi4QwzTh363EQLk72fQxMleGvVLetY5ZMMOQEA/yUdBGbJyRxn D1gqrr68KFf61NKxrJir41djeeqKjQw= =ex04 -----END PGP SIGNATURE----- Merge tag 'bmap-intent-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: bmap log intent cleanups The next major target of online repair are metadata that are persisted in blocks mapped by a file fork. In other words, we want to repair directories, extended attributes, symbolic links, and the realtime free space information. For file-based metadata, we assume that the space metadata is correct, which enables repair to construct new versions of the metadata in a temporary file. We then need to swap the file fork mappings of the two files atomically. With this patchset, we begin constructing such a facility based on the existing bmap log items and a new extent swap log item. This series cleans up a few parts of the file block mapping log intent code before we start adding support for realtime bmap intents. Most of it involves cleaning up tracepoints so that more of the data extraction logic ends up in the tracepoint code and not the tracepoint call site, which should reduce overhead further when tracepoints are disabled. There is also a change to pass bmap intents all the way back to the bmap code instead of unboxing the intent values and re-boxing them after the _finish_one function completes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'bmap-intent-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: add a xattr_entry helper xfs: move xfs_bmap_defer_add to xfs_bmap_item.c xfs: reuse xfs_bmap_update_cancel_item xfs: add a bi_entry helper xfs: remove xfs_trans_set_bmap_flags xfs: clean up bmap log intent item tracepoint callsites xfs: split tracepoint classes for deferred items	2024-02-24 10:29:06 +05:30
Chandan Babu R	74acb70535	xfs: reduce refcount repair memory usage [v29.3 14/18] The refcountbt repair code has serious memory usage problems when the block sharing factor of the filesystem is very high. This can happen if a deduplication tool has been run against the filesystem, or if the fs stores reflinked VM images that have been aging for a long time. Recall that the original reference counting algorithm walks the reverse mapping records of the filesystem to generate reference counts. For any given block in the AG, the rmap bag structure contains the all rmap records that cover that block; the refcount is the size of that bag. For online repair, the bag doesn't need the owner, offset, or state flag information, so it discards those. This halves the record size, but the bag structure still stores one excerpted record for each reverse mapping. If the sharing count is high, this will use a LOT of memory storing redundant records. In the extreme case, 100k mappings to the same piece of space will consume 100k16 bytes = 1.6M of memory. For offline repair, the bag stores the owner values so that we know which inodes need to be marked as being reflink inodes. If a deduplication tool has been run and there are many blocks within a file pointing to the same physical space, this will stll use a lot of memory to store redundant records. The solution to this problem is to deduplicate the bag records when possible by adding a reference count to the bag record, and changing the bag add function to detect an existing record to bump the refcount. In the above example, the 100k mappings will now use 24 bytes of memory. These lookups can be done efficiently with a btree, so we create a new refcount bag btree type (inside of online repair). This is why we refactored the btree code in the previous patchset. The btree conversion also dramatically reduces the runtime of the refcount generation algorithm, because the code to delete all bag records that end at a given agblock now only has to delete one record instead of (using the example above) 100k records. As an added benefit, record deletion now gives back the unused xfile space, which it did not do previously. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR piYKAP49HMCG0qAUM78QVBjRYDUzTnzh4X052PbYXiGKvFf+xgD/ajUbn+pL12TH Et2H9pN6V7T+S74dwWdw+km02Mgo0wI= =HJC0 -----END PGP SIGNATURE----- Merge tag 'repair-refcount-scalability-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: reduce refcount repair memory usage The refcountbt repair code has serious memory usage problems when the block sharing factor of the filesystem is very high. This can happen if a deduplication tool has been run against the filesystem, or if the fs stores reflinked VM images that have been aging for a long time. Recall that the original reference counting algorithm walks the reverse mapping records of the filesystem to generate reference counts. For any given block in the AG, the rmap bag structure contains the all rmap records that cover that block; the refcount is the size of that bag. For online repair, the bag doesn't need the owner, offset, or state flag information, so it discards those. This halves the record size, but the bag structure still stores one excerpted record for each reverse mapping. If the sharing count is high, this will use a LOT of memory storing redundant records. In the extreme case, 100k mappings to the same piece of space will consume 100k16 bytes = 1.6M of memory. For offline repair, the bag stores the owner values so that we know which inodes need to be marked as being reflink inodes. If a deduplication tool has been run and there are many blocks within a file pointing to the same physical space, this will stll use a lot of memory to store redundant records. The solution to this problem is to deduplicate the bag records when possible by adding a reference count to the bag record, and changing the bag add function to detect an existing record to bump the refcount. In the above example, the 100k mappings will now use 24 bytes of memory. These lookups can be done efficiently with a btree, so we create a new refcount bag btree type (inside of online repair). This is why we refactored the btree code in the previous patchset. The btree conversion also dramatically reduces the runtime of the refcount generation algorithm, because the code to delete all bag records that end at a given agblock now only has to delete one record instead of (using the example above) 100k records. As an added benefit, record deletion now gives back the unused xfile space, which it did not do previously. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-refcount-scalability-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: port refcount repair to the new refcount bag structure xfs: create refcount bag structure for btree repairs xfs: define an in-memory btree for storing refcount bag info during repairs	2024-02-24 10:25:31 +05:30
Chandan Babu R	fd43925cad	xfs: online repair of rmap btrees [v29.3 13/18] We have now constructed the four tools that we need to scan the filesystem looking for reverse mappings: an inode scanner, hooks to receive live updates from other writer threads, the ability to construct btrees in memory, and a btree bulk loader. This series glues those three together, enabling us to scan the filesystem for mappings and keep it up to date while other writers run, and then commit the new btree to disk atomically. To reduce the size of each patch, the functionality is left disabled until the end of the series and broken up into three patches: one to create the mechanics of scanning the filesystem, a second to transition to in-memory btrees, and a third to set up the live hooks. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR plVEAP9K4IYNtRvVfC88M3x1fHLKU7EEhBubZtVx7IxWYvttRQEA+km0YW61p46G B459ut4jBZ78M//oZC0OmttRRWC0rgk= =WGZR -----END PGP SIGNATURE----- Merge tag 'repair-rmap-btree-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: online repair of rmap btrees We have now constructed the four tools that we need to scan the filesystem looking for reverse mappings: an inode scanner, hooks to receive live updates from other writer threads, the ability to construct btrees in memory, and a btree bulk loader. This series glues those three together, enabling us to scan the filesystem for mappings and keep it up to date while other writers run, and then commit the new btree to disk atomically. To reduce the size of each patch, the functionality is left disabled until the end of the series and broken up into three patches: one to create the mechanics of scanning the filesystem, a second to transition to in-memory btrees, and a third to set up the live hooks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-rmap-btree-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: hook live rmap operations during a repair operation xfs: create a shadow rmap btree during rmap repair xfs: repair the rmapbt xfs: create agblock bitmap helper to count the number of set regions xfs: create a helper to decide if a file mapping targets the rt volume	2024-02-24 10:22:15 +05:30
Chandan Babu R	8394a97c4b	xfs: support in-memory btrees [v29.3 12/18] Online repair of the reverse-mapping btrees presens some unique challenges. To construct a new reverse mapping btree, we must scan the entire filesystem, but we cannot afford to quiesce the entire filesystem for the potentially lengthy scan. For rmap btrees, therefore, we relax our requirements of totally atomic repairs. Instead, repairs will scan all inodes, construct a new reverse mapping dataset, format a new btree, and commit it before anyone trips over the corruption. This is exactly the same strategy as was used in the quotacheck and nlink scanners. Unfortunately, the xfarray cannot perform key-based lookups and is therefore unsuitable for supporting live updates. Luckily, we already a data structure that maintains an indexed rmap recordset -- the existing rmap btree code! Hence we port the existing btree and buffer target code to be able to create a btree using the xfile we developed earlier. Live hooks keep the in-memory btree up to date for any resources that have already been scanned. This approach is not maximally memory efficient, but we can use the same rmap code that we do everywhere else, which provides improved stability without growing the code base even more. Note that in-memory btree blocks are always page sized. This patchset modifies the kernel xfs buffer cache to be capable of using a xfile (aka a shmem file) as a backing device. It then augments the btree code to support creating btree cursors with buffers that come from a buftarg other than the data device (namely an xfile-backed buftarg). For the userspace xfs buffer cache, we instead use a memfd or an O_TMPFILE file as a backing device. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR pqHwAQCjqXCvbqrWgMC/oYWuuDkYul79mqZw7PF58tdWpsOnzwD/dkyt7FBpAaAA GCaCIpWwOPErZ0LMT2fH6CqZyndSxw8= =vdwY -----END PGP SIGNATURE----- Merge tag 'in-memory-btrees-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: support in-memory btrees Online repair of the reverse-mapping btrees presens some unique challenges. To construct a new reverse mapping btree, we must scan the entire filesystem, but we cannot afford to quiesce the entire filesystem for the potentially lengthy scan. For rmap btrees, therefore, we relax our requirements of totally atomic repairs. Instead, repairs will scan all inodes, construct a new reverse mapping dataset, format a new btree, and commit it before anyone trips over the corruption. This is exactly the same strategy as was used in the quotacheck and nlink scanners. Unfortunately, the xfarray cannot perform key-based lookups and is therefore unsuitable for supporting live updates. Luckily, we already a data structure that maintains an indexed rmap recordset -- the existing rmap btree code! Hence we port the existing btree and buffer target code to be able to create a btree using the xfile we developed earlier. Live hooks keep the in-memory btree up to date for any resources that have already been scanned. This approach is not maximally memory efficient, but we can use the same rmap code that we do everywhere else, which provides improved stability without growing the code base even more. Note that in-memory btree blocks are always page sized. This patchset modifies the kernel xfs buffer cache to be capable of using a xfile (aka a shmem file) as a backing device. It then augments the btree code to support creating btree cursors with buffers that come from a buftarg other than the data device (namely an xfile-backed buftarg). For the userspace xfs buffer cache, we instead use a memfd or an O_TMPFILE file as a backing device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'in-memory-btrees-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: launder in-memory btree buffers before transaction commit xfs: support in-memory btrees xfs: add a xfs_btree_ptrs_equal helper xfs: support in-memory buffer cache targets xfs: teach buftargs to maintain their own buffer hashtable	2024-02-24 10:18:39 +05:30
Chandan Babu R	aa8fb4bb7d	xfs: buftarg cleanups [v29.3 11/18] Clean up the buffer target code in preparation for adding the ability to target tmpfs files. That will enable the creation of in memory btrees. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYgAKCRBKO3ySh0YR pi3lAQCo1U4Qp5ftsgR2FdjpxKofWJr7qqpXFMvDZhQSCF1yTAEAiRz018fUVTtt Hb9xUmu0XrHErtJN++23/O0Q4TmM4A0= =T7wJ -----END PGP SIGNATURE----- Merge tag 'buftarg-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: buftarg cleanups Clean up the buffer target code in preparation for adding the ability to target tmpfs files. That will enable the creation of in memory btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'buftarg-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move setting bt_logical_sectorsize out of xfs_setsize_buftarg xfs: remove xfs_setsize_buftarg_early xfs: remove the xfs_buftarg_t typedef	2024-02-24 10:14:43 +05:30
Chandan Babu R	a7ade7e13d	xfs: btree readahead cleanups [v29.3 10/18] Minor cleanups for the btree block readahead code. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYQAKCRBKO3ySh0YR pmF2AQCvxmU219Wjaa25Bkor40UGpcgy4T5xbFdSFT8umdPxXAD+IdVsIUm6pPmk gr4+OPqIwJbGUaz3GJHg5qO6Wh+orQY= =rxd3 -----END PGP SIGNATURE----- Merge tag 'btree-readahead-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: btree readahead cleanups Minor cleanups for the btree block readahead code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'btree-readahead-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: split xfs_buf_rele for cached vs uncached buffers xfs: move and rename xfs_btree_read_bufl xfs: remove xfs_btree_reada_bufs xfs: remove xfs_btree_reada_bufl	2024-02-24 10:11:25 +05:30
Chandan Babu R	169c030a95	xfs: btree check cleanups [v29.3 09/18] Minor cleanups for the btree block pointer checking code. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYQAKCRBKO3ySh0YR psBVAP4ik1Cj5KPP75mJws4duDycYjy9Wm0ZucNjUD19iQbo9wEAqXwFZuUm/wuJ TbCPVZ9EWtqg7pInrOSK+GmoOjUciQM= =FpUY -----END PGP SIGNATURE----- Merge tag 'btree-check-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: btree check cleanups Minor cleanups for the btree block pointer checking code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'btree-check-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: factor out a __xfs_btree_check_lblock_hdr helper xfs: rename btree helpers that depends on the block number representation xfs: consolidate btree block verification xfs: tighten up validation of root block in inode forks xfs: remove the crc variable in __xfs_btree_check_lblock xfs: misc cleanups for __xfs_btree_check_sblock xfs: consolidate btree ptr checking xfs: open code xfs_btree_check_lptr in xfs_bmap_btree_to_extents xfs: simplify xfs_btree_check_lblock_siblings xfs: simplify xfs_btree_check_sblock_siblings	2024-02-24 10:08:27 +05:30
Chandan Babu R	ee138217c3	xfs: remove bc_btnum from btree cursors [v29.3 08/18] From Christoph Hellwig, This series continues the migration of btree geometry information out of the cursor structure and into the ops structure. This time around, we replace the btree type enumeration (btnum) with an explicit name string in the btree ops structure. This enables easy creation of /any/ new btree type without having to mess with libxfs. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYQAKCRBKO3ySh0YR ppPOAP9QUT/PSvFdaSSKr64DAIy/fNE5qusmDfmBeQY/uMqOAQEAr75VaDV77JA3 KYIXiQVu6siZrRzVOC7T6oANii7+Ugk= =1Tbm -----END PGP SIGNATURE----- Merge tag 'btree-remove-btnum-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: remove bc_btnum from btree cursors From Christoph Hellwig, This series continues the migration of btree geometry information out of the cursor structure and into the ops structure. This time around, we replace the btree type enumeration (btnum) with an explicit name string in the btree ops structure. This enables easy creation of /any/ new btree type without having to mess with libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'btree-remove-btnum-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: remove xfs_btnum_t xfs: pass a 'bool is_finobt' to xfs_inobt_insert xfs: split xfs_inobt_init_cursor xfs: split xfs_inobt_insert_sprec xfs: remove the which variable in xchk_iallocbt xfs: remove the btnum argument to xfs_inobt_count_blocks xfs: remove xfs_inobt_cur xfs: split xfs_allocbt_init_cursor xfs: refactor the btree cursor allocation logic in xchk_ag_btcur_init xfs: add a sick_mask to struct xfs_btree_ops xfs: add a name field to struct xfs_btree_ops xfs: split the agf_roots and agf_levels arrays xfs: remove xfs_bmbt_stage_cursor xfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor xfs: make staging file forks explicit xfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor xfs: remove xfs_rmapbt_stage_cursor xfs: fold xfs_rmapbt_init_common into xfs_rmapbt_init_cursor xfs: remove xfs_refcountbt_stage_cursor xfs: fold xfs_refcountbt_init_common into xfs_refcountbt_init_cursor xfs: remove xfs_inobt_stage_cursor xfs: fold xfs_inobt_init_common into xfs_inobt_init_cursor xfs: remove xfs_allocbt_stage_cursor xfs: fold xfs_allocbt_init_common into xfs_allocbt_init_cursor xfs: don't override bc_ops for staging btrees xfs: add a xfs_btree_init_ptr_from_cur xfs: move comment about two 2 keys per pointer in the rmap btree	2024-02-24 10:04:39 +05:30
Chandan Babu R	681cb87b6a	xfs: move btree geometry to ops struct [v29.3 07/18] This patchset prepares the generic btree code to allow for the creation of new btree types outside of libxfs. The end goal here is for online fsck to be able to create its own in-memory btrees that will be used to improve the performance (and reduce the memory requirements of) the refcount btree. To enable this, I decided that the btree ops structure is the ideal place to encode all of the geometry information about a btree. The btree ops struture already contains the buffer ops (and hence the btree block magic numbers) as well as the key and record sizes, so it doesn't seem all that farfetched to encode the XFS_BTREE_ flags that determine the geometry (ROOT_IN_INODE, LONG_PTRS, etc). The rest of the patchset cleans up the btree functions that initialize btree blocks and btree buffers. The bulk of this work is to replace btree geometry related function call arguments with a single pointer to the ops structure, and then clean up everything else around that. As a side effect, we rename the functions. Later, Christoph Hellwig and I merged together a bunch more cleanups that he wanted to do for a while. All the btree geometry information is now in the btree ops structure, we've created an explicit btree type (ag, inode, mem) and moved the per-btree type information to a separate union. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYQAKCRBKO3ySh0YR puvdAQDMAehLo8djvsi4dipw1+v7kgD11/H1sD4qsHQTc2UuGAEA3me2bHB36o1k bDL7Vmsin6BazStkoqCGhc18x8MW3w4= =0Fis -----END PGP SIGNATURE----- Merge tag 'btree-geometry-in-ops-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: move btree geometry to ops struct This patchset prepares the generic btree code to allow for the creation of new btree types outside of libxfs. The end goal here is for online fsck to be able to create its own in-memory btrees that will be used to improve the performance (and reduce the memory requirements of) the refcount btree. To enable this, I decided that the btree ops structure is the ideal place to encode all of the geometry information about a btree. The btree ops struture already contains the buffer ops (and hence the btree block magic numbers) as well as the key and record sizes, so it doesn't seem all that farfetched to encode the XFS_BTREE_ flags that determine the geometry (ROOT_IN_INODE, LONG_PTRS, etc). The rest of the patchset cleans up the btree functions that initialize btree blocks and btree buffers. The bulk of this work is to replace btree geometry related function call arguments with a single pointer to the ops structure, and then clean up everything else around that. As a side effect, we rename the functions. Later, Christoph Hellwig and I merged together a bunch more cleanups that he wanted to do for a while. All the btree geometry information is now in the btree ops structure, we've created an explicit btree type (ag, inode, mem) and moved the per-btree type information to a separate union. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'btree-geometry-in-ops-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: create predicate to determine if cursor is at inode root level xfs: split the per-btree union in struct xfs_btree_cur xfs: split out a btree type from the btree ops geometry flags xfs: store the btree pointer length in struct xfs_btree_ops xfs: factor out a btree block owner check xfs: factor out a xfs_btree_owner helper xfs: move the btree stats offset into struct btree_ops xfs: move lru refs to the btree ops structure xfs: set btree block buffer ops in _init_buf xfs: remove the unnecessary daddr paramter to _init_block xfs: btree convert xfs_btree_init_block to xfs_btree_init_buf calls xfs: rename btree block/buffer init functions xfs: initialize btree blocks using btree_ops structure xfs: extern some btree ops structures xfs: turn the allocbt cursor active field into a btree flag xfs: consolidate the xfs_alloc_lookup_* helpers xfs: remove bc_ino.flags xfs: encode the btree geometry flags in the btree ops structure xfs: fix imprecise logic in xchk_btree_check_block_owner xfs: drop XFS_BTREE_CRC_BLOCKS xfs: set the btree cursor bc_ops in xfs_btree_alloc_cursor xfs: consolidate btree block allocation tracepoints xfs: consolidate btree block freeing tracepoints	2024-02-24 10:01:16 +05:30
Chandan Babu R	5d1bd19d83	xfs: online repair for fs summary counters [v29.3 06/18] A longstanding deficiency in the online fs summary counter scrubbing code is that it hasn't any means to quiesce the incore percpu counters while it's running. There is no way to coordinate with other threads are reserving or freeing free space simultaneously, which leads to false error reports. Right now, if the discrepancy is large, we just sort of shrug and bail out with an incomplete flag, but this is lame. For repair activity, we actually /do/ need to stabilize the counters to get an accurate reading and install it in the percpu counter. To improve the former and enable the latter, allow the fscounters online fsck code to perform an exclusive mini-freeze on the filesystem. The exclusivity prevents userspace from thawing while we're running, and the mini-freeze means that we don't wait for the log to quiesce, which will make both speedier. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYQAKCRBKO3ySh0YR pjYgAQCHdKKwkD0bAZyKT6By30Vxiahey5HUrlLvg1QeGIzXlAD/e4oLwdfxYrcr 1wL7GTDix9eVJSMe7ZSgmxX4W9KVMAE= =CqKZ -----END PGP SIGNATURE----- Merge tag 'repair-fscounters-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: online repair for fs summary counters A longstanding deficiency in the online fs summary counter scrubbing code is that it hasn't any means to quiesce the incore percpu counters while it's running. There is no way to coordinate with other threads are reserving or freeing free space simultaneously, which leads to false error reports. Right now, if the discrepancy is large, we just sort of shrug and bail out with an incomplete flag, but this is lame. For repair activity, we actually /do/ need to stabilize the counters to get an accurate reading and install it in the percpu counter. To improve the former and enable the latter, allow the fscounters online fsck code to perform an exclusive mini-freeze on the filesystem. The exclusivity prevents userspace from thawing while we're running, and the mini-freeze means that we don't wait for the log to quiesce, which will make both speedier. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-fscounters-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: repair summary counters	2024-02-24 09:58:28 +05:30
Chandan Babu R	f107757953	xfs: indirect health reporting [v29.3 05/18] This series enables the XFS health reporting infrastructure to remember indirect health concerns when resources are scarce. For example, if a scrub notices that there's something wrong with an inode's metadata but memory reclaim needs to free the incore inode, we want to record in the perag data the fact that there was some inode somewhere with an error. The perag structures never go away. The first two patches in this series set that up, and the third one provides a means for xfs_scrub to tell the kernel that it can forget the indirect problem report. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYAAKCRBKO3ySh0YR pvzvAQD1oxlTpXYLjT5xnhQHpg/nYoXUQdOqFLrMTH5i8mm9VQD+JmiLvIUCD5Au K4qtmq9MSO4XzTKDbx9B4l/rn0H4uQ0= =H6az -----END PGP SIGNATURE----- Merge tag 'indirect-health-reporting-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: indirect health reporting This series enables the XFS health reporting infrastructure to remember indirect health concerns when resources are scarce. For example, if a scrub notices that there's something wrong with an inode's metadata but memory reclaim needs to free the incore inode, we want to record in the perag data the fact that there was some inode somewhere with an error. The perag structures never go away. The first two patches in this series set that up, and the third one provides a means for xfs_scrub to tell the kernel that it can forget the indirect problem report. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'indirect-health-reporting-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: update health status if we get a clean bill of health xfs: remember sick inodes that get inactivated xfs: add secondary and indirect classes to the health tracking system	2024-02-24 09:55:02 +05:30
Chandan Babu R	6fe1910e85	xfs: report corruption to the health trackers [v29.3 04/18] Any time that the runtime code thinks it has found corrupt metadata, it should tell the health tracking subsystem that the corresponding part of the filesystem is sick. These reports come primarily from two places -- code that is reading a buffer that fails validation, and higher level pieces that observe a conflict involving multiple buffers. This patchset uses automated scanning to update all such callsites with a mark_sick call. Doing this enables the health system to record problem observed at runtime, which (for now) can prompt the sysadmin to run xfs_scrub, and (later) may enable more targetted fixing of the filesystem. Note: Earlier reviewers of this patchset suggested that the verifier functions themselves should be responsible for calling _mark_sick. In a higher level language this would be easily accomplished with lambda functions and closures. For the kernel, however, we'd have to create the necessary closures by hand, pass them to the buf_read calls, and then implement necessary state tracking to detach the xfs_buf from the closure at the necessary time. This is far too much work and complexity and will not be pursued further. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYAAKCRBKO3ySh0YR pi9LAQCgKJHvlmaM5JUNPoa06YCqxvsrDHRWQ0j7cZ/xm9AGSAEAmhDay+1oCDef PIPxJr//SvxXByCzttjjOA+TAouGFA0= =2Xjd -----END PGP SIGNATURE----- Merge tag 'corruption-health-reports-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: report corruption to the health trackers Any time that the runtime code thinks it has found corrupt metadata, it should tell the health tracking subsystem that the corresponding part of the filesystem is sick. These reports come primarily from two places -- code that is reading a buffer that fails validation, and higher level pieces that observe a conflict involving multiple buffers. This patchset uses automated scanning to update all such callsites with a mark_sick call. Doing this enables the health system to record problem observed at runtime, which (for now) can prompt the sysadmin to run xfs_scrub, and (later) may enable more targetted fixing of the filesystem. Note: Earlier reviewers of this patchset suggested that the verifier functions themselves should be responsible for calling _mark_sick. In a higher level language this would be easily accomplished with lambda functions and closures. For the kernel, however, we'd have to create the necessary closures by hand, pass them to the buf_read calls, and then implement necessary state tracking to detach the xfs_buf from the closure at the necessary time. This is far too much work and complexity and will not be pursued further. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'corruption-health-reports-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: report XFS_IS_CORRUPT errors to the health system xfs: report realtime metadata corruption errors to the health system xfs: report quota block corruption errors to the health system xfs: report inode corruption errors to the health system xfs: report symlink block corruption errors to the health system xfs: report dir/attr block corruption errors to the health system xfs: report btree block corruption errors to the health system xfs: report block map corruption errors to the health tracking system xfs: report ag header corruption errors to the health tracking system xfs: report fs corruption errors to the health tracking system xfs: separate the marking of sick and checked metadata	2024-02-24 09:51:32 +05:30
Chandan Babu R	128d0fd1ab	xfs: online repair of file link counts [v29.3 03/18] Now that we've created the infrastructure to perform live scans of every file in the filesystem and the necessary hook infrastructure to observe live updates, use it to scan directories to compute the correct link counts for files in the filesystem, and reset those link counts. This patchset creates a tailored readdir implementation for scrub because the regular version has to cycle ILOCKs to copy information to userspace. We can't cycle the ILOCK during the nlink scan and we don't need all the other VFS support code (maintaining a readdir cursor and translating XFS structures to VFS structures and back) so it was easier to duplicate the code. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYAAKCRBKO3ySh0YR pjJwAP49J6UgZUDQMX4rQ5q2TbyHLnyvUzeiQ/Cc8IqXau5VxwD/U3QNPii5uUMw jNid2AMba4AkXkskU6fr4do8UwmPSQ4= =Dva/ -----END PGP SIGNATURE----- Merge tag 'scrub-nlinks-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: online repair of file link counts Now that we've created the infrastructure to perform live scans of every file in the filesystem and the necessary hook infrastructure to observe live updates, use it to scan directories to compute the correct link counts for files in the filesystem, and reset those link counts. This patchset creates a tailored readdir implementation for scrub because the regular version has to cycle ILOCKs to copy information to userspace. We can't cycle the ILOCK during the nlink scan and we don't need all the other VFS support code (maintaining a readdir cursor and translating XFS structures to VFS structures and back) so it was easier to duplicate the code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'scrub-nlinks-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: teach repair to fix file nlinks xfs: track directory entry updates during live nlinks fsck xfs: teach scrub to check file nlinks xfs: report health of inode link counts	2024-02-24 09:47:39 +05:30
Chandan Babu R	aa03f524a2	xfs: online repair of quota counters [v29.3 02/18] This series uses the inode scanner and live update hook functionality introduced in the last patchset to implement quotacheck on a live filesystem. The quotacheck scrubber builds an incore copy of the dquot resource usage counters and compares it to the live dquots to report discrepancies. If the user chooses to repair the quota counters, the repair function visits each incore dquot to update the counts from the live information. The live update hooks are key to keeping the incore copy up to date. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYAAKCRBKO3ySh0YR pl+VAQDUflOVAEIKqwm+EaFFkbW7esxF4UYTn5N9Vj0hiLhogAD/SBOf/3fF58AI kwRwHDBtbDesuwZbTnaCo7Vj7Hq33wM= =2VuF -----END PGP SIGNATURE----- Merge tag 'repair-quotacheck-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: online repair of quota counters This series uses the inode scanner and live update hook functionality introduced in the last patchset to implement quotacheck on a live filesystem. The quotacheck scrubber builds an incore copy of the dquot resource usage counters and compares it to the live dquots to report discrepancies. If the user chooses to repair the quota counters, the repair function visits each incore dquot to update the counts from the live information. The live update hooks are key to keeping the incore copy up to date. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-quotacheck-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: repair dquots based on live quotacheck results xfs: repair cannot update the summary counters when logging quota flags xfs: track quota updates during live quotacheck xfs: implement live quotacheck inode scan xfs: create a sparse load xfarray function xfs: create a helper to count per-device inode block usage xfs: create a xchk_trans_alloc_empty helper for scrub xfs: report the health of quota counts	2024-02-24 09:44:28 +05:30
Chandan Babu R	8e3ef44f9b	xfs: repair inode mode by scanning dirs [v29.3 01/18] One missing piece of functionality in the inode record repair code is figuring out what to do with a file whose mode is so corrupt that we cannot tell us the type of the file. Originally this was done by guessing the mode from the ondisk inode contents, but Christoph didn't like that because it read from data fork block 0, which could be user controlled data. Therefore, I've replaced all that with a directory scanner that looks for any dirents that point to the file with the garbage mode. If so, the ftype in the dirent will tell us exactly what mode to set on the file. Since users cannot directly write to the ftype field of a dirent, this should be safe. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZdlBYAAKCRBKO3ySh0YR pvYaAQCdHMP4V94sw+jI46FHgqBAEEuqZUjq8cHwZPrtDzZc5QEA4fsbzR8yXJsw imsHjftkRhSEav0LVXhPCbaFfphbUQ0= =O1cc -----END PGP SIGNATURE----- Merge tag 'repair-inode-mode-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC xfs: repair inode mode by scanning dirs One missing piece of functionality in the inode record repair code is figuring out what to do with a file whose mode is so corrupt that we cannot tell us the type of the file. Originally this was done by guessing the mode from the ondisk inode contents, but Christoph didn't like that because it read from data fork block 0, which could be user controlled data. Therefore, I've replaced all that with a directory scanner that looks for any dirents that point to the file with the garbage mode. If so, the ftype in the dirent will tell us exactly what mode to set on the file. Since users cannot directly write to the ftype field of a dirent, this should be safe. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-inode-mode-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: repair file modes by scanning for a dirent pointing to us xfs: create a macro for decoding ftypes in tracepoints xfs: create a predicate to determine if two xfs_names are the same xfs: create a static name for the dot entry too xfs: iscan batching should handle unallocated inodes too xfs: cache a bunch of inodes for repair scans xfs: stagger the starting AG of scrub iscans to reduce contention xfs: allow scrub to hook metadata updates in other writers xfs: implement live inode scan for scrub xfs: speed up xfs_iwalk_adjust_start a little bit	2024-02-24 09:40:39 +05:30
Darrick J. Wong	b8102b61f7	xfs: move symlink target write function to libxfs Move xfs_symlink_write_target to xfs_symlink_remote.c so that kernel and mkfs can share the same function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:52:37 -08:00
Darrick J. Wong	376b4f0522	xfs: move remote symlink target read function to libxfs Move xfs_readlink_bmap_ilocked to xfs_symlink_remote.c so that the swapext code can use it to convert a remote format symlink back to shortform format after a metadata repair. While we're at it, fix a broken printf prefix. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:45:17 -08:00
Darrick J. Wong	622d88e2ad	xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h Move declarations for libxfs symlink functions into a separate header file like we do for most everything else. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:45:01 -08:00
Darrick J. Wong	6c8127e93e	xfs: xfs_bmap_finish_one should map unwritten extents properly The deferred bmap work state and the log item can transmit unwritten state, so the XFS_BMAP_MAP handler must map in extents with that unwritten state. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:45:00 -08:00
Darrick J. Wong	52f807067b	xfs: support deferred bmap updates on the attr fork The deferred bmap update log item has always supported the attr fork, so plumb this in so that higher layers can access this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:32 -08:00
Darrick J. Wong	1b5453baed	xfs: support recovering bmap intent items targetting realtime extents Now that we have reflink on the realtime device, bmap intent items have to support remapping extents on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:24 -08:00
Darrick J. Wong	7302cda7f8	xfs: add a realtime flag to the bmap update log redo items Extend the bmap update (BUI) log items with a new realtime flag that indicates that the updates apply against a realtime file's data fork. We'll wire up the actual code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:23 -08:00
Darrick J. Wong	c75f1a2c15	xfs: add a xattr_entry helper Add a helper to translate from the item list head to the attr_intent item structure and use it so shorten assignments and avoid the need for extra local variables. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:22 -08:00
Darrick J. Wong	2b6a5ec268	xfs: fix xfs_bunmapi to allow unmapping of partial rt extents When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to remove part of a block mapping without touching the allocator. For realtime files with rtextsize > 1, that also means that we should skip all the code that changes a partial remove request into an unwritten extent conversion. IOWs, bunmapi in this mode should handle removing the mapping from the rt file and nothing else. Note that XFS_BMAPI_REMAP callers are required to decrement the reference count and/or free the space manually. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:22 -08:00
Darrick J. Wong	8028411585	xfs: move xfs_bmap_defer_add to xfs_bmap_item.c Move the code that adds the incore xfs_bmap_item deferred work data to a transaction live with the BUI log item code. This means that the file mapping code no longer has to know about the inner workings of the BUI log items. As a consequence, we can hide the _get_group helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:21 -08:00
Darrick J. Wong	5d3d0a6ad2	xfs: reuse xfs_bmap_update_cancel_item Reuse xfs_bmap_update_cancel_item to put the AG/RTG and free the item in a few places that currently open code the logic. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:20 -08:00
Darrick J. Wong	de47e4c9ad	xfs: add a bi_entry helper Add a helper to translate from the item list head to the bmap_intent structure and use it so shorten assignments and avoid the need for extra local variables. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:19 -08:00
Darrick J. Wong	372fe0b8ce	xfs: remove xfs_trans_set_bmap_flags Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:19 -08:00
Darrick J. Wong	2a15e76860	xfs: clean up bmap log intent item tracepoint callsites Pass the incore bmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:53 -08:00
Darrick J. Wong	ef2d4a00df	xfs: split tracepoint classes for deferred items We're about to start adding support for deferred log intent items for realtime extents, so split these four types into separate classes so that we can customize them as the transition happens. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:43 -08:00
Darrick J. Wong	7fbaab57a8	xfs: port refcount repair to the new refcount bag structure Port the refcount record generating code to use the new refcount bag data structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:42 -08:00
Darrick J. Wong	7a2192ac10	xfs: create refcount bag structure for btree repairs Create a bag structure for refcount information that uses the refcount bag btree defined in the previous patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:41 -08:00
Darrick J. Wong	7e1b84b24d	xfs: hook live rmap operations during a repair operation Hook the regular rmap code when an rmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:40 -08:00
Darrick J. Wong	18a1e644b0	xfs: define an in-memory btree for storing refcount bag info during repairs Create a new in-memory btree type so that we can store refcount bag info in a much more memory-efficient and performant format. Recall that the refcount recordset regenerator computes the new recordset from browsing the rmap records. Let's say that the rmap records are: {agbno: 10, length: 40, ...} {agbno: 11, length: 3, ...} {agbno: 12, length: 20, ...} {agbno: 15, length: 1, ...} It is convenient to have a data structure that could quickly tell us the refcount for an arbitrary agbno without wasting memory. An array or a list could do that pretty easily. List suck because of the pointer overhead. xfarrays are a lot more compact, but we want to minimize sparse holes in the xfarray to constrain memory usage. Maintaining any kind of record order isn't needed for correctness, so I created the "rcbag", which is shorthand for an unordered list of (excerpted) reverse mappings. So we add the first rmap to the rcbag, and it looks like: 0: {agbno: 10, length: 40} The refcount for agbno 10 is 1. Then we move on to block 11, so we add the second rmap: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} The refcount for agbno 11 is 2. We move on to block 12, so we add the third: 0: {agbno: 10, length: 40} 1: {agbno: 11, length: 3} 2: {agbno: 12, length: 20} The refcount for agbno 12 and 13 is 3. We move on to block 14, and remove the second rmap: 0: {agbno: 10, length: 40} 1: NULL 2: {agbno: 12, length: 20} The refcount for agbno 14 is 2. We move on to block 15, and add the last rmap. But we don't care where it is and we don't want to expand the array so we put it in slot 1: 0: {agbno: 10, length: 40} 1: {agbno: 15, length: 1} 2: {agbno: 12, length: 20} The refcount for block 15 is 3. Notice how order doesn't matter in this list? That's why repair uses an unordered list, or "bag". The data structure is not a set because it does not guarantee uniqueness. That said, adding and removing specific items is now an O(n) operation because we have no idea where that item might be in the list. Overall, the runtime is O(n^2) which is bad. I realized that I could easily refactor the btree code and reimplement the refcount bag with an xfbtree. Adding and removing is now O(log2 n), so the runtime is at least O(n log2 n), which is much faster. In the end, the rcbag becomes a sorted list, but that's merely a detail of the implementation. The repair code doesn't care. (Note: That horrible xfs_db bmap_inflate command can be used to exercise this sort of rcbag insanity by cranking up refcounts quickly.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:40 -08:00
Darrick J. Wong	4787fc8027	xfs: create a shadow rmap btree during rmap repair Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:39 -08:00
Darrick J. Wong	32080a9b9b	xfs: repair the rmapbt Rebuild the reverse mapping btree from all primary metadata. This first patch establishes the bare mechanics of finding records and putting together a new ondisk tree; more complex pieces are needed to make it work properly. Link: Documentation/filesystems/xfs-online-fsck-design.rst Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:38 -08:00
Darrick J. Wong	e4fd1def30	xfs: create agblock bitmap helper to count the number of set regions In the next patch, the rmap btree repair code will need to estimate the size of the new ondisk rmapbt. The size is a function of the number of records that will be written to disk, and the size of the recordset is the number of observations made while scanning the filesystem plus the number of OWN_AG records that will be injected into the rmap btree. OWN_AG rmap records track the free space btrees, the AGFL, and the new rmap btree itself. The repair tool uses a bitmap to record the space used for all four structures, which is why we need a function to count the number of set regions. A reviewer requested that this be pulled into a separate patch with its own justification, so here it is. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:37 -08:00
Darrick J. Wong	0dc63c8a1c	xfs: launder in-memory btree buffers before transaction commit As we've noted in various places, all current users of in-memory btrees are online fsck. Online fsck only stages a btree long enough to rebuild an ondisk data structure, which means that the in-memory btree is ephemeral. Furthermore, if we encounter /any/ errors while updating an in-memory btree, all we do is tear down all the staged data and return an errno to userspace. In-memory btrees need not be transactional, so their buffers should not be committed to the ondisk log, nor should they be checkpointed by the AIL. That's just as well since the ephemeral nature of the btree means that the buftarg and the buffers may disappear quickly anyway. Therefore, we need a way to launder the btree buffers that get attached to the transaction by the generic btree code. Because the buffers are directly mapped to backing file pages, there's no need to bwrite them back to the tmpfs file. All we need to do is clean enough of the buffer log item state so that the bli can be detached from the buffer, remove the bli from the transaction's log item list, and reset the transaction dirty state as if the laundered items had never been there. For simplicity, create xfbtree transaction commit and cancel helpers that launder the in-memory btree buffers for callers. Once laundered, call the write verifier on non-stale buffers to avoid integrity issues, or punch a hole in the backing file for stale buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:36 -08:00
Darrick J. Wong	5049ff4d14	xfs: create a helper to decide if a file mapping targets the rt volume Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:43:36 -08:00

1 2 3 4 5 ...

1250128 Commits