linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-23 12:42:02 +00:00

Author	SHA1	Message	Date
Darrick J. Wong	6ece924b95	xfs: create separate structures and code for u32 bitmaps Create a version of the xbitmap that handles 32-bit integer intervals and adapt the xfs_agblock_t bitmap to use it. This reduces the size of the interval tree nodes from 48 to 36 bytes and enables us to use a more efficient slab (:0000040 instead of :0000048) which allows us to pack more nodes into a single slab page (102 vs 85). As a side effect, the users of these bitmaps no longer have to convert between u32 and u64 quantities just to use the bitmap; and the hairy overflow checking code in xagb_bitmap_test goes away. Later in this patchset we're going to add bitmaps for xfs_agino_t, xfs_rgblock_t, and xfs_dablk_t, so the increase in code size (5622 vs. 9959 bytes) seems worth it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:30 -08:00
Darrick J. Wong	e069d54970	xfs: constrain dirty buffers while formatting a staged btree Constrain the number of dirty buffers that are locked by the btree staging code at any given time by establishing a threshold at which we put them all on the delwri queue and push them to disk. This limits memory consumption while writing out new btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:29 -08:00
Darrick J. Wong	6dfeb0c2ec	xfs: move btree bulkload record initialization to ->get_record implementations When we're performing a bulk load of a btree, move the code that actually stores the btree record in the new btree block out of the generic code and into the individual ->get_record implementations. This is preparation for being able to store multiple records with a single indirect call. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:29 -08:00
Darrick J. Wong	a20ffa7d9f	xfs: add debug knobs to control btree bulk load slack factors Add some debug knobs so that we can control the leaf and node block slack when rebuilding btrees. For developers, it might be useful to construct btrees of various heights by crafting a filesystem with a certain number of records and then using repair+knobs to rebuild the index with a certain shape. Practically speaking, you'd only ever do that for extreme stress testing of the runtime code or the btree generator. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:28 -08:00
Darrick J. Wong	26de64629d	xfs: read leaf blocks when computing keys for bulkloading into node blocks When constructing a new btree, xfs_btree_bload_node needs to read the btree blocks for level N to compute the keyptrs for the blocks that will be loaded into level N+1. The level N blocks must be formatted at that point. A subsequent patch will change the btree bulkloader to write new btree blocks in 256K chunks to moderate memory consumption if the new btree is very large. As a consequence of that, it's possible that the buffers for lower level blocks might have been reclaimed by the time the node builder comes back to the block. Therefore, change xfs_btree_bload_node to read the lower level blocks to handle the reclaimed buffer case. As a side effect, the read will increase the LRU refs, which will bias towards keeping new btree buffers in memory after the new btree commits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:28 -08:00
Darrick J. Wong	c1e0f8e6fb	xfs: set XBF_DONE on newly formatted btree block that are ready for writing The btree bulkloading code calls xfs_buf_delwri_queue_here when it has finished formatting a new btree block and wants to queue it to be written to disk. Once the new btree root has been committed, the blocks (and hence the buffers) will be accessible to the rest of the filesystem. Mark each new buffer as DONE when adding it to the delwri list so that the next btree traversal can skip reloading the contents from disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:27 -08:00
Darrick J. Wong	13ae04d8d4	xfs: force all buffers to be written during btree bulk load While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffers with incorrect DELWRI_Q state. Looking further, I observed this race between the AIL trying to write out a btree block and repair zapping a btree block after the fact: AIL: Repair0: pin buffer X delwri_queue: set DELWRI_Q add to delwri list stale buf X: clear DELWRI_Q does not clear b_list free space X commit delwri_submit # oops Worse yet, I discovered that running the same repair over and over in a tight loop can result in a second race that cause data integrity problems with the repair: AIL: Repair0: Repair1: pin buffer X delwri_queue: set DELWRI_Q add to delwri list stale buf X: clear DELWRI_Q does not clear b_list free space X commit find free space X get buffer rewrite buffer delwri_queue: set DELWRI_Q already on a list, do not add commit BAD: committed tree root before all blocks written delwri_submit # too late now I traced this to my own misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. Marking the buffer stale clears DELWRI_Q from the buffer state without removing the buffer from its delwri list. The buffer doesn't know which list it's on, so it cannot know which lock to take to protect the list for a removal. If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep. When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list. This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency. Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag. To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. By waiting for the buffer to clear both the delwri list and any potential delwri wait list, we can be sure that repair will initiate writes of all buffers and report all write errors back to userspace instead of committing the new structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-15 10:03:27 -08:00
Dave Chinner	0573676fdd	xfs: initialise di_crc in xfs_log_dinode Alexander Potapenko report that KMSAN was issuing these warnings: kmalloc-ed xlog buffer of size 512 : ffff88802fc26200 kmalloc-ed xlog buffer of size 368 : ffff88802fc24a00 kmalloc-ed xlog buffer of size 648 : ffff88802b631000 kmalloc-ed xlog buffer of size 648 : ffff88802b632800 kmalloc-ed xlog buffer of size 648 : ffff88802b631c00 xlog_write_iovec: copying 12 bytes from ffff888017ddbbd8 to ffff88802c300400 xlog_write_iovec: copying 28 bytes from ffff888017ddbbe4 to ffff88802c30040c xlog_write_iovec: copying 68 bytes from ffff88802fc26274 to ffff88802c300428 xlog_write_iovec: copying 188 bytes from ffff88802fc262bc to ffff88802c30046c ===================================================== BUG: KMSAN: uninit-value in xlog_write_iovec fs/xfs/xfs_log.c:2227 BUG: KMSAN: uninit-value in xlog_write_full fs/xfs/xfs_log.c:2263 BUG: KMSAN: uninit-value in xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532 xlog_write_iovec fs/xfs/xfs_log.c:2227 xlog_write_full fs/xfs/xfs_log.c:2263 xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532 xlog_cil_write_chain fs/xfs/xfs_log_cil.c:918 xlog_cil_push_work+0x30f2/0x44e0 fs/xfs/xfs_log_cil.c:1263 process_one_work kernel/workqueue.c:2630 process_scheduled_works+0x1188/0x1e30 kernel/workqueue.c:2703 worker_thread+0xee5/0x14f0 kernel/workqueue.c:2784 kthread+0x391/0x500 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 Uninit was created at: slab_post_alloc_hook+0x101/0xac0 mm/slab.h:768 slab_alloc_node mm/slub.c:3482 __kmem_cache_alloc_node+0x612/0xae0 mm/slub.c:3521 __do_kmalloc_node mm/slab_common.c:1006 __kmalloc+0x11a/0x410 mm/slab_common.c:1020 kmalloc ./include/linux/slab.h:604 xlog_kvmalloc fs/xfs/xfs_log_priv.h:704 xlog_cil_alloc_shadow_bufs fs/xfs/xfs_log_cil.c:343 xlog_cil_commit+0x487/0x4dc0 fs/xfs/xfs_log_cil.c:1574 __xfs_trans_commit+0x8df/0x1930 fs/xfs/xfs_trans.c:1017 xfs_trans_commit+0x30/0x40 fs/xfs/xfs_trans.c:1061 xfs_create+0x15af/0x2150 fs/xfs/xfs_inode.c:1076 xfs_generic_create+0x4cd/0x1550 fs/xfs/xfs_iops.c:199 xfs_vn_create+0x4a/0x60 fs/xfs/xfs_iops.c:275 lookup_open fs/namei.c:3477 open_last_lookups fs/namei.c:3546 path_openat+0x29ac/0x6180 fs/namei.c:3776 do_filp_open+0x24d/0x680 fs/namei.c:3809 do_sys_openat2+0x1bc/0x330 fs/open.c:1440 do_sys_open fs/open.c:1455 __do_sys_openat fs/open.c:1471 __se_sys_openat fs/open.c:1466 __x64_sys_openat+0x253/0x330 fs/open.c:1466 do_syscall_x64 arch/x86/entry/common.c:51 do_syscall_64+0x4f/0x140 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b arch/x86/entry/entry_64.S:120 Bytes 112-115 of 188 are uninitialized Memory access of size 188 starts at ffff88802fc262bc This is caused by the struct xfs_log_dinode not having the di_crc field initialised. Log recovery never uses this field (it is only present these days for on-disk format compatibility reasons) and so it's value is never checked so nothing in XFS has caught this. Further, none of the uninitialised memory access warning tools have caught this (despite catching other uninit memory accesses in the struct xfs_log_dinode back in 2017!) until recently. Alexander annotated the XFS code to get the dump of the actual bytes that were detected as uninitialised, and from that report it took me about 30s to realise what the issue was. The issue was introduced back in 2016 and every inode that is logged fails to initialise this field. This is no actual bad behaviour caused by this issue - I find it hard to even classify it as a bug... Reported-and-tested-by: Alexander Potapenko <glider@google.com> Fixes: `f8d55aa052` ("xfs: introduce inode log format object") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-15 09:33:29 +05:30
Darrick J. Wong	c0e37f07d2	xfs: fix an off-by-one error in xreap_agextent_binval Overall, this function tries to find and invalidate all buffers for a given extent of space on the data device. The inner for loop in this function tries to find all xfs_bufs for a given daddr. The lengths of all possible cached buffers range from 1 fsblock to the largest needed to contain a 64k xattr value (~17fsb). The scan is capped to avoid looking at anything buffer going past the given extent. Unfortunately, the loop continuation test is wrong -- max_fsbs is the largest size we want to scan, not one past that. Put another way, this loop is actually 1-indexed, not 0-indexed. Therefore, the continuation test should use <=, not <. As a result, online repairs of btree blocks fails to stale any buffers for btrees that are being torn down, which causes later assertions in the buffer cache when another thread creates a different-sized buffer. This happens in xfs/709 when allocating an inode cluster buffer: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3346128 at fs/xfs/xfs_message.c:104 assfail+0x3a/0x40 [xfs] CPU: 0 PID: 3346128 Comm: fsstress Not tainted 6.7.0-rc4-djwx #rc4 RIP: 0010:assfail+0x3a/0x40 [xfs] Call Trace: <TASK> _xfs_buf_obj_cmp+0x4a/0x50 xfs_buf_get_map+0x191/0xba0 xfs_trans_get_buf_map+0x136/0x280 xfs_ialloc_inode_init+0x186/0x340 xfs_ialloc_ag_alloc+0x254/0x720 xfs_dialloc+0x21f/0x870 xfs_create_tmpfile+0x1a9/0x2f0 xfs_rename+0x369/0xfd0 xfs_vn_rename+0xfa/0x170 vfs_rename+0x5fb/0xc30 do_renameat2+0x52d/0x6e0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x3b/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e A later refactoring patch in the online repair series fixed this by accident, which is why I didn't notice this until I started testing only the patches that are likely to end up in 6.8. Fixes: `1c7ce115e5` ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-15 09:31:14 +05:30
Eric Sandeen	84712492e6	xfs: short circuit xfs_growfs_data_private() if delta is zero Although xfs_growfs_data() doesn't call xfs_growfs_data_private() if in->newblocks == mp->m_sb.sb_dblocks, xfs_growfs_data_private() further massages the new block count so that we don't i.e. try to create a too-small new AG. This may lead to a delta of "0" in xfs_growfs_data_private(), so we end up in the shrink case and emit the EXPERIMENTAL warning even if we're not changing anything at all. Fix this by returning straightaway if the block delta is zero. (nb: in older kernels, the result of entering the shrink case with delta == 0 may actually let an -ENOSPC escape to userspace, which is confusing for users.) Fixes: `fb2fc17201` ("xfs: support shrinking unused space in the last AG") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-15 09:26:57 +05:30
Christoph Hellwig	603ce8ab12	xfs: pass the defer ops directly to xfs_defer_add Pass a pointer to the xfs_defer_op_type structure to xfs_defer_add and remove the indirection through the xfs_defer_ops_type enum and a global table of all possible operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:13:52 +05:30
Christoph Hellwig	dc22af6436	xfs: pass the defer ops instead of type to xfs_defer_start_recovery xfs_defer_start_recovery is only called from xlog_recover_intent_item, and the callers of that all have the actual xfs_defer_ops_type operation vector at hand. Pass that directly instead of looking it up from the defer_op_types table. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:13:38 +05:30
Christoph Hellwig	7f2f7531e0	xfs: store an ops pointer in struct xfs_defer_pending The dfp_type field in struct xfs_defer_pending is only used to either look up the operations associated with the pending word or in trace points. Replace it with a direct pointer to the operations vector, and store a pretty name in the vector for tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:10:34 +05:30
Christoph Hellwig	2e8f7b6f4a	xfs: move xfs_attr_defer_type up in xfs_attr_item.c We'll reference it directly in xlog_recover_attri_commit_pass2, so move it up a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:10:33 +05:30
Christoph Hellwig	c00eebd09e	xfs: consolidate the xfs_attr_defer_* helpers Consolidate the xfs_attr_defer_* helpers into a single xfs_attr_defer_add one that picks the right dela_state based on the passed in operation. Also move to a single trace point as the actual operation is visible through the flags in the delta_state passed to the trace point. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:10:33 +05:30
Chandan Babu R	19b366dae1	xfs: fix growfsrt failure during rt volume attach One more series to fix a transaction reservation overrun while trying to attach a very large rt volume to a filesystem. This has been running on the djcloud for months with no problems. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXozjwAKCRBKO3ySh0YR pm0UAQD1NR8DZMki/y1e3m6ZsgwXIKwPtGlZ7/UxYsSoBqPQVwD8Cx2zJVXfeLH/ Be9sDChrnC3Wz9Z6WhjSIHANMy/mFAw= =bR3a -----END PGP SIGNATURE----- Merge tag 'fix-growfsrt-failures-6.8_2023-12-13' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeB xfs: fix growfsrt failure during rt volume attach One more series to fix a transaction reservation overrun while trying to attach a very large rt volume to a filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-growfsrt-failures-6.8_2023-12-13' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: recompute growfsrtfree transaction reservation while growing rt volume	2023-12-14 10:50:35 +05:30
Darrick J. Wong	578bd4ce71	xfs: recompute growfsrtfree transaction reservation while growing rt volume While playing with growfs to create a 20TB realtime section on a filesystem that didn't previously have an rt section, I noticed that growfs would occasionally shut down the log due to a transaction reservation overflow. xfs_calc_growrtfree_reservation uses the current size of the realtime summary file (m_rsumsize) to compute the transaction reservation for a growrtfree transaction. The reservations are computed at mount time, which means that m_rsumsize is zero when growfs starts "freeing" the new realtime extents into the rt volume. As a result, the transaction is undersized and fails. Fix this by recomputing the transaction reservations every time we change m_rsumsize. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-13 14:16:27 -08:00
Christoph Hellwig	18793e0505	xfs: move xfs_ondisk.h to libxfs/ Move xfs_ondisk.h to libxfs so that we can do the struct sanity checks in userspace libxfs as well. This should allow us to retire the somewhat fragile xfs/122 test on xfstests. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 15:15:29 +05:30
Christoph Hellwig	c12c50393c	xfs: use static_assert to check struct sizes and offsets Use the compiler-provided static_assert built-in from C11 instead of the kernel-specific BUILD_BUG_ON_MSG for the structure size and offset checks in xfs_ondisk. This not only gives slightly nicer error messages in case things go south, but can also be trivially used as-is in userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 15:15:29 +05:30
Zhang Tianci	fd45ddb9dd	xfs: extract xfs_da_buf_copy() helper function This patch does not modify logic. xfs_da_buf_copy() will copy one block from src xfs_buf to dst xfs_buf, and update the block metadata in dst directly. Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:57:14 +05:30
Zhang Tianci	5759aa4f95	xfs: update dir3 leaf block metadata after swap xfs_da3_swap_lastblock() copy the last block content to the dead block, but do not update the metadata in it. We need update some metadata for some kinds of type block, such as dir3 leafn block records its blkno, we shall update it to the dead block blkno. Otherwise, before write the xfs_buf to disk, the verify_write() will fail in blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown. We will get this warning: XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 128 bytes of corrupted metadata buffer: 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=....... 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................ 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC.. 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s...... 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@.... 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I...... 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H. 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M. XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1 XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem XFS (dm-0): Please umount the filesystem and rectify the problem(s) >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record its blkno is 0x1a0. Fixes: `24df33b45e` ("xfs: add CRC checking to dir2 leaf blocks") Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:57:14 +05:30
Jiachen Zhang	e6af9c98cb	xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real In the case of returning -ENOSPC, ensure logflagsp is initialized by 0. Otherwise the caller __xfs_bunmapi will set uninitialized illegal tmp_logflags value into xfs log, which might cause unpredictable error in the log recovery procedure. Also, remove the flags variable and set the *logflagsp directly, so that the code should be more robust in the long run. Fixes: `1b24b633aa` ("xfs: move some more code into xfs_bmap_del_extent_real") Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:57:14 +05:30
Christoph Hellwig	08e54ca42d	xfs: clean up xfs_fsops.h Use struct types instead of typedefs so that the header can be included with pulling in the headers that define the typedefs, and remove the pointless externs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:51:07 +05:30
Christoph Hellwig	646ddf0c4d	xfs: clean up the xfs_reserve_blocks interface xfs_reserve_blocks has a very odd interface that can only be explained by it directly deriving from the IRIX fcntl handler back in the day. Split reporting out the reserved blocks out of xfs_reserve_blocks into the only caller that cares. This means that the value reported from XFS_IOC_SET_RESBLKS isn't atomically sampled in the same critical section as when it was set anymore, but as the values could change right after setting them anyway that does not matter. It does provide atomic sampling of both values for XFS_IOC_GET_RESBLKS now, though. Also pass a normal scalar integer value for the requested value instead of the pointless pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:51:07 +05:30
Christoph Hellwig	c2c2620de7	xfs: clean up the XFS_IOC_FSCOUNTS handler Split XFS_IOC_FSCOUNTS out of the main xfs_file_ioctl function, and merge the xfs_fs_counts helper into the ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:51:07 +05:30
Christoph Hellwig	64f08b152a	xfs: clean up the XFS_IOC_{GS}ET_RESBLKS handler The XFS_IOC_GET_RESBLKS and XFS_IOC_SET_RESBLKS already share a fair amount of code, and will share even more soon. Move the logic for both of them out of the main xfs_file_ioctl function into a xfs_ioctl_getset_resblocks helper to share the code and prepare for additional changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:51:07 +05:30
Bagas Sanjaya	011f129fee	Documentation: xfs: consolidate XFS docs into its own subdirectory XFS docs are currently in upper-level Documentation/filesystems. Although these are currently 4 docs, they are already outstanding as a group and can be moved to its own subdirectory. Consolidate them into Documentation/filesystems/xfs/. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:49:13 +05:30
Shiyang Ruan	fa422b353d	mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind Now, if we suddenly remove a PMEM device(by calling unbind) which contains FSDAX while programs are still accessing data in this device, e.g.: ``` $FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 & # $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 & echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind ``` it could come into an unacceptable state: 1. device has gone but mount point still exists, and umount will fail with "target is busy" 2. programs will hang and cannot be killed 3. may crash with NULL pointer dereference To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we are going to remove the whole device, and make sure all related processes could be notified so that they could end up gracefully. This patch is inspired by Dan's "mm, dax, pmem: Introduce dev_pagemap_failure()"[1]. With the help of dax_holder and ->notify_failure() mechanism, the pmem driver is able to ask filesystem on it to unmap all files in use, and notify processes who are using those files. Call trace: trigger unbind -> unbind_store() -> ... (skip) -> devres_release_all() -> kill_dax() -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) -> xfs_dax_notify_failure() `-> freeze_super() // freeze (kernel call) `-> do xfs rmap ` -> mf_dax_kill_procs() ` -> collect_procs_fsdax() // all associated processes ` -> unmap_and_kill() ` -> invalidate_inode_pages2_range() // drop file's cache `-> thaw_super() // thaw (both kernel & user call) Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove event. Use the exclusive freeze/thaw[2] to lock the filesystem to prevent new dax mapping from being created. Do not shutdown filesystem directly if configuration is not supported, or if failure range includes metadata area. Make sure all files and processes(not only the current progress) are handled correctly. Also drop the cache of associated files before pmem is removed. [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ [2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/ Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-07 14:34:26 +05:30
Chandan Babu R	49391d1349	xfs: reserve disk space for online repairs [v28.1] Online repair fixes metadata structures by writing a new copy out to disk and atomically committing the new structure into the filesystem. For this to work, we need to reserve all the space we're going to need ahead of time so that the atomic commit transaction is as small as possible. We also require the reserved space to be freed if the system goes down, or if we decide not to commit the repair, or if we reserve too much space. To keep the atomic commit transaction as small as possible, we would like to allocate some space and simultaneously schedule automatic reaping of the reserved space, even on log recovery. EFIs are the mechanism to get us there, but we need to use them in a novel manner. Once we allocate the space, we want to hold on to the EFI (relogging as necessary) until we can commit or cancel the repair. EFIs for written committed blocks need to go away, but unwritten or uncommitted blocks can be freed like normal. Earlier versions of this patchset directly manipulated the log items, but Dave thought that to be a layering violation. For v27, I've modified the defer ops handling code to be capable of pausing a deferred work item. Log intent items are created as they always have been, but paused items are pushed onto a side list when finishing deferred work items, and pushed back onto the transaction after that. Log intent done item are not created for paused work. The second part adds a "stale" flag to the EFI so that the repair reservation code can dispose of an EFI the normal way, but without the space actually being freed. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR phIZAQCeUpGo77FqSuvgbXOcePgdsrKqSrhCYNxXQqbmTnX6BQEA09ir+SHoWKDy cvYZ2AEgllh8zJKJsXYi0YO6Y7qj6gQ= =FuaR -----END PGP SIGNATURE----- Merge tag 'repair-auto-reap-space-reservations-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: reserve disk space for online repairs Online repair fixes metadata structures by writing a new copy out to disk and atomically committing the new structure into the filesystem. For this to work, we need to reserve all the space we're going to need ahead of time so that the atomic commit transaction is as small as possible. We also require the reserved space to be freed if the system goes down, or if we decide not to commit the repair, or if we reserve too much space. To keep the atomic commit transaction as small as possible, we would like to allocate some space and simultaneously schedule automatic reaping of the reserved space, even on log recovery. EFIs are the mechanism to get us there, but we need to use them in a novel manner. Once we allocate the space, we want to hold on to the EFI (relogging as necessary) until we can commit or cancel the repair. EFIs for written committed blocks need to go away, but unwritten or uncommitted blocks can be freed like normal. Earlier versions of this patchset directly manipulated the log items, but Dave thought that to be a layering violation. For v27, I've modified the defer ops handling code to be capable of pausing a deferred work item. Log intent items are created as they always have been, but paused items are pushed onto a side list when finishing deferred work items, and pushed back onto the transaction after that. Log intent done item are not created for paused work. The second part adds a "stale" flag to the EFI so that the repair reservation code can dispose of an EFI the normal way, but without the space actually being freed. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-auto-reap-space-reservations-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: force small EFIs for reaping btree extents xfs: log EFIs for all btree blocks being used to stage a btree xfs: implement block reservation accounting for btrees we're staging xfs: remove unused fields from struct xbtree_ifakeroot xfs: automatic freeing of freshly allocated unwritten space xfs: remove __xfs_free_extent_later xfs: allow pausing of pending deferred work items xfs: don't append work items to logged xfs_defer_pending objects	2023-12-07 14:18:33 +05:30
Chandan Babu R	dec0224bae	xfs: prevent livelocks in xchk_iget [v28.1] Prevent scrub from live locking in xchk_iget if there's a cycle in the inobt by allocating an empty transaction. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR phrFAP4mjHlCNpZZlz4xwNMBKypY7Rbm1BY9CJgR8036UYYNPgEAkUgrzbN/LIVc hgvrcfksJ5ogoemkvd2E+GRvr8EMBAg= =9yso -----END PGP SIGNATURE----- Merge tag 'scrub-livelock-prevention-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: prevent livelocks in xchk_iget Prevent scrub from live locking in xchk_iget if there's a cycle in the inobt by allocating an empty transaction. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'scrub-livelock-prevention-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: make xchk_iget safer in the presence of corrupt inode btrees	2023-12-07 14:15:26 +05:30
Chandan Babu R	9f334526ee	xfs: elide defer work ->create_done if no intent [v2] Christoph pointed out that the defer ops machinery doesn't need to call ->create_done if the deferred work item didn't generate a log intent item in the first place. Let's clean that up and save an indirect call in the non-logged xattr update call path. v2: pick up rvb tags This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR pjCsAP4z+FUZUEtxd2T32hITEsoEdNksZ5MgPJN0P1lfwoUuzgD9H4HQPQIgbsoY 3aJl/vWE9Nat8jMf+BoujAq2Ns3t7AQ= =Nlvg -----END PGP SIGNATURE----- Merge tag 'defer-elide-create-done-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: elide defer work ->create_done if no intent Christoph pointed out that the defer ops machinery doesn't need to call ->create_done if the deferred work item didn't generate a log intent item in the first place. Let's clean that up and save an indirect call in the non-logged xattr update call path. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'defer-elide-create-done-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: elide ->create_done calls for unlogged deferred work xfs: document what LARP means	2023-12-07 14:10:55 +05:30
Chandan Babu R	47c460efc4	xfs: fix realtime geometry integer overflows [v2] While reading through the realtime geometry support code in xfsprogs, I noticed a discrepancy between the sb_rextslog computation used when writing out the superblock during mkfs and the validation code used in xfs_repair. This discrepancy would lead to system failure for a runt rt volume having more than 1 rt block but zero rt extents in length. Most people aren't going to configure a 1M extent size for their 360k rt floppy disk volume, but I did! In the process of studying that code, it occurred to me that there is a second bug in the computation -- the use of highbit32 for a 64-bit value means that the upper 32 bits are not considered in the search for a high bit. This causes the creation of a realtime summary file that is the wrong length. If rextents is a multiple of U32_MAX then this will appear to work fine because highbit32 returns -1 for an input of 0; but for all other cases the rt summary is undersized, leading to failures. Fix the first problem by standardizing the computation with a helper in libxfs; and the second problem by correcting the computation. This will cause any existing rt volumes larger than 2^32 blocks to fail validation but they probably were already crashing anyway. v2: pick up review tags This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR prfnAQDVz4i7wygrSGFefrlRwum7OcfjnEO1DMbGmtRK70o9LAEA3qX57rLGwevB iltpNB7QGdi5LuCvn2eR608gMBDY6wg= =DszP -----END PGP SIGNATURE----- Merge tag 'fix-rtmount-overflows-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: fix realtime geometry integer overflows While reading through the realtime geometry support code in xfsprogs, I noticed a discrepancy between the sb_rextslog computation used when writing out the superblock during mkfs and the validation code used in xfs_repair. This discrepancy would lead to system failure for a runt rt volume having more than 1 rt block but zero rt extents in length. Most people aren't going to configure a 1M extent size for their 360k rt floppy disk volume, but I did! In the process of studying that code, it occurred to me that there is a second bug in the computation -- the use of highbit32 for a 64-bit value means that the upper 32 bits are not considered in the search for a high bit. This causes the creation of a realtime summary file that is the wrong length. If rextents is a multiple of U32_MAX then this will appear to work fine because highbit32 returns -1 for an input of 0; but for all other cases the rt summary is undersized, leading to failures. Fix the first problem by standardizing the computation with a helper in libxfs; and the second problem by correcting the computation. This will cause any existing rt volumes larger than 2^32 blocks to fail validation but they probably were already crashing anyway. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'fix-rtmount-overflows-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: don't allow overly small or large realtime volumes xfs: fix 32-bit truncation in xfs_compute_rextslog xfs: make rextslog computation consistent with mkfs	2023-12-07 14:03:11 +05:30
Chandan Babu R	34d3866668	xfs: continue removing defer item boilerplate [v2] Now that we've restructured log intent item recovery to reconstruct the incore deferred work state, apply further cleanups to that code to remove boilerplate that is duplicated across all the _item.c files. Having done that, collapse a bunch of trivial helpers to reduce the overall call chain. That enables us to refactor the relog code so that the ->relog_item implementations only have to know how to format the implementation-specific data encoded in an intent item and don't themselves have to handle the log item juggling. v2: pick up rvb tags This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXEx4wAKCRBKO3ySh0YR pn7tAQD1AmShSQYrSkbrxBJGy7pvT7T/KkaMvV/CDQiGU0N6+wEA2DfX33nmGRC1 g8THbLLBsFzdYPVXyKSXdAEC6zzKYgA= =V5xP -----END PGP SIGNATURE----- Merge tag 'reconstruct-defer-cleanups-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: continue removing defer item boilerplate Now that we've restructured log intent item recovery to reconstruct the incore deferred work state, apply further cleanups to that code to remove boilerplate that is duplicated across all the _item.c files. Having done that, collapse a bunch of trivial helpers to reduce the overall call chain. That enables us to refactor the relog code so that the ->relog_item implementations only have to know how to format the implementation-specific data encoded in an intent item and don't themselves have to handle the log item juggling. This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'reconstruct-defer-cleanups-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move ->iop_relog to struct xfs_defer_op_type xfs: collapse the ->create_done functions xfs: hoist xfs_trans_add_item calls to defer ops functions xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog xfs: use xfs_defer_create_done for the relogging operation xfs: hoist ->create_intent boilerplate to its callsite xfs: collapse the ->finish_item helpers xfs: hoist intent done flag setting to ->finish_item callsite xfs: don't set XFS_TRANS_HAS_INTENT_DONE when there's no ATTRD log item	2023-12-07 13:58:08 +05:30
Chandan Babu R	6b4ffe97e9	xfs: log intent item recovery should reconstruct defer work state [v3] Long Li reported a KASAN report from a UAF when intent recovery fails: ================================================================== BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x50/0x70 print_report+0xc2/0x600 kasan_report+0xb6/0xe0 xfs_cui_release+0xb7/0xc0 xfs_cud_item_release+0x3c/0x90 xfs_trans_committed_bulk+0x2d5/0x7f0 xlog_cil_committed+0xaba/0xf20 xlog_cil_push_work+0x1a60/0x2360 process_one_work+0x78e/0x1140 worker_thread+0x58b/0xf60 kthread+0x2cd/0x3c0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x55/0x60 kmem_cache_alloc+0x195/0x5f0 xfs_cui_init+0x198/0x1d0 xlog_recover_cui_commit_pass2+0x133/0x5f0 xlog_recover_items_pass2+0x107/0x230 xlog_recover_commit_trans+0x3e7/0x9c0 xlog_recovery_process_trans+0x140/0x1d0 xlog_recover_process_ophdr+0x1a0/0x3d0 xlog_recover_process_data+0x108/0x2d0 xlog_recover_process+0x1f6/0x280 xlog_do_recovery_pass+0x609/0xdb0 xlog_do_log_recovery+0x84/0xe0 xlog_do_recover+0x7d/0x470 xlog_recover+0x25f/0x490 xfs_log_mount+0x2dd/0x6f0 xfs_mountfs+0x11ce/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x40 __kasan_slab_free+0x114/0x1b0 kmem_cache_free+0xf8/0x510 xfs_cui_item_free+0x95/0xb0 xfs_cui_release+0x86/0xc0 xlog_recover_cancel_intents.isra.0+0xf8/0x210 xlog_recover_finish+0x7e7/0x980 xfs_log_mount_finish+0x2bb/0x4a0 xfs_mountfs+0x14bf/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888012575dc8 which belongs to the cache xfs_cui_item of size 432 The buggy address is located 152 bytes inside of freed 432-byte region [ffff888012575dc8, ffff888012575f78) The buggy address belongs to the physical page: page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x1fffff80010200(slab\|head\|node=0\|zone=1\|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc ================================================================== "If process intents fails, intent items left in AIL will be delete from AIL and freed in error handling, even intent items that have been recovered and created done items. After this, uaf will be triggered when done item committed, because at this point the released intent item will be accessed. xlog_recover_finish xlog_cil_push_work ---------------------------- --------------------------- xlog_recover_process_intents xfs_cui_item_recover//cui_refcount == 1 xfs_trans_get_cud xfs_trans_commit <add cud item to cil> xfs_cui_item_recover <error occurred and return> xlog_recover_cancel_intents xfs_cui_release //cui_refcount == 0 xfs_cui_item_free //free cui <release other intent items> xlog_force_shutdown //shutdown <...> <push items in cil> xlog_cil_committed xfs_cud_item_release xfs_cui_release // UAF "Intent log items are created with a reference count of 2, one for the creator, and one for the intent done object. Log recovery explicitly drops the creator reference after it is inserted into the AIL, but it then processes the log item as if it also owns the intent-done reference. "The code in ->iop_recovery should assume that it passes the reference to the done intent, we can remove the intent item from the AIL after creating the done-intent, but if that code fails before creating the done-intent then it needs to release the intent reference by log recovery itself. "That way when we go to cancel the intent, the only intents we find in the AIL are the ones we know have not been processed yet and hence we can safely drop both the creator and the intent done reference from xlog_recover_cancel_intents(). "Hence if we remove the intent from the list of intents that need to be recovered after we have done the initial recovery, we acheive two things: "1. the tail of the log can be moved forward with the commit of the done intent or new intent to continue the operation, and "2. We avoid the problem of trying to determine how many reference counts we need to drop from intent recovery cancelling because we never come across intents we've actually attempted recovery on." Restated: The cause of the UAF is that xlog_recover_cancel_intents thinks that it owns the refcount on any intent item in the AIL, and that it's always safe to release these intent items. This is not true after the recovery function creates an log intent done item and points it at the log intent item because releasing the done item always releases the intent item. The runtime defer ops code avoids all this by tracking both the log intent and the intent done items, and releasing only the intent done item if both have been created. Long Li proposed fixing this by adding state flags, but I have a more comprehensive fix. First, observe that the latter half of the intent _recover functions are nearly open-coded versions of the corresponding _finish_one function that uses an onstack deferred work item to single-step through the item. Second, notice that the recover function is not an exact match because of the odd behavior that unfinished recovered work items are relogged with separate log intent items instead of a single new log intent item, which is what the defer ops machinery does. Dave and I have long suspected that recovery should be reconstructing the defer work state from what's in the recovered intent item. Now we finally have an excuse to refactor the code to do that. This series starts by fixing a resource leak in LARP recovery. We fix the bug that Long Li reported by switching the intent recovery code to construct chains of xfs_defer_pending objects and then using the defer pending objects to track the intent/done item ownership. Finally, we clean up the code to reconstruct the exact incore state, which means we can remove all the opencoded _recover code, which makes maintaining log items much easier. v2: minor changes per review comments v3: pick up more rvb tags, fix build errors This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZXExxgAKCRBKO3ySh0YR pteoAP9mQhZ9tnB7Nj37dfx2BY6vcZXBJYDhUIzfzCh5B0wOSAD+MkTw8TTinlsq HAXuAxf4cjyk5TNl9sXnJ+9L4+bUVQU= =Y57I -----END PGP SIGNATURE----- Merge tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.8-mergeA xfs: log intent item recovery should reconstruct defer work state Long Li reported a KASAN report from a UAF when intent recovery fails: ================================================================== BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x50/0x70 print_report+0xc2/0x600 kasan_report+0xb6/0xe0 xfs_cui_release+0xb7/0xc0 xfs_cud_item_release+0x3c/0x90 xfs_trans_committed_bulk+0x2d5/0x7f0 xlog_cil_committed+0xaba/0xf20 xlog_cil_push_work+0x1a60/0x2360 process_one_work+0x78e/0x1140 worker_thread+0x58b/0xf60 kthread+0x2cd/0x3c0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x55/0x60 kmem_cache_alloc+0x195/0x5f0 xfs_cui_init+0x198/0x1d0 xlog_recover_cui_commit_pass2+0x133/0x5f0 xlog_recover_items_pass2+0x107/0x230 xlog_recover_commit_trans+0x3e7/0x9c0 xlog_recovery_process_trans+0x140/0x1d0 xlog_recover_process_ophdr+0x1a0/0x3d0 xlog_recover_process_data+0x108/0x2d0 xlog_recover_process+0x1f6/0x280 xlog_do_recovery_pass+0x609/0xdb0 xlog_do_log_recovery+0x84/0xe0 xlog_do_recover+0x7d/0x470 xlog_recover+0x25f/0x490 xfs_log_mount+0x2dd/0x6f0 xfs_mountfs+0x11ce/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 531: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x40 __kasan_slab_free+0x114/0x1b0 kmem_cache_free+0xf8/0x510 xfs_cui_item_free+0x95/0xb0 xfs_cui_release+0x86/0xc0 xlog_recover_cancel_intents.isra.0+0xf8/0x210 xlog_recover_finish+0x7e7/0x980 xfs_log_mount_finish+0x2bb/0x4a0 xfs_mountfs+0x14bf/0x1e70 xfs_fs_fill_super+0x10ec/0x1b20 get_tree_bdev+0x3c8/0x730 vfs_get_tree+0x89/0x2c0 path_mount+0xecf/0x1800 do_mount+0xf3/0x110 __x64_sys_mount+0x154/0x1f0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888012575dc8 which belongs to the cache xfs_cui_item of size 432 The buggy address is located 152 bytes inside of freed 432-byte region [ffff888012575dc8, ffff888012575f78) The buggy address belongs to the physical page: page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x1fffff80010200(slab\|head\|node=0\|zone=1\|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc ================================================================== "If process intents fails, intent items left in AIL will be delete from AIL and freed in error handling, even intent items that have been recovered and created done items. After this, uaf will be triggered when done item committed, because at this point the released intent item will be accessed. xlog_recover_finish xlog_cil_push_work ---------------------------- --------------------------- xlog_recover_process_intents xfs_cui_item_recover//cui_refcount == 1 xfs_trans_get_cud xfs_trans_commit <add cud item to cil> xfs_cui_item_recover <error occurred and return> xlog_recover_cancel_intents xfs_cui_release //cui_refcount == 0 xfs_cui_item_free //free cui <release other intent items> xlog_force_shutdown //shutdown <...> <push items in cil> xlog_cil_committed xfs_cud_item_release xfs_cui_release // UAF "Intent log items are created with a reference count of 2, one for the creator, and one for the intent done object. Log recovery explicitly drops the creator reference after it is inserted into the AIL, but it then processes the log item as if it also owns the intent-done reference. "The code in ->iop_recovery should assume that it passes the reference to the done intent, we can remove the intent item from the AIL after creating the done-intent, but if that code fails before creating the done-intent then it needs to release the intent reference by log recovery itself. "That way when we go to cancel the intent, the only intents we find in the AIL are the ones we know have not been processed yet and hence we can safely drop both the creator and the intent done reference from xlog_recover_cancel_intents(). "Hence if we remove the intent from the list of intents that need to be recovered after we have done the initial recovery, we acheive two things: "1. the tail of the log can be moved forward with the commit of the done intent or new intent to continue the operation, and "2. We avoid the problem of trying to determine how many reference counts we need to drop from intent recovery cancelling because we never come across intents we've actually attempted recovery on." Restated: The cause of the UAF is that xlog_recover_cancel_intents thinks that it owns the refcount on any intent item in the AIL, and that it's always safe to release these intent items. This is not true after the recovery function creates an log intent done item and points it at the log intent item because releasing the done item always releases the intent item. The runtime defer ops code avoids all this by tracking both the log intent and the intent done items, and releasing only the intent done item if both have been created. Long Li proposed fixing this by adding state flags, but I have a more comprehensive fix. First, observe that the latter half of the intent _recover functions are nearly open-coded versions of the corresponding _finish_one function that uses an onstack deferred work item to single-step through the item. Second, notice that the recover function is not an exact match because of the odd behavior that unfinished recovered work items are relogged with separate log intent items instead of a single new log intent item, which is what the defer ops machinery does. Dave and I have long suspected that recovery should be reconstructing the defer work state from what's in the recovered intent item. Now we finally have an excuse to refactor the code to do that. This series starts by fixing a resource leak in LARP recovery. We fix the bug that Long Li reported by switching the intent recovery code to construct chains of xfs_defer_pending objects and then using the defer pending objects to track the intent/done item ownership. Finally, we clean up the code to reconstruct the exact incore state, which means we can remove all the opencoded _recover code, which makes maintaining log items much easier. v2: minor changes per review comments v3: pick up more rvb tags, fix build errors This has been lightly tested with fstests. Enjoy! Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'reconstruct-defer-work-6.8_2023-12-06' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: move ->iop_recover to xfs_defer_op_type xfs: use xfs_defer_finish_one to finish recovered work items xfs: dump the recovered xattri log item if corruption happens xfs: recreate work items when recovering intent items xfs: transfer recovered intent item ownership in ->iop_recover xfs: pass the xfs_defer_pending object to iop_recover xfs: use xfs_defer_pending objects to recover intent items xfs: don't leak recovered attri intent items	2023-12-07 13:50:54 +05:30
Darrick J. Wong	3f3cec0310	xfs: force small EFIs for reaping btree extents Introduce the concept of a defer ops barrier to separate consecutively queued pending work items of the same type. With a barrier in place, the two work items will be tracked separately, and receive separate log intent items. The goal here is to prevent reaping of old metadata blocks from creating unnecessarily huge EFIs that could then run the risk of overflowing the scrub transaction. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:19 -08:00
Darrick J. Wong	6bb9ea8ecd	xfs: log EFIs for all btree blocks being used to stage a btree We need to log EFIs for every extent that we allocate for the purpose of staging a new btree so that if we fail then the blocks will be freed during log recovery. Use the autoreaping mechanism provided by the previous patch to attach paused freeing work to the scrub transaction. We can then mark the EFIs stale if we decide to commit the new btree, or we can unpause the EFIs if we decide to abort the repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:19 -08:00
Darrick J. Wong	be40841763	xfs: implement block reservation accounting for btrees we're staging Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. As for the particular choice of lowspace thresholds and btree block slack factors -- at this point one could say that the thresholds in online repair come from bulkload_estimate_ag_slack in xfs_repair[1]. But that's not the entire story, since the offline btree rebuilding code in xfs_repair was merged as a retroport of the online btree code in this patchset! Before xfs_btree_staging.[ch] came along, xfs_repair determined the slack factor (aka the number of slots to leave unfilled in each new btree block) via open-coded logic in repair/phase5.c[2]. At that point the slack factors were arbitrary quantities per btree. The rmapbt automatically left 10 slots free; everything else left zero. That had a noticeable effect on performance straight after mounting because adding records to /any/ btree would result in splits. A few years ago when this patch was first written, Dave and I decided that repair should generate btree blocks that were 75% full unless space was tight, in which case it should try to fill the blocks to nearly full. We defined tight as ~10% free to avoid repair failures but settled on 3/32 (~9%) to avoid div64. IOWs, we mostly pulled the thresholds out of thin air. We've been QAing with those geometry numbers ever since. ;) Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/bulkload.c?h=v6.5.0#n114 Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/phase5.c?h=v4.19.0#n1349 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	4c8ecd1cfd	xfs: remove unused fields from struct xbtree_ifakeroot Remove these unused fields since nobody uses them. They should have been removed years ago in a different cleanup series from Christoph Hellwig. Fixes: `daf83964a3` ("xfs: move the per-fork nextents fields into struct xfs_ifork") Fixes: `f7e67b20ec` ("xfs: move the fork format fields into struct xfs_ifork") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	e3042be36c	xfs: automatic freeing of freshly allocated unwritten space As mentioned in the previous commit, online repair wants to allocate space to write out a new metadata structure, and it also wants to hedge against system crashes during repairs by logging (and later cancelling) EFIs to free the space if we crash before committing the new data structure. Therefore, create a trio of functions to schedule automatic reaping of freshly allocated unwritten space. xfs_alloc_schedule_autoreap creates a paused EFI representing the space we just allocated. Once the allocations are made and the autoreaps scheduled, we can start writing to disk. If the writes succeed, xfs_alloc_cancel_autoreap marks the EFI work items as stale and unpauses the pending deferred work item. Assuming that's done in the same transaction that commits the new structure into the filesystem, we guarantee that either the new object is fully visible, or that all the space gets reclaimed. If the writes succeed but only part of an extent was used, repair must call the same _cancel_autoreap function to kill the first EFI and then log a new EFI to free the unused space. The first EFI is already committed, so it cannot be changed. For full extents that aren't used, xfs_alloc_commit_autoreap will unpause the EFI, which results in the space being freed during the next _defer_finish cycle. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	4c88fef3af	xfs: remove __xfs_free_extent_later xfs_free_extent_later is a trivial helper, so remove it to reduce the amount of thinking required to understand the deferred freeing interface. This will make it easier to introduce automatic reaping of speculative allocations in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	4dffb2cbb4	xfs: allow pausing of pending deferred work items Traditionally, all pending deferred work attached to a transaction is finished when one of the xfs_defer_finish* functions is called. However, online repair wants to be able to allocate space for a new data structure, format a new metadata structure into the allocated space, and commit that into the filesystem. As a hedge against system crashes during repairs, we also want to log some EFI items for the allocated space speculatively, and cancel them if we elect to commit the new data structure. Therefore, introduce the idea of pausing a pending deferred work item. Log intent items are still created for paused items and relogged as necessary. However, paused items are pushed onto a side list before we start calling ->finish_item, and the whole list is reattach to the transaction afterwards. New work items are never attached to paused pending items. Modify xfs_defer_cancel to clean up pending deferred work items holding a log intent item but not a log intent done item, since that is now possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	6b12613940	xfs: don't append work items to logged xfs_defer_pending objects When someone tries to add a deferred work item to xfs_defer_add, it will try to attach the work item to the most recently added xfs_defer_pending object attached to the transaction. However, it doesn't check if the pending object has a log intent item attached to it. This is incorrect behavior because we cannot add more work to an object that has already been committed to the ondisk log. Therefore, change the behavior not to append to pending items with a non null dfp_intent. In practice this has not been an issue because the only way xfs_defer_add gets called after log intent items have been committed is from the defer ops ->finish_item functions themselves, and the @dop_pending isolation in xfs_defer_finish_noroll protects the pending items that have already been logged. However, the next patch will add the ability to pause a deferred extent free object during online btree rebuilding, and any new extfree work items need to have their own pending event. While we're at it, hoist the predicate to its own static inline function for readability. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:18 -08:00
Darrick J. Wong	3f113c2739	xfs: make xchk_iget safer in the presence of corrupt inode btrees When scrub is trying to iget an inode, ensure that it won't end up deadlocked on a cycle in the inode btree by using an empty transaction to store all the buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	9c07bca793	xfs: elide ->create_done calls for unlogged deferred work Extended attribute updates use the deferred work machinery to manage state across a chain of smaller transactions. All previous deferred work users have employed log intent items and log done items to manage restarting of interrupted operations, which means that ->create_intent sets dfp_intent to a log intent item and ->create_done uses that item to create a log intent done item. However, xattrs have used the INCOMPLETE flag to deal with the lack of recovery support for an interrupted transaction chain. Log items are optional if the xattr update caller didn't set XFS_DA_OP_LOGGED to require a restartable sequence. In other words, ->create_intent can return NULL to say that there's no log intent item. If that's the case, no log intent done item should be created. Clean up xfs_defer_create_done not to do this, so that the ->create_done functions don't have to check for non-null dfp_intent themselves. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	e14293803f	xfs: don't allow overly small or large realtime volumes Don't allow realtime volumes that are less than one rt extent long. This has been broken across 4 LTS kernels with nobody noticing, so let's just disable it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	a49c708f9a	xfs: move ->iop_relog to struct xfs_defer_op_type The only log items that need relogging are the ones created for deferred work operations, and the only part of the code base that relogs log items is the deferred work machinery. Move the function pointers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	94da54d582	xfs: document what LARP means Christoph requested a blurb somewhere explaining exactly what LARP means. I don't know of a good place other than the source code (debug knobs aren't covered in Documentation/), so here it is. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	cf8f0e6c14	xfs: fix 32-bit truncation in xfs_compute_rextslog It's quite reasonable that some customer somewhere will want to configure a realtime volume with more than 2^32 extents. If they try to do this, the highbit32() call will truncate the upper bits of the xfs_rtbxlen_t and produce the wrong value for rextslog. This in turn causes the rsumlevels to be wrong, which results in a realtime summary file that is the wrong length. Fix that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	a6a38f309a	xfs: make rextslog computation consistent with mkfs There's a weird discrepancy in xfsprogs dating back to the creation of the Linux port -- if there are zero rt extents, mkfs will set sb_rextents and sb_rextslog both to zero: sbp->sb_rextslog = (uint8_t)(rtextents ? libxfs_highbit32((unsigned int)rtextents) : 0); However, that's not the check that xfs_repair uses for nonzero rtblocks: if (sb->sb_rextslog != libxfs_highbit32((unsigned int)sb->sb_rextents)) The difference here is that xfs_highbit32 returns -1 if its argument is zero. Unfortunately, this means that in the weird corner case of a realtime volume shorter than 1 rt extent, xfs_repair will immediately flag a freshly formatted filesystem as corrupt. Because mkfs has been writing ondisk artifacts like this for decades, we have to accept that as "correct". TBH, zero rextslog for zero rtextents makes more sense to me anyway. Regrettably, the superblock verifier checks created in commit copied xfs_repair even though mkfs has been writing out such filesystems for ages. Fix the superblock verifier to accept what mkfs spits out; the userspace version of this patch will have to fix xfs_repair as well. Note that the new helper leaves the zeroday bug where the upper 32 bits of sb_rextents is ripped off and fed to highbit32. This leads to a seriously undersized rt summary file, which immediately breaks mkfs: $ hugedisk.sh foo /dev/sdc $(( 0x100000080 * 4096))B $ /sbin/mkfs.xfs -f /dev/sda -m rmapbt=0,reflink=0 -r rtdev=/dev/mapper/foo meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=5192704, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =/dev/mapper/foo extsz=4096 blocks=4294967424, rtextents=4294967424 Discarding blocks...Done. mkfs.xfs: Error initializing the realtime space [117 - Structure needs cleaning] The next patch will drop support for rt volumes with fewer than 1 or more than 2^32-1 rt extents, since they've clearly been broken forever. Fixes: `f8e566c0f5` ("xfs: validate the realtime geometry in xfs_validate_sb_common") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	8a9aa763e1	xfs: collapse the ->create_done functions Move the meat of the ->create_done function helpers into ->create_done to reduce the amount of boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00

1 2 3 4 5 ...

1234461 Commits