linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 14:12:06 +00:00

Author	SHA1	Message	Date
Darrick J. Wong	1b5453baed	xfs: support recovering bmap intent items targetting realtime extents Now that we have reflink on the realtime device, bmap intent items have to support remapping extents on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:24 -08:00
Darrick J. Wong	7302cda7f8	xfs: add a realtime flag to the bmap update log redo items Extend the bmap update (BUI) log items with a new realtime flag that indicates that the updates apply against a realtime file's data fork. We'll wire up the actual code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:23 -08:00
Darrick J. Wong	8028411585	xfs: move xfs_bmap_defer_add to xfs_bmap_item.c Move the code that adds the incore xfs_bmap_item deferred work data to a transaction live with the BUI log item code. This means that the file mapping code no longer has to know about the inner workings of the BUI log items. As a consequence, we can hide the _get_group helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:21 -08:00
Darrick J. Wong	5d3d0a6ad2	xfs: reuse xfs_bmap_update_cancel_item Reuse xfs_bmap_update_cancel_item to put the AG/RTG and free the item in a few places that currently open code the logic. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:20 -08:00
Darrick J. Wong	de47e4c9ad	xfs: add a bi_entry helper Add a helper to translate from the item list head to the bmap_intent structure and use it so shorten assignments and avoid the need for extra local variables. Inspired-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:19 -08:00
Darrick J. Wong	372fe0b8ce	xfs: remove xfs_trans_set_bmap_flags Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2024-02-22 12:44:19 -08:00
Dave Chinner	2c1e31ed5c	xfs: place intent recovery under NOFS allocation context When recovery starts processing intents, all of the initial intent allocations are done outside of transaction contexts. That means they need to specifically use GFP_NOFS as we do not want memory reclaim to attempt to run direct reclaim of filesystem objects while we have lots of objects added into deferred operations. Rather than use GFP_NOFS for these specific allocations, just place the entire intent recovery process under NOFS context and we can then just use GFP_KERNEL for these allocations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-13 18:07:35 +05:30
Dave Chinner	4929257613	xfs: convert kmem_free() for kvmalloc users to kvfree() Start getting rid of kmem_free() by converting all the cases where memory can come from vmalloc interfaces to calling kvfree() directly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2024-02-13 18:07:34 +05:30
Christoph Hellwig	dc22af6436	xfs: pass the defer ops instead of type to xfs_defer_start_recovery xfs_defer_start_recovery is only called from xlog_recover_intent_item, and the callers of that all have the actual xfs_defer_ops_type operation vector at hand. Pass that directly instead of looking it up from the defer_op_types table. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:13:38 +05:30
Christoph Hellwig	7f2f7531e0	xfs: store an ops pointer in struct xfs_defer_pending The dfp_type field in struct xfs_defer_pending is only used to either look up the operations associated with the pending word or in trace points. Replace it with a direct pointer to the operations vector, and store a pretty name in the vector for tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>	2023-12-14 11:10:34 +05:30
Darrick J. Wong	a49c708f9a	xfs: move ->iop_relog to struct xfs_defer_op_type The only log items that need relogging are the ones created for deferred work operations, and the only part of the code base that relogs log items is the deferred work machinery. Move the function pointers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:17 -08:00
Darrick J. Wong	8a9aa763e1	xfs: collapse the ->create_done functions Move the meat of the ->create_done function helpers into ->create_done to reduce the amount of boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	b28852a5bd	xfs: hoist xfs_trans_add_item calls to defer ops functions Remove even more repeated boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	3e0958be21	xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog Hoist this dirty flag setting to the ->iop_relog callsite to reduce boilerplate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	bd3a88f6b7	xfs: use xfs_defer_create_done for the relogging operation Now that we have a helper to handle creating a log intent done item and updating all the necessary state flags, use it to reduce boilerplate in the ->iop_relog implementations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	f3fd7f6fce	xfs: hoist ->create_intent boilerplate to its callsite Hoist the dirty flag setting code out of each ->create_intent implementation up to the callsite to reduce boilerplate further. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	e6e5299fcb	xfs: collapse the ->finish_item helpers Each log item's ->finish_item function sets up a small amount of state and calls another function to do the work. Collapse that other function into ->finish_item to reduce the call stack height. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:16 -08:00
Darrick J. Wong	3dd75c8db1	xfs: hoist intent done flag setting to ->finish_item callsite Each log intent item's ->finish_item call chain inevitably includes some code to set the dirty flag of the transaction. If there's an associated log intent done item, it also sets the item's dirty flag and the transaction's INTENT_DONE flag. This is repeated throughout the codebase. Reduce the LOC by moving all that to xfs_defer_finish_one. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:15 -08:00
Darrick J. Wong	db7ccc0bac	xfs: move ->iop_recover to xfs_defer_op_type Finish off the series by moving the intent item recovery function pointer to the xfs_defer_op_type struct, since this is really a deferred work function now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:15 -08:00
Darrick J. Wong	e5f1a5146e	xfs: use xfs_defer_finish_one to finish recovered work items Get rid of the open-coded calls to xfs_defer_finish_one. This also means that the recovery transaction takes care of cleaning up the dfp, and we have solved (I hope) all the ownership issues in recovery. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:15 -08:00
Darrick J. Wong	e70fb328d5	xfs: recreate work items when recovering intent items Recreate work items for each xfs_defer_pending object when we are recovering intent items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:15 -08:00
Darrick J. Wong	deb4cd8ba8	xfs: transfer recovered intent item ownership in ->iop_recover Now that we pass the xfs_defer_pending object into the intent item recovery functions, we know exactly when ownership of the sole refcount passes from the recovery context to the intent done item. At that point, we need to null out dfp_intent so that the recovery mechanism won't release it. This should fix the UAF problem reported by Long Li. Note that we still want to recreate the full deferred work state. That will be addressed in the next patches. Fixes: `2e76f188fd` ("xfs: cancel intents immediately if process_intents fails") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:14 -08:00
Darrick J. Wong	a050acdfa8	xfs: pass the xfs_defer_pending object to iop_recover Now that log intent item recovery recreates the xfs_defer_pending state, we should pass that into the ->iop_recover routines so that the intent item can finish the recreation work. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:14 -08:00
Darrick J. Wong	03f7767c9f	xfs: use xfs_defer_pending objects to recover intent items One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed. This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done. Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function. However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed. This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2023-12-06 18:45:14 -08:00
Darrick J. Wong	3c919b0910	xfs: reserve less log space when recovering log intent items Wengang Wang reports that a customer's system was running a number of truncate operations on a filesystem with a very small log. Contention on the reserve heads lead to other threads stalling on smaller updates (e.g. mtime updates) long enough to result in the node being rebooted on account of the lack of responsivenes. The node failed to recover because log recovery of an EFI became stuck waiting for a grant of reserve space. From Wengang's report: "For the file deletion, log bytes are reserved basing on xfs_mount->tr_itruncate which is: tr_logres = 175488, tr_logcount = 2, tr_logflags = XFS_TRANS_PERM_LOG_RES, "You see it's a permanent log reservation with two log operations (two transactions in rolling mode). After calculation (xlog_calc_unit_res() adds space for various log headers), the final log space needed per transaction changes from 175488 to 180208 bytes. So the total log space needed is 360416 bytes (180208 * 2). [That quantity] of log space (360416 bytes) needs to be reserved for both run time inode removing (xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())." In other words, runtime pre-reserves 360K of space in anticipation of running a chain of two transactions in which each transaction gets a 180K reservation. Now that we've allocated the transaction, we delete the bmap mapping, log an EFI to free the space, and roll the transaction as part of finishing the deferops chain. Rolling creates a new xfs_trans which shares its ticket with the old transaction. Next, xfs_trans_roll calls __xfs_trans_commit with regrant == true, which calls xlog_cil_commit with the same regrant parameter. xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and subtracts t_curr_res from the reservation and write heads. If the filesystem is fresh and the first transaction only used (say) 20K, then t_curr_res will be 160K, and we give that much reservation back to the reservation head. Or if the file is really fragmented and the first transaction actually uses 170K, then t_curr_res will be 10K, and that's what we give back to the reservation. Having done that, we're now headed into the second transaction with an EFI and 180K of reservation. Other threads apparently consumed all the reservation for smaller transactions, such as timestamp updates. Now let's say the first transaction gets written to disk and we crash without ever completing the second transaction. Now we remount the fs, log recovery finds the unfinished EFI, and calls xfs_efi_recover to finish the EFI. However, xfs_efi_recover starts a new tr_itruncate tranasction, which asks for 360K log reservation. This is a lot more than the 180K that we had reserved at the time of the crash. If the first EFI to be recovered is also pinning the tail of the log, we will be unable to free any space in the log, and recovery livelocks. Wengang confirmed this: "Now we have the second transaction which has 180208 log bytes reserved too. The second transaction is supposed to process intents including extent freeing. With my hacking patch, I blocked the extent freeing 5 hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved. "With my test case, other transactions (update timestamps) then happen. As my hacking patch pins the journal tail, those timestamp-updating transactions finally use up (almost) all the left available log space (in memory in on disk). And finally the on disk (and in memory) available log space goes down near to 180208 bytes. Those 180208 bytes are reserved by [the] second (extent-free) transaction [in the chain]." Wengang and I noticed that EFI recovery starts a transaction, completes one step of the chain, and commits the transaction without completing any other steps of the chain. Those subsequent steps are completed by xlog_finish_defer_ops, which allocates yet another transaction to finish the rest of the chain. That transaction gets the same tr_logres as the head transaction, but with tr_logcount = 1 to force regranting with every roll to avoid livelocks. In other words, we already figured this out in commit `929b92f640` ("xfs: xfs_defer_capture should absorb remaining transaction reservation"), but should have applied that logic to each intent item's recovery function. For Wengang's case, the xfs_trans_alloc call in the EFI recovery function should only be asking for a single transaction's worth of log reservation -- 180K, not 360K. Quoting Wengang again: "With log recovery, during EFI recovery, we use tr_itruncate again to reserve two transactions that needs 360416 log bytes. Reserving 360416 bytes fails [stalls] because we now only have about 180208 available. "Actually during the EFI recover, we only need one transaction to free the extents just like the 2nd transaction at RUNTIME. So it only needs to reserve 180208 rather than 360416 bytes. We have (a bit) more than 180208 available log bytes on disk, so [if we decrease the reservation to 180K] the reservation goes and the recovery [finishes]. That is to say: we can fix the log recover part to fix the issue. We can introduce a new xfs_trans_res xfs_mount->tr_ext_free { tr_logres = 175488, tr_logcount = 0, tr_logflags = 0, } "and use tr_ext_free instead of tr_itruncate in EFI recover." However, I don't think it quite makes sense to create an entirely new transaction reservation type to handle single-stepping during log recovery. Instead, we should copy the transaction reservation information in the xfs_mount, change tr_logcount to 1, and pass that into xfs_trans_alloc. We know this won't risk changing the min log size computation since we always ask for a fraction of the reservation for all known transaction types. This looks like it's been lurking in the codebase since commit `3d3c8b5222`, which changed the xfs_trans_reserve call in xlog_recover_process_efi to use the tr_logcount in tr_itruncate. That changed the EFI recovery transaction from making a non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth. Fixes: `3d3c8b5222` ("xfs: refactor xfs_trans_reserve() interface") Complements: `929b92f640` ("xfs: xfs_defer_capture should absorb remaining transaction reservation") Suggested-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Srikanth C S <srikanth.c.s@oracle.com> [djwong: apply the same transformation to all log intent recovery] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-09-12 10:31:07 -07:00
Darrick J. Wong	d5c88131db	xfs: allow queued AG intents to drain before scrubbing When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. One quirk of the drain code is that deferred bmap updates also bump and drop the intent counter. A fundamental decision made during the design phase of the reverse mapping feature is that updates to the rmapbt records are always made by the same code that updates the primary metadata. In other words, callers of bmapi functions expect that the bmapi functions will queue deferred rmap updates. Some parts of the reflink code queue deferred refcount (CUI) and bmap (BUI) updates in the same head transaction, but the deferred work manager completely finishes the CUI before the BUI work is started. As a result, the CUI drops the intent count long before the deferred rmap (RUI) update even has a chance to bump the intent count. The only way to keep the intent count elevated between the CUI and RUI is for the BUI to bump the counter until the RUI has been created. A second quirk of the intent drain code is that deferred work items must increment the intent counter as soon as the work item is added to the transaction. When a BUI completes and queues an RUI, the RUI must increment the counter before the BUI decrements it. The only way to accomplish this is to require that the counter be bumped as soon as the deferred work item is created in memory. In the next patches we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-04-11 18:59:58 -07:00
Darrick J. Wong	774a99b47b	xfs: give xfs_bmap_intent its own perag reference Give the xfs_bmap_intent an active reference to the perag structure data. This reference will be used to enable scrub intent draining functionality in subsequent patches. Later, shrink will use these passive references to know if an AG is quiesced or not. The reason why we take a passive ref for a file mapping operation is simple: we're committing to some sort of action involving space in an AG, so we want to indicate our interest in that AG. The space is already allocated, so we need to be able to operate on AGs that are offline or being shrunk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2023-04-11 18:59:53 -07:00
Darrick J. Wong	f3ebac4c94	xfs: fix confusing variable names in xfs_bmap_item.c Variable names in this code module are inconsistent and confusing. xfs_map_extent describe file mappings, so rename them "map". xfs_bmap_intents describe block mapping intents, so rename them "bi". Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2023-02-05 08:48:11 -08:00
Darrick J. Wong	ddccb81b26	xfs: pass the xfs_bmbt_irec directly through the log intent code Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through. Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2023-02-05 08:48:11 -08:00
Darrick J. Wong	950f0d50ee	xfs: dump corrupt recovered log intent items to dmesg consistently If log recovery decides that an intent item is corrupt and wants to abort the mount, capture a hexdump of the corrupt log item in the kernel log for further analysis. Some of the log item code already did this, so we're fixing the rest to do it consistently. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-10-31 08:58:20 -07:00
Darrick J. Wong	a38ebce1da	xfs: fix memcpy fortify errors in BUI log format copying Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of memcpy. Unfortunately, it doesn't handle flex arrays correctly: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 48) of single field "dst_bui_fmt" at fs/xfs/xfs_bmap_item.c:628 (size 16) Fix this by refactoring the xfs_bui_copy_format function to handle the copying of the head and the flex array members separately. While we're at it, fix a minor validation deficiency in the recovery function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2022-10-31 08:58:19 -07:00
Dave Chinner	3512fc1e84	xfs: whiteouts release intents that are not in the AIL When we release an intent that a whiteout applies to, it will not have been committed to the journal and so won't be in the AIL. Hence when we drop the last reference to the intent, we do not want to try to remove it from the AIL as that will trigger a filesystem shutdown. Hence make the removal of intents from the AIL conditional on them actually being in the AIL so we do the correct thing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-05-04 11:46:47 +10:00
Dave Chinner	c23ab603e3	xfs: add log item method to return related intents To apply a whiteout to an intent item when an intent done item is committed, we need to be able to retrieve the intent item from the the intent done item. Add a log item op method for doing this, and wire all the intent done items up to it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-05-04 11:46:39 +10:00
Dave Chinner	bb7b1c9c5d	xfs: tag transactions that contain intent done items Intent whiteouts will require extra work to be done during transaction commit if the transaction contains an intent done item. To determine if a transaction contains an intent done item, we want to avoid having to walk all the items in the transaction to check if they are intent done items. Hence when we add an intent done item to a transaction, tag the transaction to indicate that it contains such an item. We don't tag the transaction when the defer ops is relogging an intent to move it forward in the log. Whiteouts will never apply to these cases, so we don't need to bother looking for them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-05-04 11:46:21 +10:00
Dave Chinner	f5b81200b6	xfs: add log item flags to indicate intents We currently have a couple of helper functions that try to infer whether the log item is an intent or intent done item from the combinations of operations it supports. This is incredibly fragile and not very efficient as it requires checking specific combinations of ops. We need to be able to identify intent and intent done items quickly and easily in upcoming patches, so simply add intent and intent done type flags to the log item ops flags. These are static flags to begin with, so intent items should have been typed like this from the start. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-05-04 11:46:09 +10:00
Dave Chinner	c230a4a85b	xfs: fix potential log item leak Ever since we added shadown format buffers to the log items, log items need to handle the item being released with shadow buffers attached. Due to the fact this requirement was added at the same time we added new rmap/reflink intents, we missed the cleanup of those items. In theory, this means shadow buffers can be leaked in a very small window when a shutdown is initiated. Testing with KASAN shows this leak does not happen in practice - we haven't identified a single leak in several years of shutdown testing since ~v4.8 kernels. However, the intent whiteout cleanup mechanism results in every cancelled intent in exactly the same state as this tiny race window creates and so if intents down clean up shadow buffers on final release we will leak the shadow buffer for just about every intent we create. Hence we start with this patch to close this condition off and ensure that when whiteouts start to be used we don't leak lots of memory. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-05-04 11:45:11 +10:00
Chandan Babu R	4f86bb4b66	xfs: Conditionally upgrade existing inodes to use large extent counters This commit enables upgrading existing inodes to use large extent counters provided that underlying filesystem's superblock has large extent counter feature enabled. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>	2022-04-13 07:02:44 +00:00
Dave Chinner	d86142dd7c	xfs: log items should have a xlog pointer, not a mount Log items belong to the log, not the xfs_mount. Convert the mount pointer in the log item to a xlog pointer in preparation for upcoming log centric changes to the log items. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2022-03-20 08:59:49 -07:00
Darrick J. Wong	f3c799c22c	xfs: create slab caches for frequently-used deferred items Create slab caches for the high-level structures that coordinate deferred intent items, since they're used fairly heavily. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>	2021-10-22 16:04:36 -07:00
Darrick J. Wong	182696fb02	xfs: rename _zone variables to _cache Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>	2021-10-22 16:04:20 -07:00
Darrick J. Wong	e7720afad0	xfs: remove kmem_zone typedef Remove these typedefs by referencing kmem_cache directly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>	2021-10-22 16:00:31 -07:00
Darrick J. Wong	512edfac85	xfs: port the defer ops capture and continue to resource capture When log recovery tries to recover a transaction that had log intent items attached to it, it has to save certain parts of the transaction state (reservation, dfops chain, inodes with no automatic unlock) so that it can finish single-stepping the recovered transactions before finishing the chains. This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue functions. Right now they open-code this functionality, so let's port this to the formalized resource capture structure that we introduced in the previous patch. This enables us to hold up to two inodes and two buffers during log recovery, the same way we do for regular runtime. With this patch applied, we'll be ready to support atomic extent swap which holds two inodes; and logged xattrs which holds one inode and one xattr leaf buffer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2021-10-14 09:19:31 -07:00
Darrick J. Wong	4bc619833f	xfs: refactor xfs_iget calls from log intent recovery Hoist the code from xfs_bui_item_recover that igets an inode and marks it as being part of log intent recovery. The next patch will want a common function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-08-09 15:57:59 -07:00
Darrick J. Wong	43059d5416	xfs: dump log intent items that cannot be recovered due to corruption If we try to recover a log intent item and the operation fails due to filesystem corruption, dump the contents of the item to the log for further analysis. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-08-09 11:13:17 -07:00
Sami Tolvanen	4f0f586bf0	treewide: Change list_sort to use const pointers list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com	2021-04-08 16:04:22 -07:00
Chandan Babu R	85ef08b5a6	xfs: Check for extent overflow when punching a hole The extent mapping the file offset at which a hole has to be inserted will be split into two extents causing extent count to increase by 1. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2021-01-22 16:54:47 -08:00
Chandan Babu R	727e1acd29	xfs: Check for extent overflow when trivally adding a new extent When adding a new data extent (without modifying an inode's existing extents) the extent count increases only by 1. This commit checks for extent count overflow in such cases. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2021-01-22 16:54:47 -08:00
Darrick J. Wong	33005fd0a5	xfs: refactor file range validation Refactor all the open-coded validation of file block ranges into a single helper, and teach the bmap scrubber to check the ranges. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2020-12-09 09:49:38 -08:00
Darrick J. Wong	67457eb0d2	xfs: refactor data device extent validation Refactor all the open-coded validation of non-static data device extents into a single helper. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-12-09 09:49:38 -08:00
Darrick J. Wong	67d8679bd3	xfs: improve the code that checks recovered bmap intent items The code that validates recovered bmap intent items is kind of a mess -- it doesn't use the standard xfs type validators, and it doesn't check for things that it should. Fix the validator function to use the standard validation helpers and look for more types of obvious errors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>	2020-12-09 09:49:38 -08:00

1 2 3

106 Commits