forked from Minki/linux
btrfs: add a comment explaining the data flush steps
The data flushing steps are not obvious to people other than myself and Chris. Write a giant comment explaining the reasoning behind each flush step for data as well as why it is in that particular order. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
5705674081
commit
1a7a92c8dd
@ -998,6 +998,53 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work)
|
||||
} while (flush_state <= COMMIT_TRANS);
|
||||
}
|
||||
|
||||
/*
|
||||
* FLUSH_DELALLOC_WAIT:
|
||||
* Space is freed from flushing delalloc in one of two ways.
|
||||
*
|
||||
* 1) compression is on and we allocate less space than we reserved
|
||||
* 2) we are overwriting existing space
|
||||
*
|
||||
* For #1 that extra space is reclaimed as soon as the delalloc pages are
|
||||
* COWed, by way of btrfs_add_reserved_bytes() which adds the actual extent
|
||||
* length to ->bytes_reserved, and subtracts the reserved space from
|
||||
* ->bytes_may_use.
|
||||
*
|
||||
* For #2 this is trickier. Once the ordered extent runs we will drop the
|
||||
* extent in the range we are overwriting, which creates a delayed ref for
|
||||
* that freed extent. This however is not reclaimed until the transaction
|
||||
* commits, thus the next stages.
|
||||
*
|
||||
* RUN_DELAYED_IPUTS
|
||||
* If we are freeing inodes, we want to make sure all delayed iputs have
|
||||
* completed, because they could have been on an inode with i_nlink == 0, and
|
||||
* thus have been truncated and freed up space. But again this space is not
|
||||
* immediately re-usable, it comes in the form of a delayed ref, which must be
|
||||
* run and then the transaction must be committed.
|
||||
*
|
||||
* FLUSH_DELAYED_REFS
|
||||
* The above two cases generate delayed refs that will affect
|
||||
* ->total_bytes_pinned. However this counter can be inconsistent with
|
||||
* reality if there are outstanding delayed refs. This is because we adjust
|
||||
* the counter based solely on the current set of delayed refs and disregard
|
||||
* any on-disk state which might include more refs. So for example, if we
|
||||
* have an extent with 2 references, but we only drop 1, we'll see that there
|
||||
* is a negative delayed ref count for the extent and assume that the space
|
||||
* will be freed, and thus increase ->total_bytes_pinned.
|
||||
*
|
||||
* Running the delayed refs gives us the actual real view of what will be
|
||||
* freed at the transaction commit time. This stage will not actually free
|
||||
* space for us, it just makes sure that may_commit_transaction() has all of
|
||||
* the information it needs to make the right decision.
|
||||
*
|
||||
* COMMIT_TRANS
|
||||
* This is where we reclaim all of the pinned space generated by the previous
|
||||
* two stages. We will not commit the transaction if we don't think we're
|
||||
* likely to satisfy our request, which means if our current free space +
|
||||
* total_bytes_pinned < reservation we will not commit. This is why the
|
||||
* previous states are actually important, to make sure we know for sure
|
||||
* whether committing the transaction will allow us to make progress.
|
||||
*/
|
||||
static const enum btrfs_flush_state data_flush_states[] = {
|
||||
FLUSH_DELALLOC_WAIT,
|
||||
RUN_DELAYED_IPUTS,
|
||||
|
Loading…
Reference in New Issue
Block a user