This probably can't happen even with a corrupt file system, because we
would have failed much earlier on than here. However there's no reason
we can't just check and bail out as appropriate, so do that and convert
the correctness BUG_ON() to an ASSERT().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
If we have a duplicate entry for a reloc root then we could have fs
corruption that resulted in a double allocation. Since this shouldn't
happen unless there is corruption, add an ASSERT(ret != -EEXIST) to all
of the callers of __add_reloc_root() to catch any logic mistakes for
developers, otherwise normal error handling will happen for normal
users.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can already handle errors appropriately from this function, deal with
an error coming from __add_reloc_root appropriately.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
We already handle some errors in this function, and the callers do the
correct error handling, so clean up the rest of the function to do the
appropriate error handling.
There's a little extra work that needs to be done here, as we create the
inode item before we create the orphan item. We could potentially add
the orphan item, but if we failed to create the inode item we would have
to abort the transaction.
Instead add a helper to delete the inode item we created in the case
that we're unable to look up the inode (this would likely be caused by
an ENOMEM), which if it succeeds means we can avoid a transaction abort
in this particular error case.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These checks are all taken care of for us by the tree checker code:
- the flags don't change or are updated consistently
- the v0 extent item format is invalid and caught in many other places
too
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
We need to validate that a data extent item does not have the
FULL_BACKREF flag set on its flags.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can already deal with errors appropriately from do_relocation, simply
handle any errors that come from changing the refs at this point
cleanly. We have to abort the transaction if we fail here as we've
modified metadata at this point.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If any of the reference count manipulation stuff fails in replace_path
we need to abort the transaction, as we've modified the blocks already.
We can simply break at this point and everything will be cleaned up.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The search can fail for various reasons, in case of errors there's no
cleanup to be done so we can pass the error to the caller, adjusting for
the case where the key is not found and search slot returns 1.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
If we error out COWing the root node when doing a replace_path then we
simply unlock and free the buffer and return the error.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A few BUG_ON()'s in replace_path are purely to keep us from making
logical mistakes, so replace them with ASSERT()'s.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We call btrfs_update_root in btrfs_update_reloc_root, which can fail for
all sorts of reasons, including IO errors. Instead of panicing the box
lets return the error, now that all callers properly handle those
errors.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_update_reloc_root will will return errors in the future, so handle
an error properly in prepare_to_merge.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_update_reloc_root will will return errors in the future, so handle
the error properly in insert_dirty_subvol.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This will be able to return errors in the future, so change it to return
an error and handle the errors appropriately.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_update_reloc_root will will return errors in the future, so handle
the error properly in commit_fs_roots.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we fail to setup a root->reloc_root in a different thread that path
will error out, however it still leaves root->reloc_root NULL but would
still appear set up in the transaction. Subsequent calls to
btrfs_record_root_in_transaction would succeed without attempting to
create the reloc root, as the transid has already been updated.
Handle this case by making sure we have a root->reloc_root set after a
btrfs_record_root_in_transaction call so we don't end up dereferencing a
NULL pointer.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We do memory allocations here, read blocks from disk, all sorts of
operations that could easily fail at any given point. Instead of
panicing the box, simply return the error back up the chain, all callers
at this point have proper error handling.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
create_reloc_root will return errors in the future, and __add_reloc_root
can return ENOMEM or EEXIST, so handle these errors properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
We can create a reloc root when we record the root in the trans, which
can fail for all sorts of different reasons. Propagate this error up
the chain of callers. Future patches will fix the callers of
btrfs_record_root_in_trans() to handle the error.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
record_root_in_trans can currently fail, so handle this failure
properly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
record_root_in_trans can fail currently, handle this failure properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
record_root_in_trans can fail currently, so handle this failure
properly.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in start_transaction.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in relocate_tree_block.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in create_subvol.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_recover_log_trees.
This appears tricky, however we have a reference count on the
destination root, so if this fails we need to continue on in the loop to
make sure the proper cleanup is done.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_delete_subvolume.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_rename.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_rename_exchange.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Generally speaking this shouldn't ever fail, the corresponding fs root
for the reloc root will already be in memory, so we won't get ENOMEM
here.
However if there is no corresponding root for the reloc root then we
could get ENOMEM when we try to allocate it or we could get ENOENT
when we look it up and see that it doesn't exist.
Convert these BUG_ON()'s into ASSERT()'s and add proper error handling
for the case of corruption.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will record the fs root or the reloc root in the trans in
select_reloc_root. These will actually return errors in the following
patches, so check their return value here and return it up the stack.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have several BUG_ON()'s in select_reloc_root() that can be tripped if
there is an extent tree corruption. Convert these to ASSERT()'s, because
if we hit it during testing it really is bad, or could indicate a
problem with the backref walking code.
However if users hit these problems it generally indicates corruption,
I've hit a few machines in the fleet that trip over these with clearly
corrupted extent trees, so be nice and print out an error message and
return an error instead of bringing the whole box down.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently select_reloc_root() doesn't return an error, but followup
patches will make it possible for it to return an error. We do have
proper error recovery in do_relocation however, so handle the
possibility of select_reloc_root() having an error properly instead of
BUG_ON(!root).
I've also adjusted select_reloc_root() to return ERR_PTR(-ENOENT) if we
don't find a root, instead of NULL, to make the error case easier to
deal with. I've replaced the BUG_ON(!root) with an ASSERT(0) for this
case as it indicates we messed up the backref walking code, but it could
also indicate corruption.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a couple of BUG_ON()'s in relocate_tree_block() that can be
tripped if we have file system corruption. Convert these to ASSERT()'s
so developers still get yelled at when they break the backref code, but
error out nicely for users so the whole box doesn't go down.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A few of these are checking for correctness, and won't be triggered by
corrupted file systems, so convert them to ASSERT() instead of BUG_ON()
and add a comment explaining their existence.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Implement readahead_batch_length() to determine the number of bytes in
the current batch of readahead pages and use it in btrfs. Also use the
readahead_pos to get the offset.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two forward declarations deep in extent_io.h, move them
to the beginning and remove the duplicate one.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch adds an overview how btrfs subpage support works:
- limitations
- behavior
- basic implementation points
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current set_btree_ioerr() only accepts @page parameter and grabs extent
buffer from page::private. This works fine for sector size == PAGE_SIZE
case, but not for subpage case.
Add an extra parameter, @eb, for callers to pass extent buffer to this
function, so that subpage code can reuse this function.
And also add subpage special handling to update
btrfs_subpage::error_bitmap.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For set_extent_buffer_dirty() to support subpage sized metadata, just
call btrfs_page_set_dirty() to handle both cases.
For clear_extent_buffer_dirty(), it needs to clear the page dirty if and
only if all extent buffers in the page range are no longer dirty.
Also do the same for page error.
This is pretty different from the existing clear_extent_buffer_dirty()
routine, so add a new helper function,
clear_subpage_extent_buffer_dirty() to do this for subpage metadata.
Also since the main part of clearing page dirty code is still the same,
extract that into btree_clear_page_dirty() so that it can be utilized
for both cases.
But there is a special race between set_extent_buffer_dirty() and
clear_extent_buffer_dirty(), where we can clear the page dirty.
[POSSIBLE RACE WINDOW]
For the race window between clear_subpage_extent_buffer_dirty() and
set_extent_buffer_dirty(), due to the fact that we can't call
clear_page_dirty_for_io() under subpage spin lock, we can race like
below:
T1 (eb1 in the same page) | T2 (eb2 in the same page)
-------------------------------+------------------------------
set_extent_buffer_dirty() | clear_extent_buffer_dirty()
|- was_dirty = false; | |- clear_subpagE_extent_buffer_dirty()
| | |- btrfs_clear_and_test_dirty()
| | | Since eb2 is the last dirty page
| | | we got:
| | | last == true;
| | |
|- btrfs_page_set_dirty() | |
| We set the page dirty and | |
| subpage dirty bitmap | |
| | |- if (last)
| | | Since we don't have subpage lock
| | | held, now @last is no longer
| | | correct
| | |- btree_clear_page_dirty()
| | Now PageDirty == false, even if
| | we have dirty_bitmap not zero.
|- ASSERT(PageDirty()); |
^^^^ CRASH
The solution here is to also lock the eb->pages[0] for subpage case of
set_extent_buffer_dirty(), to prevent racing with
clear_extent_buffer_dirty().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are quite some assert checks on page uptodate in extent buffer
write accessors. They ensure the destination page is already uptodate.
This is fine for regular sector size case, but not for subpage case, as
for subpage we only mark the page uptodate if the page contains no hole
and all its extent buffers are uptodate.
So instead of checking PageUptodate(), for subpage case we check the
uptodate bitmap of btrfs_subpage structure.
To make the check more elegant, introduce a helper,
assert_eb_page_uptodate() to do the check for both subpage and regular
sector size cases.
The following functions are involved:
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer()
- extent_buffer_test_bit()
- extent_buffer_bitmap_set()
- extent_buffer_bitmap_clear()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In alloc_extent_buffer(), we make sure that the newly allocated page is
never dirty.
This is fine for sector size == PAGE_SIZE case, but for subpage it's
possible that one extent buffer in the page is dirty, thus the whole
page is marked dirty, and could cause false alert.
To support subpage, call btrfs_page_test_dirty() to handle both cases.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a new helper, csum_dirty_subpage_buffers(), to iterate through all
dirty extent buffers in one bvec.
Also extract the code of calculating csum for one extent buffer into
csum_one_extent_buffer(), so that both the existing csum_dirty_buffer()
and the new csum_dirty_subpage_buffers() can reuse the same routine.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For btree_set_page_dirty(), we should also check the extent buffer
sanity for subpage support.
Unlike the regular sector size case, since one page can contain multiple
extent buffers, we need to make sure there is at least one dirty extent
buffer in the page.
So this patch will iterate through the btrfs_subpage::dirty_bitmap
to get the extent buffers, and check if any dirty extent buffer in the page
range has EXTENT_BUFFER_DIRTY and proper refs.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduces the following functions to handle subpage writeback status:
- btrfs_subpage_set_writeback()
- btrfs_subpage_clear_writeback()
- btrfs_subpage_test_writeback()
These helpers can only be called when the range is ensured to be
inside the page.
- btrfs_page_set_writeback()
- btrfs_page_clear_writeback()
- btrfs_page_test_writeback()
These helpers can handle both regular sector size and subpage without
problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduce the following functions to handle subpage dirty status:
- btrfs_subpage_set_dirty()
- btrfs_subpage_clear_dirty()
- btrfs_subpage_test_dirty()
These helpers can only be called when the range is ensured to be
inside the page.
- btrfs_page_set_dirty()
- btrfs_page_clear_dirty()
- btrfs_page_test_dirty()
These helpers can handle both regular sector size and subpage without
problem.
Thus they would be used to replace PageDirty() related calls in
later patches.
There is one special point to note here, just like set_page_dirty() and
clear_page_dirty_for_io(), btrfs_*page_set_dirty() and
btrfs_*page_clear_dirty() must be called with page locked.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_invalidatepage() we re-declare @tree variable as
btrfs_ordered_inode_tree.
Since it's only used to do the spinlock, we can grab it from inode
directly, and remove the unnecessary declaration completely.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In btrfs_invalidatepage() we introduce a temporary variable, new_len, to
update ordered->truncated_len. But we can use min() to replace it
completely and no need for the variable.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Export supported sector sizes in /sys/fs/btrfs/features/supported_sectorsizes.
Currently all architectures have PAGE_SIZE, There's some disparity
between read-only and read-write support but that will be unified in the
future so there's only one file exporting the size.
The read-only support for systems with 64K pages also works for 4K
sector size.
This new sysfs interface would help eg. mkfs.btrfs to print more
accurate warnings about potentially incompatible option combinations.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>