Now that we have dynamically resizable btree paths,
check_directory_structure() can check one path - inode up to the root -
in a single transaction.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume
to look up lost+found, but if we're in an interior node snapshot that
didn't make any sense.
Instead, this adds a dirent path for creating in a specific snapshot,
skipping the subvolume; and we also make sure to create lost+found in
the root snapshot, to avoid conflicts with lost+found being created in
overlapping snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
check_root() is simple enough to run as one single transaction, so is
trivial to run online.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.
The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When searching the link table for the matching inode, we were searching
for a specific - incorrect - snapshot ID as well, causing us to fail to
find the inode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.
Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Be a bit more careful about when bch2_delete_dead_snapshots needs to
run: it only needs to run synchronously if we're running fsck, and it
only needs to run at all if we have snapshot nodes to delete or if fsck
has noticed that it needs to run.
Also:
Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS
Kill bch2_delete_dead_snapshots_hook(), move functionality to
bch2_mark_snapshot()
Factor out bch2_check_snapshot_needs_deletion(), to explicitly check
if we need to be running snapshot deletion.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
These errors aren't actual errors, and should never be printed - do this
in the common helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.
But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we handle a transaction restart in a nested context, we need to
return -BCH_ERR_transaction_restart_nested because we invalidated the
outer context's iterators and locks.
bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch
fixes it to use trans_was_restarted().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If fsck finds a key that needs work done, the primary example being an
unlinked inode that needs to be deleted, and the key is in an internal
snapshot node, we have a bit of a conundrum.
The conundrum is that internal snapshot nodes are shared, and we in
general do updates in internal snapshot nodes because there may be
overwrites in some snapshots and not others, and this may affect other
keys referenced by this key (i.e. extents).
For example, we might be seeing an unlinked inode in an internal
snapshot node, but then in one child snapshot the inode might have been
reattached and might not be unlinked. Deleting the inode in the internal
snapshot node would be wrong, because then we'll delete all the extents
that the child snapshot references.
But if an unlinked inode does not have any overwrites in child
snapshots, we're fine: the inode is overwrritten in all child snapshots,
so we can do the deletion at the point of comonality in the snapshot
tree, i.e. the node where we found it.
This patch adds a new helper, bch2_propagate_key_to_snapshot_leaves(),
to handle the case where we need a to update a key that does have
overwrites in child snapshots: we copy the key to leaf snapshot nodes,
and then rewind fsck and process the needed updates there.
With this, fsck can now always correctly handle unlinked inodes found in
internal snapshot nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A number of smallish fixes for overlapping extent repair, and (part of)
a new unit test. This fixes all the issues turned up by bhzhu203, in his
filesystem image from running mongodb + snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This introduces bch2_run_explicit_recovery_pass() and uses it for when
fsck detects that we need to re-run dead snaphots cleanup, and makes
dead snapshot cleanup more like a normal recovery pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We currently don't track whether snapshot cleanup still needs to finish
(aside from running a full fsck), so it shouldn't be a fsck error yet -
fsck -n after fsck has succesfully completed shouldn't error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Delete a redundant bch2_snapshot_is_ancestor() check, and convert some
assertions to debug assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Make the overlapping extent check/repair code more self contained.
This is prep work for hopefully reducing key_visible_in_snapshot() usage
here as well, and also includes a nice performance optimization to not
check ref_visible2() unless the extents potentially overlap.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This changes the main part of check_extents(), that checks the extent
against the corresponding inode, to not use key_visible_in_snapshot().
key_visible_in_snapshot() has to iterate over the list of ancestor
overwrites repeatedly calling bch2_snapshot_is_ancestor(), so this is a
significant performance improvement.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
More prep work for reducing key_visible_in_snapshot() usage - this
rearranges how KEY_TYPE_whitout keys are handled, so that they can be
marked off in inode_warker->inode->seen_this_pos.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We only want to synthesize an inode for the current snapshot ID for non
whiteouts - this refactoring lets us call walk_inode() earlier and clean
up some control flow.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Minor refactoring/dead code deletion, prep work for reworking
check_extent() to avoid key_visible_in_snapshot().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>