The ! was obviously intended to be ~. As it is, this function does
the equivalent to: "addr[bit / 64] = 0;".
Fixes: 27fcec6c27 ("bcachefs: Clear recovery_passes_required as they complete without errors")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
sysfs is limited to PAGE_SIZE, and when we're debugging strange
deadlocks/priority inversions we need to see the full list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Snapshot table accesses generally need to be checking for invalid
snapshot ID now, fix one that was missed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If an inode is missing, but corresponding extents and dirent still
exist, it's well worth recreating it - this does so.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In backpointer repair, if we get a missing backpointer - but there's
already a backpointer that points to an existing extent - we've got
multiple extents that point to the same space and need to decide which
to keep.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When the snapshots btree is going, we'll have to delete huge amounts of
data - unless we can reconstruct it by looking at the keys that refer to
it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the new btree node scan code, we can now recover from corrupt btree
roots - simply create a new fake root at depth 1, and then insert all
the leaves we found.
If the root wasn't corrupt but there's corruption elsewhere in the
btree, we can fill in holes as needed with the newest version of a given
node(s) from the scan; we also check if a given btree node is older than
what we found from the scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.
Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.
This implements the scanning - next patch will rework topology repair to
make use of the found nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide
eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t
and cmp_r_func_t callbacks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.
If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.
Expected mean_and_variance_get_mean(mv) == mean[i], but
mean_and_variance_get_mean(mv) == 22 (0x16)
mean[i] == 10 (0xa)
Drop the bad tests.
Fixes: 65bc410907 ("mean and variance: More tests")
Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.
One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
overlapping extent repair was colliding with extent past end of inode
checks - don't update "extent ends at" until we know we have an extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If something is wrong with a logged op, we just want to delete it -
there's nothing to repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.
Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.
When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.
recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Flag that we need to run a recovery pass and run it - persistenly, so if
we crash it'll still get run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Finishing logged ops requires the filesystem to be in a reasonably
consistent state - and other fsck passes don't require it to have
completed, so just run it last.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.
So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we haven't yet allocated any btree nodes for a given btree, we
first need to call the regular split path to allocate one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When dropping keys now outside a now because we're changing the node
min/max, we need to redo the node's accounting as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
when writing file with direct_IO on bcachefs, then performance is
much lower than other fs due to write back throttle in block layer:
wbt_wait+1
__rq_qos_throttle+32
blk_mq_submit_bio+394
submit_bio_noacct_nocheck+649
bch2_submit_wbio_replicas+538
__bch2_write+2539
bch2_direct_write+1663
bch2_write_iter+318
aio_write+355
io_submit_one+1224
__x64_sys_io_submit+169
do_syscall_64+134
entry_SYSCALL_64_after_hwframe+110
add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io
Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().
This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.
Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.
Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.
In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>