Commit Graph

83 Commits

Author SHA1 Message Date
Kent Overstreet
d5c5b337f8 bcachefs: Don't drop devices with stripe pointers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:49 -04:00
Kent Overstreet
66927b8928 bcachefs: Fix failure to return error in data_update_index_update()
This fixes an assertion pop in io_write.c - if we don't return an error
we're supposed to have completed all the btree updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-26 20:33:12 -04:00
Kent Overstreet
8cc0e50614 bcachefs: Fix "trying to move an extent, but nr_replicas=0"
data_update_init() does a bunch of complicated stuff to decide how many
replicas to add, since we only want to increase an extent's durability
on an explicit rereplicate, but extent pointers may be on devices with
different durability settings.

There was a corner case when evacuating a device that had been set to
durability=0 after data had been written to it, and extents on that
device had already been rereplicated - then evacuate only needs to drop
pointers on that device, not move them.

So the assert for !m->op.nr_replicas was spurious; this was a perfectly
legitimate case that needed to be handled.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 02:07:22 -04:00
Kent Overstreet
3f53d05041 bcachefs: bch2_data_update_init() cleanup
Factor out some helpers - this function has gotten much too big.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22 02:07:22 -04:00
Kent Overstreet
2102bdac67 bcachefs: Extra debug for data move path
We don't have sufficient information to debug:

https://github.com/koverstreet/bcachefs/issues/726

- print out durability of extent ptrs, when non default
- print the number of replicas we need in data_update_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-19 21:42:08 -04:00
Kent Overstreet
d97de0d017 bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
f49d2c9835 bcachefs: Warn on attempting a move with no replicas
Instead of popping an assert in bch2_write(), WARN and print out some
debugging info.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
ad8b68cd39 bcachefs: bch2_data_update_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
f2736b9c79 bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-11 18:59:08 -04:00
Kent Overstreet
d8585a79be bcachefs: bch2_dev_have_ref()
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that
we're looking up a device without checking if it's present because we
have a reference to it already.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:24 -04:00
Kent Overstreet
222eacabc1 bcachefs: kill bch2_dev_bkey_exists() in data_update_init()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
302c980a81 bcachefs: extent_ptr_durability() -> bch2_dev_rcu()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
1f2f92ec3f bcachefs: PTR_BUCKET_POS() now takes bch_dev
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:23 -04:00
Kent Overstreet
f295298b8c bcachefs: New helpers for device refcounts
This will be used in the next patch for adding some new debug mode
asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
9a768ab75b bcachefs: bch2_bkey_drop_ptrs() declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
ca563dccb2 bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()
We're about to add new asserts for btree_trans locking consistency, and
part of that requires that aren't using the btree_trans while it's
unlocked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
5dd8c60e1e bcachefs: iter/update/trigger/str_hash flag cleanup
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.

These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
5957e0a28b bcachefs: Fix rebalance from durability=0 device
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05 03:05:30 -04:00
Kent Overstreet
c42cd606e4 bcachefs: fix nocow lock deadlock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02 01:04:10 -04:00
Kent Overstreet
ec9cc18fc2 bcachefs: Add checks for invalid snapshot IDs
Previously, we assumed that keys were consistent with the snapshots
btree - but that's not correct as fsck may not have been run or may not
be complete.

This adds checks and error handling when using the in-memory snapshots
table (that mirrors the snapshots btree).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
d7e77f53e9 bcachefs: opts->compression can now also be applied in the background
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet
0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
a7dc10ce68 bcachefs: Make sure allocation failure errors are logged
The previous patch fixed a bug in allocation path error handling, and it
would've been noticed sooner had it been logged properly.

Generally speaking, errors that shouldn't happen in normal operation and
are being returned up the stack should be logged: the write path was
already logging IO errors, but non IO errors were missed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00
Daniel Hill
21e07cc966 bcachefs: remove redundant condition from data_update_index_update
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet
7464403009 bcachefs: count_event()
Small helper for event counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:39 -05:00
Kent Overstreet
25d1e39df0 bcachefs: Add a rebalance, data_update tracepoints
Add a tracepoint for rebalance, printing out
 - the target option
 - the compression option
 - the key being rebalanced

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:38 -05:00
Kent Overstreet
cb52d23e77 bcachefs: Rename BTREE_INSERT flags
BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:37 -05:00
Kent Overstreet
7b474c77da bcachefs: Fix promotes
The recent work to fix data moves w.r.t. durability broke promotes,
because the caused us to bail out when the extent minus pointers being
dropped still has enough pointers to satisfy the current number of
replicas.

Disable this check when we're adding cached replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-26 19:31:11 -05:00
Kent Overstreet
bedd6fe4d3 bcachefs: Fix nocow locks deadlock
On trylock failure we were waiting for outstanding reads to complete -
but nocow locks need to be held until the whole move is finished.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-11 20:43:11 -05:00
Kent Overstreet
131898b0cb bcachefs: Fix bch2_extent_drop_ptrs() call
Also, make bch2_extent_drop_ptrs() safer, so it works with extents and
non-extents iterators.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-04 16:04:55 -05:00
Kent Overstreet
7d9f8468ff bcachefs: Data update path won't accidentaly grow replicas
Previously, there was a bug where if an extent had greater durability
than required (because we needed to move a durability=1 pointer and
ended up putting it on a durability 2 device), we would submit a write
for replicas=2 - the durability of the pointer being rewritten - instead
of the number of replicas required to bring it back up to the
data_replicas option.

This, plus the allocation path sometimes allocating on a greater
durability device than requested, meant that extents could continue
having more and more replicas added as they were being rewritten.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25 21:48:42 -05:00
Kent Overstreet
701ff57eb3 bcachefs: Check for nonce offset inconsistency in data_update path
We've rarely been seeing a nonce offset inconsistency that doesn't show
up in tests: this adds some extra verification code to the data update
path that prints out more relevant info when it occurs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13 21:45:03 -05:00
Kent Overstreet
be9e782df3 bcachefs: Don't downgrade locks on transaction restart
We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
fb3f57bb11 bcachefs: rebalance_work
This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
 - Propagate options to indirect extents when being changed
 - Add other IO path options - nr_replicas, ec, to rebalance_work so
   they can be applied in the background when they change
 - Add a counter, for bcachefs fs usage output, showing the pending
   amount of rebalance work: we'll probably want to do this after the
   disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:05 -04:00
Kent Overstreet
96a363a7e6 bcachefs: move: move_stats refactoring
data_progress_list is gone - it was redundant with moving_context_list

The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:38 -04:00
Kent Overstreet
d5eade9345 bcachefs: move: convert to bbpos
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
633169035a bcachefs: moving_context now owns a btree_trans
btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
6bd68ec266 bcachefs: Heap allocate btree_trans
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
96dea3d599 bcachefs: Fix W=12 build errors
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:13 -04:00
Kent Overstreet
1809b8cba7 bcachefs: Break up io.c
More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:12 -04:00
Kent Overstreet
93ee2c4b21 bcachefs: Don't open code closure_nr_remaining()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:10 -04:00
Kent Overstreet
986e9842fb bcachefs: Compression levels
This allows including a compression level when specifying a compression
type, e.g.
  compression=zstd:15

Values from 1 through 15 indicate compression levels, 0 or unspecified
indicates the default.

For LZ4, values 3-15 specify that the HC algorithm should be used.

Note that for compatibility, extents themselves only include the
compression type, not the compression level. This means that specifying
the same compression algorithm but different compression levels for the
compression and background_compression options will have no effect.

XXX: perhaps we could add a warning for this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:07 -04:00
Kent Overstreet
f33c58fc46 bcachefs: Kill BTREE_INSERT_USE_RESERVE
Now that we have journal watermarks and alloc watermarks unified,
BTREE_INSERT_USE_RESERVE is redundant and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
e53a961c6b bcachefs: Rename enum alloc_reserve -> bch_watermark
This is prep work for consolidating with JOURNAL_WATERMARK.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
91ecd41b7f bcachefs: bch2_extent_ptr_desired_durability()
This adds a new helper for getting a pointer's durability irrespective
of the device state, and uses it in the the data update path.

This fixes a bug where we do a data update but request 0 replicas to be
allocated, because the replica being rewritten is on a device marked as
failed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
ad520141b1 bcachefs: Fix corruption with writeable snapshots
When partially overwriting an extent in an older snapshot, the existing
extent has to be split.

If the existing extent was overwritten in a different (sibling)
snapshot, we have to ensure that the split won't be visible in the
sibling snapshot.

data_update.c already has code for this,
bch2_insert_snapshot_writeouts() - we just need to move it into
btree_update_leaf.c and change bch2_trans_update_extent() to use it as
well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:03 -04:00
Kent Overstreet
c26463ce99 bcachefs: Fix move_extent_fail counter
fail counters need to be events, not numbers of sectors - or the
calculations the tests use for determining if we've had too many
slowpath events don't work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:02 -04:00
Kent Overstreet
bcb79a51cb bcachefs: bch2_bkey_get_iter() helpers
Introduce new helpers for a common pattern:

  bch2_trans_iter_init();
  bch2_btree_iter_peek_slot();

 - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of
   the correct type
 - bch2_bkey_get_val_typed() copies the val out of the btree to a
   (typically stack allocated) variable; it handles the case where the
   value in the btree is smaller than the current version of the type,
   zeroing out the remainder.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet
5a21764db1 bcachefs: Improve move path tracepoints
Move path tracepoints now include the key being moved. Also, add new
tracepoints for the start of move_extent, and evacuate_bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00