Commit Graph

98 Commits

Author SHA1 Message Date
Kent Overstreet
cb6055e66f bcachefs: Handle race between stripe reuse, invalidate_stripe_to_dev
When creating a new stripe, we may reuse an existing stripe that has
some empty and some nonempty blocks.

Generally, the existing stripe won't change underneath us - except for
block sector counts, which we copy to the new key in
ec_stripe_key_update.

But the device removal path can now invalidate stripe pointers to a
device, and that can race with stripe reuse.

Change ec_stripe_key_update() to check for and resolve this
inconsistency.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-13 22:03:03 -04:00
Kent Overstreet
d5c5b337f8 bcachefs: Don't drop devices with stripe pointers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21 11:39:49 -04:00
Julian Sun
26c0900d85 bcachefs: remove the unused parameter in macro bkey_crc_next
In the macro definition of bkey_crc_next, five parameters
were accepted, but only four of them were used. Let's remove
the unused one.

The patch has only passed compilation tests, but it should be fine.

Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:48 -04:00
Kent Overstreet
df88febc20 bcachefs: Simplify bch2_bkey_drop_ptrs()
bch2_bkey_drop_ptrs() had a some complicated machinery for avoiding
O(n^2) when dropping multiple pointers - but when n is only going to be
~4, it's not worth it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:46 -04:00
Kent Overstreet
49aa783039 bcachefs: Fix rebalance_work accounting
rebalance_work was keying off of the presence of rebelance_opts in the
extent - but that was incorrect, we keep those around after rebalance
for indirect extents since the inode's options are not directly
available

Fixes: 20ac515a9c ("bcachefs: bch_acct_rebalance_work")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-24 10:16:21 -04:00
Kent Overstreet
d97de0d017 bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
a2cb8a6236 bcachefs: Self healing on read IO error
This repurposes the promote path, which already knows how to call
data_update() after a read: we now automatically rewrite bad data when
we get a read error and then successfully retry from a different
replica.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
9d9d212e26 bcachefs: bch2_extent_crc_unpacked_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
65eaf4e24a bcachefs: s/bkey_invalid_flags/bch_validate_flags
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
9a768ab75b bcachefs: bch2_bkey_drop_ptrs() declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
4409b8081d bcachefs: Repair pass for scanning for btree nodes
If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.

Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.

This implements the scanning - next patch will rework topology repair to
make use of the found nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03 14:44:18 -04:00
Kent Overstreet
47d2080e30 bcachefs: Kill bch2_bkey_ptr_data_type()
Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
264b501f8f bcachefs: Avoid extent entry type assertions in .invalid()
After keys have passed bkey_ops.key_invalid we should never see invalid
extent entry types - but .key_invalid itself needs to cope with them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
88005d5dfb bcachefs: extent_entry_next_safe()
We need to be able to iterate over extent ptrs that may be corrupted in
order to print them - this fixes a bug where we'd pop an assert in
bch2_bkey_durability_safe().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:12:13 -04:00
Kent Overstreet
d7e77f53e9 bcachefs: opts->compression can now also be applied in the background
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
Kent Overstreet
f0431c5f47 bcachefs: Combine .trans_trigger, .atomic_trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
4f9ec59f8f bcachefs: unify extent trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
6cacd0c414 bcachefs: unify reservation trigger
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:20 -05:00
Kent Overstreet
0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
037a2d9f48 bcachefs: simplify bch_devs_list
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
b65db750e2 bcachefs: Enumerate fsck errors
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.

Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:08 -04:00
Kent Overstreet
fb3f57bb11 bcachefs: rebalance_work
This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
 - Propagate options to indirect extents when being changed
 - Add other IO path options - nr_replicas, ec, to rebalance_work so
   they can be applied in the background when they change
 - Add a counter, for bcachefs fs usage output, showing the pending
   amount of rebalance work: we'll probably want to do this after the
   disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01 21:11:05 -04:00
Kent Overstreet
9db2f86060 bcachefs: Check for too-large encoded extents
We don't yet repair (split) them, just check.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kent Overstreet
e38356d65e bcachefs: Kill dead code extent_save()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31 12:18:37 -04:00
Kees Cook
7413ab70cb bcachefs: Refactor memcpy into direct assignment
The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake
flexible array which looks to the compiler like it has 0 size. This
causes W=1 builds to emit warnings due to -Wstringop-overflow:

   In file included from include/linux/string.h:254,
                    from include/linux/bitmap.h:11,
                    from include/linux/cpumask.h:12,
                    from include/linux/smp.h:13,
                    from include/linux/lockdep.h:14,
                    from include/linux/radix-tree.h:14,
                    from include/linux/backing-dev-defs.h:6,
                    from fs/bcachefs/bcachefs.h:182:
   fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr':
   include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
      57 | #define __underlying_memcpy     __builtin_memcpy
         |                                 ^
   include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy'
     648 |         __underlying_##op(p, q, __fortify_size);                        \
         |         ^~~~~~~~~~~~~
   include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk'
     693 | #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
         |                          ^~~~~~~~~~~~~~~~~~~~
   fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy'
     235 |                 memcpy((void *) &k->v + bkey_val_bytes(&k->k),
         |                 ^~~~~~
   fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0
     287 |                 struct bch_val  v;
         |                                 ^

Avoid making any structure changes and just replace the u64 copy into a
direct assignment, side-stepping the entire problem.

Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Cc: linux-bcachefs@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/
Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Kent Overstreet
be47e0ba4f bcachefs: KEY_TYPE_error now counts towards i_sectors
KEY_TYPE_error is used when all replicas in an extent are marked as
failed; it indicates that data was present, but has been lost.

So that i_sectors doesn't change when replacing extents with
KEY_TYPE_error, we now have to count error keys as allocations - this
fixes fsck errors later.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Kent Overstreet
88d39fd544 bcachefs: Switch to unsafe_memcpy() in a few places
The new fortify checking doesn't work for us in all places; this
switches to unsafe_memcpy() where appropriate to silence a few
warnings/errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:16 -04:00
Kent Overstreet
8726dc936f bcachefs: Change check for invalid key types
As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.

This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
73bd774d28 bcachefs: Assorted sparse fixes
- endianness fixes
 - mark some things static
 - fix a few __percpu annotations
 - fix silent enum conversions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:06 -04:00
Kent Overstreet
2766876d5d bcachefs: struct bch_extent_rebalance
This adds the extent entry for extents that rebalance needs to do
something with.

We're adding this ahead of the main rebalance_work patchset, because
adding new extent entries can't be done in a forwards-compatible way.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:05 -04:00
Kent Overstreet
91ecd41b7f bcachefs: bch2_extent_ptr_desired_durability()
This adds a new helper for getting a pointer's durability irrespective
of the device state, and uses it in the the data update path.

This fixes a bug where we do a data update but request 0 replicas to be
allocated, because the replica being rewritten is on a device marked as
failed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:04 -04:00
Kent Overstreet
174f930b8e bcachefs: bkey_ops.min_val_size
This adds a new field to bkey_ops for the minimum size of the value,
which standardizes that check and also enforces the new rule (previously
done somewhat ad-hoc) that we can extend value types by adding new
fields on to the end.

To make that work we do _not_ initialize min_val_size with sizeof,
instead we initialize it to the size of the first version of those
values.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:10:00 -04:00
Kent Overstreet
702ffea204 bcachefs: Extent helper improvements
- __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available
   outside extents.

 - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and
   non const versions

 - bch2_extent_has_ptr() now returns the pointer it found

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:56 -04:00
Kent Overstreet
ac2ccddc26 bcachefs: Drop some anonymous structs, unions
Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:55 -04:00
Kent Overstreet
64784ade4f bcachefs: Fix buffer overrun in ec_stripe_update_extent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet
c9163bb03b bcachefs: Cached pointers should not be erasure coded
There's no reason to erasure code cached pointers: we'll always have
another copy, and it'll be cheaper to read the other copy than do a
reconstruct read. And erasure coded cached pointers would add
complications that we'd rather not have to deal with, so let's make sure
to disallow them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:54 -04:00
Kent Overstreet
facafdcbc1 bcachefs: Change bkey_invalid() rw param to flags
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
a8b3a677e7 bcachefs: Nocow support
This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

 - New boolean filesystem and inode option, nocow: note that when nocow
   is enabled, data checksumming and compression are implicitly disabled

 - To prevent in-place writes from racing with data moves
   (data_update.c) or bucket reuse (i.e. a bucket being reused and
   re-allocated while a nocow write is in flight, we have a new locking
   mechanism.

   Buckets can be locked for either data update or data move, using a
   fixed size hash table of two_state_shared locks. We don't have any
   chaining, meaning updates and moves to different buckets that hash to
   the same lock will wait unnecessarily - we'll want to watch for this
   becoming an issue.

 - The allocator path also needs to check for in-place writes in flight
   to a given bucket before giving it out: thus we add another counter
   to bucket_alloc_state so we can track this.

 - Fsync now may need to issue cache flushes to block devices instead of
   flushing the journal. We add a device bitmask to bch_inode_info,
   ei_devs_need_flush, which tracks devices that need to have flushes
   issued - note that this will lead to unnecessary flushes when other
   codepaths have already issued flushes, we may want to replace this with
   a sequence number.

 - New nocow write path: look up extents, and if they're writable write
   to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
792031116b bcachefs: Unwritten extents support
- bch2_extent_merge checks unwritten bit
 - read path returns 0s for unwritten extents without actually reading
 - reflink path skips over unwritten extents
 - bch2_bkey_ptrs_invalid() checks for extents with both written and
   unwritten extents, and non-normal extents (stripes, btree ptrs) with
   unwritten ptrs
 - fiemap checks for unwritten extents and returns
   FIEMAP_EXTENT_UNWRITTEN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
393a1f6863 bcachefs: Better inlining in core write path
Provide inline versions of some allocation functions
 - bch2_alloc_sectors_done_inlined()
 - bch2_alloc_sectors_append_ptrs_inlined()

and use them in the core IO path.

Also, inline bch2_extent_update_i_size_sectors() and
bch2_bkey_append_ptr().

In the core write path, function call overhead matters - every function
call is a jump to a new location and a potential cache miss.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
e88a75ebe8 bcachefs: New bpos_cmp(), bkey_cmp() replacements
This patch introduces
 - bpos_eq()
 - bpos_lt()
 - bpos_le()
 - bpos_gt()
 - bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:47 -04:00
Kent Overstreet
a101957649 bcachefs: More style fixes
Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
7f5c5d20f0 bcachefs: Redo data_update interface
This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
275c8426fb bcachefs: Add rw to .key_invalid()
This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
f0ac7df23d bcachefs: Convert .key_invalid methods to printbufs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
880e2275f9 bcachefs: Move trigger fns to bkey_ops
This replaces the switch statements in bch2_mark_key(),
bch2_trans_mark_key() with new bkey methods - prep work for the next
patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
b9a7d8ac5f bcachefs: Fix implementation of KEY_TYPE_error
When force-removing a device, we were silently dropping extents that we
no longer had pointers for - we should have been switching them to
KEY_TYPE_error, so that reads for data that was lost return errors.

This patch adds the logic for switching a key to KEY_TYPE_error to
bch2_bkey_drop_ptr(), and improves the logic somewhat.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:13 -04:00
Kent Overstreet
6fed42bb77 bcachefs: Plumb through subvolume id
To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
297d89343d bcachefs: Extensive triggers cleanups
- We no longer mark subsets of extents, they're marked like regular
   keys now - which means we can drop the offset & sectors arguments
   to trigger functions
 - Drop other arguments that are no longer needed anymore in various
   places - fs_usage
 - Drop the logic for handling extents in bch2_mark_update() that isn't
   needed anymore, to match bch2_trans_mark_update()
 - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
   don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
Kent Overstreet
59ba21d99f bcachefs: Clean up key merging
This patch simplifies the key merging code by getting rid of partial
merges - it's simpler and saner if we just don't merge extents when
they'd overflow k->size.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00