Commit Graph

223 Commits

Author SHA1 Message Date
Thorsten Blum
fa1ab1b466 bcachefs: Annotate bch_replicas_entry_{v0,v1} with __counted_by()
Add the __counted_by compiler attribute to the flexible array members
devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.

Increment nr_devs before adding a new device to the devs array and
adjust the array indexes accordingly. Add a helper macro for adding a
new device.

In bch2_journal_read(), explicitly set nr_devs to 0.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:49 -04:00
Chen Yufan
848c3ff882 bcachefs: Convert to use jiffies macros
Use jiffies macros instead of using jiffies directly to handle wraparound.

Signed-off-by: Chen Yufan <chenyufan@vivo.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09 09:41:49 -04:00
Kent Overstreet
d97de0d017 bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.

This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.

Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13 23:00:50 -04:00
Kent Overstreet
39d5d8290c bcachefs: Improve "unable to allocate journal write" message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:16 -04:00
Kent Overstreet
b9efa9673e bcachefs: Eytzinger accumulation for accounting keys
The btree write buffer takes as input keys from the journal, sorts them,
deduplicates them, and flushes them back to the btree in sorted order.

The disk space accounting rewrite is moving accounting to normal btree
keys, with update (in this case deltas) accumulated in the write buffer
and then flushed to the btree; but this is going to increase the number
of keys handled by the write buffer by perhaps as much as a factor of
3x-5x.

The overhead from copying around and sorting this many keys would cause
a significant performance regression, but: there is huge locality in
updates to accounting keys that we can take advantage of.

Instead of appending accounting keys to the list of keys to be sorted,
this patch adds an eytzinger search tree of recently seen accounting
keys. We look up the accounting key in the eytzinger search tree and
apply the delta directly, adding it if it doesn't exist, and
periodically prune the eytzinger tree of unused entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:14 -04:00
Uros Bizjak
68573b936d bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg()
Use try_cmpxchg() family of functions instead of
cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).

Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:12 -04:00
Youling Tang
12e7ff1a1e bcachefs: Fix missing spaces in journal_entry_dev_usage_to_text
Fixed missing spaces displayed in journal_entry_dev_usage_to_text
while adjusting the display format to improve readability.

before:
```
 # bcachefs list_journal -a -t alloc:1:0 /dev/sdb
 ...
     dev_usage: dev=0free: buckets=233180 sectors=0 fragmented=0sb: buckets=13 sectors=6152 fragmented=504journal: buckets=1847 sectors=945664 fragmented=0btree: buckets=20 sectors=10240 fragmented=0user: buckets=1419 sectors=726513 fragmented=15cached: buckets=0 sectors=0 fragmented=0parity: buckets=0 sectors=0 fragmented=0stripe: buckets=0 sectors=0 fragmented=0need_gc_gens: buckets=0 sectors=0 fragmented=0need_discard: buckets=1 sectors=0 fragmented=0
```

after:
```
 # bcachefs list_journal -a -t alloc:1:0 /dev/sdb
 ...
     dev_usage: dev=0
       free: buckets=233180 sectors=0 fragmented=0
       sb: buckets=13 sectors=6152 fragmented=504
       journal: buckets=1847 sectors=945664 fragmented=0
       btree: buckets=20 sectors=10240 fragmented=0
       user: buckets=1419 sectors=726513 fragmented=15
       cached: buckets=0 sectors=0 fragmented=0
       parity: buckets=0 sectors=0 fragmented=0
       stripe: buckets=0 sectors=0 fragmented=0
       need_gc_gens: buckets=0 sectors=0 fragmented=0
       need_discard: buckets=1 sectors=0 fragmented=0
```
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14 19:00:12 -04:00
Kent Overstreet
0f6f8f7693 bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
0435773239 bcachefs: Fix journal getting stuck on a flush commit
silly race

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-10 09:53:39 -04:00
Kent Overstreet
89d21b69b4 bcachefs: Add missing bch2_journal_do_writes() call
This fixes a rare deadlock when we're doing an emergency shutdown due to
failure to do a journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23 12:55:32 -04:00
Kent Overstreet
e6b3a655ac bcachefs: Use bch2_print_string_as_lines for long err
printk strings get truncated to 1024 bytes; if we have a long error
message (journal debug info) we need to use a helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-21 10:17:07 -04:00
Kent Overstreet
65eaf4e24a bcachefs: s/bkey_invalid_flags/bch_validate_flags
We're about to start using bch_validate_flags for superblock section
validation - it's no longer bkey specific.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
99179fb898 bcachefs: Invalid devices are now checked for by fsck, not .invalid methods
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
2c91ab7262 bcachefs: bch2_dev_get_ioref() checks for device not present
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:36 -04:00
Kent Overstreet
6212ea2497 bcachefs: bch2_dev_get_ioref2(); journal_io.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09 16:23:35 -04:00
Kent Overstreet
b6d29b5869 bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:24 -04:00
Kent Overstreet
d8585a79be bcachefs: bch2_dev_have_ref()
bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that
we're looking up a device without checking if it's present because we
have a reference to it already.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:24 -04:00
Kent Overstreet
b895c70326 bcachefs: x-macroize journal flags enums
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:22 -04:00
Kent Overstreet
c8bda9f20a bcachefs: Simplify resuming of journal position
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
45150765d3 bcachefs: bch_member.last_journal_bucket
On recovery from clean shutdown we don't typically read the journal, but
we still want to avoid overwriting existing entries in the journal for
list_journal debugging.

Thus, add some fields to the member info section so we can remember
where we left off.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:21 -04:00
Kent Overstreet
f04158290d bcachefs: journal seq blacklist gc no longer has to walk btree
Since btree_ptr_v2, we no longer require the journal seq blacklist table
for skipping blacklisted bsets (btree node entries); the pointer to a
given node indicates how much data is present.

Therefore there's no longer any need for journal seq blacklist gc to
walk the btree - we can prune entries older than journal last_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:20 -04:00
Kent Overstreet
2f724563fc bcachefs: member helper cleanups
Some renaming for better consistency

bch2_member_exists	-> bch2_member_alive
bch2_dev_exists		-> bch2_member_exists
bch2_dev_exsits2	-> bch2_dev_exists
bch_dev_locked		-> bch2_dev_locked
bch_dev_bkey_exists	-> bch2_dev_bkey_exists

new helper - bch2_dev_safe

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:19 -04:00
Kent Overstreet
19391b9294 bcachefs: allow for custom action in fsck error messages
Be more explicit to the user about what we're doing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08 17:29:18 -04:00
Kent Overstreet
85ab365f7c bcachefs: Fix deadlock in journal write path
bch2_journal_write() was incorrectly waiting on earlier journal writes
synchronously; this usually worked because most of the time we'd be
running in the context of a thread that did a journal_buf_put(), but
sometimes we'd be running out of the same workqueue that completes those
prior journal writes.

Additionally, this makes sure to punt to a workqueue before submitting
preflushes - we really don't want to be calling submit_bio() in the main
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20 23:00:59 -04:00
Kent Overstreet
9abb6dd7ce bcachefs: Standardize helpers for printing enum strs with bounds checks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13 22:48:17 -04:00
Kent Overstreet
3ed94062e3 bcachefs: Improve bch2_fatal_error()
error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
62f35024b2 bcachefs: Change "accounting overran journal reservation" to a warning
This doesn't need to be a BUG_ON(); the actual serious "things break"
condition is if the whole journal write overruns the available space,
and that has a fatal error, not a BUG_ON(). This check indicates we
screwed something up, but it should be a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
f1ca1abfb0 bcachefs: pull out time_stats.[ch]
prep work for lifting out of fs/bcachefs/

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:30:35 -04:00
Brian Foster
ada02c207c bcachefs: fix lost journal buf wakeup due to improved pipelining
The journal_write_done() handler was reworked into a loop in commit
746a33c96b7a ("bcachefs: better journal pipelining"). As part of this,
the journal buffer wake was factored into a post-loop branch that
executes if at least one journal buffer has completed.

The journal buffer processing loop iterates on the journal buffer
pointer, however. This means that w refers to the last buffer processed
by the loop, which may or may not be done. This also means that if
multiple buffers are processed by the loop, only the last is awoken.
This lost wakeup behavior has lead to stalling problems in various CI
and fstests, such as generic/703.

Lift the wake into the loop so each done buffer sees a wake call as
it is processed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:26 -04:00
Kent Overstreet
7efa287526 bcachefs: Fix bch2_journal_noflush_seq()
Improved journal pipelining broke journal_noflush_seq(); it implicitly
assumed only the oldest outstanding journal buf could be in flight, but
that's no longer true.

Make this more straightforward by just setting buf->must_flush whenever
we know a journal buf is going to be flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
2cce3752ce bcachefs: split out ignore_blacklisted, ignore_not_dirty
prep work for replaying the journal backwards

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
0b5961b0d8 bcachefs: jset_entry for loops declare loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
d9290c9931 bcachefs: Fix journal_buf bitfield accesses
All jounal_buf bitfield updates must happen under the journal lock -
perhaps we should just switch these to atomic bit flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Kent Overstreet
916abefd43 bcachefs: better journal pipelining
Recently a severe performance regression was discovered, which bisected
to

  a6548c8b5e bcachefs: Avoid flushing the journal in the discard path

It turns out the old behaviour, which issued excessive journal flushes,
worked around a performance issue where queueing delays would cause the
journal to not be able to write quickly enough and stall.

The journal flushes masked the issue because they periodically flushed
the device write cache, reducing write latency for non flushes.

This patch reworks the journalling code to allow more than one
(non-flush) write to be in flight at a time. With this patch, doing 4k
random writes and an iodepth of 128, we are now able to hit 560k iops to
a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
38789c2508 bcachefs: closure per journal buf
Prep work for having multiple journal writes in flight.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
5165400275 bcachefs: bio per journal buf
Prep work for having multiple journal writes in flight.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
52f7d75e7d bcachefs: jset_entry_datetime
This gives us a way to record the date and time every journal entry was
written - useful for debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
3d3d23b341 bcachefs: improve journal entry read fsck error messages
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
a555bcf4fa bcachefs: convert journal replay ptrs to darray
Eliminates some error paths - no longer have a hardcoded
BCH_REPLICAS_MAX limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
bdec47f57f bcachefs: Journal writes should be REQ_SYNC|REQ_META
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
656f05d8bd bcachefs: Split out journal workqueue
We don't want journal write completions to be blocked behind btree
transactions - io_complete_wq is used for btree updates after data and
metadata writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10 15:34:08 -04:00
Kent Overstreet
4e07447503 bcachefs: Clamp replicas_required to replicas
This prevents going emergency read only when the user has specified
replicas_required > replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-13 20:33:38 -05:00
Christoph Hellwig
3e44f325f6 bcachefs: fix incorrect usage of REQ_OP_FLUSH
REQ_OP_FLUSH is only for internal use in the blk-mq and request based
drivers. File systems and other block layer consumers must use
REQ_OP_WRITE | REQ_PREFLUSH as documented in
Documentation/block/writeback_cache_control.rst.

While REQ_OP_FLUSH appears to work for blk-mq drivers it does not
get the proper flush state machine handling, and completely fails
for any bio based drivers, including all the stacking drivers.  The
block layer will also get a check in 6.8 to reject this use case
entirely.

[Note: completely untested, but as this never got fixed since the
original bug report in November:

   https://bugzilla.kernel.org/show_bug.cgi?id=218184

and the the discussion in December:

    https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/

this seems to be best way to force it]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-22 12:37:51 -05:00
Kent Overstreet
e58f963cec bcachefs: helpers for printing data types
We need bounds checking since new versions may introduce new data types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 06:01:45 -05:00
Kent Overstreet
4819b66e29 bcachefs: improve checksum error messages
new helpers:
 - bch2_csum_to_text()
 - bch2_csum_err_msg()

standardize our checksum error messages a bit, and print out the
checksums a bit more nicely.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05 23:24:21 -05:00
Kent Overstreet
0beebd9245 bcachefs: bkey_for_each_ptr() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:43 -05:00
Kent Overstreet
cea07a7b6a bcachefs: vstruct_for_each() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
9fea2274f7 bcachefs: for_each_member_device() now declares loop iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:42 -05:00
Kent Overstreet
73ffa53056 bcachefs: Drop journal entry compaction
Previously, we dropped empty journal entries and coalesced entries that
could be - but it's not worth the overhead; we very rarely leave unused
journal entries after getting a journal reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01 11:47:41 -05:00