linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 12:11:40 +00:00

Author	SHA1	Message	Date
Kent Overstreet	9ab55df599	bcachefs: Walk leaf to root in btree_gc Next change will move gc_alloc_start initialization into the alloc trigger, so we have to mark those first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	86d46471d5	bcachefs: Don't block journal when finishing check_allocations() Blocking the journal was needed to finish checking old style accounting, but that code is gone and it's not needed in the alloc rewrite, mark_lock is sufficient for synchronization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	5645c32ccf	bcachefs: bch2_fs_get_tree() cleanup - improve error paths - call bch2_fs_start() separately, after applying late-parsed options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	25ee25e637	bcachefs: Kill bch2_mount() Fold into bch2_fs_get_tree() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	b9efa9673e	bcachefs: Eytzinger accumulation for accounting keys The btree write buffer takes as input keys from the journal, sorts them, deduplicates them, and flushes them back to the btree in sorted order. The disk space accounting rewrite is moving accounting to normal btree keys, with update (in this case deltas) accumulated in the write buffer and then flushed to the btree; but this is going to increase the number of keys handled by the write buffer by perhaps as much as a factor of 3x-5x. The overhead from copying around and sorting this many keys would cause a significant performance regression, but: there is huge locality in updates to accounting keys that we can take advantage of. Instead of appending accounting keys to the list of keys to be sorted, this patch adds an eytzinger search tree of recently seen accounting keys. We look up the accounting key in the eytzinger search tree and apply the delta directly, adding it if it doesn't exist, and periodically prune the eytzinger tree of unused entries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	20ac515a9c	bcachefs: bch_acct_rebalance_work Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	6af91147b6	bcachefs: bch_acct_btree Add counters for how much disk space we're using per btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	6675c37662	bcachefs: bch_acct_snapshot Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:14 -04:00
Kent Overstreet	72c2778780	bcachefs: bch2_fs_usage_base_to_text() Helper to show raw accounting in sysfs, mainly for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	f93bb76ba2	bcachefs: bch2_fs_accounting_to_text() Helper to show raw accounting in sysfs, mainly for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	91f44781d5	bcachefs: Convert bch2_compression_stats_to_text() to new accounting We no longer have to walk the whole btree to calculate compression stats. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	bfcaa9079d	bcachefs: bch_acct_compression This adds per-compression-type accounting of compressed and uncompressed size as well as number of extents - meaning we can now see compression ratio (without walking the whole filesystem). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5668e5deec	bcachefs: bch2_verify_accounting_clean() Verify that the in-memory accounting verifies the on-disk accounting after a clean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	00839addfc	bcachefs: Convert bch2_replicas_gc2() to new accounting bch2_replicas_gc2() is used for garbage collection superblock replicas entries that are empty - this converts it to the new accounting scheme. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	fb23d57a6d	bcachefs: Convert gc to new accounting Rewrite fsck/gc for the new accounting scheme. This adds a second set of in-memory accounting counters for gc to use; like with other parts of gc we run all trigger in TRIGGER_GC mode, then compare what we calculated to existing in-memory accounting at the end. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	4c4a7d48bd	bcachefs: Kill replicas_journal_res More dead code deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	66a57684c6	bcachefs: Kill fs_usage_online More dead code deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	fe5eddc0d0	bcachefs: Kill bch2_fs_usage_to_text() Dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	8bb8d683a4	bcachefs: Delete journal-buf-sharded old style accounting More deletion of dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5b9bc272e6	bcachefs: Kill writing old accounting to journal More ripping out of the old disk space accounting. Note that the new disk space accounting is incompatible with the old, and writing out old style disk space accounting with the new code is infeasible. This means upgrading and downgrading past this version requires regenerating accounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	3afb8dbf03	bcachefs: kill bch2_fs_usage_read() With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	6b39638b84	bcachefs: Convert bch2_ioctl_fs_usage() to new accounting This converts bch2_ioctl_fs_usage() to read from the new disk accounting, via bch2_fs_replicas_usage_read(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	72a6bb098c	bcachefs: Kill bch2_fs_usage_initialize() Deleting code for the old disk accounting scheme. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	f5095b9f85	bcachefs: dev_usage updated by new accounting Reading disk accounting now requires an eytzinger lookup (see: bch2_accounting_mem_read()), but the per-device counters are used frequently enough that we'd like to still be able to read them with just a percpu sum, as in the old code. This patch special cases the device counters; when we update in-memory accounting we also update the old style percpu counters if it's a deice counter update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	2e8d686a4a	bcachefs: Coalesce accounting keys before journal replay This fixes a performance regression in journal replay; without colaescing accounting keys we have multiple keys at the same position, which means journal_keys_peek_upto() has to skip past many overwritten keys - turning journal replay into an O(n^2) algorithm. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	1d16c605cc	bcachefs: Disk space accounting rewrite Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5d9667d1d6	bcachefs: btree write buffer knows how to accumulate bch_accounting keys Teach the btree write buffer how to accumulate accounting keys - instead of having the newer key overwrite the older key as we do with other updates, we need to add them together. Also, add a flag so that write buffer flush knows when journal replay is finished flushing accounting, and teach it to hold accounting keys until that flag is set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	9dec2a473b	bcachefs: Accumulate accounting keys in journal replay Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	2744e5c9eb	bcachefs: KEY_TYPE_accounting New key type for the disk space accounting rewrite. - Holds a variable sized array of u64s (may be more than one for accounting e.g. compressed and uncompressed size, or buckets and sectors for a given data type) - Updates are deltas, not new versions of the key: this means updates to accounting can happen via the btree write buffer, which we'll be teaching to accumulate deltas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Thomas Bertschinger	929d954330	bcachefs: use new mount API This updates bcachefs to use the new mount API: - Update the file_system_type to use the new init_fs_context() function. - Define the new fs_context_operations functions. - No longer register bch2_mount() and bch2_remount(); these are now called via the new fs_context functions. - Define a new helper type, bch2_opts_parse that includes a struct bch_opts and additionally a printbuf used to save options that can't be parsed until after the FS is opened. This enables us to parse as many options as possible prior to opening the filesystem while saving those options that need the open FS for later parsing. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	1c12d1caf8	bcachefs: Add error code to defer option parsing This introduces a new error code, option_needs_open_fs, which is used to indicate that an attempt was made to parse a mount option prior to opening a filesystem, when that mount option requires an open filesystem in order to be validated. Returning this error results in bch2_parse_one_mount_opt() saving that option for later parsing, after the filesystem is opened. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	9b7f0b5d3d	bcachefs: add printbuf arg to bch2_parse_mount_opts() Mount options that take the name of a device that may be part of a filesystem, for example "metadata_target", cannot be validated until after the filesystem has been opened. However, an attempt to parse those options may be made prior to the filesystem being opened. This change adds a printbuf parameter to bch2_parse_mount_opts() which will be used to save those mount options, when they are supplied prior to the FS being opened, so that they can be parsed later. This functionality is not currently needed, but will be used after bcachefs starts using the new mount API to parse mount options. This is because using the new mount API, we will process mount options prior to opening the FS, but the new API doesn't provide a convenient way to "replay" mount option parsing. So we save these options ourselves to accomplish this. This change also splits out the code to parse a single option into bch2_parse_one_mount_opt(), which will be useful when using the new mount API which deals with a single mount option at a time. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	7773df19c3	bcachefs: metadata version bucket_stripe_sectors New on disk format version for bch_alloc->stripe_sectors and BCH_DATA_unstriped - accounting for unstriped data in stripe buckets. Upgrade/downgrade requires regenerating alloc info - but only if erasure coding is in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	2612e29142	bcachefs: BCH_DATA_unstriped Add a new pseudo data type, to track buckets that are members of a stripe, but have unstriped data in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	55f7962da3	bcachefs: bch_alloc->stripe_sectors Add a separate counter to bch_alloc_v4 for amount of striped data; this lets us separately track striped and unstriped data in a bucket, which lets us see when erasure coding has failed to update extents with stripe pointers, and also find buckets to continue updating if we crash mid way through creating a new stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	c13d526d9d	bcachefs: check_key_has_inode() Consolidate duplicated checks for extents/dirents/xattrs - these keys should all have a corresponding inode of the correct type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	51fc436c80	bcachefs: allow passing full device path for target options The output of mount options such as "metadata_target" in `/proc/mounts` uses the full path to the device. mount(8) from util-linux uses the output from `/proc/mounts` to pass existing mount options when performing a remount, so bcachefs should accept as input the same form that it prints as output. Without this change: $ mount -t bcachefs -o metadata_target=vdb /dev/vdb /mnt $ strace mount -o remount /mnt ... fsconfig(4, FSCONFIG_SET_STRING, "metadata_target", "/dev/vdb", 0) = -1 EINVAL (Invalid argument) ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	3811f48aa3	bcachefs: bch2_printbuf_strip_trailing_newline() Add a new helper to fix inode_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	babe30fe8d	bcachefs: don't expose "read_only" as a mount option When "read_only" is exposed as a mount option, it is redundant with the standard option "ro" and gives users multiple ways to specify that a bcachefs filesystem should be mounted read-only. This presents the risk of having inconsistent options specified. This can be seen when remounting a read-only filesystem in read-write mode, using mount(8) from util-linux. Because mount(8) parses the existing mount options from `/proc/mounts` and applies them when remounting, it can end up applying both "read_only" and "rw": $ mount img -o ro /mnt $ strace mount -o remount,rw /mnt ... fsconfig(4, FSCONFIG_SET_FLAG, "read_only", NULL, 0) = 0 fsconfig(4, FSCONFIG_SET_FLAG, "rw", NULL, 0) = 0 ... Making "read_only" no longer a mount option means this edge case cannot occur. Fixes: `62719cf33c` ("bcachefs: Fix nochanges/read_only interaction") Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	03ec0927fa	bcachefs: make offline fsck set read_only fs flag A subsequent change will remove "read_only" as a mount option in favor of the standard option "ro", meaning the userspace fsck command cannot pass it to the fsck ioctl. Instead, in offline fsck, set "read_only" kernel-side without trying to parse it as a mount option. For compatibility with versions of the "bcachefs fsck" command that try to pass the "read_only" mount opt, remove it from the mount options string prior to parsing when it is present. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	652bc7fabc	bcachefs: btree_ptr_sectors_written() now takes bkey_s_c this is for the userspace metadata dump tool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	9cc8eb3098	bcachefs: Check for bsets past bch_btree_ptr_v2.sectors_written Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Uros Bizjak	68573b936d	bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg() Use try_cmpxchg() family of functions instead of cmpxchg (ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	e76a2b65b0	bcachefs: add might_sleep() annotations for fsck_err() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	546b65378d	bcachefs: fix missing include fs-common.h needs dirent.h for enum bch_rename_mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	630d565dda	bcachefs: Use filemap_read() to simplify the execution flow Using filemap_read() can reduce unnecessary code execution for non IOCB_DIRECT paths. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	da6fa380d3	bcachefs: Align the display format of `btrees/inodes/keys` Before patch: ``` #cat btrees/inodes/keys u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0: mode=40755 flags= (16300000) bi_size=0 ``` After patch: ``` #cat btrees/inodes/keys u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0: mode=40755 flags=(16300000) bi_size=0 ``` Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	12e7ff1a1e	bcachefs: Fix missing spaces in journal_entry_dev_usage_to_text Fixed missing spaces displayed in journal_entry_dev_usage_to_text while adjusting the display format to improve readability. before: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0free: buckets=233180 sectors=0 fragmented=0sb: buckets=13 sectors=6152 fragmented=504journal: buckets=1847 sectors=945664 fragmented=0btree: buckets=20 sectors=10240 fragmented=0user: buckets=1419 sectors=726513 fragmented=15cached: buckets=0 sectors=0 fragmented=0parity: buckets=0 sectors=0 fragmented=0stripe: buckets=0 sectors=0 fragmented=0need_gc_gens: buckets=0 sectors=0 fragmented=0need_discard: buckets=1 sectors=0 fragmented=0 ``` after: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0 free: buckets=233180 sectors=0 fragmented=0 sb: buckets=13 sectors=6152 fragmented=504 journal: buckets=1847 sectors=945664 fragmented=0 btree: buckets=20 sectors=10240 fragmented=0 user: buckets=1419 sectors=726513 fragmented=15 cached: buckets=0 sectors=0 fragmented=0 parity: buckets=0 sectors=0 fragmented=0 stripe: buckets=0 sectors=0 fragmented=0 need_gc_gens: buckets=0 sectors=0 fragmented=0 need_discard: buckets=1 sectors=0 fragmented=0 ``` Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	f369de8267	bcachefs: fix ei_update_lock lock ordering ei_update_lock is largely vestigal and will probably be removed, but we're not ready for that just yet. this fixes some lockdep splats with the new lockdep support for btree node locks; they're harmless, since we were taking ei_update_lock before actually locking any btree nodes, but "any btree nodes locked" are now tracked at the btree_trans level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	cdda2126ab	bcachefs: bch2_btree_reserve_cache_to_text() Add a pretty printer so the btree reserve cache can be seen in sysfs; as it pins open_buckets we need it for tracking down open_buckets issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	d06a26d24d	bcachefs: sysfs trigger_freelist_wakeup another debugging knob Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	a1e7a97f22	bcachefs: sysfs internal/trigger_journal_writes another debugging knob - trigger the journal to do ready journal writes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	26a170aa61	bcachefs: add capacity, reserved to fs_alloc_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	8a3c8303e2	bcachefs: uninline fallocate functions better stack traces Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	52fd0f9620	bcachefs: btree ids are 64 bit bitmasks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	3de8fd4a33	bcachefs: Print allocator stuck on timeout in fallocate path same as in io_write.c, if we're waiting on the allocator for an excessive amount of time, print what's going on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	1841027c7d	bcachefs: bch2_gc_btree() should not use btree_root_lock btree_root_lock is for the root keys in btree_root, not the pointers to the nodes themselves; this fixes a lock ordering issue between btree_root_lock and btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f236ea4bca	bcachefs: Set PF_MEMALLOC_NOFS when trans->locked proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f0f3e51148	bcachefs; Use trans_unlock_long() when waiting on allocator not using unlock_long() blocks key cache reclaim, and the allocator may take awhile Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	aacd897d4d	Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT" This reverts commit `86d81ec5f5`. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:01:38 -04:00
Kent Overstreet	fd80d14005	bcachefs: fix scheduling while atomic in break_cycle() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 12:59:28 -04:00
Kent Overstreet	6f692b1672	bcachefs: Fix RCU splat Reported-by: syzbot+e74fea078710bbca6f4b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 12:46:22 -04:00
Kent Overstreet	7d7f71cd87	bcachefs: Add missing bch2_trans_begin() this fixes a 'transaction should be locked' error in backpointers fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0f6f8f7693	bcachefs: Fix missing error check in journal_entry_btree_keys_validate() Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	f49d2c9835	bcachefs: Warn on attempting a move with no replicas Instead of popping an assert in bch2_write(), WARN and print out some debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	ad8b68cd39	bcachefs: bch2_data_update_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0f1f7324da	bcachefs: Log mount failure error code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	8ed58789fc	bcachefs: Fix undefined behaviour in eytzinger1_first() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Youling Tang	86d81ec5f5	bcachefs: Mark bch_inode_info as SLAB_ACCOUNT After commit `230e9fc286` ("slab: add SLAB_ACCOUNT flag"), we need to mark the inode cache as SLAB_ACCOUNT, similar to commit `5d097056c9` ("kmemcg: account for certain kmem allocations to memcg") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	b02f973e67	bcachefs: Fix bch2_inode_insert() race path for tmpfiles Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0435773239	bcachefs: Fix journal getting stuck on a flush commit silly race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	a2d23f3d91	bcachefs: io clock: run timer fns under clock lock We don't have a way to flush a timer that's executing the callback, and this is simple and limited enough in scope that we can just use the lock instead. Needed for the next patch that adds direct wakeups from the allocator to copygc, where we're now more frequently calling io_timer_del() on an expiring timer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-01 22:56:28 -04:00
Kent Overstreet	b5cbb42dc5	bcachefs: Repair fragmentation_lru in alloc_write_key() fragmentation_lru derives from dirty_sectors, and wasn't being checked. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:37:13 -04:00
Kent Overstreet	d39881d2da	bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref() We need to make sure we're not missing any fragmenation entries in the LRU BTREE after repairing ALLOC BTREE Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was only working without it before since bucket invalidation (usually) wasn't happening while fsck was running. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:35:18 -04:00
Kent Overstreet	92e1c29ae8	bcachefs: bch2_btree_write_buffer_maybe_flush() Add a new helper for checking references to write buffer btrees, where we need a flush before we definitively know we have an inconsistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:34:52 -04:00
Kent Overstreet	ef05bdf5d6	bcachefs: Add missing printbuf_tabstops_reset() calls Fixes warnings from bch2_print_allocator_stuck() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:14:18 -04:00
Kent Overstreet	67c564111f	bcachefs: Fix loop restart in bch2_btree_transactions_read() Accidental infinite loop; also fix btree_deadlock_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 21:08:48 -04:00
Kent Overstreet	1539bdf516	bcachefs: Fix bch2_read_retry_nodecode() BCH_READ_NODECODE mode - used by the move paths - really wants to use only the original rbio, but the retry path really wants to clone - oof. Make sure to copy the crc of the pointer we read from back to the original rbio, or we'll see spurious checksum errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 20:47:04 -04:00
Kent Overstreet	44ec599035	bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs On a new filesystem or device we have to allocate the journal with a bump allocator, because allocation info isn't ready yet - but when hot-adding a device that doesn't have a journal, we don't want to use that path. Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:47:31 -04:00
Kent Overstreet	a0bd30e4ea	bcachefs: Fix shift greater than integer size Reported-by: syzbot+e5292b50f1957164a4b6@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:42:22 -04:00
Kent Overstreet	600b8be5e7	bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:16:41 -04:00
Kent Overstreet	84db600016	bcachefs: Delete old faulty bch2_trans_unlock() call the unlock is now in read_extent, this fixes an assertion pop in read_from_stale_dirty_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 13:30:13 -04:00
Kent Overstreet	759b2e800f	bcachefs: Switch online_reserved shutdown assert to WARN() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 11:06:31 -04:00
Pei Li	64cd7de998	bcachefs: Fix kmalloc bug in __snapshot_t_mut When allocating too huge a snapshot table, we should fail gracefully in __snapshot_t_mut() instead of fail in kmalloc(). Reported-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1 Tested-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 20:51:14 -04:00
Kent Overstreet	64ee1431cc	bcachefs: Discard, invalidate workers are now per device There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 18:47:55 -04:00
Pei Li	472237b69d	bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc This series fix the shift-out-of-bounds issue in bch2_blacklist_entries_gc(). Instead of passing 0 to eytzinger0_first() when iterating the entries, we explicitly check 0 and initialize i to be 0. syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+835d255ad6bc7f29ee12@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=835d255ad6bc7f29ee12 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 17:53:31 -04:00
Pei Li	211c581de2	bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu Acquire fsck_error_counts_lock before accessing the critical section protected by this lock. syzbot has tested the proposed patch and the reproducer did not trigger any issue. Reported-by: syzbot+a2bc0e838efd7663f4d9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a2bc0e838efd7663f4d9 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 17:51:26 -04:00
Kuan-Wei Chiu	1fcce6b8a7	bcachefs: remove heap-related macros and switch to generic min_heap Drop the heap-related macros from bcachefs and replacing them with the generic min_heap implementation from include/linux. By doing so, code readability is improved by using functions instead of macros. Moreover, the min_heap implementation in include/linux adopts a bottom-up variation compared to the textbook version currently used in bcachefs. This bottom-up variation allows for approximately 50% reduction in the number of comparison operations during heap siftdown, without changing the number of swaps, thus making it more efficient. [visitorckw@gmail.com: fix missing assignment of minimum element] Link: https://lkml.kernel.org/r/20240602174828.1955320-1-visitorckw@gmail.com Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu Link: https://lkml.kernel.org/r/20240524152958.919343-17-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-06-24 22:25:00 -07:00
Kuan-Wei Chiu	fd60f7fe69	bcachefs: fix typo Replace 'utiility' with 'utility'. Link: https://lkml.kernel.org/r/20240524152958.919343-4-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-06-24 22:24:57 -07:00
Kent Overstreet	89d21b69b4	bcachefs: Add missing bch2_journal_do_writes() call This fixes a rare deadlock when we're doing an emergency shutdown due to failure to do a journal write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 12:55:32 -04:00
Kent Overstreet	d6b52f6828	bcachefs: Fix null ptr deref in journal_pins_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 12:07:07 -04:00
Kent Overstreet	36da8e387b	bcachefs: Add missing recalc_capacity() call This fixes filesystem size not changing on device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 10:12:51 -04:00
Kent Overstreet	1aaf5cb41b	bcachefs: Fix btree_trans list ordering The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	de611ab6fc	bcachefs: Fix race between trans_put() and btree_transactions_read() debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	18e92841e8	bcachefs: Make btree_deadlock_to_text() clearer btree_deadlock_to_text() searches the list of btree transactions to find a deadlock - when it finds one it's done; it's not like other *_read() functions that's printing each object. Factor out btree_deadlock_to_text() to make this clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	f44cc269a1	bcachefs: fix seqmutex_relock() We were grabbing the sequence number before unlock incremented it - fix this by moving the increment to seqmutex_lock() (so the seqmutex_relock() failure path skips the mutex_trylock()), and returning the sequence number from unlock(), to make the API simpler and safer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	9bd01500e4	bcachefs: Fix freeing of error pointers This fixes incorrect/missign checking of strndup_user() returns. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-22 17:22:24 -04:00
Youling Tang	bd4da0462e	bcachefs: Move the ei_flags setting to after initialization `inode->ei_flags` setting and cleaning should be done after initialization, otherwise the operation is invalid. Fixes: `9ca4853b98` ("bcachefs: Fix quota support for snapshots") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	2fe79ce7d1	bcachefs: Fix a UAF after write_super() write_super() may reallocate the superblock buffer - but bch_sb_field_ext was referencing it; don't use it after the write_super call. Reported-by: syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	e6b3a655ac	bcachefs: Use bch2_print_string_as_lines for long err printk strings get truncated to 1024 bytes; if we have a long error message (journal debug info) we need to use a helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	dd9086487c	bcachefs: Fix I_NEW warning in race path in bch2_inode_insert() discard_new_inode() is the correct interface for tearing down an indoe that was fully created but not made visible to other threads, but it expects I_NEW to be set, which we don't use. Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	504794067f	bcachefs: Replace bare EEXIST with private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	f648b6c12b	bcachefs: Fix missing alloc_data_type_set() Incorrect bucket state transition in the discard path; when incrementing a bucket's generation number that had already been discarded, we were forgetting to check if it should be need_gc_gens, not free. This was caught by the .invalid checks in the transaction commit path, causing us to go emergency read only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Youling Tang	c6cab97cdf	bcachefs: fix alignment of VMA for memory mapped files on THP With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs for read-only mmapped files, such as shared libraries. However, the kernel makes no attempt to actually align those mappings on 2MB boundaries, which makes it impossible to use those THPs most of the time. This issue applies to general file mapping THP as well as existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily fixed by using thp_get_unmapped_area for the unmapped_area function in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use. Similar to commit `b0c582233a` ("btrfs: fix alignment of VMA for memory mapped files on THP"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-20 09:14:58 -04:00
Kent Overstreet	33dfafa902	bcachefs: Fix safe errors by default i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-20 09:13:09 -04:00
Kent Overstreet	a56da69799	bcachefs: Fix bch2_trans_put() reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:34:18 -04:00
Kent Overstreet	0a2a507d40	bcachefs: set_worker_desc() for delete_dead_snapshots this is long running - help users see what's going on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	ddd118ab45	bcachefs: Fix bch2_sb_downgrade_update() Missing enum conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	2e9940d4a1	bcachefs: Handle cached data LRU wraparound We only have 48 bits for the LRU time field, which is insufficient to prevent wraparound. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	cff07e2739	bcachefs: Guard against overflowing LRU_TIME_BITS LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	1ba44217f8	bcachefs: delete_dead_snapshots() doesn't need to go RW We've been moving away from going RW lazily; if we want to go RW we do that in set_may_go_rw(), and if we didn't go RW we don't need to delete dead snapshots. Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	dbf4d79b7f	bcachefs: Fix early init error path in journal code We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	9e7cfb35e2	bcachefs: Check for invalid btree IDs We can only handle btree IDs up to 62, since the btree id (plus the type for interior btree nodes) has to fit ito a 64 bit bitmask - check for invalid ones to avoid invalid shifts later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	e3fd3faa45	bcachefs: Fix btree ID bitmasks these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	d406545613	bcachefs: Fix shift overflow in read_one_super() Reported-by: syzbot+9f74cb4006b83e2a3df1@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	3727ca5604	bcachefs: Fix a locking bug in the do_discard_fast() path We can't discard a bucket while it's still open; this needs the bucket_is_open_safe() version, which takes the open_buckets lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	d47df4f616	bcachefs: Fix array-index-out-of-bounds We use 0 size arrays as markers, but ubsan doesn't know that - cast them to a pointer to fix the splat. Also, make sure this code gets tested a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	f770a6e9a3	bcachefs: Fix initialization order for srcu barrier btree_iter_init() needs to happen before key_cache_init(), to initialize btree_trans_barrier Reported-by: syzbot+3cca837c2183f8f6fcaf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Mateusz Guzik	267574dee6	bcachefs: remove now spurious i_state initialization inode_init_always started setting the field to 0. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240611120626.513952-5-mjguzik@gmail.com Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-13 13:40:45 +02:00
Kent Overstreet	f2736b9c79	bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-11 18:59:08 -04:00
Kent Overstreet	7124a8982b	bcachefs: Add missing bch_inode_info.ei_flags init Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 20:50:14 -04:00
Kent Overstreet	b799220092	bcachefs: Add missing synchronize_srcu_expedited() call when shutting down We use the polling interface to srcu for tracking pending frees; when shutting down we don't need to wait for an srcu barrier to free them, but SRCU still gets confused if we shutdown with an outstanding grace period. Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9432e90df1	bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket() Turn more asserts into proper recoverable error paths. Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9c4acd19bb	bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks The bucket_gens array and gc_buckets array known their own size; we should be using those members, and returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	e0cb5722e1	bcachefs: Fix snapshot_create_lock lock ordering ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted ------------------------------------------------------ fio/1345 is trying to acquire lock: ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0 but task is already holding lock: ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}: down_write+0x3d/0xd0 bch2_write_iter+0x1c0/0x10f0 vfs_write+0x24a/0x560 __x64_sys_pwrite64+0x77/0xb0 x64_sys_call+0x17e5/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (sb_writers#10){.+.+}-{0:0}: mnt_want_write+0x4a/0x1d0 filename_create+0x69/0x1a0 user_path_create+0x38/0x50 bch2_fs_file_ioctl+0x315/0xbf0 __x64_sys_ioctl+0x297/0xaf0 x64_sys_call+0x10cb/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&c->snapshot_create_lock){++++}-{3:3}: __lock_acquire+0x1445/0x25b0 lock_acquire+0xbd/0x2b0 down_read+0x40/0x180 bch2_truncate+0x76/0xf0 bchfs_truncate+0x240/0x3f0 bch2_setattr+0x7b/0xb0 notify_change+0x322/0x4b0 do_truncate+0x8b/0xc0 do_ftruncate+0x110/0x270 __x64_sys_ftruncate+0x43/0x80 x64_sys_call+0x1373/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 other info that might help us debug this: Chain exists of: &c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#13); lock(sb_writers#10); lock(&sb->s_type->i_mutex_key#13); rlock(&c->snapshot_create_lock); * DEADLOCK * Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	f9035b0ce6	bcachefs: Fix refcount leak in check_fix_ptrs() fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so that our error label can drop our device ref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	bf2b356afd	bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	2760bfe388	bcachefs: Fix reporting of freed objects from key cache shrinker We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9ac3e660ca	bcachefs: set sb->s_shrinker->seeks = 0 inodes and dentries are still present in the btree node cache, in much more compact form Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	bc65e98e68	bcachefs: increase key cache shrinker batch size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	5ae67abcdf	bcachefs: Enable automatic shrinking for rhashtables Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Hongbo Li	26447d224a	bcachefs: fix the display format for show-super There are three keys displayed in non-uniform format. Let's fix them. [Before] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` [After] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` Fixes: `7423330e30` ("bcachefs: prt_printf() now respects \r\n\t") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	dab1870439	bcachefs: fix stack frame size in fsck.c fsck.c always runs top of the stack so we're not too concerned here; noinline_for_stack is sufficient Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	04f635ede8	bcachefs: Delete incorrect BTREE_ID_NR assertion for forwards compat we now explicitly allow mounting and using filesystems with unknown btrees, and we have to walk them for fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	1c8cc24eef	bcachefs: Fix incorrect error handling found_btree_node_is_readable() error handling here is slightly odd, which is why we were accidently calling evict() on an error pointer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	161f73c2c7	bcachefs: Split out btree_write_submit_wq Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	319fef29e9	bcachefs: Fix trans->locked assert in bch2_move_data_btree, we might start with the trans unlocked from a previous loop iteration - we need a trans_begin() before iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Kent Overstreet	fdccb24352	bcachefs: Rereplicate now moves data off of durability=0 devices This fixes an issue where setting a device to durability=0 after it's been used makes it impossible to remove. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Kent Overstreet	9a64e1bfd8	bcachefs: Fix GFP_KERNEL allocation in break_cycle() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Linus Torvalds	ff9bce3d06	bcachefs fixes for 6.10-rc2 - two downgrade fixes - a couple snapshot deletion and repair fixes, thanks to noradtux for finding these and providing the image to debug them - a couple assert fixes - convert to folio helper, from Matthew - some improved error messages - bit of code reorganization (just moving things around); doing this while things are quiet so I'm not rebasing fixes past reorgs - don't return -EROFS on inconsistency error in recovery, this confuses util-linux and has it retry the mount - fix failure to return error on misaligned dio write; reported as an issue with coreutils shred -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZYntUACgkQE6szbY3K bnbp7hAAvMgBanBT7qq3ac+W3vtgLuIk6gXNB7eRl+QNff7bJ+BzJH4UhCGhbo5g WzzQAQ2Zta6NwxbdAcZdL91qe4QDI3ITdIeKBZYtN/C8FySOeEk14K+CNhfQjYgd fJP2bx4LuUnyMri1pw8ZF3L/YXMOKhzTF8jLH04etty8Sbxss+zh9Dz6LFXqvloq 3v0EmbzrgB3KH+zflJ+yxTFUO3/tNYJhZHGXD452AlJYs29bECAAzJ/5gUq43CqQ /q+omBqqqf7oJZ84dHIu2piZrUhUJqotLdcIkzlkxDg+hN/BPeY4hv+dw5GNffz7 hgD6ieWm+0PQrf2WSBGRy7l3DglrwknUgrFSb8PlUAbOsg0TNsN7qjW6LVZSWMZ/ tBWiUQ95VYtlP8KzwLrIZ+BcP/Jm0X5hIAxui0Diz+exh7onDiY7Gxsp8/r0krYI x0s7uLhl73Jb/TO3pX9BS6U+Y0bUu0GJb+TThOLNX961Vg900BmpZvLave6y3U0i E09JRetWGK50wgPPvNt7M+s8lhs0Jg+Q+AuHAUd3x8eb1NSMibAvYGzV4oVpElrT YAP7vrJSgVdCCpI6qqCt+SgxatNUCSa/sHraJz2XeVGFyE6iLlXylBHabxKPn5P2 d8jyJ9cEHzumx6tHjLgm09UvoCBg00+ameiNOpjNKbPw6iJXfuw= =HDxx -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Assorted odds and ends... - two downgrade fixes - a couple snapshot deletion and repair fixes, thanks to noradtux for finding these and providing the image to debug them - a couple assert fixes - convert to folio helper, from Matthew - some improved error messages - bit of code reorganization (just moving things around); doing this while things are quiet so I'm not rebasing fixes past reorgs - don't return -EROFS on inconsistency error in recovery, this confuses util-linux and has it retry the mount - fix failure to return error on misaligned dio write; reported as an issue with coreutils shred" * tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs: (21 commits) bcachefs: Fix failure to return error on misaligned dio write bcachefs: Don't return -EROFS from mount on inconsistency error bcachefs: Fix uninitialized var warning bcachefs: Split out sb-errors_format.h bcachefs: Split out journal_seq_blacklist_format.h bcachefs: Split out replicas_format.h bcachefs: Split out disk_groups_format.h bcachefs: split out sb-downgrade_format.h bcachefs: split out sb-members_format.h bcachefs: Better fsck error message for key version bcachefs: btree_gc can now handle unknown btrees bcachefs: add missing MODULE_DESCRIPTION() bcachefs: Fix setting of downgrade recovery passes/errors bcachefs: Run check_key_has_snapshot in snapshot_delete_keys() bcachefs: Refactor delete_dead_snapshots() bcachefs: Fix locking assert bcachefs: Fix lookup_first_inode() when inode_generations are present bcachefs: Plumb bkey into __btree_err() bcachefs: Use copy_folio_from_iter_atomic() bcachefs: Fix sb-downgrade validation ...	2024-05-31 11:45:41 -07:00
Kent Overstreet	7b038b564b	bcachefs: Fix failure to return error on misaligned dio write This was reported as an error when running coreutils shred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-29 16:40:30 -04:00
Kent Overstreet	83208cbf2f	bcachefs: Don't return -EROFS from mount on inconsistency error We were accidentally returning -EROFS during recovery on filesystem inconsistency - since this is what the journal returns on emergency shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 19:23:03 -04:00
Kent Overstreet	8528bde1b6	bcachefs: Fix uninitialized var warning Can't actually be used uninitialized, but gcc was being silly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 18:21:51 -04:00
Kent Overstreet	759bb4eabc	bcachefs: Split out sb-errors_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:33:45 -04:00
Kent Overstreet	5c16c57488	bcachefs: Split out journal_seq_blacklist_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	24998050b6	bcachefs: Split out replicas_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	1cdcc6e3c2	bcachefs: Split out disk_groups_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	4c5eef0c50	bcachefs: split out sb-downgrade_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	016c22e410	bcachefs: split out sb-members_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	f1d4fed13f	bcachefs: Better fsck error message for key version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	088d0de812	bcachefs: btree_gc can now handle unknown btrees Compatibility fix - we no longer have a separate table for which order gc walks btrees in, and special case the stripes btree directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Jeff Johnson	b4131076c1	bcachefs: add missing MODULE_DESCRIPTION() Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/mean_and_variance_test.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	247c056bde	bcachefs: Fix setting of downgrade recovery passes/errors bch2_check_version_downgrade() was setting c->sb.version, which bch2_sb_set_downgrade() expects to be at the previous version; and it shouldn't even have been set directly because c->sb.version is updated by write_super(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	08f50005e0	bcachefs: Run check_key_has_snapshot in snapshot_delete_keys() delete_dead_snapshots now runs before the main fsck.c passes which check for keys for invalid snapshots; thus, it needs those checks as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	82af5ceb5d	bcachefs: Refactor delete_dead_snapshots() Consolidate per-key work into delete_dead_snapshots_process_key(), so we now walk all keys once, not twice. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	218e5e0c2a	bcachefs: Fix locking assert We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	9e1a66e668	bcachefs: Fix lookup_first_inode() when inode_generations are present This function is used for finding the hash seed (which is the same in all versions of an inode in different snapshots): ff an inode has been deleted in a child snapshot we need to iterate until we find a live version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	1292bc2ebf	bcachefs: Plumb bkey into __btree_err() It can be useful to know the exact byte offset within a btree node where an error occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:23 -04:00
Matthew Wilcox (Oracle)	b82b6eeefd	bcachefs: Use copy_folio_from_iter_atomic() copy_page_from_iter_atomic() will be removed at some point. Also fixup a comment for folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 22:30:09 -04:00
Kent Overstreet	9242a34b76	bcachefs: Fix sb-downgrade validation Superblock downgrade entries are only two byte aligned, but section sizes are 8 byte aligned, which means we have to be careful about overrun checks; an entry that crosses the end of the section is allowed (and ignored) as long as it has zero errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 12:44:12 -04:00
Kent Overstreet	d509cadc3a	bcachefs: Fix debug assert Reported-by: syzbot+a8074a75b8d73328751e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 12:40:30 -04:00
Linus Torvalds	c40b1994b9	bcachefs fixes for 6.10-rc1 Just a few syzbot fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZQk0cACgkQE6szbY3K bna7gA/+MSY3I95CwaJ4bBq5SCxOaRcrX099LFh8Zrj+OF+DWE2PtVo1LhhgnYrQ KpZrS2Q9Qgb2yVqYzOY6LBfH4il1O/WwvloMG0MbuYiQFu9/JL/6CEK9uFyiGmaC fdiFEN3u+8AK6phTFaqUU2ncG0XFQ1Ple5zmFXo4Y3ZJeNaubJeEDac+kbRvOwYh rQ6Iy0FNoQymv0BzmuM7g2NsbhdAgHTv7rhGbfpNBZv3lu0yDXsfZZgWTr2oXMSP FMhm4bcTGAFp5hbwq9k56ND8oSFpamsH7SwS4bDlEe1CNOfMI1JjnrvSEuDrocAE 1Jn2J2Gv9NXnEHKamVzzpUILG67buEtYzJyDQk51N4kulgThdpRzjm+11ylD5U0U wzIK1HXsKHtRdUiIhQGLCLW61FXM+0QBIk2eXhPq88jsM2zTL7iMbXR3P/nvgUDy 8ia8g5Q+nKxcb223M8WmK0rBwlaNasE/hXiFT54ntt8bK5nmVJjPMxVXUmYth3hw 7STkuT0k5jVsMG1NqLkg+wSupj1AuWbD2hIcas7GkxarEYAULbQcClHYGpMll3Tw +pJfLjAtBOkcE4TwWDLflVBhwWtdmPNhk51Q3iLVRp0Gm7t0rhE2vE6TjpsIFnrg rUAgaqQqQ2WXfsRaGa2wx0tRKoW+8Iigq13ndn1AZIrfEtQkYUs= =vuNC -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Nothing exciting, just syzbot fixes (except for the one FMODE_CAN_ODIRECT patch). Looks like syzbot reports have slowed down; this is all catch up from two weeks of conferences. Next hardening project is using Thomas's error injection tooling to torture test repair" * tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix race path in bch2_inode_insert() bcachefs: Ensure we're RW before journalling bcachefs: Fix shutdown ordering bcachefs: Fix unsafety in bch2_dirent_name_bytes() bcachefs: Fix stack oob in __bch2_encrypt_bio() bcachefs: Fix btree_trans leak in bch2_readahead() bcachefs: Fix bogus verify_replicas_entry() assert bcachefs: Check for subvolues with bogus snapshot/inode fields bcachefs: bch2_checksum() returns 0 for unknown checksum type bcachefs: Fix bch2_alloc_ciphers() bcachefs: Add missing guard in bch2_snapshot_has_children() bcachefs: Fix missing parens in drop_locks_do() bcachefs: Improve bch2_assert_pos_locked() bcachefs: Fix shift overflows in replicas.c bcachefs: Fix shift overflow in btree_lost_data() bcachefs: Fix ref in trans_mark_dev_sbs() error path bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method bcachefs: Fix rcu splat in check_fix_ptrs()	2024-05-24 09:07:22 -07:00
Kent Overstreet	d93ff5fa40	bcachefs: Fix race path in bch2_inode_insert() __destroy_new_inode() is appropriate when we have _just_allocated the inode, but not when it's been fully initialized and on i_sb_list. Reported-by: syzbot+a0ddc9873c280a4cb18f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 20:37:47 -04:00
Kent Overstreet	cd3b31f9d4	bcachefs: Ensure we're RW before journalling Reported-by: syzbot+c60cd352aedb109528bf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 20:17:33 -04:00
Steven Rostedt (Google)	2c92ca849f	tracing/treewide: Remove second parameter of __assign_str() With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str \| while read a ; do sed -e 's/$__assign_str([^,][^ ,]$ ,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>	2024-05-22 20:14:47 -04:00
Kent Overstreet	d293ece108	bcachefs: Fix shutdown ordering the btree key cache uses the srcu struct created/destroyed by btree_iter.c; btree_iter needs to be exited last. Reported-by: syzbot+3af9daea347788b15213@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:54:03 -04:00
Kent Overstreet	2195b755eb	bcachefs: Fix unsafety in bch2_dirent_name_bytes() Reported-by: syzbot+84fa6fb8c7f98b93cdea@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:14:36 -04:00
Kent Overstreet	2ba24864d2	bcachefs: Fix stack oob in __bch2_encrypt_bio() Reported-by: syzbot+fff6b0fb00259873576a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:01:17 -04:00
Kent Overstreet	70dd062e27	bcachefs: Fix btree_trans leak in bch2_readahead() Reported-by: syzbot+d797fe78808e968d6c84@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:01:17 -04:00
Kent Overstreet	5fa421448d	bcachefs: Fix bogus verify_replicas_entry() assert verify_replicas_entry() is only for newly created replicas entries - existing entries on disk may have unknown data types, and we have real verifiers for them. Reported-by: syzbot+73414091bd382684ee2b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 19:01:17 -04:00
Linus Torvalds	38da32ee70	bd_inode series Replacement of bdev->bd_inode with sane(r) set of primitives. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwjlgAKCRBZ7Krx/gZQ 66OmAP9nhZLASn/iM2+979I6O0GW+vid+uLh48uW3d+LbsmVIgD9GYpR+cuLQ/xj mJESWfYKOVSpFFSrqlzKg9PQlU/GFgs= =6LRp -----END PGP SIGNATURE----- Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bdev bd_inode updates from Al Viro: "Replacement of bdev->bd_inode with sane(r) set of primitives by me and Yu Kuai" * tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RIP ->bd_inode dasd_format(): killing the last remaining user of ->bd_inode nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode block/bdev.c: use the knowledge of inode/bdev coallocation gfs2: more obvious initializations of mapping->host fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here... grow_dev_folio(): we only want ->bd_inode->i_mapping there use ->bd_mapping instead of ->bd_inode->i_mapping block_device: add a pointer to struct address_space (page cache of bdev) missing helpers: bdev_unhash(), bdev_drop() block: move two helpers into bdev.c block2mtd: prevent direct access of bd_inode dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) blkdev_write_iter(): saner way to get inode and bdev bcachefs: remove dead function bdev_sectors() ext4: remove block_device_ejected() erofs_buf: store address_space instead of inode erofs: switch erofs_bread() to passing offset instead of block number	2024-05-21 09:51:42 -07:00
Kent Overstreet	765b8cb8ac	bcachefs: Check for subvolues with bogus snapshot/inode fields This fixes an assertion pop in btree_iter.c that checks for forgetting to pass a snapshot ID when iterating over snapshots btrees. Reported-by: syzbot+0dfe05235e38653e2aee@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	6b74fdcc8e	bcachefs: bch2_checksum() returns 0 for unknown checksum type This fixes missing guards on trying to calculate a checksum with an invalid/unknown checksum type; moving the guards up to e.g. btree_io.c might be "more correct", but doesn't buy us anything - an unknown checksum type will always be flagged as at least a checksum error so we aren't losing any safety doing it this way and it makes it less likely to accidentally pop an assert we don't want. Reported-by: syzbot+e951ad5349f3a34a715a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	c06a8b7567	bcachefs: Fix bch2_alloc_ciphers() Don't put error pointers in bch_fs, that's gross. This fixes (?) the check in bch2_checksum_type_valid() - depending on our error paths, or depending on what our error paths are doing it at least makes the code saner. Reported-by: syzbot+2e3cb81b5d1fe18a374b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	6d48e61364	bcachefs: Add missing guard in bch2_snapshot_has_children() We additionally need to be going inconsistent if passed an invalid snapshot ID; that patch will need more thorough testing. Reported-by: syzbot+1c9fca23fe478633b305@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	6ce26ad376	bcachefs: Fix missing parens in drop_locks_do() Reported-by: syzbot+95db43b0a06f157ee865@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	25989f4a9b	bcachefs: Improve bch2_assert_pos_locked() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	bcfbaea8e5	bcachefs: Fix shift overflows in replicas.c We can't disallow unknown data_types in verify() - we have to preserve them unchanged for backwards compat; that means we have to add a few more guards. Reported-by: syzbot+249018ea545364f78d04@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	f108ddd467	bcachefs: Fix shift overflow in btree_lost_data() Reported-by: syzbot+29f65db1a5fe427b5c56@syzkaller.appspotmail.com Fixes: `55936afe11` ("bcachefs: Flag btrees with missing data") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	9667214b30	bcachefs: Fix ref in trans_mark_dev_sbs() error path Reported-by: syzbot+5c7f715a7107a608a544@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Youling Tang	54429c902a	bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method Since commit `a2ad63daa8` ("VFS: add FMODE_CAN_ODIRECT file flag") file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Do that for bcachefs so that noop_direct_IO can eventually be removed. Similar to commit `b294349993` ("xfs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Kent Overstreet	427ba55503	bcachefs: Fix rcu splat in check_fix_ptrs() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-20 05:37:26 -04:00
Linus Torvalds	16dbfae867	bcachefs changes for 6.10-rc1 - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZKJQgACgkQE6szbY3K bnZETg//SU9H0OHnBSMB/cteF6PKo9QR+dhT+3n+gWTxl0o/egbGTqwbzVqGtd2f J6II1BsDk8VoTOb/gFfLRShlmJfnj2jpRThU265faR/7LQYeSaqndDPkjOpTayAD Nj/DJyiSUTL753rZh3yUhOpOIHf7iapH6wuaZCPfhdfk+yvZNW8iz07JHjHLKRp8 I2cFH0r6kN916NdRkt9oDCz68WouT8eWTqwcKra04XsLEZjNJHxLpKMq4M8UdPc7 YynJPVt+aP8+VduGIq6pV8Co3afCP2oUywo11JpRmvLsw4tex/59wxOYtpMfgn6k 4H+9WqiBwkbmnLDrfFHWRameS6F/7+GRAOVuz9nkmfk61UPU15gLjSRffqZ6u2YC 7vbrXgebId/sZXtBpQd83RMMX52BnEJah0upNJ54IsSqfDYkU9lwl6CEyYpcX1hf YNBGBTbspZztc3AB13b3ow421FMhaySUg0FDmntMR9O8Z6/BXk7Ykc7b8DPEfrFs W6JY7q+ARBxr+EgFcV74fvMCf7NJTAhyv80AKryo7NFU2JZOyyaTxcTGSnolX4Mi lyHiOgicmOX+vy3vbC1dZoDcmIDJ4Uc0vixYcpKiZqxlR8XJ+wpevC50TEhxrcW+ ZO4SloQvgyjI34xu/gZgjRYb3BhXK3x+ougVFpRG8V8zQ/+ccWg= =MKrF -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ...	2024-05-19 13:45:48 -07:00
Linus Torvalds	1b0aabcc9a	vfs-6.10.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZj3HuwAKCRCRxhvAZXjc orYvAQCZOr68uJaEaXAArYTdnMdQ6HIzG+FVlwrqtrhz0BV07wEAqgmtSR9XKh+L 0+DNepg4R8PZOHH371eSSsLNRCUCkAs= =SVsU -----END PGP SIGNATURE----- Merge tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_ bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...	2024-05-13 11:40:06 -07:00
Thomas Bertschinger	07f9a27f19	bcachefs: add no_invalid_checks flag Setting this flag on a filesystem results in validity checks being skipped when writing bkeys. This flag will be used by tooling that deliberately injects corruption into a filesystem in order to exercise fsck. It shouldn't be set outside of testing/debugging code. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:24:30 -04:00
Daniel Hill	bceacfa97e	bcachefs: add counters for failed shrinker reclaim This adds distinct counters for every reason the btree node shrinker can fail to free an object - if our shrinker isn't making progress, this will tell us why. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:24:29 -04:00
Kent Overstreet	692aa7a54b	bcachefs: Fix sb_field_downgrade validation - bch2_sb_downgrade_validate() wasn't checking for a downgrade entry extending past the end of the superblock section - for_each_downgrade_entry() is used in to_text() and needs to work on malformed input; it also was missing a check for a field extending past the end of the section Reported-by: syzbot+e49ccab73449180bc9be@syzkaller.appspotmail.com Fixes: `84f1638795` ("bcachefs: bch_sb_field_downgrade") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	a5c3e265d3	bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	65eaf4e24a	bcachefs: s/bkey_invalid_flags/bch_validate_flags We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	d09a8468d9	bcachefs: fsync() should not return -EROFS fsync has a slightly odd usage of -EROFS, where it means "does not support fsync". I didn't choose it... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	99179fb898	bcachefs: Invalid devices are now checked for by fsck, not .invalid methods Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	2f4b4a3b44	bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	02b7fa4fe5	bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	2c91ab7262	bcachefs: bch2_dev_get_ioref() checks for device not present Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	465bf6f42a	bcachefs: bch2_dev_get_ioref2(); io_read.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	91ffdecfc7	bcachefs: bch2_dev_get_ioref2(); debug.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	6212ea2497	bcachefs: bch2_dev_get_ioref2(); journal_io.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	48af853925	bcachefs: bch2_dev_get_ioref2(); io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	690f7cdf73	bcachefs: bch2_dev_get_ioref2(); btree_io.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	466298e2f6	bcachefs: bch2_dev_get_ioref2(); backpointers.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	0e57996c69	bcachefs: bch2_dev_get_ioref2(); alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	b6fb4269e7	bcachefs: for_each_bset() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:34 -04:00
Petr Vorel	e2f48c4809	bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	e11ecc6133	bcachefs: Improve sysfs internal/btree_cache Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	c670509134	bcachefs: Allocator prefers not to expand mi.btree_allocated bitmap We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	40574946b8	bcachefs: Better bucket alloc tracepoints Tracepoints are garbage, and perf trace even cuts off some of our fields. Much nicer to just trace a string, and then we can build nicely formatted output with printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	6202569777	bcachefs: Move nocow unlock to bch2_write_endio() This fixes a lifetime issue; bch2_nocow_write_unlock() uses PTR_BUCKET_POS(), which needs the device - but we drop our ref to the device in bch2_write_endio(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	b6d29b5869	bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	f4301b635a	bcachefs: kill bch2_dev_bkey_exists() in discard_one_bucket_fast() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	bc3204c80a	bcachefs: kill bch2_dev_bkey_exists() in check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	d8585a79be	bcachefs: bch2_dev_have_ref() bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that we're looking up a device without checking if it's present because we have a reference to it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	222eacabc1	bcachefs: kill bch2_dev_bkey_exists() in data_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	a9422fd404	bcachefs: kill bch2_dev_bkey_exists() in bkey_pick_read_device() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	db39a35dde	bcachefs: pass bch_dev to read_from_stale_dirty_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	78e9b548f3	bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	ad897d241b	bcachefs: kill bch2_dev_bkey_exists() in btree_gc.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	9cadb4ea56	bcachefs: bch2_extent_normalize() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8e3cc2003f	bcachefs: bch2_bkey_has_target() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8feecbed24	bcachefs: extent_ptr_invalid() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	3858aa4268	bcachefs: ptr_stale() -> dev_ptr_stale() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	302c980a81	bcachefs: extent_ptr_durability() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	3793b3f91f	bcachefs: bch2_extent_merge() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	c387d84413	bcachefs: ec_validate_checksums() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8783856ab1	bcachefs: ob_dev() Wrapper around bch2_dev_have_ref() for open_buckets; we do guarantee that the device an open_bucket points to exists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	dbd0408087	bcachefs: move replica_set from bch_dev to bch_fs This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	633cf06944	bcachefs: Kill bch2_dev_bkey_exists() in backpointer code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	1f2f92ec3f	bcachefs: PTR_BUCKET_POS() now takes bch_dev Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	fa6cce09f0	bcachefs: bch2_dev_iterate() New helper for getting refs to devices as we iterate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	cb4d340a10	bcachefs: bch2_evacuate_bucket() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	07d7c4da7b	bcachefs: bch2_bucket_ref_update() now takes bch_dev Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	a7f1c26f59	bcachefs: bch2_trigger_alloc() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	9b3059a1b3	bcachefs: bch2_check_alloc_key() -> bch2_dev_tryget_noerror() More elimination of bch2_dev_bkey_exists() usage. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	4cd91e2f87	bcachefs: Convert to bch2_dev_tryget_noerror() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b07eb8252f	bcachefs: bch2_dev_tryget() Most uses of bch2_dev_bkey_exists() are going away, where we assume that because a key references a device the device most exist - instead, we'll be explicitly checking if the device exists and getting a reference to it. This adds the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	6349b07c25	bcachefs: bch2_have_enough_devs() checks for nonexistent device Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	adf81796ee	bcachefs: journal_replay_entry_early() checks for nonexistent device Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	13a16dabde	bcachefs: bch2_dev_btree_bitmap_marked() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	267039d0fc	bcachefs: Pass device to bch2_bucket_do_index() Eliminating bch2_dev_bkey_exists() uses and replacing them with proper checks; this one was unnecessary since the caller already has it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f5faf43f85	bcachefs: Pass device to bch2_alloc_write_key() More elimating bch2_dev_bkey_exists() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	23f308ae19	bcachefs: bch2_dev_safe() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	552aa54865	bcachefs: Debug asserts for ca->ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f295298b8c	bcachefs: New helpers for device refcounts This will be used in the next patch for adding some new debug mode asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	e98786ea85	bcachefs: bch2_print_allocator_stuck() If we block on the allocator for more than 10 seconds, print out some useful debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	9a768ab75b	bcachefs: bch2_bkey_drop_ptrs() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b895c70326	bcachefs: x-macroize journal flags enums Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	3a718c0647	bcachefs: On device add, prefer unused slots We can't strictly guarantee that no pointers refer to nonexistent devices - we attempt to, but we need to be safe when the filesystem is corrupt. Therefore, change device_add to try to pick a slot that's never been used, or the slot that's been unused the longest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	ffcbec6076	bcachefs: Kill opts.buckets_nouse Now explicitly allocate and free the buckets_nouse bitmap - this is going to be used for online fsck. To go RW when we haven't check allocations, we'll do a much slimmed down version that just initializes the buckets_nouse bitmaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	abe2f470bc	bcachefs: simplify bch2_trans_start_alloc_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	0acf2169a5	bcachefs: __mark_stripe_bucket() now takes bch_alloc_v4 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	be11ae16c4	bcachefs: __mark_pointer now takes bch_alloc_v4 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00

... 3 4 5 6 7 ...

4041 Commits