linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 12:11:40 +00:00

Author	SHA1	Message	Date
Kent Overstreet	4c4a7d48bd	bcachefs: Kill replicas_journal_res More dead code deletion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	66a57684c6	bcachefs: Kill fs_usage_online More dead code deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	fe5eddc0d0	bcachefs: Kill bch2_fs_usage_to_text() Dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	8bb8d683a4	bcachefs: Delete journal-buf-sharded old style accounting More deletion of dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5b9bc272e6	bcachefs: Kill writing old accounting to journal More ripping out of the old disk space accounting. Note that the new disk space accounting is incompatible with the old, and writing out old style disk space accounting with the new code is infeasible. This means upgrading and downgrading past this version requires regenerating accounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	3afb8dbf03	bcachefs: kill bch2_fs_usage_read() With bch2_ioctl_fs_usage(), this is now dead code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	6b39638b84	bcachefs: Convert bch2_ioctl_fs_usage() to new accounting This converts bch2_ioctl_fs_usage() to read from the new disk accounting, via bch2_fs_replicas_usage_read(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	72a6bb098c	bcachefs: Kill bch2_fs_usage_initialize() Deleting code for the old disk accounting scheme. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	f5095b9f85	bcachefs: dev_usage updated by new accounting Reading disk accounting now requires an eytzinger lookup (see: bch2_accounting_mem_read()), but the per-device counters are used frequently enough that we'd like to still be able to read them with just a percpu sum, as in the old code. This patch special cases the device counters; when we update in-memory accounting we also update the old style percpu counters if it's a deice counter update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	2e8d686a4a	bcachefs: Coalesce accounting keys before journal replay This fixes a performance regression in journal replay; without colaescing accounting keys we have multiple keys at the same position, which means journal_keys_peek_upto() has to skip past many overwritten keys - turning journal replay into an O(n^2) algorithm. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	1d16c605cc	bcachefs: Disk space accounting rewrite Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	5d9667d1d6	bcachefs: btree write buffer knows how to accumulate bch_accounting keys Teach the btree write buffer how to accumulate accounting keys - instead of having the newer key overwrite the older key as we do with other updates, we need to add them together. Also, add a flag so that write buffer flush knows when journal replay is finished flushing accounting, and teach it to hold accounting keys until that flag is set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	9dec2a473b	bcachefs: Accumulate accounting keys in journal replay Until accounting keys hit the btree, they are deltas, not new versions of the existing key; this means we have to teach journal replay to accumulate them. Additionally, the journal doesn't track precisely which entries have been flushed to the btree; it only tracks a range of entries that may possibly still need to be flushed. That means we need to compare accounting keys against the version in the btree and only flush updates that are newer. There's another wrinkle with the write buffer: if the write buffer starts flushing accounting keys before journal replay has finished flushing accounting keys, journal replay will see the version number from the new updates and updates from the journal will be lost. To avoid this, journal replay has to flush accounting keys first, and we'll be adding a flag so that write buffer flush knows to hold accounting keys until then. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Kent Overstreet	2744e5c9eb	bcachefs: KEY_TYPE_accounting New key type for the disk space accounting rewrite. - Holds a variable sized array of u64s (may be more than one for accounting e.g. compressed and uncompressed size, or buckets and sectors for a given data type) - Updates are deltas, not new versions of the key: this means updates to accounting can happen via the btree write buffer, which we'll be teaching to accumulate deltas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:13 -04:00
Thomas Bertschinger	929d954330	bcachefs: use new mount API This updates bcachefs to use the new mount API: - Update the file_system_type to use the new init_fs_context() function. - Define the new fs_context_operations functions. - No longer register bch2_mount() and bch2_remount(); these are now called via the new fs_context functions. - Define a new helper type, bch2_opts_parse that includes a struct bch_opts and additionally a printbuf used to save options that can't be parsed until after the FS is opened. This enables us to parse as many options as possible prior to opening the filesystem while saving those options that need the open FS for later parsing. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	1c12d1caf8	bcachefs: Add error code to defer option parsing This introduces a new error code, option_needs_open_fs, which is used to indicate that an attempt was made to parse a mount option prior to opening a filesystem, when that mount option requires an open filesystem in order to be validated. Returning this error results in bch2_parse_one_mount_opt() saving that option for later parsing, after the filesystem is opened. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	9b7f0b5d3d	bcachefs: add printbuf arg to bch2_parse_mount_opts() Mount options that take the name of a device that may be part of a filesystem, for example "metadata_target", cannot be validated until after the filesystem has been opened. However, an attempt to parse those options may be made prior to the filesystem being opened. This change adds a printbuf parameter to bch2_parse_mount_opts() which will be used to save those mount options, when they are supplied prior to the FS being opened, so that they can be parsed later. This functionality is not currently needed, but will be used after bcachefs starts using the new mount API to parse mount options. This is because using the new mount API, we will process mount options prior to opening the FS, but the new API doesn't provide a convenient way to "replay" mount option parsing. So we save these options ourselves to accomplish this. This change also splits out the code to parse a single option into bch2_parse_one_mount_opt(), which will be useful when using the new mount API which deals with a single mount option at a time. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	7773df19c3	bcachefs: metadata version bucket_stripe_sectors New on disk format version for bch_alloc->stripe_sectors and BCH_DATA_unstriped - accounting for unstriped data in stripe buckets. Upgrade/downgrade requires regenerating alloc info - but only if erasure coding is in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	2612e29142	bcachefs: BCH_DATA_unstriped Add a new pseudo data type, to track buckets that are members of a stripe, but have unstriped data in them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	55f7962da3	bcachefs: bch_alloc->stripe_sectors Add a separate counter to bch_alloc_v4 for amount of striped data; this lets us separately track striped and unstriped data in a bucket, which lets us see when erasure coding has failed to update extents with stripe pointers, and also find buckets to continue updating if we crash mid way through creating a new stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	c13d526d9d	bcachefs: check_key_has_inode() Consolidate duplicated checks for extents/dirents/xattrs - these keys should all have a corresponding inode of the correct type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	51fc436c80	bcachefs: allow passing full device path for target options The output of mount options such as "metadata_target" in `/proc/mounts` uses the full path to the device. mount(8) from util-linux uses the output from `/proc/mounts` to pass existing mount options when performing a remount, so bcachefs should accept as input the same form that it prints as output. Without this change: $ mount -t bcachefs -o metadata_target=vdb /dev/vdb /mnt $ strace mount -o remount /mnt ... fsconfig(4, FSCONFIG_SET_STRING, "metadata_target", "/dev/vdb", 0) = -1 EINVAL (Invalid argument) ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	3811f48aa3	bcachefs: bch2_printbuf_strip_trailing_newline() Add a new helper to fix inode_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	babe30fe8d	bcachefs: don't expose "read_only" as a mount option When "read_only" is exposed as a mount option, it is redundant with the standard option "ro" and gives users multiple ways to specify that a bcachefs filesystem should be mounted read-only. This presents the risk of having inconsistent options specified. This can be seen when remounting a read-only filesystem in read-write mode, using mount(8) from util-linux. Because mount(8) parses the existing mount options from `/proc/mounts` and applies them when remounting, it can end up applying both "read_only" and "rw": $ mount img -o ro /mnt $ strace mount -o remount,rw /mnt ... fsconfig(4, FSCONFIG_SET_FLAG, "read_only", NULL, 0) = 0 fsconfig(4, FSCONFIG_SET_FLAG, "rw", NULL, 0) = 0 ... Making "read_only" no longer a mount option means this edge case cannot occur. Fixes: `62719cf33c` ("bcachefs: Fix nochanges/read_only interaction") Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Thomas Bertschinger	03ec0927fa	bcachefs: make offline fsck set read_only fs flag A subsequent change will remove "read_only" as a mount option in favor of the standard option "ro", meaning the userspace fsck command cannot pass it to the fsck ioctl. Instead, in offline fsck, set "read_only" kernel-side without trying to parse it as a mount option. For compatibility with versions of the "bcachefs fsck" command that try to pass the "read_only" mount opt, remove it from the mount options string prior to parsing when it is present. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	652bc7fabc	bcachefs: btree_ptr_sectors_written() now takes bkey_s_c this is for the userspace metadata dump tool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	9cc8eb3098	bcachefs: Check for bsets past bch_btree_ptr_v2.sectors_written Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Uros Bizjak	68573b936d	bcachefs: Use try_cmpxchg() family of functions instead of cmpxchg() Use try_cmpxchg() family of functions instead of cmpxchg (ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg() implicitly assigns old ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	e76a2b65b0	bcachefs: add might_sleep() annotations for fsck_err() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	546b65378d	bcachefs: fix missing include fs-common.h needs dirent.h for enum bch_rename_mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	630d565dda	bcachefs: Use filemap_read() to simplify the execution flow Using filemap_read() can reduce unnecessary code execution for non IOCB_DIRECT paths. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	da6fa380d3	bcachefs: Align the display format of `btrees/inodes/keys` Before patch: ``` #cat btrees/inodes/keys u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0: mode=40755 flags= (16300000) bi_size=0 ``` After patch: ``` #cat btrees/inodes/keys u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0: mode=40755 flags=(16300000) bi_size=0 ``` Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Youling Tang	12e7ff1a1e	bcachefs: Fix missing spaces in journal_entry_dev_usage_to_text Fixed missing spaces displayed in journal_entry_dev_usage_to_text while adjusting the display format to improve readability. before: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0free: buckets=233180 sectors=0 fragmented=0sb: buckets=13 sectors=6152 fragmented=504journal: buckets=1847 sectors=945664 fragmented=0btree: buckets=20 sectors=10240 fragmented=0user: buckets=1419 sectors=726513 fragmented=15cached: buckets=0 sectors=0 fragmented=0parity: buckets=0 sectors=0 fragmented=0stripe: buckets=0 sectors=0 fragmented=0need_gc_gens: buckets=0 sectors=0 fragmented=0need_discard: buckets=1 sectors=0 fragmented=0 ``` after: ``` # bcachefs list_journal -a -t alloc:1:0 /dev/sdb ... dev_usage: dev=0 free: buckets=233180 sectors=0 fragmented=0 sb: buckets=13 sectors=6152 fragmented=504 journal: buckets=1847 sectors=945664 fragmented=0 btree: buckets=20 sectors=10240 fragmented=0 user: buckets=1419 sectors=726513 fragmented=15 cached: buckets=0 sectors=0 fragmented=0 parity: buckets=0 sectors=0 fragmented=0 stripe: buckets=0 sectors=0 fragmented=0 need_gc_gens: buckets=0 sectors=0 fragmented=0 need_discard: buckets=1 sectors=0 fragmented=0 ``` Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:12 -04:00
Kent Overstreet	f369de8267	bcachefs: fix ei_update_lock lock ordering ei_update_lock is largely vestigal and will probably be removed, but we're not ready for that just yet. this fixes some lockdep splats with the new lockdep support for btree node locks; they're harmless, since we were taking ei_update_lock before actually locking any btree nodes, but "any btree nodes locked" are now tracked at the btree_trans level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	cdda2126ab	bcachefs: bch2_btree_reserve_cache_to_text() Add a pretty printer so the btree reserve cache can be seen in sysfs; as it pins open_buckets we need it for tracking down open_buckets issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	d06a26d24d	bcachefs: sysfs trigger_freelist_wakeup another debugging knob Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	a1e7a97f22	bcachefs: sysfs internal/trigger_journal_writes another debugging knob - trigger the journal to do ready journal writes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	26a170aa61	bcachefs: add capacity, reserved to fs_alloc_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	8a3c8303e2	bcachefs: uninline fallocate functions better stack traces Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	52fd0f9620	bcachefs: btree ids are 64 bit bitmasks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	3de8fd4a33	bcachefs: Print allocator stuck on timeout in fallocate path same as in io_write.c, if we're waiting on the allocator for an excessive amount of time, print what's going on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-14 19:00:11 -04:00
Kent Overstreet	1841027c7d	bcachefs: bch2_gc_btree() should not use btree_root_lock btree_root_lock is for the root keys in btree_root, not the pointers to the nodes themselves; this fixes a lock ordering issue between btree_root_lock and btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f236ea4bca	bcachefs: Set PF_MEMALLOC_NOFS when trans->locked proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	f0f3e51148	bcachefs; Use trans_unlock_long() when waiting on allocator not using unlock_long() blocks key cache reclaim, and the allocator may take awhile Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:10:55 -04:00
Kent Overstreet	aacd897d4d	Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT" This reverts commit `86d81ec5f5`. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-11 20:01:38 -04:00
Kent Overstreet	fd80d14005	bcachefs: fix scheduling while atomic in break_cycle() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 12:59:28 -04:00
Kent Overstreet	6f692b1672	bcachefs: Fix RCU splat Reported-by: syzbot+e74fea078710bbca6f4b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 12:46:22 -04:00
Kent Overstreet	7d7f71cd87	bcachefs: Add missing bch2_trans_begin() this fixes a 'transaction should be locked' error in backpointers fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0f6f8f7693	bcachefs: Fix missing error check in journal_entry_btree_keys_validate() Closes: https://syzkaller.appspot.com/bug?extid=8996d8f176cf946ef641 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	f49d2c9835	bcachefs: Warn on attempting a move with no replicas Instead of popping an assert in bch2_write(), WARN and print out some debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	ad8b68cd39	bcachefs: bch2_data_update_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0f1f7324da	bcachefs: Log mount failure error code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	8ed58789fc	bcachefs: Fix undefined behaviour in eytzinger1_first() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Youling Tang	86d81ec5f5	bcachefs: Mark bch_inode_info as SLAB_ACCOUNT After commit `230e9fc286` ("slab: add SLAB_ACCOUNT flag"), we need to mark the inode cache as SLAB_ACCOUNT, similar to commit `5d097056c9` ("kmemcg: account for certain kmem allocations to memcg") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	b02f973e67	bcachefs: Fix bch2_inode_insert() race path for tmpfiles Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	0435773239	bcachefs: Fix journal getting stuck on a flush commit silly race Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-10 09:53:39 -04:00
Kent Overstreet	a2d23f3d91	bcachefs: io clock: run timer fns under clock lock We don't have a way to flush a timer that's executing the callback, and this is simple and limited enough in scope that we can just use the lock instead. Needed for the next patch that adds direct wakeups from the allocator to copygc, where we're now more frequently calling io_timer_del() on an expiring timer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-01 22:56:28 -04:00
Kent Overstreet	b5cbb42dc5	bcachefs: Repair fragmentation_lru in alloc_write_key() fragmentation_lru derives from dirty_sectors, and wasn't being checked. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:37:13 -04:00
Kent Overstreet	d39881d2da	bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref() We need to make sure we're not missing any fragmenation entries in the LRU BTREE after repairing ALLOC BTREE Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was only working without it before since bucket invalidation (usually) wasn't happening while fsck was running. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:35:18 -04:00
Kent Overstreet	92e1c29ae8	bcachefs: bch2_btree_write_buffer_maybe_flush() Add a new helper for checking references to write buffer btrees, where we need a flush before we definitively know we have an inconsistency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:34:52 -04:00
Kent Overstreet	ef05bdf5d6	bcachefs: Add missing printbuf_tabstops_reset() calls Fixes warnings from bch2_print_allocator_stuck() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-29 18:14:18 -04:00
Kent Overstreet	67c564111f	bcachefs: Fix loop restart in bch2_btree_transactions_read() Accidental infinite loop; also fix btree_deadlock_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 21:08:48 -04:00
Kent Overstreet	1539bdf516	bcachefs: Fix bch2_read_retry_nodecode() BCH_READ_NODECODE mode - used by the move paths - really wants to use only the original rbio, but the retry path really wants to clone - oof. Make sure to copy the crc of the pointer we read from back to the original rbio, or we'll see spurious checksum errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 20:47:04 -04:00
Kent Overstreet	44ec599035	bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs On a new filesystem or device we have to allocate the journal with a bump allocator, because allocation info isn't ready yet - but when hot-adding a device that doesn't have a journal, we don't want to use that path. Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:47:31 -04:00
Kent Overstreet	a0bd30e4ea	bcachefs: Fix shift greater than integer size Reported-by: syzbot+e5292b50f1957164a4b6@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:42:22 -04:00
Kent Overstreet	600b8be5e7	bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 19:16:41 -04:00
Kent Overstreet	84db600016	bcachefs: Delete old faulty bch2_trans_unlock() call the unlock is now in read_extent, this fixes an assertion pop in read_from_stale_dirty_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 13:30:13 -04:00
Kent Overstreet	759b2e800f	bcachefs: Switch online_reserved shutdown assert to WARN() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-28 11:06:31 -04:00
Pei Li	64cd7de998	bcachefs: Fix kmalloc bug in __snapshot_t_mut When allocating too huge a snapshot table, we should fail gracefully in __snapshot_t_mut() instead of fail in kmalloc(). Reported-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1 Tested-by: syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 20:51:14 -04:00
Kent Overstreet	64ee1431cc	bcachefs: Discard, invalidate workers are now per device There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 18:47:55 -04:00
Pei Li	472237b69d	bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc This series fix the shift-out-of-bounds issue in bch2_blacklist_entries_gc(). Instead of passing 0 to eytzinger0_first() when iterating the entries, we explicitly check 0 and initialize i to be 0. syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+835d255ad6bc7f29ee12@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=835d255ad6bc7f29ee12 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 17:53:31 -04:00
Pei Li	211c581de2	bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu Acquire fsck_error_counts_lock before accessing the critical section protected by this lock. syzbot has tested the proposed patch and the reproducer did not trigger any issue. Reported-by: syzbot+a2bc0e838efd7663f4d9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a2bc0e838efd7663f4d9 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-25 17:51:26 -04:00
Kuan-Wei Chiu	1fcce6b8a7	bcachefs: remove heap-related macros and switch to generic min_heap Drop the heap-related macros from bcachefs and replacing them with the generic min_heap implementation from include/linux. By doing so, code readability is improved by using functions instead of macros. Moreover, the min_heap implementation in include/linux adopts a bottom-up variation compared to the textbook version currently used in bcachefs. This bottom-up variation allows for approximately 50% reduction in the number of comparison operations during heap siftdown, without changing the number of swaps, thus making it more efficient. [visitorckw@gmail.com: fix missing assignment of minimum element] Link: https://lkml.kernel.org/r/20240602174828.1955320-1-visitorckw@gmail.com Link: https://lkml.kernel.org/ioyfizrzq7w7mjrqcadtzsfgpuntowtjdw5pgn4qhvsdp4mqqg@nrlek5vmisbu Link: https://lkml.kernel.org/r/20240524152958.919343-17-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-06-24 22:25:00 -07:00
Kuan-Wei Chiu	fd60f7fe69	bcachefs: fix typo Replace 'utiility' with 'utility'. Link: https://lkml.kernel.org/r/20240524152958.919343-4-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-06-24 22:24:57 -07:00
Kent Overstreet	89d21b69b4	bcachefs: Add missing bch2_journal_do_writes() call This fixes a rare deadlock when we're doing an emergency shutdown due to failure to do a journal write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 12:55:32 -04:00
Kent Overstreet	d6b52f6828	bcachefs: Fix null ptr deref in journal_pins_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 12:07:07 -04:00
Kent Overstreet	36da8e387b	bcachefs: Add missing recalc_capacity() call This fixes filesystem size not changing on device removal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 10:12:51 -04:00
Kent Overstreet	1aaf5cb41b	bcachefs: Fix btree_trans list ordering The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	de611ab6fc	bcachefs: Fix race between trans_put() and btree_transactions_read() debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	18e92841e8	bcachefs: Make btree_deadlock_to_text() clearer btree_deadlock_to_text() searches the list of btree transactions to find a deadlock - when it finds one it's done; it's not like other *_read() functions that's printing each object. Factor out btree_deadlock_to_text() to make this clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	f44cc269a1	bcachefs: fix seqmutex_relock() We were grabbing the sequence number before unlock incremented it - fix this by moving the increment to seqmutex_lock() (so the seqmutex_relock() failure path skips the mutex_trylock()), and returning the sequence number from unlock(), to make the API simpler and safer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-23 00:57:21 -04:00
Kent Overstreet	9bd01500e4	bcachefs: Fix freeing of error pointers This fixes incorrect/missign checking of strndup_user() returns. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-22 17:22:24 -04:00
Youling Tang	bd4da0462e	bcachefs: Move the ei_flags setting to after initialization `inode->ei_flags` setting and cleaning should be done after initialization, otherwise the operation is invalid. Fixes: `9ca4853b98` ("bcachefs: Fix quota support for snapshots") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	2fe79ce7d1	bcachefs: Fix a UAF after write_super() write_super() may reallocate the superblock buffer - but bch_sb_field_ext was referencing it; don't use it after the write_super call. Reported-by: syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	e6b3a655ac	bcachefs: Use bch2_print_string_as_lines for long err printk strings get truncated to 1024 bytes; if we have a long error message (journal debug info) we need to use a helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	dd9086487c	bcachefs: Fix I_NEW warning in race path in bch2_inode_insert() discard_new_inode() is the correct interface for tearing down an indoe that was fully created but not made visible to other threads, but it expects I_NEW to be set, which we don't use. Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	504794067f	bcachefs: Replace bare EEXIST with private error codes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Kent Overstreet	f648b6c12b	bcachefs: Fix missing alloc_data_type_set() Incorrect bucket state transition in the discard path; when incrementing a bucket's generation number that had already been discarded, we were forgetting to check if it should be need_gc_gens, not free. This was caught by the .invalid checks in the transaction commit path, causing us to go emergency read only. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-21 10:17:07 -04:00
Youling Tang	c6cab97cdf	bcachefs: fix alignment of VMA for memory mapped files on THP With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs for read-only mmapped files, such as shared libraries. However, the kernel makes no attempt to actually align those mappings on 2MB boundaries, which makes it impossible to use those THPs most of the time. This issue applies to general file mapping THP as well as existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily fixed by using thp_get_unmapped_area for the unmapped_area function in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use. Similar to commit `b0c582233a` ("btrfs: fix alignment of VMA for memory mapped files on THP"). Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-20 09:14:58 -04:00
Kent Overstreet	33dfafa902	bcachefs: Fix safe errors by default i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-20 09:13:09 -04:00
Kent Overstreet	a56da69799	bcachefs: Fix bch2_trans_put() reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:34:18 -04:00
Kent Overstreet	0a2a507d40	bcachefs: set_worker_desc() for delete_dead_snapshots this is long running - help users see what's going on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	ddd118ab45	bcachefs: Fix bch2_sb_downgrade_update() Missing enum conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	2e9940d4a1	bcachefs: Handle cached data LRU wraparound We only have 48 bits for the LRU time field, which is insufficient to prevent wraparound. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	cff07e2739	bcachefs: Guard against overflowing LRU_TIME_BITS LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by: syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	1ba44217f8	bcachefs: delete_dead_snapshots() doesn't need to go RW We've been moving away from going RW lazily; if we want to go RW we do that in set_may_go_rw(), and if we didn't go RW we don't need to delete dead snapshots. Reported-by: syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	dbf4d79b7f	bcachefs: Fix early init error path in journal code We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by: syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:24 -04:00
Kent Overstreet	9e7cfb35e2	bcachefs: Check for invalid btree IDs We can only handle btree IDs up to 62, since the btree id (plus the type for interior btree nodes) has to fit ito a 64 bit bitmask - check for invalid ones to avoid invalid shifts later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	e3fd3faa45	bcachefs: Fix btree ID bitmasks these should be 64 bit bitmasks, not 32 bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	d406545613	bcachefs: Fix shift overflow in read_one_super() Reported-by: syzbot+9f74cb4006b83e2a3df1@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	3727ca5604	bcachefs: Fix a locking bug in the do_discard_fast() path We can't discard a bucket while it's still open; this needs the bucket_is_open_safe() version, which takes the open_buckets lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	d47df4f616	bcachefs: Fix array-index-out-of-bounds We use 0 size arrays as markers, but ubsan doesn't know that - cast them to a pointer to fix the splat. Also, make sure this code gets tested a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Kent Overstreet	f770a6e9a3	bcachefs: Fix initialization order for srcu barrier btree_iter_init() needs to happen before key_cache_init(), to initialize btree_trans_barrier Reported-by: syzbot+3cca837c2183f8f6fcaf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-19 18:27:23 -04:00
Mateusz Guzik	267574dee6	bcachefs: remove now spurious i_state initialization inode_init_always started setting the field to 0. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240611120626.513952-5-mjguzik@gmail.com Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-13 13:40:45 +02:00
Kent Overstreet	f2736b9c79	bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-11 18:59:08 -04:00
Kent Overstreet	7124a8982b	bcachefs: Add missing bch_inode_info.ei_flags init Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 20:50:14 -04:00
Kent Overstreet	b799220092	bcachefs: Add missing synchronize_srcu_expedited() call when shutting down We use the polling interface to srcu for tracking pending frees; when shutting down we don't need to wait for an srcu barrier to free them, but SRCU still gets confused if we shutdown with an outstanding grace period. Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9432e90df1	bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket() Turn more asserts into proper recoverable error paths. Reported-by: syzbot+246b47da27f8e7e7d6fb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9c4acd19bb	bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks The bucket_gens array and gc_buckets array known their own size; we should be using those members, and returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	e0cb5722e1	bcachefs: Fix snapshot_create_lock lock ordering ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted ------------------------------------------------------ fio/1345 is trying to acquire lock: ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0 but task is already holding lock: ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}: down_write+0x3d/0xd0 bch2_write_iter+0x1c0/0x10f0 vfs_write+0x24a/0x560 __x64_sys_pwrite64+0x77/0xb0 x64_sys_call+0x17e5/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (sb_writers#10){.+.+}-{0:0}: mnt_want_write+0x4a/0x1d0 filename_create+0x69/0x1a0 user_path_create+0x38/0x50 bch2_fs_file_ioctl+0x315/0xbf0 __x64_sys_ioctl+0x297/0xaf0 x64_sys_call+0x10cb/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&c->snapshot_create_lock){++++}-{3:3}: __lock_acquire+0x1445/0x25b0 lock_acquire+0xbd/0x2b0 down_read+0x40/0x180 bch2_truncate+0x76/0xf0 bchfs_truncate+0x240/0x3f0 bch2_setattr+0x7b/0xb0 notify_change+0x322/0x4b0 do_truncate+0x8b/0xc0 do_ftruncate+0x110/0x270 __x64_sys_ftruncate+0x43/0x80 x64_sys_call+0x1373/0x1ab0 do_syscall_64+0x68/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 other info that might help us debug this: Chain exists of: &c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#13); lock(sb_writers#10); lock(&sb->s_type->i_mutex_key#13); rlock(&c->snapshot_create_lock); * DEADLOCK * Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	f9035b0ce6	bcachefs: Fix refcount leak in check_fix_ptrs() fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so that our error label can drop our device ref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	bf2b356afd	bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	2760bfe388	bcachefs: Fix reporting of freed objects from key cache shrinker We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	9ac3e660ca	bcachefs: set sb->s_shrinker->seeks = 0 inodes and dentries are still present in the btree node cache, in much more compact form Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	bc65e98e68	bcachefs: increase key cache shrinker batch size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	5ae67abcdf	bcachefs: Enable automatic shrinking for rhashtables Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Hongbo Li	26447d224a	bcachefs: fix the display format for show-super There are three keys displayed in non-uniform format. Let's fix them. [Before] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` [After] ``` Label: testbcachefs Version: 1.9: (unknown version) Version upgrade complete: 0.0: (unknown version) ``` Fixes: `7423330e30` ("bcachefs: prt_printf() now respects \r\n\t") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	dab1870439	bcachefs: fix stack frame size in fsck.c fsck.c always runs top of the stack so we're not too concerned here; noinline_for_stack is sufficient Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	04f635ede8	bcachefs: Delete incorrect BTREE_ID_NR assertion for forwards compat we now explicitly allow mounting and using filesystems with unknown btrees, and we have to walk them for fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:16 -04:00
Kent Overstreet	1c8cc24eef	bcachefs: Fix incorrect error handling found_btree_node_is_readable() error handling here is slightly odd, which is why we were accidently calling evict() on an error pointer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	161f73c2c7	bcachefs: Split out btree_write_submit_wq Split the workqueues for btree read completions and btree write submissions; we don't want concurrency control on btree read completions, but we do want concurrency control on write submissions, else blocking in submit_bio() will cause a ton of kworkers to be allocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-10 13:17:15 -04:00
Kent Overstreet	319fef29e9	bcachefs: Fix trans->locked assert in bch2_move_data_btree, we might start with the trans unlocked from a previous loop iteration - we need a trans_begin() before iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Kent Overstreet	fdccb24352	bcachefs: Rereplicate now moves data off of durability=0 devices This fixes an issue where setting a device to durability=0 after it's been used makes it impossible to remove. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Kent Overstreet	9a64e1bfd8	bcachefs: Fix GFP_KERNEL allocation in break_cycle() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-06-05 10:44:08 -04:00
Linus Torvalds	ff9bce3d06	bcachefs fixes for 6.10-rc2 - two downgrade fixes - a couple snapshot deletion and repair fixes, thanks to noradtux for finding these and providing the image to debug them - a couple assert fixes - convert to folio helper, from Matthew - some improved error messages - bit of code reorganization (just moving things around); doing this while things are quiet so I'm not rebasing fixes past reorgs - don't return -EROFS on inconsistency error in recovery, this confuses util-linux and has it retry the mount - fix failure to return error on misaligned dio write; reported as an issue with coreutils shred -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZYntUACgkQE6szbY3K bnbp7hAAvMgBanBT7qq3ac+W3vtgLuIk6gXNB7eRl+QNff7bJ+BzJH4UhCGhbo5g WzzQAQ2Zta6NwxbdAcZdL91qe4QDI3ITdIeKBZYtN/C8FySOeEk14K+CNhfQjYgd fJP2bx4LuUnyMri1pw8ZF3L/YXMOKhzTF8jLH04etty8Sbxss+zh9Dz6LFXqvloq 3v0EmbzrgB3KH+zflJ+yxTFUO3/tNYJhZHGXD452AlJYs29bECAAzJ/5gUq43CqQ /q+omBqqqf7oJZ84dHIu2piZrUhUJqotLdcIkzlkxDg+hN/BPeY4hv+dw5GNffz7 hgD6ieWm+0PQrf2WSBGRy7l3DglrwknUgrFSb8PlUAbOsg0TNsN7qjW6LVZSWMZ/ tBWiUQ95VYtlP8KzwLrIZ+BcP/Jm0X5hIAxui0Diz+exh7onDiY7Gxsp8/r0krYI x0s7uLhl73Jb/TO3pX9BS6U+Y0bUu0GJb+TThOLNX961Vg900BmpZvLave6y3U0i E09JRetWGK50wgPPvNt7M+s8lhs0Jg+Q+AuHAUd3x8eb1NSMibAvYGzV4oVpElrT YAP7vrJSgVdCCpI6qqCt+SgxatNUCSa/sHraJz2XeVGFyE6iLlXylBHabxKPn5P2 d8jyJ9cEHzumx6tHjLgm09UvoCBg00+ameiNOpjNKbPw6iJXfuw= =HDxx -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Assorted odds and ends... - two downgrade fixes - a couple snapshot deletion and repair fixes, thanks to noradtux for finding these and providing the image to debug them - a couple assert fixes - convert to folio helper, from Matthew - some improved error messages - bit of code reorganization (just moving things around); doing this while things are quiet so I'm not rebasing fixes past reorgs - don't return -EROFS on inconsistency error in recovery, this confuses util-linux and has it retry the mount - fix failure to return error on misaligned dio write; reported as an issue with coreutils shred" * tag 'bcachefs-2024-05-30' of https://evilpiepirate.org/git/bcachefs: (21 commits) bcachefs: Fix failure to return error on misaligned dio write bcachefs: Don't return -EROFS from mount on inconsistency error bcachefs: Fix uninitialized var warning bcachefs: Split out sb-errors_format.h bcachefs: Split out journal_seq_blacklist_format.h bcachefs: Split out replicas_format.h bcachefs: Split out disk_groups_format.h bcachefs: split out sb-downgrade_format.h bcachefs: split out sb-members_format.h bcachefs: Better fsck error message for key version bcachefs: btree_gc can now handle unknown btrees bcachefs: add missing MODULE_DESCRIPTION() bcachefs: Fix setting of downgrade recovery passes/errors bcachefs: Run check_key_has_snapshot in snapshot_delete_keys() bcachefs: Refactor delete_dead_snapshots() bcachefs: Fix locking assert bcachefs: Fix lookup_first_inode() when inode_generations are present bcachefs: Plumb bkey into __btree_err() bcachefs: Use copy_folio_from_iter_atomic() bcachefs: Fix sb-downgrade validation ...	2024-05-31 11:45:41 -07:00
Kent Overstreet	7b038b564b	bcachefs: Fix failure to return error on misaligned dio write This was reported as an error when running coreutils shred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-29 16:40:30 -04:00
Kent Overstreet	83208cbf2f	bcachefs: Don't return -EROFS from mount on inconsistency error We were accidentally returning -EROFS during recovery on filesystem inconsistency - since this is what the journal returns on emergency shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 19:23:03 -04:00
Kent Overstreet	8528bde1b6	bcachefs: Fix uninitialized var warning Can't actually be used uninitialized, but gcc was being silly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 18:21:51 -04:00
Kent Overstreet	759bb4eabc	bcachefs: Split out sb-errors_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:33:45 -04:00
Kent Overstreet	5c16c57488	bcachefs: Split out journal_seq_blacklist_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	24998050b6	bcachefs: Split out replicas_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	1cdcc6e3c2	bcachefs: Split out disk_groups_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	4c5eef0c50	bcachefs: split out sb-downgrade_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	016c22e410	bcachefs: split out sb-members_format.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 17:32:03 -04:00
Kent Overstreet	f1d4fed13f	bcachefs: Better fsck error message for key version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	088d0de812	bcachefs: btree_gc can now handle unknown btrees Compatibility fix - we no longer have a separate table for which order gc walks btrees in, and special case the stripes btree directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Jeff Johnson	b4131076c1	bcachefs: add missing MODULE_DESCRIPTION() Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/mean_and_variance_test.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	247c056bde	bcachefs: Fix setting of downgrade recovery passes/errors bch2_check_version_downgrade() was setting c->sb.version, which bch2_sb_set_downgrade() expects to be at the previous version; and it shouldn't even have been set directly because c->sb.version is updated by write_super(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	08f50005e0	bcachefs: Run check_key_has_snapshot in snapshot_delete_keys() delete_dead_snapshots now runs before the main fsck.c passes which check for keys for invalid snapshots; thus, it needs those checks as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	82af5ceb5d	bcachefs: Refactor delete_dead_snapshots() Consolidate per-key work into delete_dead_snapshots_process_key(), so we now walk all keys once, not twice. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	218e5e0c2a	bcachefs: Fix locking assert We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	9e1a66e668	bcachefs: Fix lookup_first_inode() when inode_generations are present This function is used for finding the hash seed (which is the same in all versions of an inode in different snapshots): ff an inode has been deleted in a child snapshot we need to iterate until we find a live version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:26 -04:00
Kent Overstreet	1292bc2ebf	bcachefs: Plumb bkey into __btree_err() It can be useful to know the exact byte offset within a btree node where an error occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-28 11:29:23 -04:00
Matthew Wilcox (Oracle)	b82b6eeefd	bcachefs: Use copy_folio_from_iter_atomic() copy_page_from_iter_atomic() will be removed at some point. Also fixup a comment for folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 22:30:09 -04:00
Kent Overstreet	9242a34b76	bcachefs: Fix sb-downgrade validation Superblock downgrade entries are only two byte aligned, but section sizes are 8 byte aligned, which means we have to be careful about overrun checks; an entry that crosses the end of the section is allowed (and ignored) as long as it has zero errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 12:44:12 -04:00
Kent Overstreet	d509cadc3a	bcachefs: Fix debug assert Reported-by: syzbot+a8074a75b8d73328751e@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-26 12:40:30 -04:00
Linus Torvalds	c40b1994b9	bcachefs fixes for 6.10-rc1 Just a few syzbot fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZQk0cACgkQE6szbY3K bna7gA/+MSY3I95CwaJ4bBq5SCxOaRcrX099LFh8Zrj+OF+DWE2PtVo1LhhgnYrQ KpZrS2Q9Qgb2yVqYzOY6LBfH4il1O/WwvloMG0MbuYiQFu9/JL/6CEK9uFyiGmaC fdiFEN3u+8AK6phTFaqUU2ncG0XFQ1Ple5zmFXo4Y3ZJeNaubJeEDac+kbRvOwYh rQ6Iy0FNoQymv0BzmuM7g2NsbhdAgHTv7rhGbfpNBZv3lu0yDXsfZZgWTr2oXMSP FMhm4bcTGAFp5hbwq9k56ND8oSFpamsH7SwS4bDlEe1CNOfMI1JjnrvSEuDrocAE 1Jn2J2Gv9NXnEHKamVzzpUILG67buEtYzJyDQk51N4kulgThdpRzjm+11ylD5U0U wzIK1HXsKHtRdUiIhQGLCLW61FXM+0QBIk2eXhPq88jsM2zTL7iMbXR3P/nvgUDy 8ia8g5Q+nKxcb223M8WmK0rBwlaNasE/hXiFT54ntt8bK5nmVJjPMxVXUmYth3hw 7STkuT0k5jVsMG1NqLkg+wSupj1AuWbD2hIcas7GkxarEYAULbQcClHYGpMll3Tw +pJfLjAtBOkcE4TwWDLflVBhwWtdmPNhk51Q3iLVRp0Gm7t0rhE2vE6TjpsIFnrg rUAgaqQqQ2WXfsRaGa2wx0tRKoW+8Iigq13ndn1AZIrfEtQkYUs= =vuNC -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Nothing exciting, just syzbot fixes (except for the one FMODE_CAN_ODIRECT patch). Looks like syzbot reports have slowed down; this is all catch up from two weeks of conferences. Next hardening project is using Thomas's error injection tooling to torture test repair" * tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix race path in bch2_inode_insert() bcachefs: Ensure we're RW before journalling bcachefs: Fix shutdown ordering bcachefs: Fix unsafety in bch2_dirent_name_bytes() bcachefs: Fix stack oob in __bch2_encrypt_bio() bcachefs: Fix btree_trans leak in bch2_readahead() bcachefs: Fix bogus verify_replicas_entry() assert bcachefs: Check for subvolues with bogus snapshot/inode fields bcachefs: bch2_checksum() returns 0 for unknown checksum type bcachefs: Fix bch2_alloc_ciphers() bcachefs: Add missing guard in bch2_snapshot_has_children() bcachefs: Fix missing parens in drop_locks_do() bcachefs: Improve bch2_assert_pos_locked() bcachefs: Fix shift overflows in replicas.c bcachefs: Fix shift overflow in btree_lost_data() bcachefs: Fix ref in trans_mark_dev_sbs() error path bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method bcachefs: Fix rcu splat in check_fix_ptrs()	2024-05-24 09:07:22 -07:00
Kent Overstreet	d93ff5fa40	bcachefs: Fix race path in bch2_inode_insert() __destroy_new_inode() is appropriate when we have _just_allocated the inode, but not when it's been fully initialized and on i_sb_list. Reported-by: syzbot+a0ddc9873c280a4cb18f@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 20:37:47 -04:00
Kent Overstreet	cd3b31f9d4	bcachefs: Ensure we're RW before journalling Reported-by: syzbot+c60cd352aedb109528bf@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-22 20:17:33 -04:00
Steven Rostedt (Google)	2c92ca849f	tracing/treewide: Remove second parameter of __assign_str() With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str \| while read a ; do sed -e 's/$__assign_str([^,][^ ,]$ ,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>	2024-05-22 20:14:47 -04:00

1 2 3 4 5 ...

3926 Commits