linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 20:22:09 +00:00

Author	SHA1	Message	Date
Alan Huang	e057a290ef	bcachefs: Fix lost wake up If the reader acquires the read lock and then the writer enters the slow path, while the reader proceeds to the unlock path, the following scenario can occur without the change: writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0) reader: this_cpu_dec(*lock->readers) reader: smp_mb() reader: state = atomic_read(&lock->state) (there is no waiting flag set) writer: six_set_bitmask() then the writer will sleep forever. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:23 -04:00
Kent Overstreet	d50d7a5fa4	bcachefs: Check for logged ops when clean If we shut down successfully, there shouldn't be any logged ops to resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:22 -04:00
Kent Overstreet	1c0ee43b2c	bcachefs: BCH_FS_clean_recovery Add a filesystem flag to indicate whether we did a clean recovery - using c->sb.clean after we've got rw is incorrect, since c->sb is updated whenever we write the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:22 -04:00
Kent Overstreet	9773547b16	bcachefs: Convert disk accounting BUG_ON() to WARN_ON() We had a bug where disk accounting keys didn't always have their version field set in journal replay; change the BUG_ON() to a WARN(), and exclude this case since it's now checked for elsewhere (in the bkey validate function). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:22 -04:00
Kent Overstreet	a3581ca35d	bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply This was added to avoid double-counting accounting keys in journal replay. But applied incorrectly (easily done since it applies to the transaction commit, not a particular update), it leads to skipping in-mem accounting for real accounting updates, and failure to give them a version number - which leads to journal replay becoming very confused the next time around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 22:32:20 -04:00
Kent Overstreet	f8911ad88d	bcachefs: Check for accounting keys with bversion=0 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	cf49f8a8c2	bcachefs: rename version -> bversion give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	fd65378db9	bcachefs: Don't delete unlinked inodes before logged op resume Previously, check_inode() would delete unlinked inodes if they weren't on the deleted list - this code dating from before there was a deleted list. But, if we crash during a logged op (truncate or finsert/fcollapse) of an unlinked file, logged op resume will get confused if the inode has already been deleted - instead, just add it to the deleted list if it needs to be there; delete_dead_inodes runs after logged op resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	8d65b15f8d	bcachefs: Fix BCH_SB_ERRS() so we can reorder BCH_SB_ERRS() has a field for the actual enum val so that we can reorder to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	5612daafb7	bcachefs: Fix fsck warnings from bkey validation __bch2_fsck_err() warns if the current task has a btree_trans object and it wasn't passed in, because if it has to prompt for user input it has to be able to unlock it. But plumbing the btree_trans through bkey_validate(), as well as transaction restarts, is problematic - so instead make bkey fsck errors FSCK_AUTOFIX, which doesn't need to warn. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	7c980a43e9	bcachefs: Move transaction commit path validation to as late as possible In order to check for accounting keys with version=0, we need to run validation after they've been assigned version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	431312b59c	bcachefs: Fix disk accounting attempting to mark invalid replicas entry This fixes the following bug, where a disk accounting key has an invalid replicas entry, and we attempt to add it to the superblock: bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644 bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read... accounting not marked in superblock replicas replicas cached: 1/1 [0], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 88): user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4] bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644 accounting not marked in superblock replicas replicas user: 1/1 [3], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 96): user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] accounting not marked in superblock replicas replicas user: 1/2 [3 7], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7] replicas_v0 (size 96): user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5] done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	49fd90b2cc	bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	9104fc1928	bcachefs: Fix accounting read + device removal accounting read was checking if accounting replicas entries were marked in the superblock prior to applying accounting from the journal, which meant that a recently removed device could spuriously trigger a "not marked in superblocked" error (when journal entries zero out the offending counter). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	1e0272ef47	bcachefs: bch_accounting_mode Minor refactoring - replace multiple bool arguments with an enum; prep work for fixing a bug in accounting read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	3672bda8f5	bcachefs: fix transaction restart handling in check_extents(), check_dirents() Dealing with outside state within a btree transaction is always tricky. check_extents() and check_dirents() have to accumulate counters for i_sectors and i_nlink (for subdirectories). There were two bugs: - transaction commit may return a restart; therefore we have to commit before accumulating to those counters - get_inode_all_snapshots() may return a transaction restart, before updating w->last_pos; then, on the restart, check_i_sectors()/check_subdir_count() would see inodes that were not for w->last_pos Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:35 -04:00
Kent Overstreet	22a507d68e	bcachefs: kill inode_walker_entry.seen_this_pos dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	b29c30ab48	bcachefs: Fix incorrect IS_ERR_OR_NULL usage Returning a positive integer instead of an error code causes error paths to become very confused. Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Hongbo Li	dc5bfdf8ea	bcachefs: fix the memory leak in exception case The pointer clean points the memory allocated by kmemdup, when the return value of bch2_sb_clean_validate_late is not zero. The memory pointed by clean is leaked. So we should free it in this case. Fixes: `a37ad1a3ab` ("bcachefs: sb-clean.c") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Hongbo Li	3125c95ea6	bcachefs: fast exit when darray_make_room failed In downgrade_table_extra, the return value is needed. When it return failed, we should exit immediately. Fixes: `7773df19c3` ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	951dd86e7c	bcachefs: Fix iterator leak in check_subvol() A couple small error handling fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	2a1df87346	bcachefs: Add snapshot to bch_inode_unpacked this allows for various cleanups in fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Diogo Jahchan Koike	40d40c6bea	bcachefs: assign return error when iterating through layout syzbot reported a null ptr deref in __copy_user [0] In __bch2_read_super, when a corrupt backup superblock matches the default opts offset, no error is assigned to ret and the freed superblock gets through, possibly being assigned as the best sb in bch2_fs_open and being later dereferenced, causing a fault. Assign EINVALID to ret when iterating through layout. [0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	c6040447c5	bcachefs: Fix srcu warning in check_topology check_topology doesn't need the srcu lock and doesn't use normal btree transactions - we can just drop the srcu lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	18c520f408	bcachefs: Fix error path in check_dirent_inode_dirent() fsck_err() jumps to the fsck_err label when bailing out; need to make sure bp_iter was initialized... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Piotr Zalewski	0696a18a8c	bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping Zero-initialize part of allocated bounce buffer which wasn't touched by subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value use KMSAN bug[1]. After applying the patch reproducer still triggers stack overflow[2] but it seems unrelated to the uninit-value use warning. After further investigation it was found that stack overflow occurs because KMSAN adds too many function calls[3]. Backtrace of where the stack magic number gets smashed was added as a reply to syzkaller thread[3]. It was confirmed that task's stack magic number gets smashed after the code path where KSMAN detects uninit-value use is executed, so it can be assumed that it doesn't contribute in any way to uninit-value use detection. [1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 [2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com [3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 Fixes: `ec4edd7b9d` ("bcachefs: Prep work for variable size btree node buffers") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	51b7cc7c0f	bcachefs: Improve bch2_is_inode_open() warning message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	4a8f8fafbd	bcachefs: Add extra padding in bkey_make_mut_noupdate() This fixes a kasan splat in propagate_key_to_snapshot_leaves() - varint_decode_fast() does reads (that it never uses) up to 7 bytes past the end of the integer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	f890c8513f	bcachefs: Mark inode errors as autofix Most or all errors will be autofix in the future, we're currently just doing the ones that we know are well tested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-27 21:46:34 -04:00
Kent Overstreet	7eb4a319db	bcachefs: Fix infinite loop in propagate_key_to_snapshot_leaves() As we iterate we need to mark that we no longer need iterators - otherwise we'll infinite loop via the "too many iters" check when there's many snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-23 18:46:58 -04:00
Kent Overstreet	6d12d7ace9	bcachefs: Ensure BCH_FS_accounting_replay_done is always set if it doesn't get set we'll never be able to flush the btree write buffer; this only happens in fake rw mode, but prevents us from shutting down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-23 18:46:58 -04:00
Ahmed Ehab	39c3aad43f	bcachefs: Hold read lock in bch2_snapshot_tree_oldest_subvol() Syzbot reports a problem that a warning is triggered due to suspicious use of rcu_dereference_check(). That is triggered by a call of bch2_snapshot_tree_oldest_subvol(). The cause of the warning is that inside bch2_snapshot_tree_oldest_subvol(), snapshot_t() is called which calls rcu_dereference() that requires a read lock to be held. Also, the call of bch2_snapshot_tree_next() eventually calls snapshot_t(). To fix this, call rcu_read_lock() before calling snapshot_t(). Then, release the lock after the termination of the while loop. Reported-by: <syzbot+f7c41a878676b72c16a6@syzkaller.appspotmail.com> Signed-off-by: Ahmed Ehab <bottaawesome633@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 14:54:18 -04:00
Diogo Jahchan Koike	025c55a4c7	bcachefs: return err ptr instead of null in read sb clean syzbot reported a null-ptr-deref in bch2_fs_start. [0] When a sb is marked clear but doesn't have a clean section bch2_read_superblock_clean returns NULL which PTR_ERR_OR_ZERO lets through, eventually leading to a null ptr dereference down the line. Adjust read sb clean to return an ERR_PTR indicating the invalid clean section. [0] https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Reported-by: syzbot+1cecc37d87c4286e5543@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1cecc37d87c4286e5543 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Yang Li	abb43dd677	bcachefs: Remove duplicated include in backpointers.c The header files bbpos.h is included twice in backpointers.c, so one inclusion of each can be removed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=10783 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	d5c5b337f8	bcachefs: Don't drop devices with stripe pointers Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	035d72f72c	bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices This factors out ec_strie_head_devs_update(), which initializes the bitmap of devices we're allocating from, and runs it every time c->rw_devs_change_count changes. We also cancel pending, not allocated stripes, since they may refer to devices that are no longer available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	83ccd9b31d	bcachefs: bch_fs.rw_devs_change_count Add a counter that's incremented whenever rw devices change; this will be used for erasure coding so that it can keep ec_stripe_head in sync and not deadlock on a new stripe when a device it wants goes away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	ad8d1f77fc	bcachefs: bch2_dev_remove_stripes() We can now correctly force-remove a device that has stripes on it; this uses the new BCH_SB_MEMBER_INVALID sentinal value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	934137b0c0	bcachefs: bch2_trigger_ptr() calculates sectors even when no device This is necessary for erasure coded pointers to devices that have been removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	2aee59eb21	bcachefs: improve error messages in bch2_ec_read_extent() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	cb771fe891	bcachefs: improve error message on too few devices for ec Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:49 -04:00
Kent Overstreet	c9cabfb215	bcachefs: improve bch2_new_stripe_to_text() also print out the new stripe key Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	a4b7a0c037	bcachefs: ec_stripe_head.nr_created additional debug stat Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	fa85c47397	bcachefs: bch_stripe.disk_label When reshaping existing stripes, we should keep them on the same target that they were allocated on; to do this, we need to add a field to the btree stripe type. This is a tad awkward, because we only have 8 bits left, and targets are 16 bits - but we only need to store a label, not a full target. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	1b11c4d365	bcachefs: stripe_to_mem() factor out a common helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	54a12984a9	bcachefs: EIO errcode cleanup We want to be using private errcodes whenever possible, for better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	7a51608d01	bcachefs: Rework btree node pinning In backpointers fsck, we do a seqential scan of one btree, and check references to another: extents <-> backpointers Checking references generates random lookups, so we want to pin that btree in memory (or only a range, if it doesn't fit in ram). Previously, this was done with a simple check in the shrinker - "if btree node is in range being pinned, don't free it" - but this generated OOMs, as our shrinker wasn't well behaved if there was less memory available than expected. Instead, we now have two different shrinkers and lru lists; the second shrinker being for pinned nodes, with seeks set much higher than normal - so they can still be freed if necessary, but we'll prefer not to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	91ddd71510	bcachefs: split up btree cache counters for live, freeable this is prep for introducing a second live list and shrinker for pinned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	691f2cba22	bcachefs: btree cache counters should be size_t 32 bits won't overflow any time soon, but size_t is the correct type for counting objects in memory. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00
Kent Overstreet	ad5dbe3ce5	bcachefs: Don't count "skipped access bit" as touched in btree cache scan Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-09-21 11:39:48 -04:00

1 2 3 4 5 ...

1294685 Commits