linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 22:21:42 +00:00

Author	SHA1	Message	Date
Kent Overstreet	5bf9db0179	bcachefs: evacuate_bucket() no longer calls verify_bucket_evacuated() The copygc code itself now calls this when all moves from a given bucket are complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	51fe0332b1	bcachefs: Suppress transaction restart err message This isn't a real error, and doesn't need to be printed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	7635e1a6d6	bcachefs: Rework open bucket partial list allocation Now, any open_bucket can go on the partial list: allocating from the partial list has been moved to its own dedicated function, open_bucket_add_bucets() -> bucket_alloc_set_partial(). In particular, this means that erasure coded buckets can safely go on the partial list; the new location works with the "allocate an ec bucket first, then the rest" logic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Brian Foster	e53d03fe39	bcachefs: don't bump key cache journal seq on nojournal commits fstest generic/388 occasionally reproduces corruptions where an inode has extents beyond i_size. This is a deliberate crash and recovery test, and the post crash+recovery characteristics are usually the same: the inode exists on disk in an early (i.e. just allocated) state based on the journal sequence number associated with the inode. Subsequent inode updates exist in the journal at higher sequence numbers, but the inode hadn't been written back before the associated crash and the post-crash recovery processes a set of journal sequence numbers that doesn't include updates to the inode. In fact, the sequence with the most recent inode key update always happens to be the sequence just before the front of the journal processed by recovery. This last bit is a significant hint that the problem relates to an on-disk journal update of the front of the journal. The root cause of this problem is basically that the inode is updated (multiple times) in-core and in the key cache, each time bumping the key cache sequence number used to control the cache flush. The cache flush skips one or more times, bumping the associated key cache journal pin to the key cache seq value. This has a side effect of holding the inode in memory a bit longer than normal, which helps exacerbate this problem, but is also unsafe in certain cases where the key cache seq may have been updated by a transaction commit that didn't journal the associated key. For example, consider an inode that has been allocated, updated several times in the key cache, journaled, but not yet written back. At this stage, everything should be consistent if the fs happens to crash because the latest update has been journal. Now consider a key update via bch2_extent_update_i_size_sectors() that uses the BTREE_UPDATE_NOJOURNAL flag. While this update may not change inode state, it can have the side effect of bumping ck->seq in bch2_btree_insert_key_cached(). In turn, if a subsequent key cache flush skips due to seq not matching the former, the ck->journal pin is updated to ck->seq even though the most recent key update was not journaled. If this pin happens to reside at the front (tail) of the journal, this means a subsequent journal write can update last_seq to a value beyond that which includes the most recent update to the inode. If this occurs and the fs happens to crash before the inode happens to flush, recovery will see the latest last_seq, fail to recover the inode and leave the inode in the inconsistent state described above. To avoid this problem, skip the key cache seq update on NOJOURNAL commits, except on initial pin add. Pass the insert entry directly to bch2_btree_insert_key_cached() to make the associated flag available and be consistent with btree_insert_key_leaf(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	83ec519aea	bcachefs: When shutting down, flush btree node writes last Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	adac06fad3	bcachefs: Verbose on by default when CONFIG_BCACHEFS_DEBUG=y Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	db64a8e8a1	fixup bcachefs: Use for_each_btree_key_upto() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	4b5b13da52	six locks: be more careful about lost wakeups This is a workaround for a lost wakeup bug we've been seeing - we still need to discover the actual bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	2640faeb17	bcachefs: Journal resize fixes - Fix a sleeping-in-atomic bug due to calling bch2_journal_buckets_to_sb() under the journal lock. - Additionally, now we mark buckets as journal buckets before adding them to the journal in memory and the superblock. This ensures that if we crash part way through we'll never be writing to journal buckets that aren't marked correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	511b629aca	bcachefs: bch2_btree_iter_peek_node_and_restart() Minor refactoring for the Rust interface. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	b65499b7b1	bcachefs: bch2_btree_node_ondisk_to_text() Pulling out a helper from cmd_list.c, as the rest is being rewritten in Rust but we're not ready to rewrite lower-level btree code in Rust. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	a345b0f393	bcachefs: bch2_btree_node_to_text() const correctness This is for the Rust interface - Rust cares more about const than C does. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	26bab33b69	bcachefs: Fix "btree node in stripe" error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	2a912a9a39	bcachefs: Kill bch2_ec_bucket_written() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	81c771b266	bcachefs: Improve bch2_new_stripes_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	8fcdf81418	bcachefs: Improved copygc pipelining This improves copygc pipelining across multiple buckets: we now track each in flight bucket we're evacuating, with separate moving_contexts. This means that whereas previously we had to wait for outstanding moves to complete to ensure we didn't try to evacuate the same bucket twice, we can now just check buckets we want to evacuate against the pending list. This also mean we can run the verify_bucket_evacuated() check without killing pipelining - meaning it can now always be enabled, not just on debug builds. This is going to be important for the upcoming erasure coding work, where moving IOs that are being erasure coded will now skip the initial replication step; instead the IOs will wait on the stripe to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	0b943b973c	bcachefs: Free move buffers as early as possible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	5be6a274ff	bcachefs: Fix stripe reuse path It's possible that we reuse a stripe that doesn't have quite the same configuration as the stripe_head we're allocating from. In that case, we have to make sure that the new stripe uses the settings from the stripe we resue, not the stripe head, and make sure the buffer is allocated correctly. This fixes the ec_mixed_tiers test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	ac2ccddc26	bcachefs: Drop some anonymous structs, unions Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	45dd05b3ec	bcachefs: BKEY_PADDED_ONSTACK() Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	2f528663c5	bcachefs: moving_context->stats is allowed to be NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	e84face6f0	bcachefs: RESERVE_stripe Rework stripe creation path - new algorithm for deciding when to create new stripes or reuse existing stripes. We add a new allocation watermark, RESERVE_stripe, above RESERVE_none. Then we always try to create a new stripe by doing RESERVE_stripe allocations; if this fails, we reuse an existing stripe and allocate buckets for it with the reserve watermark for the given write (RESERVE_none or RESERVE_movinggc). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	d57c9add59	bcachefs: Improve error message for stripe block sector counts wrong Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	9d32097f3b	bcachefs: More stripe create cleanup/fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	a1fb08f5df	bcachefs: Plumb alloc_reserve through stripe create path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	910659763e	bcachefs: Mark stripe buckets with correct data type Currently, we don't use bucket data type for tracking whether buckets are part of a stripe; parity buckets are BCH_DATA_parity, but data buckets in a stripe are BCH_DATA_user. There's a separate counter, buckets_ec, outside the BCH_DATA_TYPES system for tracking number of buckets on a device that are part of a stripe. The trouble with this approach is that it's too coarse grained, and we need better information on fragmentation for debugging copygc. With this patch, data buckets in a stripe are now tracked as BCH_DATA_stripe buckets. This doesn't yet differentiate between erasure coded and non-erasure coded data in a stripe bucket, nor do we yet track empty data buckets in stripes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	3329cf1bb9	bcachefs: Centralize btree node lock initialization This fixes some confusion in the lockdep code due to initializing btree node/key cache locks with the same lockdep key, but different names. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	1306f87de3	bcachefs: Plumb btree_trans through btree cache code Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	b1cfe5ed2b	bcachefs: Improve dev_alloc_debug_to_text() Now we also print the number of buckets reserved for each watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	c85d779609	bcachefs: bch2_copygc_wait_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	2611a041ae	bcachefs: bch2_mark_key() now takes btree_id & level btree & level are passed to trans_mark - for backpointers - bch2_mark_key() should take them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	e902095868	bcachefs: bch2_write_queue() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	8f2bbcdd9b	bcachefs: ec: Improve error message for btree node in stripe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	2f4e9472fa	bcachefs: bch2_open_bucket_to_text() Factor out a common helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	11bb67a4a3	bcachefs: bch2_data_update_init() considers ptr durability Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	a64adedb86	bcachefs: ec: Ensure new stripe is closed in error path This fixes a use-after-free bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	f3a65bb98b	bcachefs: Convert constants to consts Rust bindgen doesn't handle macros, but it does handle integer constants: this conversion aids in implementing safe Rust wrapper interfaces. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	0f2ea6550f	bcachefs: bch2_btree_iter_peek_and_restart_outlined() Needed for interfacing with Rust - bindgen can't handle inline functions, alas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	94bc95c468	bcachefs: ec: zero_out_rest_of_ec_bucket() Occasionally, we won't write to an entire bucket. This fixes the EC code to handle this case, zeroing out the rest of the bucket as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	039c45feef	bcachefs: bch2_data_update_index_update() -> bch2_trans_run() Convert to use the standard helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	e07cb97460	bcachefs: Flush write buffer as needed in backpointers repair Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	747ded6ddf	bcachefs: Fix for shared paths in write buffer flush It's possible for bch2_write_buffer_flush_one() to end up with a shared path, if called from a context that already has a btree iterator pointing to a key being flushed. We have to be careful when that happens, since we can't clone a path that holds write locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	39a1ea129a	bcachefs: Single open_bucket_partial list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	0d763863af	bcachefs: Improve bch2_stripe_to_text() We now print pointers as bucket:offset, the same as how we print extent pointers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	33669e0cc9	bcachefs: Add option for completely disabling nocow This adds an option for completely disabling nocow mode, including the locking in the data move path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	1a14e25510	bcachefs: Make bucket_alloc tracepoint more readable Print bucket in dev:bucket notation, to be consistent with how we refer to buckets elsewhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	e9b7014654	bcachefs: Don't call bch2_trans_update() unlocked Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	70ded998c5	bcachefs: get_stripe_key_trans() Another nested btree_trans fix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	e3877382fb	bcachefs: Fix erasure coding shutdown path It's possible when shutting down to for a stripe head to have a new stripe that doesn't yet have any blocks allocated - we just need to free it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	64784ade4f	bcachefs: Fix buffer overrun in ec_stripe_update_extent() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00

1 2 3 4 5 ...

1217516 Commits