linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-29 23:51:37 +00:00

Author	SHA1	Message	Date
Kent Overstreet	ca1e02f7e9	bcachefs: Etyzinger cleanups Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t and cmp_r_func_t callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	bdbf953b3c	bcachefs: bch2_shoot_down_journal_keys() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	27fcec6c27	bcachefs: Clear recovery_passes_required as they complete without errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-03 14:44:18 -04:00
Kent Overstreet	fa14b50460	bcachefs: ratelimit informational fsck errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 20:24:00 -04:00
Kent Overstreet	7ee88737ab	bcachefs: Check for bad needs_discard before doing discard In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 20:24:00 -04:00
Kent Overstreet	e0319af2b6	bcachefs: Improve bch2_btree_update_to_text() Print out the mode as a string, and also print out the btree and watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 17:13:46 -04:00
Guenter Roeck	97ca7c1f93	mean_and_variance: Drop always failing tests mean_and_variance_test_2 and mean_and_variance_test_4 always fail. The input parameters to those tests are identical to the input parameters to tests 1 and 3, yet the expected result for tests 2 and 4 is different for the mean and stddev tests. That will always fail. Expected mean_and_variance_get_mean(mv) == mean[i], but mean_and_variance_get_mean(mv) == 22 (0x16) mean[i] == 10 (0xa) Drop the bad tests. Fixes: `65bc410907` ("mean and variance: More tests") Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/ Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 14:45:08 -04:00
Kent Overstreet	c42cd606e4	bcachefs: fix nocow lock deadlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 01:04:10 -04:00
Kent Overstreet	e2a316b3cc	bcachefs: BCH_WATERMARK_interior_updates This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 21:14:02 -04:00
Kent Overstreet	ba947ecd39	bcachefs: Fix btree node reserve Sign error when checking the watermark - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 21:14:02 -04:00
Kent Overstreet	b3c7fd35c0	bcachefs: On emergency shutdown, print out current journal sequence number Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 01:07:24 -04:00
Kent Overstreet	eab3a3ce2d	bcachefs: Fix overlapping extent repair overlapping extent repair was colliding with extent past end of inode checks - don't update "extent ends at" until we know we have an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 01:05:50 -04:00
Kent Overstreet	8ce1db8091	bcachefs: Fix remove_dirent() We were missing an iter_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 00:52:32 -04:00
Kent Overstreet	cecfed9b44	bcachefs: Logged op errors should be ignored If something is wrong with a logged op, we just want to delete it - there's nothing to repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-01 00:04:10 -04:00
Kent Overstreet	13c1e583f9	bcachefs: Improve -o norecovery; opts.recovery_pass_limit This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	060ff30a85	bcachefs: bch2_run_explicit_recovery_pass_persistent() Flag that we need to run a recovery pass and run it - persistenly, so if we crash it'll still get run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	0a34c058fc	bcachefs: Ensure bch_sb_field_ext always exists This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	4fe0eeeae4	bcachefs: Flush journal immediately after replay if we did early repair Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	af855a5f5e	bcachefs: Resume logged ops after fsck Finishing logged ops requires the filesystem to be in a reasonably consistent state - and other fsck passes don't require it to have completed, so just run it last. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:12 -04:00
Kent Overstreet	e5aa804641	bcachefs: Add error messages to logged ops fns Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	d2554263ad	bcachefs: Split out recovery_passes.c We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	11d5568d3e	bcachefs: fix backpointer for missing alloc key msg Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	7f9e508036	bcachefs: Fix bch2_btree_increase_depth() When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	47d2080e30	bcachefs: Kill bch2_bkey_ptr_data_type() Remove some duplication, and inconsistency between check_fix_ptrs and the main ptr marking paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	dcc1c04587	bcachefs: Fix use after free in check_root_trans() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	83bb585390	bcachefs: Fix repair path for missing indirect extents Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	6f5869ffd9	bcachefs: Fix use after free in bch2_check_fix_ptrs() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	812a929793	bcachefs: Fix btree node keys accounting in topology repair path When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	805b535a8a	bcachefs: Check btree ptr min_key in .invalid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
zhuxiaohui	bb66009958	bcachefs: add REQ_SYNC and REQ_IDLE in write dio when writing file with direct_IO on bcachefs, then performance is much lower than other fs due to write back throttle in block layer: wbt_wait+1 __rq_qos_throttle+32 blk_mq_submit_bio+394 submit_bio_noacct_nocheck+649 bch2_submit_wbio_replicas+538 __bch2_write+2539 bch2_direct_write+1663 bch2_write_iter+318 aio_write+355 io_submit_one+1224 __x64_sys_io_submit+169 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+110 add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	79032b0781	bcachefs: Improved topology repair checks Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	40cb26233a	bcachefs: Be careful about btree node splits during journal replay Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	048f47e83f	bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finished btree_and_journal_iter is now safe to use at runtime, not just during recovery before journal keys have been freed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Hongbo Li	36f9ef109b	bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: `31403dca5b` ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	57339b24a0	bcachefs: Don't do extent merging before journal replay is finished We don't normally do extent updates this early in recovery, but some of the repair paths have to and when we do, we don't want to do anything that requires the snapshots table. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	ec9cc18fc2	bcachefs: Add checks for invalid snapshot IDs Previously, we assumed that keys were consistent with the snapshots btree - but that's not correct as fsck may not have been run or may not be complete. This adds checks and error handling when using the in-memory snapshots table (that mirrors the snapshots btree). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	63332394c7	bcachefs: Move snapshot table size to struct snapshot_table We need to add bounds checking for snapshot table accesses - it turns out there are cases where we do need to use the snapshots table before fsck checks have completed (and indeed, fsck may not have been run). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	aa6e130e3c	bcachefs: Add an assertion for trying to evict btree root Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:11 -04:00
Kent Overstreet	4bd02d3fb3	bcachefs: fix mount error path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:10 -04:00
Thomas Bertschinger	688d750d10	bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text() before: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0bi_atime=227064388944 ... after: u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0: mode=40700 flags= (15300000) journal_seq=4 bi_size=0 bi_sectors=0 bi_version=0 bi_atime=227064388944 ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:10 -04:00
Kent Overstreet	8aad8e1f65	bcachefs: Fix journal pins in btree write buffer btree write buffer flush has two phases - in natural key order, which is more efficient but may fail - then in journal order The journal order flush was assuming that keys were still correctly ordered by journal sequence number - but due to coalescing by the previous phase, we need an additional sort. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:10 -04:00
Kent Overstreet	a5e3dce493	bcachefs: Fix assert in bch2_backpointer_invalid() Backpointers that point to invalid devices are caught by fsck, not .key_invalid; so .key_invalid needs to check for them instead of hitting asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-31 20:36:10 -04:00
Linus Torvalds	a4145ce1e7	bcachefs fixes for 6.9-rc1 Assorted bugfixes. Most are fixes for simple assertion pops; the most significant fix is for a deadlock in recovery when we have to rewrite large numbers of btree nodes to fix errors. This was incorrectly running out of the same workqueue as the core interior btree update path - we now give it its own single threaded workqueue. This was visible to users as "bch2_btree_update_start(): error: BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmX6CoYACgkQE6szbY3K bnZp2hAAwAw8haQKeR0+0aAaqTavvBcjcloeKlQhRl+OxV1rAgxcjKGai5txZ9rI d4FVOOo7MqHq1oN9Ydsy1+0R70eCFzhDxhT1Ph5MhIzc7nd8lC0GQjO0atx23cni 4UZgSxi6quEP401MTVhvVbCPLmvfPJLpIBzptJUDS/eysxSZpS4A10gEzipoNjPv DOdrsvoo8nQX53tERJ/IxtroFL44p4y8OyZK65NILFF9xZosKz1P9ktrWufmRVoY /Hl8SUfhSNJDFW5pIMPOmoG/+RG+hJK4BaiNWPXLaSvO+3PmQskJ2tvHQVNjHQYt dMYWcy4hN47XtYvrHG9xmaQP+lZCDijdBrhmik4brqfZbloH43MVdDFysjfIPhUm qk+zzb0uE0ZhwRvQOjnYEQpHjXmj7Bm80+dhfNuuiKlhz4bOeDz8UZykJOzgD0zH n4cd+nbCxuogkukzLLQMbFv1+MCsCZpStkXP3GQXCK0k+H2briPGALuA74sxfAhH ajHLNr6qMU+uB6Ce0oM7e+9dPLfV/NalEwWW7aR/4TamxPBt575Hpjp0BV//BRfD IxdEKrMNdbKBJDUj1s5aTwcSF6ae6zHtyQXuKr93mWQqNvVXvX5/FPQYr70uA1VP iieBkde7aSTGCbTdTEcY9NcXdT2X/91aobsPvwGeq1z5Y1JJ0nU= =J/hu -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: "Assorted bugfixes. Most are fixes for simple assertion pops; the most significant fix is for a deadlock in recovery when we have to rewrite large numbers of btree nodes to fix errors. This was incorrectly running out of the same workqueue as the core interior btree update path - we now give it its own single threaded workqueue. This was visible to users as "bch2_btree_update_start(): error: BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging" * tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix lost wakeup on journal shutdown bcachefs; Fix deadlock in bch2_btree_update_start() bcachefs: ratelimit errors from async_btree_node_rewrite bcachefs: Run check_topology() first bcachefs: Improve bch2_fatal_error() bcachefs: Fix lost transaction restart error bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info bcachefs: fix for building in userspace bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() bcachefs: Improve sysfs internal/btree_updates bcachefs: Split out btree_node_rewrite_worker bcachefs: Fix locking in bch2_alloc_write_key() bcachefs: Avoid extent entry type assertions in .invalid() bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested bcachefs: Fix check_key_has_snapshot() call bcachefs: Change "accounting overran journal reservation" to a warning	2024-03-19 17:27:25 -07:00
Kent Overstreet	2e92d26b25	bcachefs: Fix lost wakeup on journal shutdown We need to check for journal shutdown first in __journal_res_get() - after the journal is shutdown, j->watermark won't be changing anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 23:35:42 -04:00
Kent Overstreet	c502b5b878	bcachefs; Fix deadlock in bch2_btree_update_start() BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim means nonblocking, and we need the journal_res_get() in btree_update_start() to respect that. In a future refactoring we'll be deleting BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit BCH_TRANS_COMMIT_nonblocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 23:35:42 -04:00
Kent Overstreet	b38114dde0	bcachefs: ratelimit errors from async_btree_node_rewrite Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:24 -04:00
Kent Overstreet	8d347a5545	bcachefs: Run check_topology() first check_topology() doesn't actually require alloc info - and running it first means other passes don't have to catch btree read errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:24 -04:00
Kent Overstreet	3ed94062e3	bcachefs: Improve bch2_fatal_error() error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:24 -04:00
Kent Overstreet	ec35b30481	bcachefs: Fix lost transaction restart error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:23 -04:00
Kent Overstreet	a586036841	bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 21:17:38 -04:00
Kent Overstreet	f3589bfa7e	bcachefs: fix for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	1c31b83a4e	bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery this fixes an assertion pop in bch2_check_snapshot_trees() -> check_snapshot_tree() -> bch2_snapshot_tree_master_subvol() -> bch2_snapshot_is_ancestor() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	1ba6f48f09	bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	3becdd4850	bcachefs: Improve sysfs internal/btree_updates Print out the function that launched the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	a0a466ea98	bcachefs: Split out btree_node_rewrite_worker This fixes a deadlock due to using btree_interior_update_worker for non interior updates - async btree node rewrites were blocking, and then blocking other interior updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	37bb9c9572	bcachefs: Fix locking in bch2_alloc_write_key() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:11 -04:00
Kent Overstreet	264b501f8f	bcachefs: Avoid extent entry type assertions in .invalid() After keys have passed bkey_ops.key_invalid we should never see invalid extent entry types - but .key_invalid itself needs to cope with them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:11 -04:00
Kent Overstreet	109ea419cf	bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested We only need to return transaction_restart_nested when we're inside a context that's handling transaction restarts. Also, add a missing check_subdir_count() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:11 -04:00
Kent Overstreet	3ff3475611	bcachefs: Fix check_key_has_snapshot() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:11 -04:00
Kent Overstreet	62f35024b2	bcachefs: Change "accounting overran journal reservation" to a warning This doesn't need to be a BUG_ON(); the actual serious "things break" condition is if the whole journal write overruns the available space, and that has a fatal error, not a BUG_ON(). This check indicates we screwed something up, but it should be a warning. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:11 -04:00
Linus Torvalds	32a50540c3	bcachefs updates for 6.9 - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save\|restore} - mempool now does kvmalloc mempools -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXycEcACgkQE6szbY3K bnYUTg/+K4Nv2EdAqOCyHRTKaF2OgJDUb25ZDmbGpfT1XyPrNB7/+CxHqSdEP7/e FVuhtP61vnQAImDv82u9iZiab/TnuCZPUrjSobFEvrWYoGRtP9Bm9MyYB28NzmMa AXGmS4yJGVwtxxrFNxZP98IbiHYiHSoYbkqxX2E5VgLag8Ru8peb7oD0Ro3zw0rb z+6UM/seJ7on5i/9IJEMKKXFVEoZC2J5DAVoe1TghG2kgOw3cKu5OUdltLPOY5jL jkm5J5wa6Ep46nufHat92yiMxXIQrf4U9LkXxzTi5ThoSmt+Af2qXcBjqTTVqd2D 1dGxj+UG8iu4DCCbQC6EA7J5EMvxfJM0+9lk1ULUgxUs3X69co6nlI6XH1fwEMqk KpIqd35+Y/IYgogt9ioXI0dtXyL7dbaTVt6NZhc9SaPGPX+C2V0+l4bqToFdNaPH 0KATjjyQaJRE4ZFIjr6GliYOtKWDLi/HPEyoBivniUn7cF5vjSvti+cSQwNDSPpa 6jOd5Y923Iq9ZqDAPM3+mvTH8nNaaf2T2fmbPNrc5pdWbha9bGwOU71zvKHNFGm/ 66ZsnwhKSk+uwglTMZHPKSkJJXUYAHESw3slQtEWHZVlliArc55+pBHwE00bvRt7 KHUUqkqXBUPzbp/kdZGylMAdH9+8j9TE5QJ2RaoryFm/eCfexmI= =6xnj -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - Subvolume children btree; this is needed for providing a userspace interface for walking subvolumes, which will come later - Lots of improvements to directory structure checking - Improved journal pipelining, significantly improving performance on high iodepth write workloads - Discard path improvements: the discard path is more efficient, and no longer flushes the journal unnecessarily - Buffered write path can now avoid taking the inode lock - new mm helper: memalloc_flags_{save\|restore} - mempool now does kvmalloc mempools * tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits) bcachefs: time_stats: shrink time_stat_buffer for better alignment bcachefs: time_stats: split stats-with-quantiles into a separate structure bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet bcachefs: time_stats: add larger units bcachefs: pull out time_stats.[ch] bcachefs: reconstruct_alloc cleanup bcachefs: fix bch_folio_sector padding bcachefs: Fix btree key cache coherency during replay bcachefs: Always flush write buffer in delete_dead_inodes() bcachefs: Fix order of gc_done passes bcachefs: fix deletion of indirect extents in btree_gc bcachefs: Prefer struct_size over open coded arithmetic bcachefs: Kill unused flags argument to btree_split() bcachefs: Check for writing superblocks with nonsense member seq fields bcachefs: fix bch2_journal_buf_to_text() lib/generic-radix-tree.c: Make nodes more reasonably sized bcachefs: copy_(to\|from)_user_errcode() bcachefs: Split out bkey_types.h bcachefs: fix lost journal buf wakeup due to improved pipelining bcachefs: intercept mountoption value for bool type ...	2024-03-15 09:00:09 -07:00
Darrick J. Wong	be28368b2c	bcachefs: time_stats: shrink time_stat_buffer for better alignment Shrink this percpu object by one array element so that the object size becomes exactly 512 bytes. This will lead to more efficient memory use, hopefully. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:38:03 -04:00
Darrick J. Wong	273960b8f3	bcachefs: time_stats: split stats-with-quantiles into a separate structure Currently, struct time_stats has the optional ability to quantize the information that it collects. This is /probably/ useful for callers who want to see quantized information, but it more than doubles the size of the structure from 224 bytes to 464. For users who don't care about that (e.g. upcoming xfs patches) and want to avoid wasting 240 bytes per counter, split the two into separate pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:38:01 -04:00
Darrick J. Wong	4b4f0876ab	bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet The only caller of this code (time_stats) always knows the weights and whether or not any information has been collected. Pass this information into the mean and variance code so that it doesn't have to store that information. This reduces the structure size from 24 to 16 bytes, which shrinks each time_stats counter to 192 bytes from 208. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:37:58 -04:00
Darrick J. Wong	cdbfa228a5	bcachefs: time_stats: add larger units Filesystems can stay mounted for a very long time, so add some larger units. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:37:54 -04:00
Kent Overstreet	f1ca1abfb0	bcachefs: pull out time_stats.[ch] prep work for lifting out of fs/bcachefs/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:30:35 -04:00
Kent Overstreet	cdce109431	bcachefs: reconstruct_alloc cleanup Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	3bbed37214	bcachefs: fix bch_folio_sector padding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	b3f8e71117	bcachefs: Fix btree key cache coherency during replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	5d04409a62	bcachefs: Always flush write buffer in delete_dead_inodes() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	b6fc661f09	bcachefs: Fix order of gc_done passes gc_stripes_done() and gc_reflink_done() may do alloc btree updates (i.e. when deleting an indirect extent) - we need bucket gens to be fixed by then. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	06ebc48306	bcachefs: fix deletion of indirect extents in btree_gc we need to run the normal extent update path on deletion - bch2_bkey_make_mut() is incorrect when key type is changing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Erick Archer	3e48999816	bcachefs: Prefer struct_size over open coded arithmetic This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "op" variable is a pointer to "struct promote_op" and this structure ends in a flexible array: struct promote_op { [...] struct bio_vec bi_inline_vecs[]; }; and the "t" variable is a pointer to "struct journal_seq_blacklist_table" and this structure also ends in a flexible array: struct journal_seq_blacklist_table { [...] struct journal_seq_blacklist_table_entry { u64 start; u64 end; bool dirty; } entries[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the argument "size + size * count" in the kzalloc() functions. This way, the code is more readable and safer. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer <erick.archer@gmx.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	1fdb9685ed	bcachefs: Kill unused flags argument to btree_split() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	c42006458b	bcachefs: Check for writing superblocks with nonsense member seq fields We're seeing some unmountable filesystems due to split brain detection going awry; it seems we somehow wrote out superblocks where we updated the superblock seq without updating any member seq fields. A given device's superblock should always have the main seq equal to it's member seq field, so this is easy to check for. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	5e105fb806	bcachefs: fix bch2_journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	d64547999c	bcachefs: copy_(to\|from)_user_errcode() we've got some helpers that return errors sanely, move them to a more common location for use in fs-ioctl.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	ba81523eaa	bcachefs: Split out bkey_types.h We're going to need bkey_types.h in bcachefs_ioctl.h in a future patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Brian Foster	ada02c207c	bcachefs: fix lost journal buf wakeup due to improved pipelining The journal_write_done() handler was reworked into a loop in commit 746a33c96b7a ("bcachefs: better journal pipelining"). As part of this, the journal buffer wake was factored into a post-loop branch that executes if at least one journal buffer has completed. The journal buffer processing loop iterates on the journal buffer pointer, however. This means that w refers to the last buffer processed by the loop, which may or may not be done. This also means that if multiple buffers are processed by the loop, only the last is awoken. This lost wakeup behavior has lead to stalling problems in various CI and fstests, such as generic/703. Lift the wake into the loop so each done buffer sees a wake call as it is processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Hongbo Li	2a68d611a1	bcachefs: intercept mountoption value for bool type For mount option with bool type, the value must be 0 or 1 (See bch2_opt_parse). But this seems does not well intercepted cause for other value(like 2...), it returns the unexpect return code with error message printed. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Hongbo Li	7e23c1746b	bcachefs: avoid returning private error code in bch2_xattr_bcachefs_set Avoid the private error code return to caller. The error code should be transformed into genernal error code. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	7e64c86cdc	bcachefs: Buffered write path now can avoid the inode lock Non append, non extending buffered writes can now avoid taking the inode lock. To ensure atomicity of writes w.r.t. other writes, we lock every folio that we'll be writing to, and if this fails we fall back to taking the inode lock. Extensive comments are provided as to corner cases. Link: https://lore.kernel.org/linux-fsdevel/Zdkxfspq3urnrM6I@bombadil.infradead.org/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	7efa287526	bcachefs: Fix bch2_journal_noflush_seq() Improved journal pipelining broke journal_noflush_seq(); it implicitly assumed only the oldest outstanding journal buf could be in flight, but that's no longer true. Make this more straightforward by just setting buf->must_flush whenever we know a journal buf is going to be flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Hongbo Li	79162e829b	bcachefs: fix the error code when mounting with incorrect options. When mount with incorrect options such as: "mount -t bcachefs -o errors=back /dev/loop1 /mnt/bcachefs/". It rebacks the error "mount: /mnt/bcachefs: permission denied." cause bch2_parse_mount_opts returns -1 and bch2_mount throws it up. This is unreasonable. The real error message should be like this: "mount: /mnt/bcachefs: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error." Adding three private error codes for mounting error. Here are: - BCH_ERR_mount_option as the parent class for option error. - BCH_ERR_option_name represents the invalid option name. - BCH_ERR_option_value represents the invalid option value. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	2cce3752ce	bcachefs: split out ignore_blacklisted, ignore_not_dirty prep work for replaying the journal backwards Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	69426613cd	bcachefs: improve move_gap() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	95ffc7fb8c	bcachefs: journal_keys now uses darray helpers nice bit of code cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	894d062254	bcachefs: Rename journal_keys.d -> journal_keys.data This will let us use some darray helpers in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	0b5961b0d8	bcachefs: jset_entry for loops declare loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	eb386617be	bcachefs: Errcode tracepoint, documentation Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Colin Ian King	150194cdcb	bcachefs: remove redundant assignment to variable ret Variable ret is being assigned a value that is never read, it is being re-assigned a couple of statements later on. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/bcachefs/super-io.c:806:2: warning: Value stored to 'ret' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Calvin Owens	c7cad231e8	bcachefs: Silence gcc warnings about arm arch ABI drift 32-bit arm builds emit a lot of spam like this: fs/bcachefs/backpointers.c: In function ‘extent_matches_bp’: fs/bcachefs/backpointers.c:15:13: note: parameter passing for argument of type ‘struct bch_backpointer’ changed in GCC 9.1 Apply the change from commit `ebcc5928c5` ("arm64: Silence gcc warnings about arch ABI drift") to fs/bcachefs/ to silence them. Signed-off-by: Calvin Owens <jcalvinowens@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	90aa35c4c9	bcachefs: Add journal.blocked to journal_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	d9290c9931	bcachefs: Fix journal_buf bitfield accesses All jounal_buf bitfield updates must happen under the journal lock - perhaps we should just switch these to atomic bit flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	a393f33123	bcachefs: Split out discard fastpath Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	06d493fee4	bcachefs: improve bch2_journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	29e11f9699	bcachefs: Drop redundant btree_path_downgrade()s If a path doesn't have any active references, we shouldn't downgrade it; it'll either be reused, possibly with intent refs again, or dropped at bch2_trans_begin() time. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Daniel Hill	ba78af9e56	bcachefs: rebalance_status now shows correct units Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	3235e04afe	bcachefs: more informative write path error message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	74406f66ad	bcachefs: check_path() now only needs to walk up to subvolume root Now that checking subvolume structure is a separate pass, the main check_directory_connectivity() pass only needs to walk up to a given inode's subvolume root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00

1 2 3 4 5 ...

3453 Commits