linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-13 22:53:20 +00:00

History

Filipe Manana 79bd37120b btrfs: rework chunk allocation to avoid exhaustion of the system chunk array Commit `eafa4fd0ad` ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations") fixed a problem that resulted in exhausting the system chunk array in the superblock when there are many tasks allocating chunks in parallel. Basically too many tasks enter the first phase of chunk allocation without previous tasks having finished their second phase of allocation, resulting in too many system chunks being allocated. That was originally observed when running the fallocate tests of stress-ng on a PowerPC machine, using a node size of 64K. However that commit also introduced a deadlock where a task in phase 1 of the chunk allocation waited for another task that had allocated a system chunk to finish its phase 2, but that other task was waiting on an extent buffer lock held by the first task, therefore resulting in both tasks not making any progress. That change was later reverted by a patch with the subject "btrfs: fix deadlock with concurrent chunk allocations involving system chunks", since there is no simple and short solution to address it and the deadlock is relatively easy to trigger on zoned filesystems, while the system chunk array exhaustion is not so common. This change reworks the chunk allocation to avoid the system chunk array exhaustion. It accomplishes that by making the first phase of chunk allocation do the updates of the device items in the chunk btree and the insertion of the new chunk item in the chunk btree. This is done while under the protection of the chunk mutex (fs_info->chunk_mutex), in the same critical section that checks for available system space, allocates a new system chunk if needed and reserves system chunk space. This way we do not have chunk space reserved until the second phase completes. The same logic is applied to chunk removal as well, since it keeps reserved system space long after it is done updating the chunk btree. For direct allocation of system chunks, the previous behaviour remains, because otherwise we would deadlock on extent buffers of the chunk btree. Changes to the chunk btree are by large done by chunk allocation and chunk removal, which first reserve chunk system space and then later do changes to the chunk btree. The other remaining cases are uncommon and correspond to adding a device, removing a device and resizing a device. All these other cases do not pre-reserve system space, they modify the chunk btree right away, so they don't hold reserved space for a long period like chunk allocation and chunk removal do. The diff of this change is huge, but more than half of it is just addition of comments describing both how things work regarding chunk allocation and removal, including both the new behavior and the parts of the old behavior that did not change. CC: stable@vger.kernel.org # 5.12+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2021-07-07 17:42:41 +02:00
..
tests	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
acl.c	fs: make helpers idmap mount aware	2021-01-24 14:27:20 +01:00
async-thread.c	Btrfs: fix crash during unmount due to race with delayed inode workers	2020-03-23 17:01:51 +01:00
async-thread.h	Btrfs: fix crash during unmount due to race with delayed inode workers	2020-03-23 17:01:51 +01:00
backref.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
backref.h	btrfs: add asserts for deleting backref cache nodes	2021-02-08 22:58:56 +01:00
block-group.c	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
block-group.h	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
block-rsv.c	btrfs: introduce mount option rescue=ignorebadroots	2020-12-08 15:53:41 +01:00
block-rsv.h	btrfs: Remove __ prefix from btrfs_block_rsv_release	2020-03-23 17:01:55 +01:00
btrfs_inode.h	btrfs: remove stale comment and logic from btrfs_inode_in_log()	2021-04-19 17:25:16 +02:00
check-integrity.c	btrfs: integrity-checker: convert block context kmap's to kmap_local_page	2021-04-19 17:25:16 +02:00
check-integrity.h	btrfs: remove btrfsic_submit_bh()	2020-03-23 17:01:39 +01:00
compression.c	btrfs: remove a stale comment for btrfs_decompress_bio()	2021-06-22 14:11:57 +02:00
compression.h	btrfs: optimize variables size in btrfs_submit_compressed_write	2021-06-21 15:19:07 +02:00
ctree.c	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
ctree.h	btrfs: remove unused btrfs_fs_info::total_pinned	2021-06-22 19:58:26 +02:00
delalloc-space.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
delalloc-space.h	btrfs: make btrfs_delalloc_reserve_space take btrfs_inode	2020-07-27 12:55:36 +02:00
delayed-inode.c	btrfs: remove total_data_size variable in btrfs_batch_insert_items()	2021-06-21 15:19:11 +02:00
delayed-inode.h	btrfs: make btrfs_delayed_update_inode take btrfs_inode	2020-12-08 15:54:10 +01:00
delayed-ref.c	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
delayed-ref.h	btrfs: only let one thread pre-flush delayed refs in commit	2021-02-08 22:58:56 +01:00
dev-replace.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
dev-replace.h	btrfs: zoned: mark block groups to copy for device-replace	2021-02-09 02:46:07 +01:00
dir-item.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
discard.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
discard.h	btrfs: cleanup btrfs_discard_update_discardable usage	2020-12-08 15:54:02 +01:00
disk-io.c	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
disk-io.h	btrfs: split alloc_log_tree()	2021-02-09 02:46:07 +01:00
export.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
export.h	btrfs: export helpers for subvolume name/id resolution	2020-03-23 17:01:42 +01:00
extent_io.c	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
extent_io.h	btrfs: rename PagePrivate2 to PageOrdered inside btrfs	2021-06-21 15:19:09 +02:00
extent_map.c	btrfs: fix parameter description of btrfs_add_extent_mapping	2021-02-08 22:58:53 +01:00
extent_map.h
extent-io-tree.h	btrfs: use fixed width int type for extent_state::state	2020-12-08 15:54:13 +01:00
extent-tree.c	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
file-item.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
file.c	btrfs: eliminate insert label in add_falloc_range	2021-06-21 15:19:10 +02:00
free-space-cache.c	btrfs: don't set the full sync flag when truncation does not touch extents	2021-06-21 15:19:05 +02:00
free-space-cache.h	btrfs: zoned: track unusable bytes for zones	2021-02-09 02:46:03 +01:00
free-space-tree.c	btrfs: fix possible free space tree corruption with online conversion	2021-01-25 18:44:37 +01:00
free-space-tree.h
inode-item.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
inode.c	btrfs: compression: don't try to compress if we don't have enough pages	2021-06-22 14:11:57 +02:00
ioctl.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
Kconfig	btrfs: disable build on platforms having page size 256K	2021-06-22 14:11:57 +02:00
locking.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
locking.h	btrfs: remove the recurse parameter from __btrfs_tree_read_lock	2020-12-08 15:54:09 +01:00
lzo.c	btrfs: convert kmap to kmap_local_page, simple cases	2021-04-19 17:25:16 +02:00
Makefile	btrfs: move the tree mod log code into its own file	2021-04-19 17:25:16 +02:00
misc.h	btrfs: rename tree_entry to rb_simple_node and export it	2020-05-25 11:25:19 +02:00
ordered-data.c	btrfs: make page Ordered bit to be subpage compatible	2021-06-21 15:19:10 +02:00
ordered-data.h	btrfs: introduce btrfs_lookup_first_ordered_range()	2021-06-21 15:19:08 +02:00
orphan.c
print-tree.c	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
print-tree.h	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
props.c	btrfs: props: change how empty value is interpreted	2021-06-22 14:11:58 +02:00
props.h
qgroup.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
qgroup.h	btrfs: export and rename qgroup_reserve_meta	2021-03-02 16:58:30 +01:00
raid56.c	CFI on arm64 series for v5.13-rc1	2021-04-27 10:16:46 -07:00
raid56.h
rcu-string.h	btrfs: rcu-string: Replace zero-length array with flexible-array member	2020-03-23 17:01:53 +01:00
reada.c	btrfs: subpage: make readahead work properly	2021-03-16 11:06:21 +01:00
ref-verify.c	btrfs: ref-verify: use 'inline void' keyword ordering	2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c	btrfs: reflink: make copy_inline_to_page() to be subpage compatible	2021-06-21 15:19:10 +02:00
reflink.h	Btrfs: move all reflink implementation code into its own file	2020-03-23 17:01:54 +01:00
relocation.c	btrfs: ensure relocation never runs while we have send operations running	2021-06-22 14:11:58 +02:00
root-tree.c	btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations	2020-10-07 12:12:13 +02:00
scrub.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
send.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
send.h	btrfs: send: avoid copying file data	2020-10-07 12:13:17 +02:00
space-info.c	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
space-info.h	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
struct-funcs.c	btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors	2020-12-09 19:16:10 +01:00
subpage.c	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
subpage.h	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
super.c	btrfs: shorten integrity checker extent data mount option	2021-06-22 14:11:58 +02:00
sysfs.c	btrfs: rip out btrfs_space_info::total_bytes_pinned	2021-06-22 14:55:25 +02:00
sysfs.h	btrfs: split and refactor btrfs_sysfs_remove_devices_dir	2020-10-07 12:12:21 +02:00
transaction.c	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
transaction.h	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
tree-checker.c	btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly	2021-04-19 17:25:21 +02:00
tree-checker.h
tree-defrag.c	btrfs: locking: remove all the blocking helpers	2020-12-08 15:54:01 +01:00
tree-log.c	btrfs: avoid unnecessary logging of xattrs during fast fsyncs	2021-06-21 15:19:07 +02:00
tree-log.h	btrfs: make fast fsyncs wait only for writeback	2020-10-07 12:06:56 +02:00
tree-mod-log.c	btrfs: fix race when picking most recent mod log operation for an old root	2021-04-20 19:27:17 +02:00
tree-mod-log.h	btrfs: add and use helper to get lowest sequence number for the tree mod log	2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c	btrfs: remove unnecessary casts in printk	2020-12-08 15:53:52 +01:00
volumes.c	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
volumes.h	btrfs: rework chunk allocation to avoid exhaustion of the system chunk array	2021-07-07 17:42:41 +02:00
xattr.c	for-5.12-rc1-tag	2021-03-05 12:21:14 -08:00
xattr.h
zlib.c	btrfs: use memzero_page() instead of open coded kmap pattern	2021-05-05 11:27:27 -07:00
zoned.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
zoned.h	btrfs: zoned: factor out zoned device lookup	2021-06-21 15:19:05 +02:00
zstd.c	btrfs: use memzero_page() instead of open coded kmap pattern	2021-05-05 11:27:27 -07:00