linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-24 11:51:27 +00:00

History

Filipe Manana 35b22c19af btrfs: send: fix crash when memory allocations trigger reclaim When doing a send we don't expect the task to ever start a transaction after the initial check that verifies if commit roots match the regular roots. This is because after that we set current->journal_info with a stub (special value) that signals we are in send context, so that we take a read lock on an extent buffer when reading it from disk and verifying it is valid (its generation matches the generation stored in the parent). This stub was introduced in 2014 by commit `a26e8c9f75` ("Btrfs: don't clear uptodate if the eb is under IO") in order to fix a concurrency issue between send and balance. However there is one particular exception where we end up needing to start a transaction and when this happens it results in a crash with a stack trace like the following: [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80 [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy (...) [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted 5.12.9-300.fc34.x86_64 #1 [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015 [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80 [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd (...) [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246 [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9fb583453224 [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000 RDI: 0000000000000000 [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000 R15: 0000000000000001 [60015.902408] kernel: FS: 00007f158e62d8c0(0000) GS:ffff93bd37580000(0000) knlGS:0000000000000000 [60015.902410] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000 CR4: 00000000000506e0 [60015.902414] kernel: Call Trace: [60015.902419] kernel: kfence_unprotect+0x13/0x30 [60015.902423] kernel: page_fault_oops+0x89/0x270 [60015.902427] kernel: ? search_module_extables+0xf/0x40 [60015.902431] kernel: ? search_bpf_extables+0x57/0x70 [60015.902435] kernel: kernelmode_fixup_or_oops+0xd6/0xf0 [60015.902437] kernel: __bad_area_nosemaphore+0x142/0x180 [60015.902440] kernel: exc_page_fault+0x67/0x150 [60015.902445] kernel: asm_exc_page_fault+0x1e/0x30 [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580 [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 (...) [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246 [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000 RDI: 0000000000000039 [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001 R09: 0000000000000000 [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001 R12: 0000000000000001 [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000 R15: ffff93bc0c92b000 [60015.902468] kernel: btrfs_commit_inode_delayed_inode+0x5d/0x120 [60015.902473] kernel: btrfs_evict_inode+0x2c5/0x3f0 [60015.902476] kernel: evict+0xd1/0x180 [60015.902480] kernel: inode_lru_isolate+0xe7/0x180 [60015.902483] kernel: __list_lru_walk_one+0x77/0x150 [60015.902487] kernel: ? iput+0x1a0/0x1a0 [60015.902489] kernel: ? iput+0x1a0/0x1a0 [60015.902491] kernel: list_lru_walk_one+0x47/0x70 [60015.902495] kernel: prune_icache_sb+0x39/0x50 [60015.902497] kernel: super_cache_scan+0x161/0x1f0 [60015.902501] kernel: do_shrink_slab+0x142/0x240 [60015.902505] kernel: shrink_slab+0x164/0x280 [60015.902509] kernel: shrink_node+0x2c8/0x6e0 [60015.902512] kernel: do_try_to_free_pages+0xcb/0x4b0 [60015.902514] kernel: try_to_free_pages+0xda/0x190 [60015.902516] kernel: __alloc_pages_slowpath.constprop.0+0x373/0xcc0 [60015.902521] kernel: ? __memcg_kmem_charge_page+0xc2/0x1e0 [60015.902525] kernel: __alloc_pages_nodemask+0x30a/0x340 [60015.902528] kernel: pipe_write+0x30b/0x5c0 [60015.902531] kernel: ? set_next_entity+0xad/0x1e0 [60015.902534] kernel: ? switch_mm_irqs_off+0x58/0x440 [60015.902538] kernel: __kernel_write+0x13a/0x2b0 [60015.902541] kernel: kernel_write+0x73/0x150 [60015.902543] kernel: send_cmd+0x7b/0xd0 [60015.902545] kernel: send_extent_data+0x5a3/0x6b0 [60015.902549] kernel: process_extent+0x19b/0xed0 [60015.902551] kernel: btrfs_ioctl_send+0x1434/0x17e0 [60015.902554] kernel: ? _btrfs_ioctl_send+0xe1/0x100 [60015.902557] kernel: _btrfs_ioctl_send+0xbf/0x100 [60015.902559] kernel: ? enqueue_entity+0x18c/0x7b0 [60015.902562] kernel: btrfs_ioctl+0x185f/0x2f80 [60015.902564] kernel: ? psi_task_change+0x84/0xc0 [60015.902569] kernel: ? _flat_send_IPI_mask+0x21/0x40 [60015.902572] kernel: ? check_preempt_curr+0x2f/0x70 [60015.902576] kernel: ? selinux_file_ioctl+0x137/0x1e0 [60015.902579] kernel: ? expand_files+0x1cb/0x1d0 [60015.902582] kernel: ? __x64_sys_ioctl+0x82/0xb0 [60015.902585] kernel: __x64_sys_ioctl+0x82/0xb0 [60015.902588] kernel: do_syscall_64+0x33/0x40 [60015.902591] kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab [60015.902599] kernel: Code: ff ff ff 85 c0 79 9b (...) [60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00 RCX: 00007f158e38f0ab [60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426 RDI: 0000000000000004 [60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640 R09: 00007f158e297640 [60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000 [60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0 R15: 0000558c1a83e2a0 [60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]--- This happens because when writing to the pipe, by calling kernel_write(), we end up doing page allocations using GFP_HIGHUSER \| __GFP_ACCOUNT as the gfp flags, which allow reclaim to happen if there is memory pressure. This allocation happens at fs/pipe.c:pipe_write(). If the reclaim is triggered, inode eviction can be triggered and that in turn can result in starting a transaction if the inode has a link count of 0. The transaction start happens early on during eviction, when we call btrfs_commit_inode_delayed_inode() at btrfs_evict_inode(). This happens if there is currently an open file descriptor for an inode with a link count of 0 and the reclaim task gets a reference on the inode before that descriptor is closed, in which case the reclaim task ends up doing the final iput that triggers the inode eviction. When we have assertions enabled (CONFIG_BTRFS_ASSERT=y), this triggers the following assertion at transaction.c:start_transaction(): /* Send isn't supposed to start transactions. */ ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB); And when assertions are not enabled, it triggers a crash since after that assertion we cast current->journal_info into a transaction handle pointer and then dereference it: if (current->journal_info) { WARN_ON(type & TRANS_EXTWRITERS); h = current->journal_info; refcount_inc(&h->use_count); (...) Which obviously results in a crash due to an invalid memory access. The same type of issue can happen during other memory allocations we do directly in the send code with kmalloc (and friends) as they use GFP_KERNEL and therefore may trigger reclaim too, which started to happen since 2016 after commit `e780b0d1c1` ("btrfs: send: use GFP_KERNEL everywhere"). The issue could be solved by setting up a NOFS context for the entire send operation so that reclaim could not be triggered when allocating memory or pages through kernel_write(). However that is not very friendly and we can in fact get rid of the send stub because: 1) The stub was introduced way back in 2014 by commit `a26e8c9f75` ("Btrfs: don't clear uptodate if the eb is under IO") to solve an issue exclusive to when send and balance are running in parallel, however there were other problems between balance and send and we do not allow anymore to have balance and send run concurrently since commit `9e967495e0` ("Btrfs: prevent send failures and crashes due to concurrent relocation"). More generically the issues are between send and relocation, and that last commit eliminated only the possibility of having send and balance run concurrently, but shrinking a device also can trigger relocation, and on zoned filesystems we have relocation of partially used block groups triggered automatically as well. The previous patch that has a subject of: "btrfs: ensure relocation never runs while we have send operations running" Addresses all the remaining cases that can trigger relocation. 2) We can actually allow starting and even committing transactions while in a send context if needed because send is not holding any locks that would block the start or the commit of a transaction. So get rid of all the logic added by commit `a26e8c9f75` ("Btrfs: don't clear uptodate if the eb is under IO"). We can now always call clear_extent_buffer_uptodate() at verify_parent_transid() since send is the only case that uses commit roots without having a transaction open or without holding the commit_root_sem. Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtRQ57=qXo3kygwpwEBOU_CA_eKvdmjP52sU=eFvuVOEGw@mail.gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2021-06-22 14:11:58 +02:00
..
tests	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
acl.c	fs: make helpers idmap mount aware	2021-01-24 14:27:20 +01:00
async-thread.c
async-thread.h
backref.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
backref.h	btrfs: add asserts for deleting backref cache nodes	2021-02-08 22:58:56 +01:00
block-group.c	btrfs: ensure relocation never runs while we have send operations running	2021-06-22 14:11:58 +02:00
block-group.h	btrfs: zoned: automatically reclaim zones	2021-04-20 20:46:31 +02:00
block-rsv.c	btrfs: introduce mount option rescue=ignorebadroots	2020-12-08 15:53:41 +01:00
block-rsv.h
btrfs_inode.h	btrfs: remove stale comment and logic from btrfs_inode_in_log()	2021-04-19 17:25:16 +02:00
check-integrity.c	btrfs: integrity-checker: convert block context kmap's to kmap_local_page	2021-04-19 17:25:16 +02:00
check-integrity.h
compression.c	btrfs: remove a stale comment for btrfs_decompress_bio()	2021-06-22 14:11:57 +02:00
compression.h	btrfs: optimize variables size in btrfs_submit_compressed_write	2021-06-21 15:19:07 +02:00
ctree.c	btrfs: always abort the transaction if we abort a trans handle	2021-06-21 15:19:06 +02:00
ctree.h	btrfs: ensure relocation never runs while we have send operations running	2021-06-22 14:11:58 +02:00
delalloc-space.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
delalloc-space.h
delayed-inode.c	btrfs: remove total_data_size variable in btrfs_batch_insert_items()	2021-06-21 15:19:11 +02:00
delayed-inode.h	btrfs: make btrfs_delayed_update_inode take btrfs_inode	2020-12-08 15:54:10 +01:00
delayed-ref.c	btrfs: update debug message when checking seq number of a delayed ref	2021-04-19 17:25:17 +02:00
delayed-ref.h	btrfs: only let one thread pre-flush delayed refs in commit	2021-02-08 22:58:56 +01:00
dev-replace.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
dev-replace.h	btrfs: zoned: mark block groups to copy for device-replace	2021-02-09 02:46:07 +01:00
dir-item.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
discard.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
discard.h	btrfs: cleanup btrfs_discard_update_discardable usage	2020-12-08 15:54:02 +01:00
disk-io.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
disk-io.h	btrfs: split alloc_log_tree()	2021-02-09 02:46:07 +01:00
export.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
export.h
extent_io.c	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
extent_io.h	btrfs: rename PagePrivate2 to PageOrdered inside btrfs	2021-06-21 15:19:09 +02:00
extent_map.c	btrfs: fix parameter description of btrfs_add_extent_mapping	2021-02-08 22:58:53 +01:00
extent_map.h
extent-io-tree.h	btrfs: use fixed width int type for extent_state::state	2020-12-08 15:54:13 +01:00
extent-tree.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
file-item.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
file.c	btrfs: eliminate insert label in add_falloc_range	2021-06-21 15:19:10 +02:00
free-space-cache.c	btrfs: don't set the full sync flag when truncation does not touch extents	2021-06-21 15:19:05 +02:00
free-space-cache.h	btrfs: zoned: track unusable bytes for zones	2021-02-09 02:46:03 +01:00
free-space-tree.c	btrfs: fix possible free space tree corruption with online conversion	2021-01-25 18:44:37 +01:00
free-space-tree.h
inode-item.c	btrfs: locking: rip out path->leave_spinning	2020-12-08 15:54:02 +01:00
inode.c	btrfs: compression: don't try to compress if we don't have enough pages	2021-06-22 14:11:57 +02:00
ioctl.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
Kconfig	btrfs: disable build on platforms having page size 256K	2021-06-22 14:11:57 +02:00
locking.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
locking.h	btrfs: remove the recurse parameter from __btrfs_tree_read_lock	2020-12-08 15:54:09 +01:00
lzo.c	btrfs: convert kmap to kmap_local_page, simple cases	2021-04-19 17:25:16 +02:00
Makefile	btrfs: move the tree mod log code into its own file	2021-04-19 17:25:16 +02:00
misc.h
ordered-data.c	btrfs: make page Ordered bit to be subpage compatible	2021-06-21 15:19:10 +02:00
ordered-data.h	btrfs: introduce btrfs_lookup_first_ordered_range()	2021-06-21 15:19:08 +02:00
orphan.c
print-tree.c	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
print-tree.h	btrfs: print the actual offset in btrfs_root_name	2021-01-07 17:25:05 +01:00
props.c	btrfs: props: change how empty value is interpreted	2021-06-22 14:11:58 +02:00
props.h
qgroup.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
qgroup.h	btrfs: export and rename qgroup_reserve_meta	2021-03-02 16:58:30 +01:00
raid56.c	CFI on arm64 series for v5.13-rc1	2021-04-27 10:16:46 -07:00
raid56.h
rcu-string.h
reada.c	btrfs: subpage: make readahead work properly	2021-03-16 11:06:21 +01:00
ref-verify.c	btrfs: ref-verify: use 'inline void' keyword ordering	2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c	btrfs: reflink: make copy_inline_to_page() to be subpage compatible	2021-06-21 15:19:10 +02:00
reflink.h
relocation.c	btrfs: ensure relocation never runs while we have send operations running	2021-06-22 14:11:58 +02:00
root-tree.c	btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations	2020-10-07 12:12:13 +02:00
scrub.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
send.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
send.h	btrfs: send: avoid copying file data	2020-10-07 12:13:17 +02:00
space-info.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
space-info.h	btrfs: zoned: track unusable bytes for zones	2021-02-09 02:46:03 +01:00
struct-funcs.c	btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors	2020-12-09 19:16:10 +01:00
subpage.c	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
subpage.h	btrfs: subpage: fix a rare race between metadata endio and eb freeing	2021-06-21 15:19:10 +02:00
super.c	btrfs: shorten integrity checker extent data mount option	2021-06-22 14:11:58 +02:00
sysfs.c	btrfs: sysfs: export dev stats in devinfo directory	2021-06-22 14:11:57 +02:00
sysfs.h	btrfs: split and refactor btrfs_sysfs_remove_devices_dir	2020-10-07 12:12:21 +02:00
transaction.c	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
transaction.h	btrfs: send: fix crash when memory allocations trigger reclaim	2021-06-22 14:11:58 +02:00
tree-checker.c	btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly	2021-04-19 17:25:21 +02:00
tree-checker.h
tree-defrag.c	btrfs: locking: remove all the blocking helpers	2020-12-08 15:54:01 +01:00
tree-log.c	btrfs: avoid unnecessary logging of xattrs during fast fsyncs	2021-06-21 15:19:07 +02:00
tree-log.h	btrfs: make fast fsyncs wait only for writeback	2020-10-07 12:06:56 +02:00
tree-mod-log.c	btrfs: fix race when picking most recent mod log operation for an old root	2021-04-20 19:27:17 +02:00
tree-mod-log.h	btrfs: add and use helper to get lowest sequence number for the tree mod log	2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c	btrfs: remove unnecessary casts in printk	2020-12-08 15:53:52 +01:00
volumes.c	btrfs: ensure relocation never runs while we have send operations running	2021-06-22 14:11:58 +02:00
volumes.h	btrfs: remove the unused parameter @len for btrfs_bio_fits_in_stripe()	2021-06-21 15:19:08 +02:00
xattr.c	for-5.12-rc1-tag	2021-03-05 12:21:14 -08:00
xattr.h
zlib.c	btrfs: use memzero_page() instead of open coded kmap pattern	2021-05-05 11:27:27 -07:00
zoned.c	btrfs: fix typos in comments	2021-06-22 14:11:57 +02:00
zoned.h	btrfs: zoned: factor out zoned device lookup	2021-06-21 15:19:05 +02:00
zstd.c	btrfs: use memzero_page() instead of open coded kmap pattern	2021-05-05 11:27:27 -07:00