linux/fs/btrfs
Filipe Manana 8949b9a114 btrfs: fix lock inversion problem when doing qgroup extent tracing
At btrfs_qgroup_trace_extent_post() we call btrfs_find_all_roots() with a
NULL value as the transaction handle argument, which makes that function
take the commit_root_sem semaphore, which is necessary when we don't hold
a transaction handle or any other mechanism to prevent a transaction
commit from wiping out commit roots.

However btrfs_qgroup_trace_extent_post() can be called in a context where
we are holding a write lock on an extent buffer from a subvolume tree,
namely from btrfs_truncate_inode_items(), called either during truncate
or unlink operations. In this case we end up with a lock inversion problem
because the commit_root_sem is a higher level lock, always supposed to be
acquired before locking any extent buffer.

Lockdep detects this lock inversion problem since we switched the extent
buffer locks from custom locks to semaphores, and when running btrfs/158
from fstests, it reported the following trace:

[ 9057.626435] ======================================================
[ 9057.627541] WARNING: possible circular locking dependency detected
[ 9057.628334] 5.14.0-rc2-btrfs-next-93 #1 Not tainted
[ 9057.628961] ------------------------------------------------------
[ 9057.629867] kworker/u16:4/30781 is trying to acquire lock:
[ 9057.630824] ffff8e2590f58760 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.632542]
               but task is already holding lock:
[ 9057.633551] ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs]
[ 9057.635255]
               which lock already depends on the new lock.

[ 9057.636292]
               the existing dependency chain (in reverse order) is:
[ 9057.637240]
               -> #1 (&fs_info->commit_root_sem){++++}-{3:3}:
[ 9057.638138]        down_read+0x46/0x140
[ 9057.638648]        btrfs_find_all_roots+0x41/0x80 [btrfs]
[ 9057.639398]        btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
[ 9057.640283]        btrfs_add_delayed_data_ref+0x418/0x490 [btrfs]
[ 9057.641114]        btrfs_free_extent+0x35/0xb0 [btrfs]
[ 9057.641819]        btrfs_truncate_inode_items+0x424/0xf70 [btrfs]
[ 9057.642643]        btrfs_evict_inode+0x454/0x4f0 [btrfs]
[ 9057.643418]        evict+0xcf/0x1d0
[ 9057.643895]        do_unlinkat+0x1e9/0x300
[ 9057.644525]        do_syscall_64+0x3b/0xc0
[ 9057.645110]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 9057.645835]
               -> #0 (btrfs-tree-00){++++}-{3:3}:
[ 9057.646600]        __lock_acquire+0x130e/0x2210
[ 9057.647248]        lock_acquire+0xd7/0x310
[ 9057.647773]        down_read_nested+0x4b/0x140
[ 9057.648350]        __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.649175]        btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 9057.650010]        btrfs_search_slot+0x537/0xc00 [btrfs]
[ 9057.650849]        scrub_print_warning_inode+0x89/0x370 [btrfs]
[ 9057.651733]        iterate_extent_inodes+0x1e3/0x280 [btrfs]
[ 9057.652501]        scrub_print_warning+0x15d/0x2f0 [btrfs]
[ 9057.653264]        scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs]
[ 9057.654295]        scrub_bio_end_io_worker+0x101/0x2e0 [btrfs]
[ 9057.655111]        btrfs_work_helper+0xf8/0x400 [btrfs]
[ 9057.655831]        process_one_work+0x247/0x5a0
[ 9057.656425]        worker_thread+0x55/0x3c0
[ 9057.656993]        kthread+0x155/0x180
[ 9057.657494]        ret_from_fork+0x22/0x30
[ 9057.658030]
               other info that might help us debug this:

[ 9057.659064]  Possible unsafe locking scenario:

[ 9057.659824]        CPU0                    CPU1
[ 9057.660402]        ----                    ----
[ 9057.660988]   lock(&fs_info->commit_root_sem);
[ 9057.661581]                                lock(btrfs-tree-00);
[ 9057.662348]                                lock(&fs_info->commit_root_sem);
[ 9057.663254]   lock(btrfs-tree-00);
[ 9057.663690]
                *** DEADLOCK ***

[ 9057.664437] 4 locks held by kworker/u16:4/30781:
[ 9057.665023]  #0: ffff8e25922a1148 ((wq_completion)btrfs-scrub){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0
[ 9057.666260]  #1: ffffabb3451ffe70 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0
[ 9057.667639]  #2: ffff8e25922da198 (&ret->mutex){+.+.}-{3:3}, at: scrub_handle_errored_block.isra.0+0x5d2/0x1640 [btrfs]
[ 9057.669017]  #3: ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs]
[ 9057.670408]
               stack backtrace:
[ 9057.670976] CPU: 7 PID: 30781 Comm: kworker/u16:4 Not tainted 5.14.0-rc2-btrfs-next-93 #1
[ 9057.672030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 9057.673492] Workqueue: btrfs-scrub btrfs_work_helper [btrfs]
[ 9057.674258] Call Trace:
[ 9057.674588]  dump_stack_lvl+0x57/0x72
[ 9057.675083]  check_noncircular+0xf3/0x110
[ 9057.675611]  __lock_acquire+0x130e/0x2210
[ 9057.676132]  lock_acquire+0xd7/0x310
[ 9057.676605]  ? __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.677313]  ? lock_is_held_type+0xe8/0x140
[ 9057.677849]  down_read_nested+0x4b/0x140
[ 9057.678349]  ? __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.679068]  __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 9057.679760]  btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 9057.680458]  btrfs_search_slot+0x537/0xc00 [btrfs]
[ 9057.681083]  ? _raw_spin_unlock+0x29/0x40
[ 9057.681594]  ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs]
[ 9057.682336]  scrub_print_warning_inode+0x89/0x370 [btrfs]
[ 9057.683058]  ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs]
[ 9057.683834]  ? scrub_write_block_to_dev_replace+0xb0/0xb0 [btrfs]
[ 9057.684632]  iterate_extent_inodes+0x1e3/0x280 [btrfs]
[ 9057.685316]  scrub_print_warning+0x15d/0x2f0 [btrfs]
[ 9057.685977]  ? ___ratelimit+0xa4/0x110
[ 9057.686460]  scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs]
[ 9057.687316]  scrub_bio_end_io_worker+0x101/0x2e0 [btrfs]
[ 9057.688021]  btrfs_work_helper+0xf8/0x400 [btrfs]
[ 9057.688649]  ? lock_is_held_type+0xe8/0x140
[ 9057.689180]  process_one_work+0x247/0x5a0
[ 9057.689696]  worker_thread+0x55/0x3c0
[ 9057.690175]  ? process_one_work+0x5a0/0x5a0
[ 9057.690731]  kthread+0x155/0x180
[ 9057.691158]  ? set_kthread_struct+0x40/0x40
[ 9057.691697]  ret_from_fork+0x22/0x30

Fix this by making btrfs_find_all_roots() never attempt to lock the
commit_root_sem when it is called from btrfs_qgroup_trace_extent_post().

We can't just pass a non-NULL transaction handle to btrfs_find_all_roots()
from btrfs_qgroup_trace_extent_post(), because that would make backref
lookup not use commit roots and acquire read locks on extent buffers, and
therefore could deadlock when btrfs_qgroup_trace_extent_post() is called
from the btrfs_truncate_inode_items() code path which has acquired a write
lock on an extent buffer of the subvolume btree.

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-22 15:50:07 +02:00
..
tests btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
acl.c fs: make helpers idmap mount aware 2021-01-24 14:27:20 +01:00
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
backref.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
block-group.c btrfs: don't block if we can't acquire the reclaim lock 2021-07-07 17:42:57 +02:00
block-group.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: remove stale comment and logic from btrfs_inode_in_log() 2021-04-19 17:25:16 +02:00
check-integrity.c btrfs: integrity-checker: convert block context kmap's to kmap_local_page 2021-04-19 17:25:16 +02:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: remove a stale comment for btrfs_decompress_bio() 2021-06-22 14:11:57 +02:00
compression.h btrfs: optimize variables size in btrfs_submit_compressed_write 2021-06-21 15:19:07 +02:00
ctree.c btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
ctree.h btrfs: remove unused btrfs_fs_info::total_pinned 2021-06-22 19:58:26 +02:00
delalloc-space.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: remove total_data_size variable in btrfs_batch_insert_items() 2021-06-21 15:19:11 +02:00
delayed-inode.h btrfs: make btrfs_delayed_update_inode take btrfs_inode 2020-12-08 15:54:10 +01:00
delayed-ref.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
delayed-ref.h btrfs: only let one thread pre-flush delayed refs in commit 2021-02-08 22:58:56 +01:00
dev-replace.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
dev-replace.h btrfs: zoned: mark block groups to copy for device-replace 2021-02-09 02:46:07 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
disk-io.h btrfs: split alloc_log_tree() 2021-02-09 02:46:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: subpage: fix a rare race between metadata endio and eb freeing 2021-06-21 15:19:10 +02:00
extent_io.h btrfs: rename PagePrivate2 to PageOrdered inside btrfs 2021-06-21 15:19:09 +02:00
extent_map.c btrfs: fix parameter description of btrfs_add_extent_mapping 2021-02-08 22:58:53 +01:00
extent_map.h
extent-io-tree.h btrfs: use fixed width int type for extent_state::state 2020-12-08 15:54:13 +01:00
extent-tree.c btrfs: check for missing device in btrfs_trim_fs 2021-07-22 15:49:49 +02:00
file-item.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
file.c btrfs: eliminate insert label in add_falloc_range 2021-06-21 15:19:10 +02:00
free-space-cache.c btrfs: don't set the full sync flag when truncation does not touch extents 2021-06-21 15:19:05 +02:00
free-space-cache.h btrfs: zoned: track unusable bytes for zones 2021-02-09 02:46:03 +01:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-01-25 18:44:37 +01:00
free-space-tree.h
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode.c btrfs: properly split extent_map for REQ_OP_ZONE_APPEND 2021-07-07 17:42:45 +02:00
ioctl.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
Kconfig btrfs: disable build on platforms having page size 256K 2021-06-22 14:11:57 +02:00
locking.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
locking.h btrfs: remove the recurse parameter from __btrfs_tree_read_lock 2020-12-08 15:54:09 +01:00
lzo.c btrfs: convert kmap to kmap_local_page, simple cases 2021-04-19 17:25:16 +02:00
Makefile btrfs: move the tree mod log code into its own file 2021-04-19 17:25:16 +02:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: make page Ordered bit to be subpage compatible 2021-06-21 15:19:10 +02:00
ordered-data.h btrfs: introduce btrfs_lookup_first_ordered_range() 2021-06-21 15:19:08 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-07 17:25:05 +01:00
props.c btrfs: props: change how empty value is interpreted 2021-06-22 14:11:58 +02:00
props.h
qgroup.c btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
qgroup.h btrfs: fix lock inversion problem when doing qgroup extent tracing 2021-07-22 15:50:07 +02:00
raid56.c CFI on arm64 series for v5.13-rc1 2021-04-27 10:16:46 -07:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: subpage: make readahead work properly 2021-03-16 11:06:21 +01:00
ref-verify.c btrfs: ref-verify: use 'inline void' keyword ordering 2021-03-02 16:55:40 +01:00
ref-verify.h
reflink.c btrfs: reflink: make copy_inline_to_page() to be subpage compatible 2021-06-21 15:19:10 +02:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: ensure relocation never runs while we have send operations running 2021-06-22 14:11:58 +02:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
send.c btrfs: send: fix crash when memory allocations trigger reclaim 2021-06-22 14:11:58 +02:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
space-info.h btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
struct-funcs.c btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors 2020-12-09 19:16:10 +01:00
subpage.c btrfs: subpage: fix a rare race between metadata endio and eb freeing 2021-06-21 15:19:10 +02:00
subpage.h btrfs: subpage: fix a rare race between metadata endio and eb freeing 2021-06-21 15:19:10 +02:00
super.c btrfs: shorten integrity checker extent data mount option 2021-06-22 14:11:58 +02:00
sysfs.c btrfs: rip out btrfs_space_info::total_bytes_pinned 2021-06-22 14:55:25 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
transaction.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
tree-checker.c btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly 2021-04-19 17:25:21 +02:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: fix unpersisted i_size on fsync after expanding truncate 2021-07-22 15:49:42 +02:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
tree-mod-log.c btrfs: fix race when picking most recent mod log operation for an old root 2021-04-20 19:27:17 +02:00
tree-mod-log.h btrfs: add and use helper to get lowest sequence number for the tree mod log 2021-04-19 17:25:17 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
volumes.h btrfs: rework chunk allocation to avoid exhaustion of the system chunk array 2021-07-07 17:42:41 +02:00
xattr.c for-5.12-rc1-tag 2021-03-05 12:21:14 -08:00
xattr.h
zlib.c btrfs: use memzero_page() instead of open coded kmap pattern 2021-05-05 11:27:27 -07:00
zoned.c btrfs: fix typos in comments 2021-06-22 14:11:57 +02:00
zoned.h btrfs: zoned: factor out zoned device lookup 2021-06-21 15:19:05 +02:00
zstd.c btrfs: use memzero_page() instead of open coded kmap pattern 2021-05-05 11:27:27 -07:00