linux/fs/btrfs
Josef Bacik 0e46318df8 btrfs: unlock to current level in btrfs_next_old_leaf
Filipe reported the following lockdep splat

  ======================================================
  WARNING: possible circular locking dependency detected
  5.10.0-rc2-btrfs-next-71 #1 Not tainted
  ------------------------------------------------------
  find/324157 is trying to acquire lock:
  ffff8ebc48d293a0 (btrfs-tree-01#2/3){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]

  but task is already holding lock:
  ffff8eb9932c5088 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (btrfs-tree-00){++++}-{3:3}:
	 lock_acquire+0xd8/0x490
	 down_write_nested+0x44/0x120
	 __btrfs_tree_lock+0x27/0x120 [btrfs]
	 btrfs_search_slot+0x2a3/0xc50 [btrfs]
	 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
	 insert_with_overflow+0x44/0x110 [btrfs]
	 btrfs_insert_xattr_item+0xb8/0x1d0 [btrfs]
	 btrfs_setxattr+0xd6/0x4c0 [btrfs]
	 btrfs_setxattr_trans+0x68/0x100 [btrfs]
	 __vfs_setxattr+0x66/0x80
	 __vfs_setxattr_noperm+0x70/0x200
	 vfs_setxattr+0x6b/0x120
	 setxattr+0x125/0x240
	 path_setxattr+0xba/0xd0
	 __x64_sys_setxattr+0x27/0x30
	 do_syscall_64+0x33/0x80
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (btrfs-tree-01#2/3){++++}-{3:3}:
	 check_prev_add+0x91/0xc60
	 __lock_acquire+0x1689/0x3130
	 lock_acquire+0xd8/0x490
	 down_read_nested+0x45/0x220
	 __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
	 btrfs_next_old_leaf+0x27d/0x580 [btrfs]
	 btrfs_real_readdir+0x1e3/0x4b0 [btrfs]
	 iterate_dir+0x170/0x1c0
	 __x64_sys_getdents64+0x83/0x140
	 do_syscall_64+0x33/0x80
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(btrfs-tree-00);
				 lock(btrfs-tree-01#2/3);
				 lock(btrfs-tree-00);
    lock(btrfs-tree-01#2/3);

   *** DEADLOCK ***

  5 locks held by find/324157:
   #0: ffff8ebc502c6e00 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60
   #1: ffff8eb97f689980 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: iterate_dir+0x52/0x1c0
   #2: ffff8ebaec00ca58 (btrfs-tree-02#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
   #3: ffff8eb98f986f78 (btrfs-tree-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
   #4: ffff8eb9932c5088 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]

  stack backtrace:
  CPU: 2 PID: 324157 Comm: find Not tainted 5.10.0-rc2-btrfs-next-71 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   dump_stack+0x8d/0xb5
   check_noncircular+0xff/0x110
   ? mark_lock.part.0+0x468/0xe90
   check_prev_add+0x91/0xc60
   __lock_acquire+0x1689/0x3130
   ? kvm_clock_read+0x14/0x30
   ? kvm_sched_clock_read+0x5/0x10
   lock_acquire+0xd8/0x490
   ? __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
   down_read_nested+0x45/0x220
   ? __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
   __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
   btrfs_next_old_leaf+0x27d/0x580 [btrfs]
   btrfs_real_readdir+0x1e3/0x4b0 [btrfs]
   iterate_dir+0x170/0x1c0
   __x64_sys_getdents64+0x83/0x140
   ? filldir+0x1d0/0x1d0
   do_syscall_64+0x33/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens because btrfs_next_old_leaf searches down to our current
key, and then walks up the path until we can move to the next slot, and
then reads back down the path so we get the next leaf.

However it doesn't unlock any lower levels until it replaces them with
the new extent buffer.  This is technically fine, but of course causes
lockdep to complain, because we could be holding locks on lower levels
while locking upper levels.

Fix this by dropping all nodes below the level that we use as our new
starting point before we start reading back down the path.  This also
allows us to drop the nested/recursive locking magic, because we're no
longer locking two nodes at the same level anymore.

Reported-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:09 +01:00
..
tests btrfs: load free space cache into a temporary ctl 2020-12-08 15:54:03 +01:00
acl.c
async-thread.c Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
async-thread.h Btrfs: fix crash during unmount due to race with delayed inode workers 2020-03-23 17:01:51 +01:00
backref.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
backref.h btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE 2020-05-25 11:25:35 +02:00
block-group.c btrfs: protect fs_info->caching_block_groups by block_group_cache_lock 2020-12-08 15:54:03 +01:00
block-group.h btrfs: load free space cache asynchronously 2020-12-08 15:54:03 +01:00
block-rsv.c btrfs: introduce mount option rescue=ignorebadroots 2020-12-08 15:53:41 +01:00
block-rsv.h btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
btrfs_inode.h btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
check-integrity.c btrfs: check integrity: remove local copy of csum_size 2020-12-08 15:54:01 +01:00
check-integrity.h btrfs: remove btrfsic_submit_bh() 2020-03-23 17:01:39 +01:00
compression.c btrfs: remove unnecessary local variables for checksum size 2020-12-08 15:54:00 +01:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: unlock to current level in btrfs_next_old_leaf 2020-12-08 15:54:09 +01:00
ctree.h btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
delalloc-space.c btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
delayed-inode.h btrfs: delayed-inode: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
delayed-ref.c btrfs: Remove __ prefix from btrfs_block_rsv_release 2020-03-23 17:01:55 +01:00
delayed-ref.h btrfs: migrate the delayed refs rsv code 2019-07-04 17:26:17 +02:00
dev-replace.c btrfs: remove unused argument seed from btrfs_find_device 2020-12-08 15:54:08 +01:00
dev-replace.h btrfs: add __pure attribute to functions 2019-11-18 12:46:52 +01:00
dir-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
discard.c btrfs: don't miss async discards after scheduled work override 2020-12-08 15:54:05 +01:00
discard.h btrfs: cleanup btrfs_discard_update_discardable usage 2020-12-08 15:54:02 +01:00
disk-io.c btrfs: drop unused argument step from btrfs_free_extra_devids 2020-12-08 15:54:08 +01:00
disk-io.h btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
export.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
export.h btrfs: export helpers for subvolume name/id resolution 2020-03-23 17:01:42 +01:00
extent_io.c btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
extent_io.h btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
extent_map.c Btrfs: fix race between using extent maps and merging them 2020-02-12 17:16:46 +01:00
extent_map.h btrfs: remove extent_map::bdev 2019-11-18 23:43:44 +01:00
extent-io-tree.h btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
extent-tree.c btrfs: set the lockdep class for extent buffers on creation 2020-12-08 15:54:07 +01:00
file-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
file.c btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
free-space-cache.c btrfs: load free space cache into a temporary ctl 2020-12-08 15:54:03 +01:00
free-space-cache.h btrfs: load free space cache into a temporary ctl 2020-12-08 15:54:03 +01:00
free-space-tree.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
free-space-tree.h btrfs: rename btrfs_block_group_cache 2019-11-18 17:51:51 +01:00
inode-item.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
inode-map.c btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
inode-map.h
inode.c btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
ioctl.c btrfs: remove unused argument seed from btrfs_find_device 2020-12-08 15:54:08 +01:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
locking.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
locking.h btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
lzo.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00
Makefile Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
misc.h btrfs: rename tree_entry to rb_simple_node and export it 2020-05-25 11:25:19 +02:00
ordered-data.c btrfs: switch cached fs_info::csum_size from u16 to u32 2020-12-08 15:53:59 +01:00
ordered-data.h btrfs: remove unnecessary local variables for checksum size 2020-12-08 15:54:00 +01:00
orphan.c
print-tree.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
print-tree.h btrfs: pretty print leaked root name 2020-10-07 12:12:20 +02:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: pass root owner to read_tree_block 2020-12-08 15:54:07 +01:00
qgroup.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
raid56.c btrfs: raid56: remove out label in __raid56_parity_recover 2020-07-27 12:55:44 +02:00
raid56.h
rcu-string.h btrfs: rcu-string: Replace zero-length array with flexible-array member 2020-03-23 17:01:53 +01:00
reada.c btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
ref-verify.c btrfs: use btrfs_read_node_slot in walk_down_tree 2020-12-08 15:54:06 +01:00
ref-verify.h
reflink.c btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
reflink.h Btrfs: move all reflink implementation code into its own file 2020-03-23 17:01:54 +01:00
relocation.c btrfs: pass the owner_root and level to alloc_extent_buffer 2020-12-08 15:54:07 +01:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: remove unused argument seed from btrfs_find_device 2020-12-08 15:54:08 +01:00
send.c btrfs: send: use helpers to access root_item::ctransid 2020-12-08 15:53:51 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: kill the RCU protection for fs_info->space_info 2020-10-07 12:13:19 +02:00
space-info.h btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
struct-funcs.c btrfs: use unaligned helpers for stack and header set/get helpers 2020-10-07 12:13:23 +02:00
super.c btrfs: locking: rip out path->leave_spinning 2020-12-08 15:54:02 +01:00
sysfs.c btrfs: discard: reschedule work after sysfs param update 2020-12-08 15:54:05 +01:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: protect fs_info->caching_block_groups by block_group_cache_lock 2020-12-08 15:54:03 +01:00
transaction.h btrfs: remove dio iomap DSYNC workaround 2020-12-08 15:53:49 +01:00
tree-checker.c btrfs: switch cached fs_info::csum_size from u16 to u32 2020-12-08 15:53:59 +01:00
tree-checker.h
tree-defrag.c btrfs: locking: remove all the blocking helpers 2020-12-08 15:54:01 +01:00
tree-log.c btrfs: update the number of bytes used by an inode atomically 2020-12-08 15:54:08 +01:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: remove unnecessary casts in printk 2020-12-08 15:53:52 +01:00
volumes.c btrfs: remove unused argument seed from btrfs_find_device 2020-12-08 15:54:08 +01:00
volumes.h btrfs: remove unused argument seed from btrfs_find_device 2020-12-08 15:54:08 +01:00
xattr.c
xattr.h
zlib.c btrfs: use larger zlib buffer for s390 hardware compression 2020-01-31 10:30:40 -08:00
zstd.c btrfs: compression: inline free_workspace 2019-11-18 12:46:59 +01:00