linux/fs
Josef Bacik 0f2b8098d7 btrfs: take the cleaner_mutex earlier in qgroup disable
One of my CI runs popped the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc4+ #1 Not tainted
------------------------------------------------------
btrfs/471533 is trying to acquire lock:
ffff92ba46980850 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_quota_disable+0x54/0x4c0

but task is already holding lock:
ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&fs_info->subvol_sem){++++}-{3:3}:
       down_read+0x42/0x170
       btrfs_rename+0x607/0xb00
       btrfs_rename2+0x2e/0x70
       vfs_rename+0xaf8/0xfc0
       do_renameat2+0x586/0x600
       __x64_sys_rename+0x43/0x50
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&sb->s_type->i_mutex_key#16){++++}-{3:3}:
       down_write+0x3f/0xc0
       btrfs_inode_lock+0x40/0x70
       prealloc_file_extent_cluster+0x1b0/0x370
       relocate_file_extent_cluster+0xb2/0x720
       relocate_data_extent+0x107/0x160
       relocate_block_group+0x442/0x550
       btrfs_relocate_block_group+0x2cb/0x4b0
       btrfs_relocate_chunk+0x50/0x1b0
       btrfs_balance+0x92f/0x13d0
       btrfs_ioctl+0x1abf/0x2600
       __x64_sys_ioctl+0x97/0xd0
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (&fs_info->cleaner_mutex){+.+.}-{3:3}:
       __lock_acquire+0x13e7/0x2180
       lock_acquire+0xcb/0x2e0
       __mutex_lock+0xbe/0xc00
       btrfs_quota_disable+0x54/0x4c0
       btrfs_ioctl+0x206b/0x2600
       __x64_sys_ioctl+0x97/0xd0
       do_syscall_64+0x95/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
  &fs_info->cleaner_mutex --> &sb->s_type->i_mutex_key#16 --> &fs_info->subvol_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->subvol_sem);
                               lock(&sb->s_type->i_mutex_key#16);
                               lock(&fs_info->subvol_sem);
  lock(&fs_info->cleaner_mutex);

 *** DEADLOCK ***

2 locks held by btrfs/471533:
 #0: ffff92ba4319e420 (sb_writers#14){.+.+}-{0:0}, at: btrfs_ioctl+0x3b5/0x2600
 #1: ffff92ba46980bd0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl+0x1c8f/0x2600

stack backtrace:
CPU: 1 PID: 471533 Comm: btrfs Kdump: loaded Not tainted 6.9.0-rc4+ #1
Call Trace:
 <TASK>
 dump_stack_lvl+0x77/0xb0
 check_noncircular+0x148/0x160
 ? lock_acquire+0xcb/0x2e0
 __lock_acquire+0x13e7/0x2180
 lock_acquire+0xcb/0x2e0
 ? btrfs_quota_disable+0x54/0x4c0
 ? lock_is_held_type+0x9a/0x110
 __mutex_lock+0xbe/0xc00
 ? btrfs_quota_disable+0x54/0x4c0
 ? srso_return_thunk+0x5/0x5f
 ? lock_acquire+0xcb/0x2e0
 ? btrfs_quota_disable+0x54/0x4c0
 ? btrfs_quota_disable+0x54/0x4c0
 btrfs_quota_disable+0x54/0x4c0
 btrfs_ioctl+0x206b/0x2600
 ? srso_return_thunk+0x5/0x5f
 ? __do_sys_statfs+0x61/0x70
 __x64_sys_ioctl+0x97/0xd0
 do_syscall_64+0x95/0x180
 ? srso_return_thunk+0x5/0x5f
 ? reacquire_held_locks+0xd1/0x1f0
 ? do_user_addr_fault+0x307/0x8a0
 ? srso_return_thunk+0x5/0x5f
 ? lock_acquire+0xcb/0x2e0
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? find_held_lock+0x2b/0x80
 ? srso_return_thunk+0x5/0x5f
 ? lock_release+0xca/0x2a0
 ? srso_return_thunk+0x5/0x5f
 ? do_user_addr_fault+0x35c/0x8a0
 ? srso_return_thunk+0x5/0x5f
 ? trace_hardirqs_off+0x4b/0xc0
 ? srso_return_thunk+0x5/0x5f
 ? lockdep_hardirqs_on_prepare+0xde/0x190
 ? srso_return_thunk+0x5/0x5f

This happens because when we call rename we already have the inode mutex
held, and then we acquire the subvol_sem if we are a subvolume.  This
makes the dependency

inode lock -> subvol sem

When we're running data relocation we will preallocate space for the
data relocation inode, and we always run the relocation under the
->cleaner_mutex.  This now creates the dependency of

cleaner_mutex -> inode lock (from the prealloc) -> subvol_sem

Qgroup delete is doing this in the opposite order, it is acquiring the
subvol_sem and then it is acquiring the cleaner_mutex, which results in
this lockdep splat.  This deadlock can't happen in reality, because we
won't ever rename the data reloc inode, nor is the data reloc inode a
subvolume.

However this is fairly easy to fix, simply take the cleaner mutex in the
case where we are disabling qgroups before we take the subvol_sem.  This
resolves the lockdep splat.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-04-25 16:23:09 +02:00
..
9p 9p: Use length of data written to the server in preference to error 2024-01-04 13:15:31 +00:00
adfs adfs: remove writepage implementation 2023-12-29 11:58:33 -08:00
affs affs: free affs_sb_info with kfree_rcu() 2024-02-25 02:10:31 -05:00
afs afs: Fix endless loop in directory parsing 2024-02-27 11:20:43 +01:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs bcachefs: fix bch2_save_backtrace() 2024-02-25 15:45:36 -05:00
befs befs: d_obtain_alias(ERR_PTR(...)) will do the right thing 2023-12-21 12:51:02 -05:00
bfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
btrfs btrfs: take the cleaner_mutex earlier in qgroup disable 2024-04-25 16:23:09 +02:00
cachefiles cachefiles: fix memory leak in cachefiles_add_cache() 2024-02-20 09:46:07 +01:00
ceph ceph: switch to corrected encoding of max_xattr_size in mdsmap 2024-02-26 19:20:30 +01:00
coda dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
configfs
cramfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
crypto fscrypt: document that CephFS supports fscrypt now 2023-12-26 22:55:42 -06:00
debugfs Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog' 2024-01-04 13:19:40 +01:00
devpts fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
dlm dlm: update format header reflect current format 2023-12-20 15:36:48 -06:00
ecryptfs fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
efivarfs efivarfs: Drop 'duplicates' bool parameter on efivar_init() 2024-02-25 09:43:39 +01:00
efs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
erofs Change since last update: 2024-02-25 09:53:13 -08:00
exfat Description for this pull request: 2024-03-01 12:22:30 -08:00
exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined 2023-10-28 16:16:19 +02:00
ext2 fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
ext4 We still have some races in filesystem methods when exposed to RCU 2024-02-25 09:29:05 -08:00
f2fs f2fs: fix double free of f2fs_sb_info 2024-01-12 18:55:09 -08:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs freevxfs: lookup: fix function params kernel-doc 2023-12-20 15:02:58 -08:00
fuse fuse: fix UAF in rcu pathwalks 2024-02-25 02:10:32 -05:00
gfs2 Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" 2024-02-02 17:21:44 +01:00
hfs hfs: really remove hfs_writepage 2023-12-29 11:58:34 -08:00
hfsplus hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info 2024-02-25 02:10:31 -05:00
hostfs hostfs: use d_splice_alias() calling conventions to simplify failure exits 2023-12-21 12:51:00 -05:00
hpfs
hugetlbfs fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super 2024-02-07 21:20:36 -08:00
iomap mm: add folio_fill_tail() and use it in iomap 2023-12-10 16:51:36 -08:00
isofs
jbd2 jbd2: abort journal when detecting metadata writeback error of fs dev 2024-01-04 23:42:21 -05:00
jffs2 jffs2: mark __jffs2_dbg_superblock_counts() static 2023-12-10 17:21:43 -08:00
jfs Revert "jfs: fix shift-out-of-bounds in dbJoin" 2024-01-29 08:45:10 -06:00
kernfs Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" 2024-01-11 11:51:27 +01:00
lockd sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
minix minixfs kmap_local_page() switchover and related fixes - very similar to sysv series. 2024-01-11 19:54:18 -08:00
netfs netfs: Fix missing zero-length check in unbuffered write 2024-01-29 14:53:21 +01:00
nfs nfs: fix UAF on pathwalk running into umount 2024-02-25 02:10:32 -05:00
nfs_common
nfsd nfsd-6.8 fixes: 2024-02-07 17:48:15 +00:00
nilfs2 nilfs2: fix potential bug in end_buffer_async_write 2024-02-07 21:20:37 -08:00
nls
notify dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ntfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ntfs3 fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS 2024-02-26 09:32:23 -08:00
ocfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
omfs
openpromfs
orangefs orangefs: saner arguments passing in readdir guts 2023-12-21 12:53:36 -05:00
overlayfs vfs-6.8-rc5.fixes 2024-02-12 07:15:45 -08:00
proc We still have some races in filesystem methods when exposed to RCU 2024-02-25 09:29:05 -08:00
pstore pstore: inode: Use cleanup.h for struct pstore_private 2023-12-08 14:15:44 -08:00
qnx4 qnx4: Use get_directory_fname() in qnx4_match() 2023-12-13 11:19:18 -08:00
qnx6
quota sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ramfs mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
reiserfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
romfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
smb We still have some races in filesystem methods when exposed to RCU 2024-02-25 09:29:05 -08:00
squashfs Squashfs: fix variable overflow triggered by sysbot 2023-12-10 17:21:26 -08:00
sysfs fs/sysfs/dir.c : Fix typo in comment 2023-12-07 11:35:23 +09:00
sysv sysv: remove writepage implementation 2023-12-29 11:58:35 -08:00
tracefs eventfs: Keep all directory links at 1 2024-02-01 11:53:53 -05:00
ubifs ubifs: fix kernel-doc warnings 2024-01-06 23:49:50 +01:00
udf misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
ufs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
unicode
vboxsf fs: vboxsf: fix a kernel-doc warning 2023-12-08 15:32:31 -07:00
verity Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
xfs xfs: drop experimental warning for FSDAX 2024-02-27 09:53:30 +05:30
zonefs zonefs: Improve error handling 2024-02-16 10:20:35 +09:00
aio.c fs/aio: Make io_cancel() generate completions again 2024-02-27 11:20:44 +01:00
anon_inodes.c Merge branch 'kvm-guestmemfd' into HEAD 2023-11-14 08:31:31 -05:00
attr.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
backing-file.c fs: factor out backing_file_mmap() helper 2023-12-23 16:35:09 +02:00
bad_inode.c
binfmt_elf_fdpic.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_elf_test.c
binfmt_elf.c
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
d_path.c
dax.c fs : Fix warning using plain integer as NULL 2023-11-18 15:00:01 +01:00
dcache.c Revert "get rid of DCACHE_GENOCIDE" 2024-02-09 23:31:16 -05:00
direct-io.c fs : Fix warning using plain integer as NULL 2023-11-18 15:00:01 +01:00
drop_caches.c
eventfd.c eventfd: Remove usage of the deprecated ida_simple_xx() API 2023-12-12 14:24:55 +01:00
eventpoll.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
exec.c execve fixes for v6.8-rc2 2024-01-24 13:32:29 -08:00
fcntl.c
fhandle.c exportfs: add helpers to check if filesystem can encode/decode file handles 2023-10-24 17:57:45 +02:00
file_table.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
file.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c netfs: Move pinning-for-writeback from fscache to netfs 2023-12-24 15:08:49 +00:00
fsopen.c
init.c
inode.c fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
internal.h dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2023-12-24 15:48:03 -05:00
Kconfig vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
Kconfig.binfmt
kernel_read_file.c
libfs.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
locks.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
Makefile vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
mbcache.c
mnt_idmapping.c mnt_idmapping: decouple from namespaces 2023-11-28 14:08:47 +01:00
mount.h mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
mpage.c fs: convert block_write_full_page to block_write_full_folio 2023-12-29 11:58:35 -08:00
namei.c rcu pathwalk: prevent bogus hard errors from may_lookup() 2024-02-25 02:10:31 -05:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-07 21:16:29 +01:00
nsfs.c nsfs: use d_make_root() 2023-11-25 02:49:43 -05:00
open.c vfs-6.8.rw 2024-01-08 11:11:51 -08:00
pipe.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
pnode.c mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
pnode.h
posix_acl.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
proc_namespace.c namespace: extract show_path() helper 2023-11-18 14:56:16 +01:00
read_write.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
readdir.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
remap_range.c remap_range: merge do_clone_file_range() into vfs_clone_file_range() 2024-02-06 17:07:21 +01:00
select.c
seq_file.c
signalfd.c
splice.c fs: use splice_copy_file_range() inline helper 2023-12-12 16:20:02 +01:00
stack.c
stat.c vfs-6.8.mount 2024-01-08 10:57:34 -08:00
statfs.c
super.c fs/super.c: don't drop ->s_user_ns until we free struct super_block itself 2024-02-25 02:10:31 -05:00
sync.c
sysctls.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
timerfd.c
userfaultfd.c Generic: 2024-01-17 13:03:37 -08:00
utimes.c
xattr.c