linux/fs
Su Yue 2acc59dd88 bcachefs: grab s_umount only if snapshotting
When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
    $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
    sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
  destination: file
  logAppend: true
  path: /mnt/data/mongod.log

storage:
  dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
               but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
               which lock already depends on the new lock.

[ 3437.456553]
               the existing dependency chain (in reverse order) is:
[ 3437.457054]
               -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507]        down_read+0x3e/0x170
[ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
[ 3437.458498]        do_syscall_64+0x42/0xf0
[ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
               -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615]        down_read+0x3e/0x170
[ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686]        notify_change+0x1f1/0x4a0
[ 3437.461283]        do_truncate+0x7f/0xd0
[ 3437.461555]        path_openat+0xa57/0xce0
[ 3437.461836]        do_filp_open+0xb4/0x160
[ 3437.462116]        do_sys_openat2+0x91/0xc0
[ 3437.462402]        __x64_sys_openat+0x53/0xa0
[ 3437.462701]        do_syscall_64+0x42/0xf0
[ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
               -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843]        down_write+0x3b/0xc0
[ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493]        vfs_write+0x21b/0x4c0
[ 3437.464653]        ksys_write+0x69/0xf0
[ 3437.464839]        do_syscall_64+0x42/0xf0
[ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
               -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471]        __lock_acquire+0x1455/0x21b0
[ 3437.465656]        lock_acquire+0xc6/0x2b0
[ 3437.465822]        mnt_want_write+0x46/0x1a0
[ 3437.465996]        filename_create+0x62/0x190
[ 3437.466175]        user_path_create+0x2d/0x50
[ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
[ 3437.466791]        do_syscall_64+0x42/0xf0
[ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
               other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
               other info that might help us debug this:

[ 3437.467507] Chain exists of:
                 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979]  Possible unsafe locking scenario:

[ 3437.468223]        CPU0                    CPU1
[ 3437.468405]        ----                    ----
[ 3437.468585]   rlock(&type->s_umount_key#48);
[ 3437.468758]                                lock(&c->snapshot_create_lock);
[ 3437.469030]                                lock(&type->s_umount_key#48);
[ 3437.469291]   rlock(sb_writers#10);
[ 3437.469434]
                *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
               stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795]  <TASK>
[ 3437.471884]  dump_stack_lvl+0x57/0x90
[ 3437.472035]  check_noncircular+0x132/0x150
[ 3437.472202]  __lock_acquire+0x1455/0x21b0
[ 3437.472369]  lock_acquire+0xc6/0x2b0
[ 3437.472518]  ? filename_create+0x62/0x190
[ 3437.472683]  ? lock_is_held_type+0x97/0x110
[ 3437.472856]  mnt_want_write+0x46/0x1a0
[ 3437.473025]  ? filename_create+0x62/0x190
[ 3437.473204]  filename_create+0x62/0x190
[ 3437.473380]  user_path_create+0x2d/0x50
[ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819]  ? lock_acquire+0xc6/0x2b0
[ 3437.474002]  ? __fget_files+0x2a/0x190
[ 3437.474195]  ? __fget_files+0xbc/0x190
[ 3437.474380]  ? lock_release+0xc5/0x270
[ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
[ 3437.475277]  do_syscall_64+0x42/0xf0
[ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21 13:27:10 -05:00
..
9p Bunch of small fixes: 2023-11-04 09:20:04 -10:00
adfs adfs: convert to new timestamp accessors 2023-10-18 13:26:18 +02:00
affs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
afs afs: Fix use-after-free due to get/remove race in volume tree 2023-12-21 10:16:07 -08:00
autofs autofs: add: new_inode check in autofs_fill_super() 2023-11-20 14:56:36 +01:00
bcachefs bcachefs: grab s_umount only if snapshotting 2024-01-21 13:27:10 -05:00
befs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
bfs bfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
btrfs for-6.7-rc5-tag 2023-12-17 09:27:36 -08:00
cachefiles - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
ceph Two items: 2023-11-10 09:52:56 -08:00
coda coda: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
configfs configfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
cramfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
crypto This update includes the following changes: 2023-11-02 16:15:30 -10:00
debugfs debugfs: initialize cancellations earlier 2023-12-22 07:33:02 +01:00
devpts devpts: convert to new timestamp accessors 2023-10-18 13:26:20 +02:00
dlm dlm: slow down filling up processing queue 2023-10-12 15:21:00 -05:00
ecryptfs fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
efivarfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
efs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
erofs MAINTAINERS: erofs: add EROFS webpage 2023-11-17 19:55:46 +08:00
exfat exfat: fix ctime is not updated 2023-11-03 22:24:11 +09:00
exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined 2023-10-28 16:16:19 +02:00
ext2 ext2: Fix ki_pos update for DIO buffered-io fallback case 2023-11-22 10:17:10 +01:00
ext4 ext4: fix warning in ext4_dio_write_end_io() 2023-11-30 23:29:34 -05:00
f2fs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
fscache
fuse fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP 2023-12-04 10:19:32 +01:00
gfs2 gfs2 fixes 2023-11-07 11:54:17 -08:00
hfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hfsplus vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hostfs hostfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hpfs hpfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hugetlbfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
iomap Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
isofs isofs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
jbd2 jbd2: fix soft lockup in journal_finish_inode_data_buffers() 2023-12-12 10:25:46 -05:00
jffs2 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
jfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
kernfs Driver core changes for 6.7-rc1 2023-11-03 15:15:47 -10:00
lockd SUNRPC: change how svc threads are asked to exit. 2023-10-16 12:44:04 -04:00
minix minix: convert to new timestamp accessors 2023-10-18 14:08:23 +02:00
netfs netfs: Only call folio_start_fscache() one time for each folio 2023-09-18 12:03:46 -07:00
nfs NFS client updates for Linux 6.7 2023-11-08 13:39:16 -08:00
nfs_common
nfsd nfsd-6.7 fixes: 2023-12-20 11:16:50 -08:00
nilfs2 nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() 2023-12-06 16:12:50 -08:00
nls nls: Hide new NLS_UCS2_UTILS 2023-08-31 12:07:34 -05:00
notify vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ntfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ntfs3 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ocfs2 As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
omfs omfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
openpromfs openpromfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
orangefs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
overlayfs overlayfs fixes for 6.7-rc7 2023-12-20 12:04:03 -08:00
proc mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set 2023-12-06 16:12:45 -08:00
pstore pstore updates for v6.7-rc1 2023-10-30 19:26:39 -10:00
qnx4 qnx4: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
qnx6 qnx6: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
quota Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
ramfs ramfs: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
reiserfs Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
romfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
smb ksmbd server fix, also for stable 2023-12-28 16:12:23 -08:00
squashfs squashfs: squashfs_read_data need to check if the length is 0 2023-12-06 16:12:45 -08:00
sysfs kernfs: sysfs: support custom llseek method for sysfs entries 2023-10-05 13:42:11 +02:00
sysv sysv: convert to new timestamp accessors 2023-10-18 14:08:28 +02:00
tracefs eventfs: Fix file and directory uid and gid ownership 2023-12-22 08:13:55 -05:00
ubifs This pull request contains updates for UBI and UBIFS 2023-11-05 08:28:32 -10:00
udf \n 2023-11-02 08:19:51 -10:00
ufs fix ufs_get_locked_folio() breakage 2023-12-13 11:14:09 -05:00
unicode
vboxsf vboxsf: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
verity fsverity: skip PKCS#7 parser when keyring is empty 2023-08-20 10:33:43 -07:00
xfs Code changes for 6.7-rc2: 2023-11-25 08:57:09 -08:00
zonefs zonefs: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
aio.c aio: Annotate struct kioctx_table with __counted_by 2023-09-20 14:22:01 +02:00
anon_inodes.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
attr.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
bad_inode.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
binfmt_elf_fdpic.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_elf_test.c
binfmt_elf.c binfmt_elf: Only report padzero() errors when PROT_WRITE 2023-10-03 19:48:44 -07:00
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
d_path.c
dax.c mm: convert DAX lock/unlock page to lock/unlock folio 2023-10-04 10:32:20 -07:00
dcache.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
direct-io.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
drop_caches.c fs: drop_caches: draining pages before dropping caches 2023-08-18 10:12:11 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-07-11 11:41:34 +02:00
eventpoll.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
exec.c mm/mremap: allow moves within the same VMA for stack moves 2023-10-04 10:32:20 -07:00
fcntl.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
fhandle.c exportfs: add helpers to check if filesystem can encode/decode file handles 2023-10-24 17:57:45 +02:00
file_table.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
file.c file, i915: fix file reference for mmap_singleton() 2023-10-25 22:17:04 +02:00
filesystems.c
fs_context.c fs: factor out vfs_parse_monolithic_sep() helper 2023-10-12 18:53:36 +03:00
fs_parser.c
fs_pin.c
fs_struct.c kill do_each_thread() 2023-08-21 13:46:25 -07:00
fs_types.c
fs-writeback.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
fsopen.c fsconfig: ensure that dirfd is set to aux 2023-09-22 14:09:06 +02:00
init.c fs: add a new SB_I_NOUMASK flag 2023-10-19 11:02:47 +02:00
inode.c filemap: add a per-mapping stable writes flag 2023-11-20 15:05:18 +01:00
internal.h fs: store real path instead of fake path in backing file f_path 2023-10-19 11:03:15 +02:00
ioctl.c v6.6-vfs.super 2023-08-28 11:04:18 -07:00
Kconfig mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI 2023-12-06 16:12:49 -08:00
Kconfig.binfmt riscv: support the elf-fdpic binfmt loader 2023-08-23 14:17:43 -07:00
kernel_read_file.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
libfs.c libfs: getdents() should return 0 after reaching EOD 2023-11-20 15:34:22 +01:00
locks.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
Makefile bcachefs: Initial commit 2023-10-22 17:08:07 -04:00
mbcache.c mbcache: dynamically allocate the mbcache shrinker 2023-10-04 10:32:25 -07:00
mnt_idmapping.c fs: export mnt_idmap_get/mnt_idmap_put 2023-11-03 23:28:33 +01:00
mount.h
mpage.c buffer: remove folio_create_empty_buffers() 2023-10-25 16:47:10 -07:00
namei.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
namespace.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
nsfs.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
open.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-15 14:19:48 -08:00
pipe.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
pnode.c
pnode.h
posix_acl.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
proc_namespace.c
read_write.c fs: Fix one kernel-doc comment 2023-08-15 08:32:45 +02:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
stack.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
stat.c fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
statfs.c
super.c overlayfs update for 6.7-rc1 2023-11-07 11:46:31 -08:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
utimes.c
xattr.c xattr: make the xattr array itself const 2023-10-09 16:24:16 +02:00